# Analysis for Continuous Improvement

Author Name: Ayah Haviland

9-digit PID: 730236204

Continuous Improvement embraces a belief there is _always room to make things better_. It is a mindset and process we value and practice in this course. In this assignment, you are able to practice continuous improvement and contribute to the design ideas of the course.

## Brainstorming Ideas

Reflect on your personal experiences and observations in COMP110 and **brainstorm modifications to the course that _create value_ beyond its current design**. When brainstorming, try not to be critical of the ideas you come up with regarding scale, stakeholders impacted, or for any other reasons. In the markdown cell below, brainstorm 3 to 5 ideas you think would create value for you.

Each brainstormed idea should state a. the suggested change or addition, b. what the expected value created, and c. which specific stakeholders would benefit.  If helpful, expand on the following template "The course should (state idea here) because it will (state value created here) for (insert stakeholders here)."

Example A: "The course should use only examples from psychology experiments because it will be more relevant for students who are psychology majors."

Example B: "The course should not have post-lesson questions because they are not useful for most students in the class."

### Part 1. Creative Ideation

1. The course should use data from experiments not limited to computer science topics; because a vast majority of students are not majoring in computer science, in order to increase levels of understanding and interest, and to decrease the chance of the course being too difficult for students.
2. Attendance should be made mandatory by instructional staff to help student success.
3. Post lesson questions should continue to be implemented by instructional staff because they are useful for students for preparing on quizzes
4. The rigor and requirements of the course should be made easier by the instructional staff and the college institution as there is not a high enough amount of students understanding.
5. Because students who attend tutoring find it effective, it should be offered more frequently/flexibly by the instructional staff.

## Connecting with Available Data

The data you have available for this analysis is limited to the anonymized course survey you and your peers filled out a few weeks ago. The data is found in the `survey.csv` file in this exercise directory. Each row represents an individual survey response. Each column has a description which can be found on the project write-up here: <https://22s.comp110.com/exercises/ex08.html>

Review the list of available data and identify which one of your ideas _does not_, or is _least likely to_, have relevant data to support the analysis of your idea to create value. In the box below, identify which of your ideas lacks data and suggest how we might be able to collect this data in the future. One aspect of _continuous improvement_ is trying to avoid "tunnel vision" where possible improvements are not considered because there is no data available to analyze it. Identifying new data sources can unlock improvements!

### Part 2. Identifying Missing Data

1. Idea without sufficient data to analyze: Because students who attend tutoring find it effective, it should be offered more frequently/flexibly by the instructional staff.

2. Suggestion for how to collect data to support this idea in the future: After each tutoring session, have students take a short survey to rank how effective tutoring helped with them understanding the material. You could also follow up with the same students after taking quizzes and survey them to see if it was helpful in their studying and success. Similar to the survey that is done on Course Care asking students how effective the person helping them at office hours was.

## Choosing an Idea to Analyze

Consider those of your ideas which _do_ seem likely to have relevant data to analyze. If none of your ideas do, spend a few minutes and brainstorm another idea or two with the added connection of data available on hand and add those ideas to your brainstormed ideas list.

Select the one idea which you believe is _most valuable_ to analyze relative to the others and has data to support the analysis of. In the markdown cell for Part 3 below, identify the idea you are exploring and articulate why you believe it is most valuable (e.g. widest impact, biggest opportunity for improvement, simplest change for significant improvement, and so on).

### Part 3. Choosing Your Analysis

1. Idea to analyze with available data: The course should use data from experiments not limited to computer science topics; because a vast majority of students are not majoring in computer science, in order to increase levels of understanding and interest, and to decrease the chance of the course being too difficult.

2. This idea is more valuable than the others brainstormed because: From the data I have, I can examine the relationship between a person's major, if they find the course intellectually interesting, as well as their perception of difficulty and their understanding of course material. It is important that the course is intellectually interesting for as many students as possible, and that requires higher levels of understanding, and consequently not an extreme difficulty with material and assignments.


## Your Analysis

Before you begin analysis, a reminder that we do not expect the data to support everyone's ideas and you can complete this exercise for full credit even if the data does not clearly support your suggestion or even completely refutes it. What we are looking for is a logical attempt to explore the data using the techniques you have learned up until now in a way that _either_ supports, refutes, or does not have a clear result and then to reflect on your findings after the analysis.

Using the utility functions you created for the previous exercise, you will continue with your analysis in the following part. Before you begin, refer to the rubric on the technical expectations of this section in the exercise write-up.

In this section, you are expected to interleave code and markdown cells such that for each step of your analysis you are starting with an English description of what you are planning to do next in a markdown cell, followed by a Python cell that performs that step of the analysis.

### Part 4. Analysis

We begin by changing some settings in the notebook to automatically reload changes to imported files.

In [3]:
%reload_ext autoreload
%autoreload 2

We continue by importing the helper functions from `data_utils`.

In [4]:
# TODO: You complete the code blocks from here forward!



Next, ... (you take it from here and add additional code and markdown cells to read in the CSV file and process it as needed)

### Code to access and read the CSV survey data:

In [5]:
SURVEY_DATA_CSV_FILE_PATH: str = "../../data/survey.csv"

from data_utils import read_csv_rows
data_rows: list[dict[str, str]] = read_csv_rows(SURVEY_DATA_CSV_FILE_PATH)

print(f"Data File Read: {SURVEY_DATA_CSV_FILE_PATH}")
print(f"{len(data_rows)} rows")
print(f"{len(data_rows[0].keys())} columns")
print(f"Columns names: {data_rows[0].keys()}")




Data File Read: ../../data/survey.csv
620 rows
35 columns
Columns names: dict_keys(['row', 'year', 'unc_status', 'comp_major', 'primary_major', 'data_science', 'prereqs', 'prior_exp', 'ap_principles', 'ap_a', 'other_comp', 'prior_time', 'languages', 'hours_online_social', 'hours_online_work', 'lesson_time', 'sync_perf', 'all_sync', 'flipped_class', 'no_hybrid', 'own_notes', 'own_examples', 'oh_visits', 'ls_effective', 'lsqs_effective', 'programming_effective', 'qz_effective', 'oh_effective', 'tutoring_effective', 'pace', 'difficulty', 'understanding', 'interesting', 'valuable', 'would_recommend'])


### Code to access all values in primary major column for first 5 entries:

In [6]:
from data_utils import column_values

primary_major: list[str] = column_values(data_rows, "primary_major")

print(f"Column 'primary_major' has {len(primary_major)} values.")
print("The first five values are:")
for i in range(5):
        print(primary_major[i])


Column 'primary_major' has 620 values.
The first five values are:
Mathematics
Mathematics
Computer Science
Computer Science
Computer Science


### Code to access all values in difficulty column for first 5 entries:

In [14]:
difficulty: list[str] = column_values(data_rows, "difficulty")

print(f"Column 'difficulty' has {len(difficulty)} values.")
print("The first five values are:")
for i in range(5):
        print(difficulty[i])

Column 'difficulty' has 620 values.
The first five values are:
1
6
4
4
5


### Code to access all values in level of understanding column for first 5 entries:

In [15]:
understanding: list[str] = column_values(data_rows, "understanding")

print(f"Column 'understanding' has {len(understanding)} values.")
print("The first five values are:")
for i in range(5):
        print(understanding[i])

Column 'understanding' has 620 values.
The first five values are:
7
3
6
5
5


### Code to access all values in intellectually interesting column for first 5 entries:

In [17]:
interesting: list[str] = column_values(data_rows, "interesting")

print(f"Column 'interesting' has {len(interesting)} values.")
print("The first five values are:")
for i in range(5):
        print(interesting[i])

Column 'interesting' has 620 values.
The first five values are:
5
4
7
6
6


### Code to transform row-oriented table values to a column-oriented table values for all of the data in CSV file:


In [18]:
from data_utils import columnar

data_cols: dict[str, list[str]] = columnar(data_rows)

print(f"{len(data_cols.keys())} columns")
print(f"{len(data_cols['unc_status'])} rows")
print(f"Columns names: {data_cols.keys()}")

35 columns
620 rows
Columns names: dict_keys(['row', 'year', 'unc_status', 'comp_major', 'primary_major', 'data_science', 'prereqs', 'prior_exp', 'ap_principles', 'ap_a', 'other_comp', 'prior_time', 'languages', 'hours_online_social', 'hours_online_work', 'lesson_time', 'sync_perf', 'all_sync', 'flipped_class', 'no_hybrid', 'own_notes', 'own_examples', 'oh_visits', 'ls_effective', 'lsqs_effective', 'programming_effective', 'qz_effective', 'oh_effective', 'tutoring_effective', 'pace', 'difficulty', 'understanding', 'interesting', 'valuable', 'would_recommend'])


### To get an understanding of all the data provided from the survey by examining the first 5 entries fully in a visual table format:

In [12]:
from data_utils import head
from tabulate import tabulate
data_cols_head: dict[str, list[str]] = head(data_cols, 5)
tabulate(data_cols_head, data_cols_head.keys(), "html")

row,year,unc_status,comp_major,primary_major,data_science,prereqs,prior_exp,ap_principles,ap_a,other_comp,prior_time,languages,hours_online_social,hours_online_work,lesson_time,sync_perf,all_sync,flipped_class,no_hybrid,own_notes,own_examples,oh_visits,ls_effective,lsqs_effective,programming_effective,qz_effective,oh_effective,tutoring_effective,pace,difficulty,understanding,interesting,valuable,would_recommend
0,22,Returning UNC Student,No,Mathematics,No,"MATH 233, MATH 347, MATH 381",7-12 months,No,No,UNC,1 month or so,"Python, R / Matlab / SAS",3 to 5 hours,0 to 2 hours,6,2,2,1,2,4,4,0,7,3,7,5,,,1,1,7,5,6,5
1,25,Returning UNC Student,No,Mathematics,Yes,"MATH 130, MATH 231, STOR 155",None to less than one month!,,,,,,0 to 2 hours,5 to 10 hours,4,3,3,1,2,6,4,5,5,5,5,5,7.0,6.0,6,6,3,4,6,4
2,25,Incoming First-year Student,Yes - BA,Computer Science,No,"MATH 130, MATH 152, MATH 210",None to less than one month!,,,,,,3 to 5 hours,5 to 10 hours,3,3,4,2,1,7,7,2,5,6,7,7,4.0,,6,4,6,7,7,7
3,24,Returning UNC Student,Yes - BS,Computer Science,Maybe,"MATH 231, MATH 232, STOR 155",2-6 months,No,No,High school course (IB or other),None to less than one month!,Python,3 to 5 hours,3 to 5 hours,5,5,4,3,3,6,5,1,6,3,5,5,5.0,4.0,4,4,5,6,6,6
4,25,Incoming First-year Student,Yes - BA,Computer Science,No,MATH 130,None to less than one month!,,,,,,0 to 2 hours,3 to 5 hours,7,3,3,3,2,6,3,5,6,6,6,6,7.0,3.0,6,5,5,6,6,7


## The following code blocks are to evaluate the relationship specifically amongst major, and levels of difficulty, understanding, and interest:

### A function in order to create a _visual_ in table format of a sample of the data, from the first 100 entries, in order to see the relationship amongst students from a diverse set of majors and their opinions on the difficulty of the course, their understanding of the material, and if they find the course material to be intellectually interesting:

In [22]:
from data_utils import select

from tabulate import tabulate

selected_data: dict[str, list[str]] = select(data_cols, ["difficulty", "primary_major", "interesting", "understanding"])

tabulate(head(selected_data, 100), selected_data.keys(), "html")

primary_major,difficulty,understanding,interesting
Mathematics,1,7,5
Mathematics,6,3,4
Computer Science,4,6,7
Computer Science,4,5,6
Computer Science,5,5,6
Computer Science,3,6,7
Computer Science,4,6,7
Neuroscience,4,7,7
Computer Science,4,6,7
Neuroscience,4,6,7


### Code to count the frequencies of how many students are in each of the majors that were entered into the survey; in order to get an understanding of the diversity of student majors in the class:

In [10]:
from data_utils import count

unc_major: dict[str, int] = count(selected_data["primary_major"])
print(f"UNC Major Count: {unc_major}")

UNC Major Count: {'Mathematics': 10, 'Computer Science': 223, 'Neuroscience': 55, 'Psychology': 70, 'Environmental Science/Studies': 17, 'Economics': 50, 'Media and Journalism': 6, 'Exercise and Sports Science': 3, 'Biology': 60, 'Undecided': 4, 'Asian Studies': 3, 'Information Science': 16, 'Chemistry': 4, 'Communication': 2, 'Political Science': 7, 'Statistics and Analytics': 29, 'Business': 35, 'Advertising and PR': 1, 'English': 1, 'Radiology': 1, 'Linguistics': 1, 'HPM': 1, 'Physics': 2, 'Nursing': 2, 'Peace, War, and Defense': 1, 'Philosophy': 3, 'Clinical Lab Science': 1, 'Music Preformance': 1, 'Medical Anthropology': 2, 'Interdisciplinary Studies': 1, 'Geology': 1, 'Cultural Anthropology': 1, 'Sports Administration': 1, 'Earth Science': 1, 'Studio Art': 1, 'Communications': 2, 'Nutrition': 1}


### Code to count the frequencies for each level of difficulty to get an understanding of where the majority of the class stands as a whole:

In [25]:
unc_difficulty: dict[str, int] = count(selected_data["difficulty"])
print(f"Frequency count for level of difficulty: {unc_difficulty}")

Frequency count for level of difficulty: {'1': 23, '6': 88, '4': 160, '5': 155, '3': 84, '7': 56, '2': 54}


### Code to count the frequencies for each level of understanding to see where the majority of the class stands as a whole:

In [24]:
unc_understanding: dict[str, int] = count(selected_data["understanding"])
print(f"Frequency count for level of understanding: {unc_understanding}")

Frequency count for level of understanding: {'7': 74, '3': 68, '6': 182, '5': 172, '4': 82, '2': 28, '1': 14}


### Code to count the frequencies for each level of finding the course intellectually interesting to get an understanding of where the majority of the class stands as a whole:

In [23]:
unc_interesting: dict[str, int] = count(selected_data["interesting"])
print(f"Frequency count for level of interest: {unc_interesting}")

Frequency count for level of interest: {'5': 106, '4': 47, '7': 293, '6': 144, '1': 7, '3': 16, '2': 7}


## Conclusion

In the following markdown cell, write a reflective conclusion given the analysis you performed and identify recommendations.

If your analysis of the data supports your idea, state your recommendation for the change and summarize the data analysys results you found which support it. Additionally, describe any extensions or refinements to this idea which might be explored further. Finally, discuss the potential costs, trade-offs, or stakeholders who may be negatively impacted by this proposed change.

If your analysis of the data is inconclusive, summarize why your data analysis results were inconclusive in the support of your idea. Additionally, describe what experimental idea implementation or additional data collection might help build more confidence in assessing your idea. Finally, discuss the potential costs, trade-offs, or stakeholders who may be negatively impacted by experimenting with your idea.

Finally, if your analysis of the data does not support it, summarize your data analysis results and why it refutes your idea. Discuss the potential costs, trade-offs, or stakeholders who may be negatively impacted by this proposed change. If you disagree with the validity of the findings, describe why your idea still makes sense to implement and what alternative data would better support it. If you agree with the validity of the data analysis, describe what alternate ideas or extensions you would explore instead. 

### Part 5. Conclusion



The recommendation that I made before my analysis was that the course should use data from experiments not limited to computer science topics; because a vast majority of students are not majoring in computer science, in order to increase levels of understanding and interest, and to decrease the chance of the course being too difficult for students.

After completing my data analysis, I could visualize the relationship between major, perceived level of difficulty, perceived level of understanding, and perceived level of interest in students. I could roughly see that from the data, computer science majors tended to have a higher level of understanding and interest, and a lower level of perceived difficulty. On the contrary, majors that weren’t computer science, and specifically more humanities based, found the class to be more difficult, and they had a lower level of understanding and a lower level of interest. I saw that of the 620 entries, 223 students were declared as computer science majors. That means 64% of students enrolled in the course are not planning to major in computer science, meaning that we can infer that a large majority of students enrolled may find this course to be too difficult, not incredibly intellectually interesting, and may have a lower understanding of course material. With this information in mind, I think that if we use data and make assignments to be more interdisciplinary in style, then the level of interest and level of understanding will increase.

There are some potential downsides and trade-offs in doing this, since there are still a large portion of computer science majors enrolled in the course, roughly 36%, their level of interest in the course could decrease, and as a result, this could cause UNC to see a decrease of computer science majors in the future. The stakeholders involved are the students enrolled, instructional staff, the academic institution (UNC), and the societal workforce. This could have a downside for them as well, the instructional staff have an expertise specifically in computer science, which could make developing interdisciplinary exercises more difficult. This could affect the course requirements and the way the class is presented by the academic institution. The societal workforce could be affected both positively, by having potentially a more diverse group of majors interested in data science, but it could negatively impact them if the levels of computer science majors decrease.

For future work into developing this idea, some extensions and refinements to my idea can be made. Perhaps there can be a consideration in making more open-ended assignments, like this one, but more specifically for students to explore topics that they are interested in; let that be relating it to computer science or another major topic of choice, and allow them to use that data to use in a coding project. Then we could survey their levels of interest, understanding, and perception of difficulty of certain computer science topics after doing an open-ended assignment about a topic that they choose and they are interested in. If levels of understanding and interest in the course increase, then this strategy can be implemented more frequently into future coding assignments in order to make the class easier and more enjoyable for more students enrolled. You could survey their levels of understanding, interest, and difficulty before doing this, and then survey their levels again after the assignment, to then evaluate if there is a relationship, to conclude whether or not this kind of exercise style would be effective. 