# Analysis for Continuous Improvement

Author Name: Avery West 

9-digit PID: 730325952 

Continuous Improvement embraces a belief there is _always room to make things better_. It is a mindset and process we value and practice in this course. In this assignment, you are able to practice continuous improvement and contribute to the design ideas of the course.

## Brainstorming Ideas

Reflect on your personal experiences and observations in COMP110 and **brainstorm modifications to the course that _create value_ beyond its current design**. When brainstorming, try not to be critical of the ideas you come up with regarding scale, stakeholders impacted, or for any other reasons. In the markdown cell below, brainstorm 3 to 5 ideas you think would create value for you.

Each brainstormed idea should state a. the suggested change or addition, b. what the expected value created, and c. which specific stakeholders would benefit.  If helpful, expand on the following template "The course should (state idea here) because it will (state value created here) for (insert stakeholders here)."

Example A: "The course should use only examples from psychology experiments because it will be more relevant for students who are psychology majors."

Example B: "The course should not have post-lesson questions because they are not useful for most students in the class."

### Part 1. Creative Ideation

1. This course should include more group discussions because it will increase collaboration for students in the class. 
2. This course should provide more time before lesson quizzes are due as they typically take longer than in-person lecture, and because it will help slow the pace for students.
3. This course should provide more real world examples because it will provide insight on the materials relevance outside of a classroom setting, particularly in a career setting for students. 
4. This course should have more frequent quizzes because it will increase the understanding of specific topics for students in the class.
5. This course should have more reading assignments like the Ethical Algorithm because it will give more context on how this applies at a societal level for students in the class to understand.

## Connecting with Available Data

The data you have available for this analysis is limited to the anonymized course survey you and your peers filled out a few weeks ago. The data is found in the `survey.csv` file in this exercise directory. Each row represents an individual survey response. Each column has a description which can be found on the project write-up here: <https://22s.comp110.com/exercises/ex08.html>

Review the list of available data and identify which one of your ideas _does not_, or is _least likely to_, have relevant data to support the analysis of your idea to create value. In the box below, identify which of your ideas lacks data and suggest how we might be able to collect this data in the future. One aspect of _continuous improvement_ is trying to avoid "tunnel vision" where possible improvements are not considered because there is no data available to analyze it. Identifying new data sources can unlock improvements!

### Part 2. Identifying Missing Data

1. Idea without sufficient data to analyze: Idea 5

2. Suggestion for how to collect data to support this idea in the future: A column could be created to ask students how effective they believe the reading assignments are in the class, using the 1-7 method. 

## Choosing an Idea to Analyze

Consider those of your ideas which _do_ seem likely to have relevant data to analyze. If none of your ideas do, spend a few minutes and brainstorm another idea or two with the added connection of data available on hand and add those ideas to your brainstormed ideas list.

Select the one idea which you believe is _most valuable_ to analyze relative to the others and has data to support the analysis of. In the markdown cell for Part 3 below, identify the idea you are exploring and articulate why you believe it is most valuable (e.g. widest impact, biggest opportunity for improvement, simplest change for significant improvement, and so on).

### Part 3. Choosing Your Analysis

1. Idea to analyze with available data: Idea 2

2. This idea is more valuable than the others brainstormed because: This idea could be supported by the questions regarding pace, effective lesson (ls_effective), and post-lessons questions effective (lsqs_effective). This idea is effective because it could improve students understanding of each lesson and improve grades on the lesson quizzes. This will ultimately lead to a better understanding of the material and improved final grades within the class.


## Your Analysis

Before you begin analysis, a reminder that we do not expect the data to support everyone's ideas and you can complete this exercise for full credit even if the data does not clearly support your suggestion or even completely refutes it. What we are looking for is a logical attempt to explore the data using the techniques you have learned up until now in a way that _either_ supports, refutes, or does not have a clear result and then to reflect on your findings after the analysis.

Using the utility functions you created for the previous exercise, you will continue with your analysis in the following part. Before you begin, refer to the rubric on the technical expectations of this section in the exercise write-up.

In this section, you are expected to interleave code and markdown cells such that for each step of your analysis you are starting with an English description of what you are planning to do next in a markdown cell, followed by a Python cell that performs that step of the analysis.

### Part 4. Analysis

We begin by changing some settings in the notebook to automatically reload changes to imported files.

In [90]:
%reload_ext autoreload
%autoreload 2

We continue by importing the helper functions from `data_utils`.

In [91]:
from data_utils import read_csv_rows, columnar, head, select, count, threshold

Next, ... (you take it from here and add additional code and markdown cells to read in the CSV file and process it as needed)

In [92]:
SURVEY_DATA_CSV_FILE_PATH: str = "../../data/survey.csv"

Using read_csv_rows to read through the file and determine the keys.

In [93]:
data_rows: list[dict[str, str]] = read_csv_rows(SURVEY_DATA_CSV_FILE_PATH)
print(data_rows[0].keys())


dict_keys(['row', 'year', 'unc_status', 'comp_major', 'primary_major', 'data_science', 'prereqs', 'prior_exp', 'ap_principles', 'ap_a', 'other_comp', 'prior_time', 'languages', 'hours_online_social', 'hours_online_work', 'lesson_time', 'sync_perf', 'all_sync', 'flipped_class', 'no_hybrid', 'own_notes', 'own_examples', 'oh_visits', 'ls_effective', 'lsqs_effective', 'programming_effective', 'qz_effective', 'oh_effective', 'tutoring_effective', 'pace', 'difficulty', 'understanding', 'interesting', 'valuable', 'would_recommend'])


Using the columnar function to organize the data from a row-oriented table to a column-oriented table, to help make data analysis easier.

In [94]:
data_cols: dict[str, list[str]] = columnar(data_rows)
print(data_cols.keys())

dict_keys(['row', 'year', 'unc_status', 'comp_major', 'primary_major', 'data_science', 'prereqs', 'prior_exp', 'ap_principles', 'ap_a', 'other_comp', 'prior_time', 'languages', 'hours_online_social', 'hours_online_work', 'lesson_time', 'sync_perf', 'all_sync', 'flipped_class', 'no_hybrid', 'own_notes', 'own_examples', 'oh_visits', 'ls_effective', 'lsqs_effective', 'programming_effective', 'qz_effective', 'oh_effective', 'tutoring_effective', 'pace', 'difficulty', 'understanding', 'interesting', 'valuable', 'would_recommend'])


Using the head function to analyze the first 10 rows of data, which could provide insight on potentially the responses of the entire class. I used tabulate to help with organization.

In [95]:
from tabulate import tabulate
data_cols_head: dict[str, list[str]] = head(data_cols, 10)
tabulate(data_cols_head, data_cols_head.keys(), "html")

row,year,unc_status,comp_major,primary_major,data_science,prereqs,prior_exp,ap_principles,ap_a,other_comp,prior_time,languages,hours_online_social,hours_online_work,lesson_time,sync_perf,all_sync,flipped_class,no_hybrid,own_notes,own_examples,oh_visits,ls_effective,lsqs_effective,programming_effective,qz_effective,oh_effective,tutoring_effective,pace,difficulty,understanding,interesting,valuable,would_recommend
0,22,Returning UNC Student,No,Mathematics,No,"MATH 233, MATH 347, MATH 381",7-12 months,No,No,UNC,1 month or so,"Python, R / Matlab / SAS",3 to 5 hours,0 to 2 hours,6,2,2,1,2,4,4,0,7,3,7,5,,,1,1,7,5,6,5
1,25,Returning UNC Student,No,Mathematics,Yes,"MATH 130, MATH 231, STOR 155",None to less than one month!,,,,,,0 to 2 hours,5 to 10 hours,4,3,3,1,2,6,4,5,5,5,5,5,7.0,6.0,6,6,3,4,6,4
2,25,Incoming First-year Student,Yes - BA,Computer Science,No,"MATH 130, MATH 152, MATH 210",None to less than one month!,,,,,,3 to 5 hours,5 to 10 hours,3,3,4,2,1,7,7,2,5,6,7,7,4.0,,6,4,6,7,7,7
3,24,Returning UNC Student,Yes - BS,Computer Science,Maybe,"MATH 231, MATH 232, STOR 155",2-6 months,No,No,High school course (IB or other),None to less than one month!,Python,3 to 5 hours,3 to 5 hours,5,5,4,3,3,6,5,1,6,3,5,5,5.0,4.0,4,4,5,6,6,6
4,25,Incoming First-year Student,Yes - BA,Computer Science,No,MATH 130,None to less than one month!,,,,,,0 to 2 hours,3 to 5 hours,7,3,3,3,2,6,3,5,6,6,6,6,7.0,3.0,6,5,5,6,6,7
5,25,Incoming First-year Student,Yes - BS,Computer Science,Maybe,"MATH 129P, MATH 231, MATH 232, STOR 155",2-6 months,No,No,High school course (IB or other),1 month or so,"Python, Java / C#, JavaScript / TypeScript, HTML / CSS",10+ hours,5 to 10 hours,5,2,2,2,1,5,5,0,7,7,7,7,,,4,3,6,7,7,7
6,25,Incoming First-year Student,Yes - BA,Computer Science,Yes,"MATH 129P, MATH 130",2-6 months,Yes,No,High school course (IB or other),7-12 months,"Python, Java / C#, JavaScript / TypeScript, HTML / CSS, Bash",3 to 5 hours,5 to 10 hours,5,3,3,5,3,7,7,2,7,5,7,5,4.0,4.0,4,4,6,7,7,7
7,24,Returning UNC Student,Yes - BA,Neuroscience,No,"MATH 130, MATH 152, MATH 231, MATH 232, MATH 233, MATH 381, PSYC 210, STOR 155",None to less than one month!,,,,,,5 to 10 hours,5 to 10 hours,1,2,2,1,1,7,7,0,7,7,7,7,7.0,,4,4,7,7,7,7
8,25,Incoming First-year Student,Yes - BS,Computer Science,Yes,STOR 120,None to less than one month!,,,,,,0 to 2 hours,10+ hours,1,5,4,6,5,7,7,1,7,7,7,7,7.0,7.0,5,4,6,7,7,7
9,22,Returning UNC Student,No,Neuroscience,No,"MATH 130, MATH 231, MATH 232, PSYC 210",None to less than one month!,,,,,,3 to 5 hours,5 to 10 hours,5,2,2,1,1,7,7,2,7,5,7,7,7.0,,7,4,6,7,7,7


Using the select function to isolate selected columns that are valuable data towards my analysis of Idea 2. I used tabulate to help with organization. I also utilized the head function to print only the first ten rows of data within the selected columns to maintain organization. 

In [96]:
selected_data: dict[str, list[str]] = select(data_cols, ["pace", "ls_effective", "lsqs_effective"])
tabulate(head(selected_data, 10), selected_data.keys(), "html")

pace,ls_effective,lsqs_effective
1,7,3
6,5,5
6,5,6
4,6,3
6,6,6
4,7,7
4,7,5
4,7,7
5,7,7
7,7,5


Using the count function I cycled through the selected columns, determining how many times the students selected a particular number as a response in the survey.

In [97]:
pace_counts: dict[str, int] = count(selected_data["pace"])
print(f"pace_counts: {pace_counts}")

ls_effective_counts: dict[str, int] = count(selected_data["ls_effective"])
print(f"ls_effective_counts: {ls_effective_counts}")

lsqs_effective_counts: dict[str, int] = count(selected_data["lsqs_effective"])
print(f"lsqs_effective_counts: {lsqs_effective_counts}")

pace_counts: {'1': 2, '6': 137, '4': 173, '5': 203, '7': 69, '3': 27, '2': 9}
ls_effective_counts: {'7': 257, '5': 120, '6': 154, '4': 46, '1': 8, '3': 28, '2': 7}
lsqs_effective_counts: {'3': 39, '5': 156, '6': 165, '7': 158, '4': 74, '1': 8, '2': 20}


Using the threshold function to determine which numbers are most commonly selected by the students to determine the general opinions regarding the pace, lesson effectiveness, and lesson quiz effectiveness.

In [98]:
pace_average: list[str] = threshold(pace_counts)
print(f"pace: {pace_average}")

ls_effective_average: list[str] = threshold (ls_effective_counts)
print(f"ls_effective: {ls_effective_average}")

lsqs_effective_average: list[str] = threshold (lsqs_effective_counts)
print(f"lsqs_effective: {lsqs_effective_average}")

pace: ['6', '4', '5']
ls_effective: ['7', '5', '6']
lsqs_effective: ['5', '6', '7']


## Conclusion

In the following markdown cell, write a reflective conclusion given the analysis you performed and identify recommendations.

If your analysis of the data supports your idea, state your recommendation for the change and summarize the data analysys results you found which support it. Additionally, describe any extensions or refinements to this idea which might be explored further. Finally, discuss the potential costs, trade-offs, or stakeholders who may be negatively impacted by this proposed change.

If your analysis of the data is inconclusive, summarize why your data analysis results were inconclusive in the support of your idea. Additionally, describe what experimental idea implementation or additional data collection might help build more confidence in assessing your idea. Finally, discuss the potential costs, trade-offs, or stakeholders who may be negatively impacted by experimenting with your idea.

Finally, if your analysis of the data does not support it, summarize your data analysis results and why it refutes your idea. Discuss the potential costs, trade-offs, or stakeholders who may be negatively impacted by this proposed change. If you disagree with the validity of the findings, describe why your idea still makes sense to implement and what alternative data would better support it. If you agree with the validity of the data analysis, describe what alternate ideas or extensions you would explore instead. 

### Part 5. Conclusion



The idea I chose to focus on goes as follows: this course should provide more time before lesson quizzes are due as they typically take longer than in-person lecture, and because it will help slow the pace for students. The data I collected regarding this idea was inconclusive. Using the functions above, particularly count and threshold, allowed me to see that many students believe COMP 110 has a fast pace. 203 students chose 5 out of 7 and 137 students chose 6 out 7. However, this is very general data, and while the lesson pace potentially contributes to this, it could be attributed to other ways the class is structured. The lesson and lesson quiz effectiveness dealt specifically with my idea, and the data showed that students find these very helpful within the classroom. 257 selected 7 out 7 for ls_effective and 158 students selected 7 out of 6 for lsqs_effective. These results do not support my idea; however, this could be because it is not asking the correct question. Implementing an additional column that asks specifically do students feel they have enough time to complete lesson videos and lesson quizzes, with a 1-7 choice would give further insight on why students feel the pace of the class very fast and if the lesson structure has anything to do with that. If this question was implemented within the survey, the idea was being supported, and more time was allotted to students to complete their work there would be potential costs. Students would be more susceptible to falling behind, having more time to complete lessons while all other aspects of class continue at the same pace. The students would potentially begin working on exercises without completing the lessons and not have the proper knowledge to do it correctly. This could potentially lower class averages and increase traffic in office hours overwhelmingly. There are most definitely certain benefits and costs to this proposal.    