# Analysis for Continuous Improvement

Author Name: Keegan McDowell

9-digit PID: 730234932

Continuous Improvement embraces a belief there is _always room to make things better_. It is a mindset and process we value and practice in this course. In this assignment, you are able to practice continuous improvement and contribute to the design ideas of the course.

## Brainstorming Ideas

Reflect on your personal experiences and observations in COMP110 and **brainstorm modifications to the course that _create value_ beyond its current design**. When brainstorming, try not to be critical of the ideas you come up with regarding scale, stakeholders impacted, or for any other reasons. In the markdown cell below, brainstorm 3 to 5 ideas you think would create value for you.

Each brainstormed idea should state a. the suggested change or addition, b. what the expected value created, and c. which specific stakeholders would benefit.  If helpful, expand on the following template "The course should (state idea here) because it will (state value created here) for (insert stakeholders here)."

Example A: "The course should use only examples from psychology experiments because it will be more relevant for students who are psychology majors."

Example B: "The course should not have post-lesson questions because they are not useful for most students in the class."

### Part 1. Creative Ideation

1. The course should have more in person learning because it will make learning the material easier for students who do not learn well through videos.
2. The course should use more real data sets because it will allow better context and easier understanding for the students.
3. The course should have more, but potentially smaller, programming assignments because they make programming the focus, including hands on use of the language for students.
4. The course should have focus on how the skills are used in potential career settings because it will help understanding of the scope and importance of the course for students.
5. The course should have more opportunity for one on one help because that is the best way to get direct engagement from students. 

## Connecting with Available Data

The data you have available for this analysis is limited to the anonymized course survey you and your peers filled out a few weeks ago. The data is found in the `survey.csv` file in this exercise directory. Each row represents an individual survey response. Each column has a description which can be found on the project write-up here: <https://22s.comp110.com/exercises/ex08.html>

Review the list of available data and identify which one of your ideas _does not_, or is _least likely to_, have relevant data to support the analysis of your idea to create value. In the box below, identify which of your ideas lacks data and suggest how we might be able to collect this data in the future. One aspect of _continuous improvement_ is trying to avoid "tunnel vision" where possible improvements are not considered because there is no data available to analyze it. Identifying new data sources can unlock improvements!

### Part 2. Identifying Missing Data

1. Idea without sufficient data to analyze: My second idea, using more real life data sets, has the least support in the data. 

2. Suggestion for how to collect data to support this idea in the future: Asking whether learning with abstract, light data or more fulfilled, larger data sets is more useful to the students, potentially on the same 1 to 7 scale used in other survey questions.

## Choosing an Idea to Analyze

Consider those of your ideas which _do_ seem likely to have relevant data to analyze. If none of your ideas do, spend a few minutes and brainstorm another idea or two with the added connection of data available on hand and add those ideas to your brainstormed ideas list.

Select the one idea which you believe is _most valuable_ to analyze relative to the others and has data to support the analysis of. In the markdown cell for Part 3 below, identify the idea you are exploring and articulate why you believe it is most valuable (e.g. widest impact, biggest opportunity for improvement, simplest change for significant improvement, and so on).

### Part 3. Choosing Your Analysis

1. Idea to analyze with available data: Idea 1, using more in person lectures.

2. This idea is more valuable than the others brainstormed because: There are questions that specifically show people's thoughts on how effective synchronous classes are as well as on flipped classrooms. Another angle to look at it would be whether those things are different among those who have taken other comp classes/have prior experience and those who do not.


## Your Analysis

Before you begin analysis, a reminder that we do not expect the data to support everyone's ideas and you can complete this exercise for full credit even if the data does not clearly support your suggestion or even completely refutes it. What we are looking for is a logical attempt to explore the data using the techniques you have learned up until now in a way that _either_ supports, refutes, or does not have a clear result and then to reflect on your findings after the analysis.

Using the utility functions you created for the previous exercise, you will continue with your analysis in the following part. Before you begin, refer to the rubric on the technical expectations of this section in the exercise write-up.

In this section, you are expected to interleave code and markdown cells such that for each step of your analysis you are starting with an English description of what you are planning to do next in a markdown cell, followed by a Python cell that performs that step of the analysis.

### Part 4. Analysis

We begin by changing some settings in the notebook to automatically reload changes to imported files.

In [92]:
%reload_ext autoreload
%autoreload 2

# Organization of Data 

We continue by importing the helper functions from `data_utils`.

In [93]:
from data_utils import read_csv_rows, head, select, columnar, several_means, count

Next, we use the `read_csv_rows` function to read the rows of a csv file into a table and get an overview of the data.

In [94]:
SURVEY_DATA_CSV_FILE_PATH: str = "../../data/survey.csv"

survey_data_rows: list[dict[str, str]] = read_csv_rows(SURVEY_DATA_CSV_FILE_PATH)

print(f"{len(survey_data_rows)} rows")
print(f"{len(survey_data_rows[0].keys())} columns")
print(f"Columns names: {survey_data_rows[0].keys()}")

620 rows
35 columns
Columns names: dict_keys(['row', 'year', 'unc_status', 'comp_major', 'primary_major', 'data_science', 'prereqs', 'prior_exp', 'ap_principles', 'ap_a', 'other_comp', 'prior_time', 'languages', 'hours_online_social', 'hours_online_work', 'lesson_time', 'sync_perf', 'all_sync', 'flipped_class', 'no_hybrid', 'own_notes', 'own_examples', 'oh_visits', 'ls_effective', 'lsqs_effective', 'programming_effective', 'qz_effective', 'oh_effective', 'tutoring_effective', 'pace', 'difficulty', 'understanding', 'interesting', 'valuable', 'would_recommend'])


Then, we use the `columnar` function in order to convert a row-oriented table to a column-oriented table.

In [95]:
survey_data: dict[str, list[str]] = columnar(survey_data_rows)

print(f"{len(survey_data.keys())} columns")
print(f"Columns names: {survey_data.keys()}")

35 columns
Columns names: dict_keys(['row', 'year', 'unc_status', 'comp_major', 'primary_major', 'data_science', 'prereqs', 'prior_exp', 'ap_principles', 'ap_a', 'other_comp', 'prior_time', 'languages', 'hours_online_social', 'hours_online_work', 'lesson_time', 'sync_perf', 'all_sync', 'flipped_class', 'no_hybrid', 'own_notes', 'own_examples', 'oh_visits', 'ls_effective', 'lsqs_effective', 'programming_effective', 'qz_effective', 'oh_effective', 'tutoring_effective', 'pace', 'difficulty', 'understanding', 'interesting', 'valuable', 'would_recommend'])


We then cut the table down to columns that we believe will be helpful in answering the question using the `select` function and the `head` function to show only a piece of the data.

In [96]:
from tabulate import tabulate

selected_data: dict[str, list[str]] = select(survey_data, ["year", "prior_exp", "other_comp",
 "prior_time", "sync_perf", "all_sync",
 "flipped_class", "ls_effective","difficulty", "understanding"])

tabulate(head(selected_data, 10), selected_data.keys(), "html")

year,prior_exp,other_comp,prior_time,sync_perf,all_sync,flipped_class,ls_effective,difficulty,understanding
22,7-12 months,UNC,1 month or so,2,2,1,7,1,7
25,None to less than one month!,,,3,3,1,5,6,3
25,None to less than one month!,,,3,4,2,5,4,6
24,2-6 months,High school course (IB or other),None to less than one month!,5,4,3,6,4,5
25,None to less than one month!,,,3,3,3,6,5,5
25,2-6 months,High school course (IB or other),1 month or so,2,2,2,7,3,6
25,2-6 months,High school course (IB or other),7-12 months,3,3,5,7,4,6
24,None to less than one month!,,,2,2,1,7,4,7
25,None to less than one month!,,,5,4,6,7,4,6
22,None to less than one month!,,,2,2,1,7,4,6


To find if there is any basis to students prefering synchronous learning to asyncronous by looking at several means of columns.

In [97]:
several_means(selected_data, ["sync_perf", "all_sync", "flipped_class", "ls_effective"])

sync_perf mean: 3.0387096774193547
all_sync mean: 2.753225806451613
flipped_class mean: 2.988709677419355
ls_effective mean: 5.827419354838709


Next, I decided to look at how many people are experienced with coding/Python in the class to get a better understanding of the students. 

In [100]:
print(count(selected_data["prior_exp"]))
print(count(selected_data["prior_time"]))

{'7-12 months': 59, 'None to less than one month!': 369, '2-6 months': 142, '1-2 years': 31, 'Over 2 years': 19}
{'1 month or so': 69, '': 369, 'None to less than one month!': 102, '7-12 months': 14, '2-6 months': 49, '1-2 years': 10, '> 2 years': 7}


## Conclusion

In the following markdown cell, write a reflective conclusion given the analysis you performed and identify recommendations.

If your analysis of the data supports your idea, state your recommendation for the change and summarize the data analysys results you found which support it. Additionally, describe any extensions or refinements to this idea which might be explored further. Finally, discuss the potential costs, trade-offs, or stakeholders who may be negatively impacted by this proposed change.

If your analysis of the data is inconclusive, summarize why your data analysis results were inconclusive in the support of your idea. Additionally, describe what experimental idea implementation or additional data collection might help build more confidence in assessing your idea. Finally, discuss the potential costs, trade-offs, or stakeholders who may be negatively impacted by experimenting with your idea.

Finally, if your analysis of the data does not support it, summarize your data analysis results and why it refutes your idea. Discuss the potential costs, trade-offs, or stakeholders who may be negatively impacted by this proposed change. If you disagree with the validity of the findings, describe why your idea still makes sense to implement and what alternative data would better support it. If you agree with the validity of the data analysis, describe what alternate ideas or extensions you would explore instead. 

### Part 5. Conclusion

My analysis showed, as is seen by the means, which are not perfect measures of the distribution of the students answers, students feel their performance would not be better in synchronous classes and are not in favor of having the class be returned to synchronous lectures. They generally do believe the lesson videos are effective, giving it close to a 6/7 on average. This shows that, at the very least, the students in the class prefer the current form of teaching and believe that they are performing at their best becuase of it. 

Although they do not want it, the proposed change would have students in the seats of the class room more often, which would likely lead to higher grades due to more attention. Other factors are the ability to rewatch lectures, which may allow the students to rewatch over and over, finding particular parts they may need when they need it. The professor and TAs would have to spend more time in the classroom as well, especially with multiple sections, essentially teaching the same material over and over again, so it is inefficient on that front. 

The main problem with the validity of the findings is that the data is based on collected surveys with students who have certain agendas/interest in answering in a way. Most students will tell you they would prefer not to have to get out of bed and go to class because every day and may not tell the truth about how the synchronous/asynchronous class difference is affecting their performance. The data that would actually be needed would be a much larger study showing classes who are synchronous and classes that are not fully and comparing their performances against one another. 
