# Analysis for Continuous Improvement

Author Name: Benjamin Eldridge

9-digit PID: 730518701

Continuous Improvement embraces a belief there is _always room to make things better_. It is a mindset and process we value and practice in this course. In this assignment, you are able to practice continuous improvement and contribute to the design ideas of the course.

## Brainstorming Ideas

Reflect on your personal experiences and observations in COMP110 and **brainstorm modifications to the course that _create value_ beyond its current design**. When brainstorming, try not to be critical of the ideas you come up with regarding scale, stakeholders impacted, or for any other reasons. In the markdown cell below, brainstorm 3 to 5 ideas you think would create value for you.

Each brainstormed idea should state a. the suggested change or addition, b. what the expected value created, and c. which specific stakeholders would benefit.  If helpful, expand on the following template "The course should (state idea here) because it will (state value created here) for (insert stakeholders here)."

Example A: "The course should use only examples from psychology experiments because it will be more relevant for students who are psychology majors."

Example B: "The course should not have post-lesson questions because they are not useful for most students in the class."

### Part 1. Creative Ideation

1. COMP 110 should include brief written summaries of lessons delivered in the video format to help students who do not keep their own notes, which will help them prepare for quizzes and the final exam.
2. COMP 110 should include some course material relevant to Microsoft Excel (likely related to csv files) to help students gain a commonly required skill in the workforce.
3. COMP 110 should add a monthly course calendar to the course website (below the "🌙 on the horizon 🌙" box) to give a better visual indication of upcoming to dates that will help students plan better.
4. COMP 110 should use data from surveys conducted regarding some of UNC's services and organizations such as Carolina Dining Services, the Student Union, and Public Safety in exercises similar to EX07 and EX08 to provide a collection of student-made analyses to benefit UNC and their operation of such services.
5. The COMP 110 website should collect data concerning numbers of clicks and the time of day links clicked to help the instructional staff see resource usage data that will help improve decisions related to future lesson and resource creation.

## Connecting with Available Data

The data you have available for this analysis is limited to the anonymized course survey you and your peers filled out a few weeks ago. The data is found in the `survey.csv` file in this exercise directory. Each row represents an individual survey response. Each column has a description which can be found on the project write-up here: <https://22s.comp110.com/exercises/ex08.html>

Review the list of available data and identify which one of your ideas _does not_, or is _least likely to_, have relevant data to support the analysis of your idea to create value. In the box below, identify which of your ideas lacks data and suggest how we might be able to collect this data in the future. One aspect of _continuous improvement_ is trying to avoid "tunnel vision" where possible improvements are not considered because there is no data available to analyze it. Identifying new data sources can unlock improvements!

### Part 2. Identifying Missing Data

1. Idea without sufficient data to analyze: "COMP 110 should add a monthly course calendar to the course website (below the "🌙 on the horizon 🌙" box) to give a better visual indication of upcoming to dates that will help students plan better."

2. Suggestion for how to collect data to support this idea in the future: Currently the course survey does not collect any data concerning the effectiveness of the course website in organizing and providing course material to students. A question that could be asked in the future to collect data on this subject is, "How effective do you feel the course website was in organizing course material and relevant dates for your use? (Scale from 1 to 7, 1 being Not Effective and 7 being Extremely Effective)"

## Choosing an Idea to Analyze

Consider those of your ideas which _do_ seem likely to have relevant data to analyze. If none of your ideas do, spend a few minutes and brainstorm another idea or two with the added connection of data available on hand and add those ideas to your brainstormed ideas list.

Select the one idea which you believe is _most valuable_ to analyze relative to the others and has data to support the analysis of. In the markdown cell for Part 3 below, identify the idea you are exploring and articulate why you believe it is most valuable (e.g. widest impact, biggest opportunity for improvement, simplest change for significant improvement, and so on).

### Part 3. Choosing Your Analysis

1. Idea to analyze with available data: "COMP 110 should include brief written summaries of lessons delivered in the video format to help students who do not keep their own notes, which will help them prepare for quizzes and the final exam."

2. This idea is more valuable than the others brainstormed because: Adding a summary for each lesson made in the video format can meet the needs of many students in the class who do not take notes with only a small addition to each lesson. This change would not require a great allocation of resources, but only the use of UTAs with advanced knowledge of the course material who can sum up what was shown in each video into a short write-up. This idea has many different possible columns of data that can support analysis of the need for this idea, including `own_notes`, `ls_effective`, `qz_effective`, `difficulty`, and `understanding`.

## Your Analysis

Before you begin analysis, a reminder that we do not expect the data to support everyone's ideas and you can complete this exercise for full credit even if the data does not clearly support your suggestion or even completely refutes it. What we are looking for is a logical attempt to explore the data using the techniques you have learned up until now in a way that _either_ supports, refutes, or does not have a clear result and then to reflect on your findings after the analysis.

Using the utility functions you created for the previous exercise, you will continue with your analysis in the following part. Before you begin, refer to the rubric on the technical expectations of this section in the exercise write-up.

In this section, you are expected to interleave code and markdown cells such that for each step of your analysis you are starting with an English description of what you are planning to do next in a markdown cell, followed by a Python cell that performs that step of the analysis.

### Part 4. Analysis

We begin by changing some settings in the notebook to automatically reload changes to imported files.

In [30]:
%reload_ext autoreload
%autoreload 2

We continue by importing the helper functions from `data_utils`. This includes the helper functions `average`, `median`, and `equal_to_masked` for the purposes of statistical analysis, all are shortened.

In [31]:
from data_utils import read_csv_rows, column_values, columnar, head, select, concat, count
from data_utils import median_it as med
from data_utils import average as avg
from data_utils import equal_to_masked as etm

Next, ... (you take it from here and add additional code and markdown cells to read in the CSV file and process it as needed)

In [32]:
SURVEY_DATA_CSV_FILE_PATH: str = "../../data/survey.csv"

I am beginning the process of turning the CSV file into column-based data by turning it into row-based data with `read_csv_rows`. Then I will turn it into column-based data using `columnar`.

In [33]:
survey_data_rows: list[dict[str, str]] = read_csv_rows(SURVEY_DATA_CSV_FILE_PATH)
survey_data_cols: dict[str, list[str]] = columnar(survey_data_rows)

From these rows I will select using the `select` function the relevant information to this analysis, which will be the columns `own_notes`, `ls_effective`, `qz_effective`, `difficulty`, and `understanding`. To ensure that we are addressing only the relevant data and that we are addressing it accurately, we use the `head` function to visualize the first five rows of our selected data.

In [34]:
selected_columns: list[str] = ["own_notes", "ls_effective", "qz_effective", "difficulty", "understanding"]
selected_survey_data: dict[str, list[str]] = select(survey_data_cols, selected_columns)

first_five_rows_data: dict[str, list[str]] = head(selected_survey_data, 5)
print(first_five_rows_data)

{'own_notes': ['4', '6', '7', '6', '6'], 'ls_effective': ['7', '5', '5', '6', '6'], 'qz_effective': ['5', '5', '7', '5', '6'], 'difficulty': ['1', '6', '4', '4', '5'], 'understanding': ['7', '3', '6', '5', '5']}


To ensure the relevance of this analysis to a reasonable portion of the surveyed class, we will use the `count` function to see the number of responses for each level on the 1-7 scale for the column `own_notes`. We will be able to see what level of notes most students take and determine if the addition of written note assistance will be of use to enough people.

In [35]:
levels_of_note_taking: dict[str, int] = count(selected_survey_data["own_notes"])
print(levels_of_note_taking)

{'4': 57, '6': 129, '7': 276, '5': 86, '3': 39, '1': 15, '2': 18}


We can see that, while a definite minority, there are still many smaller classes worth of people who may be in need of support from this idea. 72 people answered 1-3 to this question, and that is enough to justify further analysis. To continue this analysis, we will use our `etm` function, which is a function that combines the mask and masked functions learned in class and adds a loop to make it go through an entire column-based set of data and append only desired values in each column. We will use the head function to make sure it works appropriately.

In [36]:
one_note_takers: dict[str, list[str]] = etm(selected_survey_data, selected_survey_data["own_notes"], "1")
first_note_takers_head: dict[str, list[str]] = head(one_note_takers, 18)
print(first_note_takers_head)

{'own_notes': ['1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1'], 'ls_effective': ['6', '4', '1', '7', '7', '7', '7', '7', '6', '6', '5', '7', '5', '2', '6'], 'qz_effective': ['6', '1', '4', '3', '5', '7', '7', '5', '4', '5', '5', '7', '7', '1', '6'], 'difficulty': ['4', '1', '1', '2', '2', '6', '4', '3', '4', '7', '6', '1', '1', '2', '2'], 'understanding': ['6', '7', '7', '7', '7', '7', '6', '6', '7', '2', '3', '7', '7', '7', '7']}


### IT WORKED!!!

Now that we know that `etm` works, we can now gather tables for all 7 scores on the `own_notes` questions. We will gather these tables, find their medians and averages, and then graph them on a line graph. I'm gonna go off the rails a bit and create a mega-data table for all of these, with type `list[dict[str, list[str]]]`.

In [37]:
two_note_takers: dict[str, list[str]] = etm(selected_survey_data, selected_survey_data["own_notes"], "2")
three_note_takers: dict[str, list[str]] = etm(selected_survey_data, selected_survey_data["own_notes"], "3")
four_note_takers: dict[str, list[str]] = etm(selected_survey_data, selected_survey_data["own_notes"], "4")
five_note_takers: dict[str, list[str]] = etm(selected_survey_data, selected_survey_data["own_notes"], "5")
six_note_takers: dict[str, list[str]] = etm(selected_survey_data, selected_survey_data["own_notes"], "6")
seven_note_takers: dict[str, list[str]] = etm(selected_survey_data, selected_survey_data["own_notes"], "7")

mega_data: list[dict[str, list[str]]] = [one_note_takers, two_note_takers, three_note_takers, four_note_takers, five_note_takers, six_note_takers, seven_note_takers]

Now that the mega-data megalodon has been created, we can now find the medians and averages of all of these data points with some for...in loops.

In [43]:
average_mega_data: list[dict[str, float]] = []
average_dicts: dict[str, float] = {}

for dicts in mega_data:
    average_dicts = avg(dicts)
    average_mega_data.append(average_dicts)

median_mega_data: list[dict[str, float]] = []
median_dicts: dict[str, float] = {}
median_floats: float = 0.0

for dicts in mega_data:
    median_dicts = {}
    for keys in dicts:
        median_floats = med(dicts[keys])
        median_dicts[keys] = median_floats
    median_mega_data.append(median_dicts)

print(average_mega_data)
print(median_mega_data)

[{'own_notes': 1.0, 'ls_effective': 5.533333333333333, 'qz_effective': 4.866666666666666, 'difficulty': 3.066666666666667, 'understanding': 6.2}, {'own_notes': 2.0, 'ls_effective': 5.944444444444445, 'qz_effective': 5.666666666666667, 'difficulty': 3.611111111111111, 'understanding': 5.555555555555555}, {'own_notes': 3.0, 'ls_effective': 5.153846153846154, 'qz_effective': 4.564102564102564, 'difficulty': 4.230769230769231, 'understanding': 4.641025641025641}, {'own_notes': 4.0, 'ls_effective': 5.4035087719298245, 'qz_effective': 4.315789473684211, 'difficulty': 4.017543859649122, 'understanding': 4.859649122807017}, {'own_notes': 5.0, 'ls_effective': 5.558139534883721, 'qz_effective': 5.046511627906977, 'difficulty': 4.488372093023256, 'understanding': 4.709302325581396}, {'own_notes': 6.0, 'ls_effective': 5.7984496124031, 'qz_effective': 4.930232558139535, 'difficulty': 4.449612403100775, 'understanding': 4.875968992248062}, {'own_notes': 7.0, 'ls_effective': 6.115942028985507, 'qz_ef

With these two mega-data tables made, we can now see the average and median scores for each group of note-taking students in the four categories we wish to analyze. Now we must graph these mega-beasts, which I will do below.

In [None]:
import seaborn as sns
import matplotlib.pyplot as plt

sns.catplot(x="own_notes"

## Conclusion

In the following markdown cell, write a reflective conclusion given the analysis you performed and identify recommendations.

If your analysis of the data supports your idea, state your recommendation for the change and summarize the data analysys results you found which support it. Additionally, describe any extensions or refinements to this idea which might be explored further. Finally, discuss the potential costs, trade-offs, or stakeholders who may be negatively impacted by this proposed change.

If your analysis of the data is inconclusive, summarize why your data analysis results were inconclusive in the support of your idea. Additionally, describe what experimental idea implementation or additional data collection might help build more confidence in assessing your idea. Finally, discuss the potential costs, trade-offs, or stakeholders who may be negatively impacted by experimenting with your idea.

Finally, if your analysis of the data does not support it, summarize your data analysis results and why it refutes your idea. Discuss the potential costs, trade-offs, or stakeholders who may be negatively impacted by this proposed change. If you disagree with the validity of the findings, describe why your idea still makes sense to implement and what alternative data would better support it. If you agree with the validity of the data analysis, describe what alternate ideas or extensions you would explore instead. 

### Part 5. Conclusion

The analysis of the data did not support my idea because very few of the students in this class do not take notes, and therefore the validity of this idea is questionable. The cost of this work would not be very much, however, it would depend on the extent to which the summaries of all of the lessons are developed. This work can be done by a ULA or by the instructor, but with creating these easy guides comes the risk of discouraging the class from taking their own notes, which may lead to laziness and worse grades for the whole class. As a future refinement to this idea, maybe there should be a separate timeline for releasing written summaries that only prepares students for quizzes, to not discourage note-taking.