# Analysis for Continuous Improvement

Author Name: Max Bauman

9-digit PID: 730405989

Continuous Improvement embraces a belief there is _always room to make things better_. It is a mindset and process we value and practice in this course. In this assignment, you are able to practice continuous improvement and contribute to the design ideas of the course.

## Brainstorming Ideas

Reflect on your personal experiences and observations in COMP110 and **brainstorm modifications to the course that _create value_ beyond its current design**. When brainstorming, try not to be critical of the ideas you come up with regarding scale, stakeholders impacted, or for any other reasons. In the markdown cell below, brainstorm 3 to 5 ideas you think would create value for you.

Each brainstormed idea should state a. the suggested change or addition, b. what the expected value created, and c. which specific stakeholders would benefit.  If helpful, expand on the following template "The course should (state idea here) because it will (state value created here) for (insert stakeholders here)."

Example A: "The course should use only examples from psychology experiments because it will be more relevant for students who are psychology majors."

Example B: "The course should not have post-lesson questions because they are not useful for most students in the class."

### Part 1. Creative Ideation

1. This course should have a updated code sheet so that topics that we went over months ago we can easily refer back to, helping everybody, but even more specifically this could help in the long term for non comp sci majors who wish to remember the code after the course.
2. This course should have practice problems, like many textbooks have in order to review material more, to better help students review at all times. (Quizzes are only every few weeks and the problems do not release until a few days before the quiz, also gradescope lesson responses only have one or two practice problems that use code for each problem, and none of them have you write your own code.)
3. Reading responses should be more geared towards utilitarian uses of coding in order to provide a background of how to use your coding skills for every student.
4. This course should have more optional learning activities to further computer science experience and skills for students who want to go above and beyond in coding and want to challenge themselves more.
5. This course should relate Python to other languages more often in order to help students understand how Python relates to other key languages, helping students who are learning their first computing language and have plans to go into others. 

## Connecting with Available Data

The data you have available for this analysis is limited to the anonymized course survey you and your peers filled out a few weeks ago. The data is found in the `survey.csv` file in this exercise directory. Each row represents an individual survey response. Each column has a description which can be found on the project write-up here: <https://22s.comp110.com/exercises/ex08.html>

Review the list of available data and identify which one of your ideas _does not_, or is _least likely to_, have relevant data to support the analysis of your idea to create value. In the box below, identify which of your ideas lacks data and suggest how we might be able to collect this data in the future. One aspect of _continuous improvement_ is trying to avoid "tunnel vision" where possible improvements are not considered because there is no data available to analyze it. Identifying new data sources can unlock improvements!

### Part 2. Identifying Missing Data

1. Idea without sufficient data to analyze: 1, 2, 3

2. Suggestion for how to collect data to support this idea in the future: 
Add questions: "Do you wish there was more practice to review topics in the course?"- Q2
"What are ways that the course could better prepare you for future courses and after COMP 110?- Q1
"Reading assignments are effective in helping student learn the topics of the course." 1 through 7, Q3


## Choosing an Idea to Analyze

Consider those of your ideas which _do_ seem likely to have relevant data to analyze. If none of your ideas do, spend a few minutes and brainstorm another idea or two with the added connection of data available on hand and add those ideas to your brainstormed ideas list.

Select the one idea which you believe is _most valuable_ to analyze relative to the others and has data to support the analysis of. In the markdown cell for Part 3 below, identify the idea you are exploring and articulate why you believe it is most valuable (e.g. widest impact, biggest opportunity for improvement, simplest change for significant improvement, and so on).

### Part 3. Choosing Your Analysis

1. Idea to analyze with available data: 5. This course should relate Python to other languages more often in order to help students understand how Python relates to other key languages, helping students who are learning their first computing language and have plans to go into others. 

2. This idea is more valuable than the others brainstormed because: This has good data to test hypothesis such as how many programs someone knows, how much prior coding experience someone has, intention to major, how valuable the course is. As an intro COMP course, this class is setting the basic understanding of programming, that although many languages are different, contextualizing other langauges and how they differ will prepare students very well for the subsequent levels of coding that come after. Specifically with the difficulty of higher level coding courses at UNC, a little extra help and background on other languages and more complex programing in COMP 110 could go a long way. I believe that this is a very simple change that could have a significant improvement on students preparedness for higher levels of programming.


## Your Analysis

Before you begin analysis, a reminder that we do not expect the data to support everyone's ideas and you can complete this exercise for full credit even if the data does not clearly support your suggestion or even completely refutes it. What we are looking for is a logical attempt to explore the data using the techniques you have learned up until now in a way that _either_ supports, refutes, or does not have a clear result and then to reflect on your findings after the analysis.

Using the utility functions you created for the previous exercise, you will continue with your analysis in the following part. Before you begin, refer to the rubric on the technical expectations of this section in the exercise write-up.

In this section, you are expected to interleave code and markdown cells such that for each step of your analysis you are starting with an English description of what you are planning to do next in a markdown cell, followed by a Python cell that performs that step of the analysis.

### Part 4. Analysis

We begin by changing some settings in the notebook to automatically reload changes to imported files.

In [None]:
%reload_ext autoreload
%autoreload 2

We continue by importing the helper functions from `data_utils`.

In [None]:
from data_utils_ex08 import read_csv_rows, head, columnar, select, count, convert_to_list
from data_utils_ex08 import countlist, prop_finder, mean

Next, ... (you take it from here and add additional code and markdown cells to read in the CSV file and process it as needed)

In [None]:
SURVEY_DATA_CSV_FILE_PATH: str = "../../data/survey.csv"

To begin I am going to put this data into a dictionary form that allows me to transform the data. After that I will be using the head function to get a general idea of the data frame and how to use it.


In [None]:
from tabulate import tabulate
data_rows: list[dict[str, str]] = read_csv_rows(SURVEY_DATA_CSV_FILE_PATH)
data_cols: dict[str, list[str]] = columnar(data_rows)

if len(data_cols.keys()) == 0:
    print("Complete your implementation of columnar in data_utils.py")
    print("Be sure to follow the guidelines above and save your work before re-evaluating!")
else:
    print(f"{len(data_cols.keys())} columns")
    print(f"{len(data_cols['row'])} rows")
    print(f"Columns names: {data_cols.keys()}")

data_cols_head: dict[str, list[str]] = head(data_cols, 5)
tabulate(data_cols_head, data_cols_head.keys(), "html")


After printing out a table that allows me to see which columns I want to use, to begin the goal is to see the number of kids who have little to no experience programming in languages outside of python. There are 620 rows so 620 students.

In [None]:
selected_data: dict[str, list[str]] = select(data_cols, ["row", "primary_major", "prior_exp", "languages", "valuable"])

tabulate(head(selected_data, 10), selected_data.keys(), "html")

# Start by counting number of people with language experience beyond python
# Convert into list
# Select list with languages
language_list = convert_to_list(selected_data, 'languages')
# Next counting number of people who responded blank or that they only knew Python
print(countlist(language_list, ''))
print(countlist(language_list, 'Python'))
# Found proportion of python or nothing among total
prop_finder((369 + 48), 620)

In [29]:

# count number coders with less than one month, even 2-6 months
exp_list = convert_to_list(selected_data, 'prior_exp')
print(countlist(exp_list, 'None to less than one month!'))
print(prop_finder(369, 620))

369
0.5951612903225807


The first two steps show the number of beginners in coding and the number of people only fluent in python or nothing. Therefore, we can see that 67% of students only have a background in Python or nothing more, and just under 60% of students in COMP 110 are beginners. This background information will set the stage for the next part of the research.

With this information I will find the number of students in majors that relate to data science and will likely need further courses in data science in order to fulfill major credits, and enter the professional workforce.

In [30]:
# Same step as before
major_list = convert_to_list(selected_data, 'primary_major')
# Next count majors list
counts_major = count(major_list)
print(counts_major)
# Printing count of majors that use coding through countlist function
print(countlist(major_list, 'Computer Science'))
print(countlist(major_list, 'Statistics and Analytics'))
print(countlist(major_list, 'Mathematics'))
print(countlist(major_list, 'Neuroscience'))
prop_finder((223 + 29 + 10+ 55), 620)

{'Mathematics': 10, 'Computer Science': 223, 'Neuroscience': 55, 'Psychology': 70, 'Environmental Science/Studies': 17, 'Economics': 50, 'Media and Journalism': 6, 'Exercise and Sports Science': 3, 'Biology': 60, 'Undecided': 4, 'Asian Studies': 3, 'Information Science': 16, 'Chemistry': 4, 'Communication': 2, 'Political Science': 7, 'Statistics and Analytics': 29, 'Business': 35, 'Advertising and PR': 1, 'English': 1, 'Radiology': 1, 'Linguistics': 1, 'HPM': 1, 'Physics': 2, 'Nursing': 2, 'Peace, War, and Defense': 1, 'Philosophy': 3, 'Clinical Lab Science': 1, 'Music Preformance': 1, 'Medical Anthropology': 2, 'Interdisciplinary Studies': 1, 'Geology': 1, 'Cultural Anthropology': 1, 'Sports Administration': 1, 'Earth Science': 1, 'Studio Art': 1, 'Communications': 2, 'Nutrition': 1}
223
29
10
55


0.5112903225806451

This information shows an extremely rough estimate of the number of students who are continuing on in coding related fields, and this will significanlty underestimate. These majors, Statistifcs and Analytics, Neuroscience, Mathematics, and of course Computer Science are all majors that take programming classes in programs other than Python. Therefore, more than half of the people in this class are new to coding, do not know anything other than Python and Next, we will use the survey fo the value of coding, assuming those people are taking it or plan on using it in the future. 

Add analysis of all 3 at same time!!! who only knows python, and is in these majors. In the last step I will attempt to further prove my point, pointing out the number of students who view it as valuable, and the number of students continuing in coding-related fields while not having a background outside of Python.

In [None]:
tabulate(head(selected_data, 10), selected_data.keys(), "html")

In [31]:
# Find mean of responses to valuable question through mean function
print(mean(selected_data['valuable']))

6.124193548387097


## Conclusion

In the following markdown cell, write a reflective conclusion given the analysis you performed and identify recommendations.

If your analysis of the data supports your idea, state your recommendation for the change and summarize the data analysys results you found which support it. Additionally, describe any extensions or refinements to this idea which might be explored further. Finally, discuss the potential costs, trade-offs, or stakeholders who may be negatively impacted by this proposed change.

If your analysis of the data is inconclusive, summarize why your data analysis results were inconclusive in the support of your idea. Additionally, describe what experimental idea implementation or additional data collection might help build more confidence in assessing your idea. Finally, discuss the potential costs, trade-offs, or stakeholders who may be negatively impacted by experimenting with your idea.

Finally, if your analysis of the data does not support it, summarize your data analysis results and why it refutes your idea. Discuss the potential costs, trade-offs, or stakeholders who may be negatively impacted by this proposed change. If you disagree with the validity of the findings, describe why your idea still makes sense to implement and what alternative data would better support it. If you agree with the validity of the data analysis, describe what alternate ideas or extensions you would explore instead. 

### Part 5. Conclusion



# Combining all of these calculations, we can make 4 major conclusions: 
    1. Around 67% of the class came into this class with no prior coding experience outside of Python programming. (Although this could be a little overexaggerated as it used blank responses as an indication that there was no other prior coding experience, and the question was optional. As a counter to that, those that did have coding experience would very likely be the ones to answer this question, so I would argue that this is pretty close to accurate.) 
    2. Around 60% of the class came into this class with less than 1 month of prior coding experience.
    3. Approximately 55% of the class are in majors where there is future coding classes.
    4. The average student in the Comp 110 class agrees pretty convincingly in the fact that COMP 110 will be extremely valuable in the long term

# What does this tell us?

A good portion of the students in this class are new to coding, fluent in Python or no other programming language. Also, a good portion of the class is in a major where future coding will be done, specifically in languages other than Python. Mathematics uses MATLAB, Neuroscience and Statistics take R classes among other languages, and Computer Science classes seem to do about everything. 

# Final Conclusion
As COMP 110 is an introductory course in programming meant to build a solid background in coding, it needs to do a good job preparing students for all languages, even those outside of Python. Although I understand that if you want to learn more languages, you can take other courses to build your coding repertoire, I still believe it would be extremely valuable for COMP 110 to explicitly mention more often the core syntax and ideas of the course, as compared and contrasted to other major programming languages. With the difficulty of coding courses, specifically Computer Science, this could be a good step to better preparing UNC students for future programming courses.

# Trade-offs, costs, stakeholders
I believe the trade-offs of this are to reduce the ability to focus as in depth on Python and as a result being more broad. However, I am not suggesting too great of a focus on other languages, just a few more examples here and there of other languages and how they contrast Python. Another option would be to have examples in the website as optional viewing that follow along with teh class and show examples in other languages, such as syntax differences between Python and R, or for..in loops in C++ to reduce the affect on class time. All in all, this serves to benefit those continuing in future coding classes without affecting the other students very much.