Using Python pandas library and Jupyter Notebook to analyze the performances in various schools in the district. We are looking for possible corelations in the school budgets, school size and school type with performance. There are two sets of deliverables we are looking for:
- Replacing the ninth-grade Reading and Math scores.
- Repeating the Student District Analysis, consisting of:
a. District Summary
b. School Summary
c. Top and Bottom 5 performing schools (based on overall passing rates)
d. Average Math score for each grade level for each school
e. Average Reading score for each grade level for each school
f. Score by
i. School Budgets per Student per school
ii. School Size
iii. School Type
After removing incorrect prefixes and suffixes like 'Mr., Dr., PhD' etc, our student dataframe looked like this:
In our challenge, we go a step further and replace all the scores for Thomas High School Grade 9 with NaN (Not A Number) using the .loc() function. The resulting student dataframe looks like this:
Notice that only the scores for 9th grade students from Thomas High School have been removed from the total analysis.
Now that we have a modified dataset to analyze, let us go through each step of the School District Analysis one more time.
Let us look at the district summary dataframes for both the original and modified analysis:
Notice the subtle differences in the average and percentage scores. While it is not much, we can see the differences made by removing the grades for 9th grade Thomas High School.
Next, let us look at the per school summary dataframes for both the original and modified analysis:
Let's take a look at Thomas High School. There is again a subtle drop in score percentages. Again, this is caused by the removal of their 9th grade scores.
When we look at the top 5 performing schools:
Since Thomas High School is second in performance, you will notice a difference in the scores and their percentages.
When we look at the bottom 5 performing schools:
Since Thomas High School is not included in this list, there is no difference in the dataframe when the analysis was redone.
- Average Math Score by Grades (per School)
Let us look at the average math scores per grade
Notice that the 9th grade math average for Thomas High School is NaN in the new dataframe.
- Average Reading Score by Grades (per School)
Let us look at the average reading scores per grade
Notice that the 9th grade reading average for Thomas High School is NaN in the new dataframe.
Let us analyze the school performance when we take the budget per student into account:
Since Thomas High School falls under the $630-$645 budget, we notice slight changes in the average scores.
More importantly, if we take a look at the performances as the budget per student increases, we actually see that the performance decreases. From our analysis, we can infer that a higher budget does not lead to better school performance.
Now let us analyze the school performance based on the size of the school :
We can see that there are some minor changes between the initial and new dataframes.
If we look at the performances as the school size increases, we notice there is no substantial difference in the performances in the small and medium bins. this could be because they are more close together in range.
However, when they are compared to the large size bin, we can see quite the drop in overall performance. From our analysis, we can infer that a larger school is more susceptible to lower performances.
Finally, let us analyze the school performance based on the type of school :
Once again, since Thomas High School is a Charter School, you will notice minute changes in the initial and new dataframes. However, it is very clear from the dataframes that Charter schools perform better than District schools
-
When we removed the scores for 9th grade at Thomas High School, we removed 1174 student grades out of a total 39170 student grades. That is roughly 3% of the total database. This may explain why we do not see a substantial difference in the original and modified dataframes.
-
We also noticed that the increase in budget per school showed a decrease in performance. This can be helpful when making budget plans for the new year.
-
We notice that larger schools have been performing poorer than those of a smaller size.
-
Lastly, we see that Charter schools perform better than District schools.
You can find the original analysis here:
original_analysis
You can also find the modified analysis here:
modified_analysis