For this analysis, I was asked by Maria to help her in analysing school district data. Basically, I had to extract meaningful information from the data that would help the school board in making informed decisions regarding funds allotment. The data contained information about 39,170 students and 15 schools. I used Pandas Library in Pyhton to perform my analysis and calculations. I generated school district summary, high and low performing schools, average math and reading scores by grade, grouped scores by school spending per student and by school size. Later, Maria was informed by the school board that for 9th grade students of THS (Tomsas High School) there is evidence which suggests academic dishonesty. Thus, I was asked to replace all math and reading score values for those particular students with NaNs. After doing that, I performed all the previously mentioned calculations again. And finally, I was aked to write a report on how these changes affected the overall analysis.
- By looking at the disctict summary tables below, we can clearly see that Average Math Score and percentages decreased after removing Thomas High School ninth graders. The decrease was very little but still it was there.
- For the school summary tables, it can be seen that except for Average reading scores, all values decreased. Please see the images below.
- From the images below, we can see that the postion of THS is number 2 in both cases. So relative to other schools, their performance did not change after removing ninth graders.
- The only thing which changed for Math and reading scores by grade is that now there are NaN values for ninth graders of THS.
- For the spending ranges, only a very minute difference is seen and that too only when the columns are not formatted. After formatting the columns, I get the same values for both tables. This can be seen in the two images below:
- We see a similar result for scores by school size and school type as seen below:
In order to differentiate between the two results, I added a 'new' to the names of tables which had ninth grade students from THS removed from them.
Following are the four changes which can be clearly seen from the analysis:
- For district_summary_df, Average Math Score, % Passing Math, % Passing Reading and % Overall Passing decreased a bit while Average Reading Score remained same
- For the school summary tables, Average Math Score, % Passing Math, % Passing Reading and % Overall Passing decreased while Average Reading Score increased a bit
- One change which can be seen in math and reading scores by grade is that ninth grade values for THS have been replaces by NaNs.
- For the spending ranges, only a very minute difference is seen and that too only when the columns are not formatted. After formatting the columns, I get the same values for both tables.