# PyCitySchools

## Objective

To analyse the district-wide standardised test results for the math and reading units by aggregating the data and showcasing obvious trends in school performance.

Math and reading scores of every student, as well as various information about the schools they attend were provided.

The following report includes: 

- District Summary
- School Summary
- Top Performing Schools (By % Overall Passing)
- Bottom Performing Schools (By % Overall Passing)
- Math Scores by Grade
- Reading Scores by Grade
- Scores by School Spending
- Scores by School Size
- Scores by School Type

NOTE: This assignment covers the pandas library. The tasks here presented demonstrates part of my skills and knowledge regarding Python and Pandas and cover (but are not limited to):

- Reading csv files
- Printing first and last rows of a data frame.
- Creating dataframes.
- Math and statistical operations with series and dataframes.
- Group by and set_index function.
- Renaming columns.
- Merge, concat and dropping.
- Filtering and binning.
- Copying and Formatting.

In [1]:
# Importin pandas library
import pandas as pd

In [2]:
# Loading data and storing into pandas data frames
school_df = pd.read_csv("Resources/schools_complete.csv")
student_df = pd.read_csv("Resources/students_complete.csv")

# Printing first 5 rows of both dataframes
print("--------------- School Dataframe ---------------")
print(school_df.head())
print("--------------- Student Dataframe ---------------")
print(student_df.head())

--------------- School Dataframe ---------------
   School ID            school_name      type  size   budget
0          0      Huang High School  District  2917  1910635
1          1   Figueroa High School  District  2949  1884411
2          2    Shelton High School   Charter  1761  1056600
3          3  Hernandez High School  District  4635  3022020
4          4    Griffin High School   Charter  1468   917500
--------------- Student Dataframe ---------------
   Student ID       student_name gender grade        school_name  \
0           0       Paul Bradley      M   9th  Huang High School   
1           1       Victor Smith      M  12th  Huang High School   
2           2    Kevin Rodriguez      M  12th  Huang High School   
3           3  Dr. Richard Scott      M  12th  Huang High School   
4           4         Bonnie Ray      F   9th  Huang High School   

   reading_score  math_score  
0             66          79  
1             94          61  
2             90          60  
3 

## District Summary

In [3]:
# Assembling the District Summary dataframe
district_summary = pd.DataFrame({
    "Total Schools": [school_df["school_name"].count()],
    "Total Students": [school_df["size"].sum()],
    "Total Budget": [school_df['budget'].sum()],
    "Average Math Score": [student_df['math_score'].mean()],
    "Average Reading Score": [student_df['reading_score'].mean()],
    "% Passing Math": [student_df["Student ID"][student_df["math_score"] >= 70].count()/student_df["Student ID"].count()*100],
    "% Passing Reading": [student_df["Student ID"][student_df["reading_score"] >= 70].count()/student_df["Student ID"].count()*100],
    "% Overall Passing": [student_df["Student ID"][(student_df["reading_score"] >= 70) & (student_df["math_score"] >= 70)].count()/student_df["Student ID"].count()*100]
})

#Applying Format
district_summary_format = district_summary.copy()
district_summary_format["Total Budget"] = district_summary_format["Total Budget"].map("${:,.2f}".format)
district_summary_format["Average Math Score"] = district_summary_format["Average Math Score"].map("{:,.3f}".format)
district_summary_format["Average Reading Score"] = district_summary_format["Average Reading Score"].map("{:,.3f}".format)
district_summary_format["% Passing Math"] = district_summary_format["% Passing Math"].map("{:,.2f}%".format)
district_summary_format["% Passing Reading"] = district_summary_format["% Passing Reading"].map("{:,.2f}%".format)
district_summary_format["% Overall Passing"] = district_summary_format["% Overall Passing"].map("{:,.2f}%".format)
district_summary_format

Unnamed: 0,Total Schools,Total Students,Total Budget,Average Math Score,Average Reading Score,% Passing Math,% Passing Reading,% Overall Passing
0,15,39170,"$24,649,428.00",78.985,81.878,74.98%,85.81%,65.17%


## School Summary

In [4]:
# Putting together the School Summary

# Calculating the average Reading and Math score per school
student_school_df = student_df.groupby("school_name").mean()
student_school_df = student_school_df.rename(columns = {
    "reading_score": "Average Reading Score",
    "math_score": "Average Math Score"
})

# Calculating % of passing math, reading and overall passing by counting the number of students that passed on each school
student_school_df["% Passing Math"] = student_df["school_name"][student_df["math_score"] >= 70].value_counts()/student_df["school_name"].value_counts()*100
student_school_df["% Passing Reading"] = student_df["school_name"][student_df["reading_score"] >= 70].value_counts()/student_df["school_name"].value_counts()*100
student_school_df["% Overall Passing"] = student_df["school_name"][(student_df["math_score"] >= 70) & (student_df["reading_score"] >= 70)].value_counts()/student_df["school_name"].value_counts()*100

# Assembling School Summary with information from the school dataframe
school_summary = pd.DataFrame({
    "School Name": school_df["school_name"],
    "School Type": school_df["type"],
    "Total Students": school_df["size"],
    "Total School Budget": school_df["budget"],
    "Per Student Budget": school_df["budget"]/school_df["size"]
})

# Merging % and average of passing students per school with the School Summary dataframe
school_summary = school_summary.merge(student_school_df, left_on='School Name', right_on='school_name', how="outer")

# Setting index to School Name and rearranging/dropping columns
school_summary = school_summary.set_index('School Name').sort_values(by=["School Name"])
school_summary = school_summary[["School Type", "Total Students", "Total School Budget", "Per Student Budget", "Average Math Score",
                                 "Average Reading Score", "% Passing Math", "% Passing Reading", "% Overall Passing"]]

#Applying Format
school_summary_format = school_summary.copy()
school_summary_format["Total School Budget"] = school_summary_format["Total School Budget"].map("${:,.2f}".format)
school_summary_format["Per Student Budget"] = school_summary_format["Per Student Budget"].map("${:,.2f}".format)
school_summary_format["Average Math Score"] = school_summary_format["Average Math Score"].map("{:,.3f}".format)
school_summary_format["Average Reading Score"] = school_summary_format["Average Reading Score"].map("{:,.3f}".format)
school_summary_format["% Passing Math"] = school_summary_format["% Passing Math"].map("{:,.2f}%".format)
school_summary_format["% Passing Reading"] = school_summary_format["% Passing Reading"].map("{:,.2f}%".format)
school_summary_format["% Overall Passing"] = school_summary_format["% Overall Passing"].map("{:,.2f}%".format)
school_summary_format


Unnamed: 0_level_0,School Type,Total Students,Total School Budget,Per Student Budget,Average Math Score,Average Reading Score,% Passing Math,% Passing Reading,% Overall Passing
School Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
Bailey High School,District,4976,"$3,124,928.00",$628.00,77.048,81.034,66.68%,81.93%,54.64%
Cabrera High School,Charter,1858,"$1,081,356.00",$582.00,83.062,83.976,94.13%,97.04%,91.33%
Figueroa High School,District,2949,"$1,884,411.00",$639.00,76.712,81.158,65.99%,80.74%,53.20%
Ford High School,District,2739,"$1,763,916.00",$644.00,77.103,80.746,68.31%,79.30%,54.29%
Griffin High School,Charter,1468,"$917,500.00",$625.00,83.351,83.817,93.39%,97.14%,90.60%
Hernandez High School,District,4635,"$3,022,020.00",$652.00,77.29,80.934,66.75%,80.86%,53.53%
Holden High School,Charter,427,"$248,087.00",$581.00,83.803,83.815,92.51%,96.25%,89.23%
Huang High School,District,2917,"$1,910,635.00",$655.00,76.629,81.183,65.68%,81.32%,53.51%
Johnson High School,District,4761,"$3,094,650.00",$650.00,77.072,80.966,66.06%,81.22%,53.54%
Pena High School,Charter,962,"$585,858.00",$609.00,83.84,84.045,94.59%,95.95%,90.54%


In [5]:
# Top and Bottom Performing Schools (By % Overall Passing)
school_summary_format = school_summary_format.sort_values(by=["% Overall Passing"], ascending = False)
# school_summary_format

## Top Performing Schools

In [6]:
# Top 5 schools
school_summary_format.head(5)

Unnamed: 0_level_0,School Type,Total Students,Total School Budget,Per Student Budget,Average Math Score,Average Reading Score,% Passing Math,% Passing Reading,% Overall Passing
School Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
Cabrera High School,Charter,1858,"$1,081,356.00",$582.00,83.062,83.976,94.13%,97.04%,91.33%
Thomas High School,Charter,1635,"$1,043,130.00",$638.00,83.418,83.849,93.27%,97.31%,90.95%
Griffin High School,Charter,1468,"$917,500.00",$625.00,83.351,83.817,93.39%,97.14%,90.60%
Wilson High School,Charter,2283,"$1,319,574.00",$578.00,83.274,83.989,93.87%,96.54%,90.58%
Pena High School,Charter,962,"$585,858.00",$609.00,83.84,84.045,94.59%,95.95%,90.54%


## Bottom Performing Schools

In [7]:
# Bottom 5 schools
school_summary_format.tail(5)

Unnamed: 0_level_0,School Type,Total Students,Total School Budget,Per Student Budget,Average Math Score,Average Reading Score,% Passing Math,% Passing Reading,% Overall Passing
School Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
Johnson High School,District,4761,"$3,094,650.00",$650.00,77.072,80.966,66.06%,81.22%,53.54%
Hernandez High School,District,4635,"$3,022,020.00",$652.00,77.29,80.934,66.75%,80.86%,53.53%
Huang High School,District,2917,"$1,910,635.00",$655.00,76.629,81.183,65.68%,81.32%,53.51%
Figueroa High School,District,2949,"$1,884,411.00",$639.00,76.712,81.158,65.99%,80.74%,53.20%
Rodriguez High School,District,3999,"$2,547,363.00",$637.00,76.843,80.745,66.37%,80.22%,52.99%


## Reading Scores by Grade

In [8]:
# Reading Scores by Grade

# print(student_df["grade"].value_counts())
ninth_g = student_df[student_df["grade"] == "9th"]
tenth_g = student_df[student_df["grade"] == "10th"]
eleventh_g = student_df[student_df["grade"] == "11th"]
twelfth_g = student_df[student_df["grade"] == "12th"]

ninth_g = ninth_g.groupby("school_name").mean()
tenth_g = tenth_g.groupby("school_name").mean()
eleventh_g = eleventh_g.groupby("school_name").mean()
twelfth_g = twelfth_g.groupby("school_name").mean()

ninth_reading = ninth_g["reading_score"].rename("9th")
tenth_reading = tenth_g["reading_score"].rename("10th")
eleventh_reading = eleventh_g["reading_score"].rename("11th")
twelfth_reading = twelfth_g["reading_score"].rename("12th")

reading_scores_grade = pd.concat([ninth_reading, tenth_reading, eleventh_reading, twelfth_reading], axis= 1)

#Applying Format
reading_scores_grade_format = reading_scores_grade.copy()
reading_scores_grade_format["9th"] = reading_scores_grade_format["9th"].map("{:,.3f}".format)
reading_scores_grade_format["10th"] = reading_scores_grade_format["10th"].map("{:,.3f}".format)
reading_scores_grade_format["11th"] = reading_scores_grade_format["11th"].map("{:,.3f}".format)
reading_scores_grade_format["12th"] = reading_scores_grade_format["12th"].map("{:,.3f}".format)
reading_scores_grade_format

Unnamed: 0_level_0,9th,10th,11th,12th
school_name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Bailey High School,81.303,80.907,80.946,80.912
Cabrera High School,83.676,84.253,83.788,84.288
Figueroa High School,81.199,81.409,80.64,81.385
Ford High School,80.633,81.263,80.404,80.662
Griffin High School,83.369,83.707,84.288,84.014
Hernandez High School,80.867,80.66,81.396,80.857
Holden High School,83.677,83.325,83.816,84.699
Huang High School,81.29,81.512,81.417,80.306
Johnson High School,81.261,80.773,80.616,81.228
Pena High School,83.807,83.612,84.336,84.591


## Math Scores by Grade

In [9]:
ninth_math = ninth_g["math_score"].rename("9th")
tenth_math = tenth_g["math_score"].rename("10th")
eleventh_math = eleventh_g["math_score"].rename("11th")
twelfth_math = twelfth_g["math_score"].rename("12th")

math_scores_grade = pd.concat([ninth_math, tenth_math, eleventh_math, twelfth_math], axis= 1)

#Applying Format
math_scores_grade_format = math_scores_grade.copy()
math_scores_grade_format["9th"] = math_scores_grade_format["9th"].map("{:,.3f}".format)
math_scores_grade_format["10th"] = math_scores_grade_format["10th"].map("{:,.3f}".format)
math_scores_grade_format["11th"] = math_scores_grade_format["11th"].map("{:,.3f}".format)
math_scores_grade_format["12th"] = math_scores_grade_format["12th"].map("{:,.3f}".format)
math_scores_grade_format

Unnamed: 0_level_0,9th,10th,11th,12th
school_name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Bailey High School,77.084,76.997,77.516,76.492
Cabrera High School,83.095,83.155,82.766,83.277
Figueroa High School,76.403,76.54,76.884,77.151
Ford High School,77.361,77.672,76.918,76.18
Griffin High School,82.044,84.229,83.842,83.356
Hernandez High School,77.438,77.337,77.136,77.187
Holden High School,83.787,83.43,85.0,82.855
Huang High School,77.027,75.909,76.447,77.226
Johnson High School,77.188,76.691,77.492,76.863
Pena High School,83.625,83.372,84.328,84.122


## Scores by School Spending

### Observation 1:
Based on the Spending Ranges per student, schools with a low budget per student are doing better in Math and Reading tests than schools with high budgets per student.

In [10]:
# Creating bins based on Spending Ranges (Per Student)
bins_stu = [0, 584, 629, 644, 675]
# print(bins)

bins_labels_stu = ["<$584", "$585-629", "$630-644", "$645-675"]

scores_budget = school_summary.copy()
scores_budget["Spending Ranges (Per Student)"] = pd.cut(scores_budget["Per Student Budget"], bins_stu, labels = bins_labels_stu)
scores_budget_grouped = scores_budget.groupby("Spending Ranges (Per Student)").mean().drop(labels = ["Total Students", "Total School Budget", "Per Student Budget"], axis = 1)

#Applying Format
scores_budget_grouped_format = scores_budget_grouped.copy()
scores_budget_grouped_format["Average Math Score"] = scores_budget_grouped_format["Average Math Score"].map("{:,.3f}".format)
scores_budget_grouped_format["Average Reading Score"] = scores_budget_grouped_format["Average Reading Score"].map("{:,.3f}".format)
scores_budget_grouped_format["% Passing Math"] = scores_budget_grouped_format["% Passing Math"].map("{:,.2f}%".format)
scores_budget_grouped_format["% Passing Reading"] = scores_budget_grouped_format["% Passing Reading"].map("{:,.2f}%".format)
scores_budget_grouped_format["% Overall Passing"] = scores_budget_grouped_format["% Overall Passing"].map("{:,.2f}%".format)
scores_budget_grouped_format

Unnamed: 0_level_0,Average Math Score,Average Reading Score,% Passing Math,% Passing Reading,% Overall Passing
Spending Ranges (Per Student),Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
<$584,83.455,83.934,93.46%,96.61%,90.37%
$585-629,81.9,83.155,87.13%,92.72%,81.42%
$630-644,78.519,81.624,73.48%,84.39%,62.86%
$645-675,76.997,81.028,66.16%,81.13%,53.53%


## Scores by School Size

### Observation 2:
The analysis demonstrated that Small and Medium Schools have a better performance in Math and Reading than Large Schools.

In [11]:
# Creating bins based on School Size
bins_sch = [0, 1000, 2000, 5000]
bins_labels_sch = ["Small (<1000)", "Medium (1000-2000)", "Large (2000-5000)"]

scores_size = school_summary.copy()
scores_size["School Size"] = pd.cut(scores_size["Total Students"], bins_sch, labels = bins_labels_sch)
scores_size_grouped = scores_size.groupby("School Size").mean().drop(labels=["Total Students", "Total School Budget", "Per Student Budget"], axis = 1)

#Applying Format
scores_size_grouped_format = scores_size_grouped.copy()
scores_size_grouped_format["Average Math Score"] = scores_size_grouped_format["Average Math Score"].map("{:,.3f}".format)
scores_size_grouped_format["Average Reading Score"] = scores_size_grouped_format["Average Reading Score"].map("{:,.3f}".format)
scores_size_grouped_format["% Passing Math"] = scores_size_grouped_format["% Passing Math"].map("{:,.2f}%".format)
scores_size_grouped_format["% Passing Reading"] = scores_size_grouped_format["% Passing Reading"].map("{:,.2f}%".format)
scores_size_grouped_format["% Overall Passing"] = scores_size_grouped_format["% Overall Passing"].map("{:,.2f}%".format)
scores_size_grouped_format

Unnamed: 0_level_0,Average Math Score,Average Reading Score,% Passing Math,% Passing Reading,% Overall Passing
School Size,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Small (<1000),83.822,83.93,93.55%,96.10%,89.88%
Medium (1000-2000),83.375,83.864,93.60%,96.79%,90.62%
Large (2000-5000),77.746,81.344,69.96%,82.77%,58.29%


## Scores by School Type

### Observation 3:
Overall, Charter Schools scored higher in both Math and Reading Tests than District Schools.

In [12]:
# School Type
scores_type = school_summary.copy()
scores_type = scores_type.set_index("School Type")
scores_type_grouped = scores_type.groupby("School Type").mean().drop(labels =["Total Students", "Total School Budget", "Per Student Budget"], axis = 1)

#Applying Format
scores_type_grouped_format = scores_type_grouped.copy()
scores_type_grouped_format["Average Math Score"] = scores_type_grouped_format["Average Math Score"].map("{:,.3f}".format)
scores_type_grouped_format["Average Reading Score"] = scores_type_grouped_format["Average Reading Score"].map("{:,.3f}".format)
scores_type_grouped_format["% Passing Math"] = scores_type_grouped_format["% Passing Math"].map("{:,.2f}%".format)
scores_type_grouped_format["% Passing Reading"] = scores_type_grouped_format["% Passing Reading"].map("{:,.2f}%".format)
scores_type_grouped_format["% Overall Passing"] = scores_type_grouped_format["% Overall Passing"].map("{:,.2f}%".format)
scores_type_grouped_format

Unnamed: 0_level_0,Average Math Score,Average Reading Score,% Passing Math,% Passing Reading,% Overall Passing
School Type,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Charter,83.474,83.896,93.62%,96.59%,90.43%
District,76.957,80.967,66.55%,80.80%,53.67%


## Final Conclusion
Based on the three observations, I can conclude that students enrolled in small- to medium-size Charter schools with a low spending range per student are more likely to pass Math and Reading standardised tests.