>
***

## Report Summary

* We analysed school data from the local government area to provide evidence to support future strategy and budget prioritisation decisions.


### Trends observed in this data

* Across all curicula, we observed a negative correlation between both school funding and school size, and academic performance, such that the schools with higher average passes in maths, reading, and overall, tended to be smaller and to aportion a smaller budget per student.

* This trend appeared to be associated with the smaller size of independant versus government schools. We saw that four of the bottom five performing schools on overall academic performance were government funded schools, whereas only two of the top five performing schools on overall academic performance were government funded. 


### Conclusions

* The data suggests that for optimal academic acheivement, limiting school size should feature in the strategic plans of our school boards.





***




### Note
* Instructions have been included for each segment. You do not have to follow them exactly, but they are included to help you think through the steps.

In [52]:
# Dependencies and Setup
import pandas as pd

# File to Load (Remember to Change These)
school_data_to_load = "Resources/schools_complete.csv"
student_data_to_load = "Resources/students_complete.csv"

# Read School and Student Data File and store into Pandas DataFrames
school_data = pd.read_csv(school_data_to_load)
student_data = pd.read_csv(student_data_to_load)

# Combine the data into a single dataset.  
school_data_complete = pd.merge(student_data, school_data, how="left", on=["school_name", "school_name"])

## Local Government Area Summary

* Calculate the total number of schools

* Calculate the total number of students

* Calculate the total budget

* Calculate the average maths score 

* Calculate the average reading score

* Calculate the percentage of students with a passing maths score (50 or greater)

* Calculate the percentage of students with a passing reading score (50 or greater)

* Calculate the percentage of students who passed maths **and** reading (% Overall Passing)

* Create a dataframe to hold the above results

* Optional: give the displayed data cleaner formatting

In [53]:
total_number_of_schools = len(school_data_complete["School ID"].unique())
total_number_of_schools

15

In [54]:
total_number_of_students = len(school_data_complete["Student ID"].unique())
total_number_of_students

39170

In [55]:
total_school_budget = school_data["budget"].sum()
total_school_budget

24649428

In [56]:
per_student_budget = total_school_budget / total_number_of_students
per_student_budget

629.2935409752362

In [57]:
average_maths_score = school_data_complete["maths_score"].mean()
average_maths_score

70.33819249425581

In [58]:
average_reading_score = school_data_complete["reading_score"].mean()
average_reading_score

69.98013786060761

In [59]:
percentage_pass_maths = len(school_data_complete.loc[school_data_complete["maths_score"]>50, :]) \
/ total_number_of_students * 100
percentage_pass_maths

84.4319632371713

In [60]:
percentage_pass_reading = len(school_data_complete.loc[school_data_complete["reading_score"]>50, :]) \
/ total_number_of_students * 100
percentage_pass_reading

82.644881286699

In [61]:
percentage_pass_overall = len(school_data_complete.loc[(school_data_complete["maths_score"]>50) & (school_data_complete["reading_score"]>50), :]) \
/ total_number_of_students * 100
percentage_pass_overall

69.94383456727087

In [62]:
# merge all variables into the 'area_summary' dataframe and tidy up
area_summary = pd.DataFrame({
                                            "Total Number of Schools": [total_number_of_schools],
                                             "Total Number of Students": [total_number_of_students],
                                             "Total Schools Budget": [total_school_budget],
                                             "Per Student Budget": [per_student_budget],
                                             "Average Maths Score": [average_maths_score],
                                             "Average Reading Score": [average_reading_score],
                                             "Percent Passing Maths": [percentage_pass_maths],
                                             "Percent Passing Reading": [percentage_pass_reading],
                                             "Percent Passing Overall": [percentage_pass_overall]})
area_summary["Total Number of Students"] = area_summary["Total Number of Students"].map("{:,}".format)
area_summary["Total Schools Budget"] = area_summary["Total Schools Budget"].map("${:,.2f}".format)
area_summary["Per Student Budget"] = area_summary["Per Student Budget"].map("${:,.2f}".format)
area_summary["Average Maths Score"] = area_summary["Average Maths Score"].round(2)
area_summary["Average Reading Score"] = area_summary["Average Reading Score"].round(2)
area_summary["Percent Passing Maths"] = area_summary["Percent Passing Maths"].map("{:.1f}%".format)
area_summary["Percent Passing Reading"] = area_summary["Percent Passing Reading"].map("{:.1f}%".format)
area_summary["Percent Passing Overall"] = area_summary["Percent Passing Overall"].map("{:.1f}%".format)
area_summary

Unnamed: 0,Total Number of Schools,Total Number of Students,Total Schools Budget,Per Student Budget,Average Maths Score,Average Reading Score,Percent Passing Maths,Percent Passing Reading,Percent Passing Overall
0,15,39170,"$24,649,428.00",$629.29,70.34,69.98,84.4%,82.6%,69.9%


## School Summary

* Create an overview table that summarises key metrics about each school, including:
  * School Name
  * School Type
  * Total Students
  * Total School Budget
  * Per Student Budget
  * Average Maths Score
  * Average Reading Score
  * % Passing Maths
  * % Passing Reading
  * % Overall Passing (The percentage of students that passed maths **and** reading.)
  
* Create a dataframe to hold the above results

In [63]:
count_students_by_school = school_data_complete[["school_name", "Student ID"]].groupby("school_name", as_index = False).count()
count_students_by_school = count_students_by_school.rename(columns={"school_name": "school_name", "Student ID": "student_count"})
count_students_by_school

Unnamed: 0,school_name,student_count
0,Bailey High School,4976
1,Cabrera High School,1858
2,Figueroa High School,2949
3,Ford High School,2739
4,Griffin High School,1468
5,Hernandez High School,4635
6,Holden High School,427
7,Huang High School,2917
8,Johnson High School,4761
9,Pena High School,962


In [64]:
school_data_complete["maths_pass"] = school_data_complete["maths_score"] >= 50
school_data_complete["read_pass"] = school_data_complete["reading_score"] >= 50
school_data_complete["overall_pass"] = (school_data_complete["reading_score"] >= 50) & (school_data_complete["maths_score"] >= 50)
subject_pass_by_school = school_data_complete[["school_name", "maths_pass", "read_pass", "overall_pass"]].groupby("school_name", as_index = False).sum()
subject_pass_by_school.head()

Unnamed: 0,school_name,maths_pass,read_pass,overall_pass
0,Bailey High School,4560,4348,3985
1,Cabrera High School,1688,1655,1501
2,Figueroa High School,2408,2442,1995
3,Ford High School,2258,2252,1848
4,Griffin High School,1339,1299,1194


In [65]:
percent_pass_by_school = pd.DataFrame({"school_name": subject_pass_by_school["school_name"],
                                       "Percent Maths Pass": subject_pass_by_school["maths_pass"] / count_students_by_school["student_count"] * 100,
                                       "Percent Read Pass": subject_pass_by_school["read_pass"] / count_students_by_school["student_count"] * 100,
                                       "Percent Overall Pass": subject_pass_by_school["overall_pass"] / count_students_by_school["student_count"] * 100
                                       })
percent_pass_by_school


Unnamed: 0,school_name,Percent Maths Pass,Percent Read Pass,Percent Overall Pass
0,Bailey High School,91.639871,87.379421,80.084405
1,Cabrera High School,90.850377,89.074273,80.785791
2,Figueroa High School,81.654798,82.807731,67.650051
3,Ford High School,82.438846,82.219788,67.46988
4,Griffin High School,91.212534,88.487738,81.33515
5,Hernandez High School,80.949299,81.877023,66.364617
6,Holden High School,89.929742,88.52459,78.922717
7,Huang High School,81.693521,81.453548,66.712376
8,Johnson High School,82.062592,81.978576,67.191766
9,Pena High School,91.683992,86.590437,79.209979


In [66]:
mean_scores_by_school = school_data_complete[["school_name","reading_score","maths_score"]].groupby("school_name", as_index = False).mean()
mean_scores_by_school

Unnamed: 0,school_name,reading_score,maths_score
0,Bailey High School,71.008842,72.352894
1,Cabrera High School,71.359526,71.657158
2,Figueroa High School,69.077993,68.698542
3,Ford High School,69.572472,69.091274
4,Griffin High School,71.245232,71.788147
5,Hernandez High School,69.186408,68.874865
6,Holden High School,71.660422,72.583138
7,Huang High School,68.910525,68.935207
8,Johnson High School,69.039277,68.8431
9,Pena High School,71.613306,72.088358


In [67]:
#Merge all school summary data into 'per_school_summary' dataframe and tidy up 
per_school_summary = pd.merge(school_data, count_students_by_school, how="left", on=["school_name"])
per_school_summary = pd.merge(per_school_summary, percent_pass_by_school, how="left", on=["school_name"])
per_school_summary = pd.merge(per_school_summary, mean_scores_by_school, how="left", on=["school_name"])
per_school_summary["per_student_budget"] = per_school_summary["budget"] / per_school_summary["student_count"]
budget_per_student = per_school_summary["budget"] / per_school_summary["student_count"]
del per_school_summary["School ID"]
del per_school_summary["size"]
per_school_summary = per_school_summary[["school_name", "type", "student_count", "budget", "per_student_budget", "maths_score", "reading_score", \
                                         "Percent Maths Pass", "Percent Read Pass", "Percent Overall Pass"]]
per_school_summary = per_school_summary.rename(columns={"school_name":"School Name", "type":"School Type", "student_count":"Total Students", \
                                                        "budget":"Total School Budget", "per_student_budget":"Per Student Budget", "maths_score":"Average Maths Score", \
                                                        "reading_score":"Average Reading Score", "Percent Maths Pass":"% Passing Maths", \
                                                        "Percent Read Pass":"% Passing Reading", "Percent Overall Pass":"% Passing Overall"})
#per_school_summary["Total Students"] = per_school_summary["Total Students"].map("{:,}".format)
per_school_summary["Total School Budget"] = per_school_summary["Total School Budget"].map("${:,.2f}".format)
per_school_summary["Per Student Budget"] = per_school_summary["Per Student Budget"].map("${:,.2f}".format)
per_school_summary["Average Maths Score"] = per_school_summary["Average Maths Score"].round(2)
per_school_summary["Average Reading Score"] = per_school_summary["Average Reading Score"].round(2)
per_school_summary["% Passing Maths"] = per_school_summary["% Passing Maths"].round(2)
per_school_summary["% Passing Reading"] = per_school_summary["% Passing Reading"].round(2)
per_school_summary["% Passing Overall"] = per_school_summary["% Passing Overall"].round(2)
per_school_summary

Unnamed: 0,School Name,School Type,Total Students,Total School Budget,Per Student Budget,Average Maths Score,Average Reading Score,% Passing Maths,% Passing Reading,% Passing Overall
0,Huang High School,Government,2917,"$1,910,635.00",$655.00,68.94,68.91,81.69,81.45,66.71
1,Figueroa High School,Government,2949,"$1,884,411.00",$639.00,68.7,69.08,81.65,82.81,67.65
2,Shelton High School,Independent,1761,"$1,056,600.00",$600.00,72.03,70.26,91.54,86.71,78.88
3,Hernandez High School,Government,4635,"$3,022,020.00",$652.00,68.87,69.19,80.95,81.88,66.36
4,Griffin High School,Independent,1468,"$917,500.00",$625.00,71.79,71.25,91.21,88.49,81.34
5,Wilson High School,Independent,2283,"$1,319,574.00",$578.00,69.17,68.88,82.79,81.3,67.46
6,Cabrera High School,Independent,1858,"$1,081,356.00",$582.00,71.66,71.36,90.85,89.07,80.79
7,Bailey High School,Government,4976,"$3,124,928.00",$628.00,72.35,71.01,91.64,87.38,80.08
8,Holden High School,Independent,427,"$248,087.00",$581.00,72.58,71.66,89.93,88.52,78.92
9,Pena High School,Independent,962,"$585,858.00",$609.00,72.09,71.61,91.68,86.59,79.21


## Top Performing Schools (By % Overall Passing)

* Sort and display the top five performing schools by % overall passing.

In [71]:
top_schools = percent_pass_by_school[["school_name","Percent Overall Pass"]].sort_values("Percent Overall Pass", ascending=False)
top_schools = top_schools.reset_index(drop=True)
top_schools.rename(columns={"school_name": "School Name", "Percent Overall Pass": "% Overall Passing"}, inplace = True)
top_schools["% Overall Passing"] = top_schools["% Overall Passing"].round(2)
top_schools.head()

Unnamed: 0,School Name,% Overall Passing
0,Griffin High School,81.34
1,Cabrera High School,80.79
2,Bailey High School,80.08
3,Wright High School,79.72
4,Rodriguez High School,79.42


## Bottom Performing Schools (By % Overall Passing)

* Sort and display the five worst-performing schools by % overall passing.

In [70]:
bottom_schools = percent_pass_by_school[["school_name", "Percent Overall Pass"]].sort_values("Percent Overall Pass", ascending=True)
bottom_schools = bottom_schools.reset_index(drop=True)
bottom_schools.rename(columns={"school_name": "School Name", "Percent Overall Pass": "% Overall Passing"}, inplace = True)
bottom_schools["% Overall Passing"] = bottom_schools["% Overall Passing"].round(2)
bottom_schools.head()

Unnamed: 0,School Name,% Overall Passing
0,Hernandez High School,66.36
1,Huang High School,66.71
2,Johnson High School,67.19
3,Wilson High School,67.46
4,Ford High School,67.47


## Maths Scores by Year

* Create a table that lists the average maths score for students of each year level (9, 10, 11, 12) at each school.

  * Create a pandas series for each year. Hint: use a conditional statement.
  
  * Group each series by school
  
  * Combine the series into a dataframe
  
  * Optional: give the displayed data cleaner formatting

In [73]:
school_data_year_9 = school_data_complete.loc[school_data_complete["year"] == 9, :].groupby("school_name", as_index = False).mean()
school_data_year_9.drop(['Student ID', 'year', 'reading_score', 'School ID', 'size', 'budget', 'maths_pass', 'read_pass', 'overall_pass'], axis=1, inplace=True) 
school_data_year_9 = school_data_year_9.rename(columns={"maths_score": "maths_av_year_9"})
school_data_year_9

Unnamed: 0,school_name,maths_av_year_9
0,Bailey High School,72.493827
1,Cabrera High School,72.32197
2,Figueroa High School,68.477804
3,Ford High School,69.021609
4,Griffin High School,72.789731
5,Hernandez High School,68.586831
6,Holden High School,70.543307
7,Huang High School,69.081754
8,Johnson High School,69.469286
9,Pena High School,71.996364


In [74]:
school_data_year_10 = school_data_complete.loc[school_data_complete["year"] == 10, :].groupby("school_name", as_index = False).mean()
school_data_year_10.drop(['Student ID', 'year', 'reading_score', 'School ID', 'size', 'budget', 'maths_pass', 'read_pass', 'overall_pass'], axis=1, inplace=True) 
school_data_year_10 = school_data_year_10.rename(columns={"maths_score": "maths_av_year_10"})
school_data_year_10

Unnamed: 0,school_name,maths_av_year_10
0,Bailey High School,71.897498
1,Cabrera High School,72.437768
2,Figueroa High School,68.331586
3,Ford High School,69.387006
4,Griffin High School,71.093596
5,Hernandez High School,68.867156
6,Holden High School,75.105263
7,Huang High School,68.533246
8,Johnson High School,67.99022
9,Pena High School,72.396


In [75]:
school_data_year_11 = school_data_complete.loc[school_data_complete["year"] == 11, :].groupby("school_name", as_index = False).mean()
school_data_year_11.drop(['Student ID', 'year', 'reading_score', 'School ID', 'size', 'budget', 'maths_pass', 'read_pass', 'overall_pass'], axis=1, inplace=True) 
school_data_year_11 = school_data_year_11.rename(columns={"maths_score": "maths_av_year_11"})
school_data_year_11

Unnamed: 0,school_name,maths_av_year_11
0,Bailey High School,72.3749
1,Cabrera High School,71.008299
2,Figueroa High School,68.811001
3,Ford High School,69.248862
4,Griffin High School,71.692521
5,Hernandez High School,69.154412
6,Holden High School,71.640777
7,Huang High School,69.431345
8,Johnson High School,68.63773
9,Pena High School,72.523438


In [76]:
school_data_year_12 = school_data_complete.loc[school_data_complete["year"] == 12, :].groupby("school_name", as_index = False).mean()
school_data_year_12.drop(['Student ID', 'year', 'reading_score', 'School ID', 'size', 'budget', 'maths_pass', 'read_pass', 'overall_pass'], axis=1, inplace=True) 
school_data_year_12 = school_data_year_12.rename(columns={"maths_score": "maths_av_year_12"})
school_data_year_12

Unnamed: 0,school_name,maths_av_year_12
0,Bailey High School,72.675097
1,Cabrera High School,70.604712
2,Figueroa High School,69.325282
3,Ford High School,68.617811
4,Griffin High School,71.469178
5,Hernandez High School,68.985075
6,Holden High School,73.409639
7,Huang High School,68.639316
8,Johnson High School,69.287393
9,Pena High School,71.187845


In [77]:
maths_score_by_year = pd.merge(school_data_year_9, school_data_year_10, how="left", on=["school_name"])
maths_score_by_year = pd.merge(maths_score_by_year, school_data_year_11, how="left", on=["school_name"])
maths_score_by_year = pd.merge(maths_score_by_year, school_data_year_12, how="left", on=["school_name"])
maths_score_by_year = maths_score_by_year.rename(columns={"school_name":"School Name", "maths_av_year_9":"Maths Average Year 9", \
                                                          "maths_av_year_10":"Maths Average Year 10", "maths_av_year_11":"Maths Average Year 11", "maths_av_year_12":"Maths Average Year 12"})
maths_score_by_year["Maths Average Year 9"] = maths_score_by_year["Maths Average Year 9"].map("{:.1f}%".format)
maths_score_by_year["Maths Average Year 10"] = maths_score_by_year["Maths Average Year 10"].map("{:.1f}%".format)
maths_score_by_year["Maths Average Year 11"] = maths_score_by_year["Maths Average Year 11"].map("{:.1f}%".format)
maths_score_by_year["Maths Average Year 12"] = maths_score_by_year["Maths Average Year 12"].map("{:.1f}%".format)
maths_score_by_year


Unnamed: 0,School Name,Maths Average Year 9,Maths Average Year 10,Maths Average Year 11,Maths Average Year 12
0,Bailey High School,72.5%,71.9%,72.4%,72.7%
1,Cabrera High School,72.3%,72.4%,71.0%,70.6%
2,Figueroa High School,68.5%,68.3%,68.8%,69.3%
3,Ford High School,69.0%,69.4%,69.2%,68.6%
4,Griffin High School,72.8%,71.1%,71.7%,71.5%
5,Hernandez High School,68.6%,68.9%,69.2%,69.0%
6,Holden High School,70.5%,75.1%,71.6%,73.4%
7,Huang High School,69.1%,68.5%,69.4%,68.6%
8,Johnson High School,69.5%,68.0%,68.6%,69.3%
9,Pena High School,72.0%,72.4%,72.5%,71.2%


## Reading Score by Year

* Perform the same operations as above for reading scores

In [78]:
read_school_data_year_9 = school_data_complete.loc[school_data_complete["year"] == 9, :].groupby("school_name", as_index = False).mean()
read_school_data_year_9.drop(['Student ID', 'year', 'maths_score', 'School ID', 'size', 'budget', 'maths_pass', 'read_pass', 'overall_pass'], axis=1, inplace=True) 
read_school_data_year_9 = read_school_data_year_9.rename(columns={"reading_score": "reading_av_year_9"})
read_school_data_year_9

Unnamed: 0,school_name,reading_av_year_9
0,Bailey High School,70.90192
1,Cabrera High School,71.172348
2,Figueroa High School,70.261682
3,Ford High School,69.615846
4,Griffin High School,72.026895
5,Hernandez High School,68.477569
6,Holden High School,71.598425
7,Huang High School,68.670616
8,Johnson High School,68.719286
9,Pena High School,70.949091


In [79]:
read_school_data_year_10 = school_data_complete.loc[school_data_complete["year"] == 9, :].groupby("school_name", as_index = False).mean()
read_school_data_year_10.drop(['Student ID', 'year', 'maths_score', 'School ID', 'size', 'budget', 'maths_pass', 'read_pass', 'overall_pass'], axis=1, inplace=True) 
read_school_data_year_10 = read_school_data_year_10.rename(columns={"reading_score": "reading_av_year_10"})
read_school_data_year_10

Unnamed: 0,school_name,reading_av_year_10
0,Bailey High School,70.90192
1,Cabrera High School,71.172348
2,Figueroa High School,70.261682
3,Ford High School,69.615846
4,Griffin High School,72.026895
5,Hernandez High School,68.477569
6,Holden High School,71.598425
7,Huang High School,68.670616
8,Johnson High School,68.719286
9,Pena High School,70.949091


## Scores by School Spending

In [80]:
read_school_data_year_11 = school_data_complete.loc[school_data_complete["year"] == 9, :].groupby("school_name", as_index = False).mean()
read_school_data_year_11.drop(['Student ID', 'year', 'maths_score', 'School ID', 'size', 'budget', 'maths_pass', 'read_pass', 'overall_pass'], axis=1, inplace=True) 
read_school_data_year_11 = read_school_data_year_11.rename(columns={"reading_score": "reading_av_year_11"})
read_school_data_year_11

Unnamed: 0,school_name,reading_av_year_11
0,Bailey High School,70.90192
1,Cabrera High School,71.172348
2,Figueroa High School,70.261682
3,Ford High School,69.615846
4,Griffin High School,72.026895
5,Hernandez High School,68.477569
6,Holden High School,71.598425
7,Huang High School,68.670616
8,Johnson High School,68.719286
9,Pena High School,70.949091


In [81]:
read_school_data_year_12 = school_data_complete.loc[school_data_complete["year"] == 9, :].groupby("school_name", as_index = False).mean()
read_school_data_year_12.drop(['Student ID', 'year', 'maths_score', 'School ID', 'size', 'budget', 'maths_pass', 'read_pass', 'overall_pass'], axis=1, inplace=True) 
read_school_data_year_12 = read_school_data_year_12.rename(columns={"reading_score": "reading_av_year_12"})
read_school_data_year_12

Unnamed: 0,school_name,reading_av_year_12
0,Bailey High School,70.90192
1,Cabrera High School,71.172348
2,Figueroa High School,70.261682
3,Ford High School,69.615846
4,Griffin High School,72.026895
5,Hernandez High School,68.477569
6,Holden High School,71.598425
7,Huang High School,68.670616
8,Johnson High School,68.719286
9,Pena High School,70.949091


In [82]:
read_score_by_year = pd.merge(read_school_data_year_9, read_school_data_year_10, how="left", on=["school_name"])
read_score_by_year = pd.merge(read_score_by_year, read_school_data_year_11, how="left", on=["school_name"])
read_score_by_year = pd.merge(read_score_by_year, read_school_data_year_12, how="left", on=["school_name"])
read_score_by_year = read_score_by_year.rename(columns={"school_name":"School Name", "reading_av_year_9":"Reading Average Year 9", \
                                                        "reading_av_year_10":"Reading Average Year 10", "reading_av_year_11":"Reading Average Year 11", "reading_av_year_12":"Reading Average Year 12"})
read_score_by_year["Reading Average Year 9"] = read_score_by_year["Reading Average Year 9"].map("{:.1f}%".format)
read_score_by_year["Reading Average Year 10"] = read_score_by_year["Reading Average Year 10"].map("{:.1f}%".format)
read_score_by_year["Reading Average Year 11"] = read_score_by_year["Reading Average Year 11"].map("{:.1f}%".format)
read_score_by_year["Reading Average Year 12"] = read_score_by_year["Reading Average Year 12"].map("{:.1f}%".format)
read_score_by_year

Unnamed: 0,School Name,Reading Average Year 9,Reading Average Year 10,Reading Average Year 11,Reading Average Year 12
0,Bailey High School,70.9%,70.9%,70.9%,70.9%
1,Cabrera High School,71.2%,71.2%,71.2%,71.2%
2,Figueroa High School,70.3%,70.3%,70.3%,70.3%
3,Ford High School,69.6%,69.6%,69.6%,69.6%
4,Griffin High School,72.0%,72.0%,72.0%,72.0%
5,Hernandez High School,68.5%,68.5%,68.5%,68.5%
6,Holden High School,71.6%,71.6%,71.6%,71.6%
7,Huang High School,68.7%,68.7%,68.7%,68.7%
8,Johnson High School,68.7%,68.7%,68.7%,68.7%
9,Pena High School,70.9%,70.9%,70.9%,70.9%


* Create a table that breaks down school performances based on average Spending Ranges (Per Student). Use 4 reasonable bins to group school spending. Include in the table each of the following:
  * Average Maths Score
  * Average Reading Score
  * % Passing Maths
  * % Passing Reading
  * Overall Passing Rate (Average of the above two)

In [84]:
spending_bins = [0, 585, 630, 645, 680]
labels = ["<$585", "$585-630", "$630-645", "$645-680"]
scores_by_spending = pd.cut(budget_per_student, spending_bins, labels=labels, include_lowest=True)
school_spending_df = per_school_summary.copy()
school_spending_df["Spending Ranges (Per Student)"] = scores_by_spending
spending_maths_scores = school_spending_df.groupby(["Spending Ranges (Per Student)"]).mean()["Average Maths Score"].round(2)
spending_reading_scores = school_spending_df.groupby(["Spending Ranges (Per Student)"]).mean()["Average Reading Score"].round(2)
spending_passing_maths = school_spending_df.groupby(["Spending Ranges (Per Student)"]).mean()["% Passing Maths"].round(2)
spending_passing_reading = school_spending_df.groupby(["Spending Ranges (Per Student)"]).mean()["% Passing Reading"].round(2)
overall_passing_spending = school_spending_df.groupby(["Spending Ranges (Per Student)"]).mean()["% Passing Overall"].round(2)
spending_summary = pd.DataFrame({"Average Maths Score": spending_maths_scores, 
                                         "Average Reading Score": spending_reading_scores,
                                         "% Passing Maths": spending_passing_maths,
                                         "% Passing Reading": spending_passing_reading,
                                         "Overall Passing Rate": overall_passing_spending })
spending_summary


Unnamed: 0_level_0,Average Maths Score,Average Reading Score,% Passing Maths,% Passing Reading,Overall Passing Rate
Spending Ranges (Per Student),Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
<$585,71.36,70.72,88.84,86.39,76.72
$585-630,72.06,71.03,91.52,87.29,79.88
$630-645,69.86,69.84,84.68,83.76,71.0
$645-680,68.88,69.05,81.57,81.77,66.75


## Scores by School Size

* Perform the same operations as above, based on school size.

In [87]:
size_bins = [0, 1000, 2000, 5000]
labels = ["Small (<1000)", "Medium (1000-2000)", "Large (2000-5000)"]
scores_by_size = pd.cut(per_school_summary["Total Students"], size_bins, labels=labels, include_lowest=True)
per_school_summary["School Size"] = scores_by_size
size_maths = per_school_summary.groupby(["School Size"]).mean()["Average Maths Score"].round(2)
size_reading = per_school_summary.groupby(["School Size"]).mean()["Average Reading Score"].round(2)
size_passing_maths = per_school_summary.groupby(["School Size"]).mean()["% Passing Maths"].round(2)
size_passing_reading = per_school_summary.groupby(["School Size"]).mean()["% Passing Reading"].round(2)
size_passing_overall = per_school_summary.groupby(["School Size"]).mean()["% Passing Overall"].round(2)
size_summary = pd.DataFrame({"Average Maths Score": size_maths, 
                                         "Average Reading Score": size_reading,
                                         "% Passing Maths": size_passing_maths,
                                         "% Passing Reading": size_passing_reading,
                                         "Overall Passing Rate": size_passing_overall })
size_summary

Unnamed: 0_level_0,Average Maths Score,Average Reading Score,% Passing Maths,% Passing Reading,Overall Passing Rate
School Size,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Small (<1000),72.34,71.63,90.8,87.56,79.06
Medium (1000-2000),71.42,70.72,89.85,86.71,78.04
Large (2000-5000),69.75,69.58,84.25,83.3,70.29


## Scores by School Type

* Perform the same operations as above, based on school type

In [89]:
type_maths = per_school_summary.groupby(["School Type"]).mean()["Average Maths Score"].round(2)
type_reading = per_school_summary.groupby(["School Type"]).mean()["Average Reading Score"].round(2)
type_passing_maths = per_school_summary.groupby(["School Type"]).mean()["% Passing Maths"].round(2)
type_passing_reading = per_school_summary.groupby(["School Type"]).mean()["% Passing Reading"].round(2)
type_passing_overall = per_school_summary.groupby(["School Type"]).mean()["% Passing Overall"].round(2)
type_summary = pd.DataFrame({"Average Maths Score": type_maths, 
                                         "Average Reading Score": type_reading,
                                         "% Passing Maths": type_passing_maths,
                                         "% Passing Reading": type_passing_reading,
                                         "Overall Passing Rate": type_passing_overall })
type_summary

Unnamed: 0_level_0,Average Maths Score,Average Reading Score,% Passing Maths,% Passing Reading,Overall Passing Rate
School Type,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Government,69.83,69.68,84.46,83.59,70.7
Independent,71.37,70.72,89.2,86.25,76.97
