In [None]:
# PyCity Schools Data Analysis

Introduction:

There are a number of factors which have the potential to impact academic performance.  In this report we will evaluate each in turn to reveal strategies contributing to student success and failure.  It is my hope that this information will allow you to make data-driven decisions for the betterment of the education system and our entire community.   

Factors for Consideration:
    
•	Consistency across all grades (9-12)
•	Budgetary allocation (per student spend)
•	Size of School (# of students enrolled)
•	Type of School:  Charter vs District

Benchmark for Comparison: 
    
•	Standardized testing scores for Math
•	Standardized testing scores for Reading
•	Measure of Success:  % of students with passing scores in both subjects

Consistency

    *See FIGURE 1 in Images folder.
 
    CONCLUSION:  Testing scores are relatively consistent across all grades at all schools.  Factors influencing academic success are occurring at the level of school administration rather than an inconsistency of the curriculum itself.

Budgetary Allocation (per student spending)

    *See FIGURE 2 in Images folder.
 
    NOTE:  There is a spending gap of [seventy-seven dollars] per student between the school that is most well-funded and least well-funded.  Furthermore, there is an average disparity of $44.07 per student between District schools and Charter schools.

    *See FIGURE 3 in Images folder.
 
    CONCLUSION:  There is a clear and consistent negative correlation between per student spending and academic success.  According to this analysis, higher funding does not appear to benefit student success. A thorough evaluation of funding allocation within each school may help to elucidate further.

Size of School  

    *See FIGURE 4 in Images folder.
 
    CONCLUSION:  Small and medium-sized schools show consistent correlation with higher testing scores.  There is a substantial drop in academic success once the number of enrollees exceed 2000 students.  Subsequent analysis should be performed to assess class size (i.e. student to teacher ratio) for each of the schools included in this report. 

Type of School:  Charter vs. District

    *See FIGURE 5 in Images folder.

    With the exception of Wilson High School, a large charter school, significantly fewer students attending large (district) schools are seeing the same degree of academic success as their counterparts who attend charter schools.

    *See FIGURE 6 in Images folder.
 
    CONCLUSION:  Students who attend a charter school are performing significantly better on standardized tests than those who attend a district school and this success is achieved at a lower per student cost to the school district.  If neither spending nor the curriculum are contributors to this success, we must conclude that school size has considerable influence over academic performance.
  
---

In [2]:
# Dependencies
import pandas as pd
from pathlib import Path

# Store file path as variable
school = Path("schools_complete.csv")
student = Path("students_complete.csv")

# Store both csv files as DataFrame
school_df = pd.read_csv(school)
student_df = pd.read_csv(student)

# Merge DFs 
comprehensive = pd.merge(student_df, school_df, how="left", on=["school_name", "school_name"])
comprehensive.head()

Unnamed: 0,Student ID,student_name,gender,grade,school_name,reading_score,math_score,School ID,type,size,budget
0,0,Paul Bradley,M,9th,Huang High School,66,79,0,District,2917,1910635
1,1,Victor Smith,M,12th,Huang High School,94,61,0,District,2917,1910635
2,2,Kevin Rodriguez,M,12th,Huang High School,90,60,0,District,2917,1910635
3,3,Dr. Richard Scott,M,12th,Huang High School,67,58,0,District,2917,1910635
4,4,Bonnie Ray,F,9th,Huang High School,97,84,0,District,2917,1910635


## District Summary

In [3]:
# Count of unique schools
NumSchool = len(comprehensive["school_name"].unique())
NumSchool

15

In [4]:
# Count of students
SumStudent = len(comprehensive["Student ID"].unique())
SumStudent

39170

In [5]:
# Calculate total budget for all schools combined
TotBudget = comprehensive.drop_duplicates(subset=['school_name'])
SumBudget = TotBudget.budget.sum()
SumBudget

24649428

In [6]:
# average math score
MeanMath = comprehensive.math_score.mean()
MeanMath

78.98537145774827

In [7]:
# average reading score
MeanRead = comprehensive.reading_score.mean()
MeanRead

81.87784018381414

In [8]:
# % of students with passing math scores
MathPass = comprehensive[(comprehensive["math_score"] >= 70)].count()["student_name"]
MathPass_pct = MathPass / float(SumStudent) * 100
MathPass_pct

74.9808526933878

In [9]:
# % of students with passing reading scores  
ReadPass = comprehensive[(comprehensive["reading_score"] >= 70)].count()["student_name"]
ReadPass_pct = ReadPass / float(SumStudent) * 100
ReadPass_pct

85.80546336482001

In [10]:
# % of students that passed both courses
AllPass = comprehensive[(comprehensive["math_score"] >= 70) & (comprehensive["reading_score"] >= 70)].count()["student_name"]
AllPass_pct = AllPass / float(SumStudent) * 100
AllPass_pct

65.17232575950983

In [11]:
# Summary/Analysis of data set
# Create new DF using prev variables
CompSummary = pd.DataFrame({"Total Schools": [NumSchool], "Total Students": [SumStudent], 
                           "Total Budget": [SumBudget], "Avg Math Score": [MeanMath],
                           "Avg Reading Score": [MeanRead], "% Passing Math": [MathPass_pct],
                           "% Passing Reading": [ReadPass_pct], "% Overall Passing": [AllPass_pct]})

# Format values
CompSummary["Total Students"] = CompSummary["Total Students"].map("{:,}".format)
CompSummary["Total Budget"] = CompSummary["Total Budget"].map("${:,.2f}".format)

# Display the (new) summary df
CompSummary

Unnamed: 0,Total Schools,Total Students,Total Budget,Avg Math Score,Avg Reading Score,% Passing Math,% Passing Reading,% Overall Passing
0,15,39170,"$24,649,428.00",78.985371,81.87784,74.980853,85.805463,65.172326


## School Summary

In [12]:
# Use the code provided to select all of the school types
Type_ps = comprehensive[['school_name', 'type']].drop_duplicates().reset_index(drop=True)
Type_ps

Unnamed: 0,school_name,type
0,Huang High School,District
1,Figueroa High School,District
2,Shelton High School,Charter
3,Hernandez High School,District
4,Griffin High School,Charter
5,Wilson High School,Charter
6,Cabrera High School,Charter
7,Bailey High School,District
8,Holden High School,Charter
9,Pena High School,Charter


In [13]:
# Calculate the total student count per school
Students_ps = comprehensive.school_name.value_counts()
Students_ps = Students_ps.reset_index()
Students_ps.columns = ['school_name', 'Student Count']
Students_ps

Unnamed: 0,school_name,Student Count
0,Bailey High School,4976
1,Johnson High School,4761
2,Hernandez High School,4635
3,Rodriguez High School,3999
4,Figueroa High School,2949
5,Huang High School,2917
6,Ford High School,2739
7,Wilson High School,2283
8,Cabrera High School,1858
9,Wright High School,1800


In [14]:
# Calculate the total budget per school
Budget_ps = TotBudget.set_index("school_name")
Budget_ps = Budget_ps['budget']
Budget_ps = Budget_ps.reset_index()
Budget_ps

Unnamed: 0,school_name,budget
0,Huang High School,1910635
1,Figueroa High School,1884411
2,Shelton High School,1056600
3,Hernandez High School,3022020
4,Griffin High School,917500
5,Wilson High School,1319574
6,Cabrera High School,1081356
7,Bailey High School,3124928
8,Holden High School,248087
9,Pena High School,585858


In [15]:
# merge DF for budget per school and students per school
StudBud_df = pd.merge(Budget_ps, Students_ps, on="school_name")

# calculate per capita spending per school
StudBud_df["Spend per Capita"] = StudBud_df["budget"] / StudBud_df["Student Count"]
StudBud_df["Spend per Capita"] = StudBud_df["Spend per Capita"].map("${:,.2f}".format)
StudBud_df = StudBud_df.set_index("school_name")
StudBud_df = StudBud_df.drop(["budget", "Student Count"], axis=1)
StudBud_df

Unnamed: 0_level_0,Spend per Capita
school_name,Unnamed: 1_level_1
Huang High School,$655.00
Figueroa High School,$639.00
Shelton High School,$600.00
Hernandez High School,$652.00
Griffin High School,$625.00
Wilson High School,$578.00
Cabrera High School,$582.00
Bailey High School,$628.00
Holden High School,$581.00
Pena High School,$609.00


In [16]:
# Calculate the average math score per school
math_ps = comprehensive[["school_name", "math_score"]]
avg_math_ps = math_ps.groupby(["school_name"]).mean()
avg_math_ps

Unnamed: 0_level_0,math_score
school_name,Unnamed: 1_level_1
Bailey High School,77.048432
Cabrera High School,83.061895
Figueroa High School,76.711767
Ford High School,77.102592
Griffin High School,83.351499
Hernandez High School,77.289752
Holden High School,83.803279
Huang High School,76.629414
Johnson High School,77.072464
Pena High School,83.839917


In [17]:
# Calculate the average reading score per school
reading_ps = comprehensive[["school_name", "reading_score"]]
avg_reading_ps = reading_ps.groupby(["school_name"]).mean()
avg_reading_ps

Unnamed: 0_level_0,reading_score
school_name,Unnamed: 1_level_1
Bailey High School,81.033963
Cabrera High School,83.97578
Figueroa High School,81.15802
Ford High School,80.746258
Griffin High School,83.816757
Hernandez High School,80.934412
Holden High School,83.814988
Huang High School,81.182722
Johnson High School,80.966394
Pena High School,84.044699


In [18]:
# Filter to include only students with math score >+70
Stud_MathPass = comprehensive.loc[comprehensive["math_score"] >= 70, :]

# Group by school and sum remaining students
Stud_MathPass_ps = Stud_MathPass["school_name"].value_counts()
Stud_MathPass_ps = Stud_MathPass_ps.reset_index()
Stud_MathPass_ps.columns = ['school_name', 'Students Passing Math']
Stud_MathPass_ps

Unnamed: 0,school_name,Students Passing Math
0,Bailey High School,3318
1,Johnson High School,3145
2,Hernandez High School,3094
3,Rodriguez High School,2654
4,Wilson High School,2143
5,Figueroa High School,1946
6,Huang High School,1916
7,Ford High School,1871
8,Cabrera High School,1749
9,Wright High School,1680


In [19]:
# Filter to include only students with reading score >+70
Stud_ReadPass = comprehensive.loc[comprehensive["reading_score"] >= 70, :]

# Group by school and sum remaining students
Stud_ReadPass_ps = Stud_ReadPass["school_name"].value_counts()
Stud_ReadPass_ps = Stud_ReadPass_ps.reset_index()
Stud_ReadPass_ps.columns = ['school_name', 'Students Passing Reading']
Stud_ReadPass_ps

Unnamed: 0,school_name,Students Passing Reading
0,Bailey High School,4077
1,Johnson High School,3867
2,Hernandez High School,3748
3,Rodriguez High School,3208
4,Figueroa High School,2381
5,Huang High School,2372
6,Wilson High School,2204
7,Ford High School,2172
8,Cabrera High School,1803
9,Wright High School,1739


In [20]:
# Calculate # of students per school that passed both courses
Stud_AllPass = comprehensive[(comprehensive["math_score"] >= 70) & (comprehensive["reading_score"] >= 70)]
Stud_AllPass_ps = Stud_AllPass.groupby(["school_name"]).size()
Stud_AllPass_ps = Stud_AllPass_ps.reset_index()
Stud_AllPass_ps.columns = ['school_name', 'Students Passing Both']
Stud_AllPass_ps

Unnamed: 0,school_name,Students Passing Both
0,Bailey High School,2719
1,Cabrera High School,1697
2,Figueroa High School,1569
3,Ford High School,1487
4,Griffin High School,1330
5,Hernandez High School,2481
6,Holden High School,381
7,Huang High School,1561
8,Johnson High School,2549
9,Pena High School,871


In [21]:
# Create new DF by merging #students per school and #passing math per school
MathPass_df = pd.merge(Stud_MathPass_ps, Students_ps, on="school_name")

# Calculate % of students per school passing math
MathPass_df["% Passing Math"] = MathPass_df["Students Passing Math"] / MathPass_df["Student Count"] *100
MathPass_df = MathPass_df[['school_name', '% Passing Math']]
MathPass_df

Unnamed: 0,school_name,% Passing Math
0,Bailey High School,66.680064
1,Johnson High School,66.057551
2,Hernandez High School,66.752967
3,Rodriguez High School,66.366592
4,Wilson High School,93.867718
5,Figueroa High School,65.988471
6,Huang High School,65.683922
7,Ford High School,68.309602
8,Cabrera High School,94.133477
9,Wright High School,93.333333


In [22]:
# Create new DF by merging #students per school and #passing reading per school
ReadPass_df = pd.merge(Stud_ReadPass_ps, Students_ps, on="school_name")

# Calculate % of students per school passing reading
ReadPass_df["% Passing Reading"] = ReadPass_df["Students Passing Reading"] / ReadPass_df["Student Count"] *100
ReadPass_df = ReadPass_df[['school_name', '% Passing Reading']]
ReadPass_df

Unnamed: 0,school_name,% Passing Reading
0,Bailey High School,81.93328
1,Johnson High School,81.222432
2,Hernandez High School,80.862999
3,Rodriguez High School,80.220055
4,Figueroa High School,80.739234
5,Huang High School,81.316421
6,Wilson High School,96.539641
7,Ford High School,79.299014
8,Cabrera High School,97.039828
9,Wright High School,96.611111


In [23]:
# Create new DF by merging #students per school and #passing both courses per school
AllPass_df = pd.merge(Stud_AllPass_ps, Students_ps, on="school_name")

# Calculate % of students per school passing both courses
AllPass_df["% Passing Both"] = AllPass_df["Students Passing Both"] / AllPass_df["Student Count"] *100
AllPass_df = AllPass_df[['school_name', '% Passing Both']]
AllPass_df

Unnamed: 0,school_name,% Passing Both
0,Bailey High School,54.642283
1,Cabrera High School,91.334769
2,Figueroa High School,53.204476
3,Ford High School,54.289887
4,Griffin High School,90.599455
5,Hernandez High School,53.527508
6,Holden High School,89.227166
7,Huang High School,53.513884
8,Johnson High School,53.539172
9,Pena High School,90.540541


In [26]:
# Create new DF called `per_school_summary` with calculations above.
ps_summary1 = pd.merge(Type_ps, Students_ps, on="school_name")
ps_summary2 = pd.merge(ps_summary1, Budget_ps, on="school_name")
ps_summary3 = pd.merge(ps_summary2, StudBud_df, on="school_name")
ps_summary4 = pd.merge(ps_summary3, avg_math_ps, on="school_name")
ps_summary5 = pd.merge(ps_summary4, avg_reading_ps, on="school_name")
ps_summary6 = pd.merge(ps_summary5, MathPass_df, on="school_name")
ps_summary7 = pd.merge(ps_summary6, ReadPass_df, on="school_name")
ps_summary8 = pd.merge(ps_summary7, AllPass_df, on="school_name")
ps_summary9 = ps_summary8.rename(columns={"school_name":"School Name", "type":"School Type", "Student Count":"Total Students",
                                          "budget":"Total School Budget", "math_score":"Avg Math Score",
                                          "reading_score":"Avg Reading Score", "% Passing Both":"% Overall Passing"})

# Format Total Budget
ps_summary9["Total School Budget"] = ps_summary9["Total School Budget"].map("${:,.2f}".format)
ps_summary = ps_summary9.sort_values('School Name')
ps_summary = ps_summary.set_index("School Name")

# Display the DataFrame
ps_summary

Unnamed: 0_level_0,School Type,Total Students,Total School Budget,Spend per Capita,Avg Math Score,Avg Reading Score,% Passing Math,% Passing Reading,% Overall Passing
School Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
Bailey High School,District,4976,"$3,124,928.00",$628.00,77.048432,81.033963,66.680064,81.93328,54.642283
Cabrera High School,Charter,1858,"$1,081,356.00",$582.00,83.061895,83.97578,94.133477,97.039828,91.334769
Figueroa High School,District,2949,"$1,884,411.00",$639.00,76.711767,81.15802,65.988471,80.739234,53.204476
Ford High School,District,2739,"$1,763,916.00",$644.00,77.102592,80.746258,68.309602,79.299014,54.289887
Griffin High School,Charter,1468,"$917,500.00",$625.00,83.351499,83.816757,93.392371,97.138965,90.599455
Hernandez High School,District,4635,"$3,022,020.00",$652.00,77.289752,80.934412,66.752967,80.862999,53.527508
Holden High School,Charter,427,"$248,087.00",$581.00,83.803279,83.814988,92.505855,96.252927,89.227166
Huang High School,District,2917,"$1,910,635.00",$655.00,76.629414,81.182722,65.683922,81.316421,53.513884
Johnson High School,District,4761,"$3,094,650.00",$650.00,77.072464,80.966394,66.057551,81.222432,53.539172
Pena High School,Charter,962,"$585,858.00",$609.00,83.839917,84.044699,94.594595,95.945946,90.540541


## Highest-Performing Schools (by % Overall Passing)

In [27]:
# Sort the schools by `% Overall Passing` in descending order and display the top 5 rows.
top_schools = ps_summary.sort_values('% Overall Passing', ascending=False)
top_schools.head(5)

Unnamed: 0_level_0,School Type,Total Students,Total School Budget,Spend per Capita,Avg Math Score,Avg Reading Score,% Passing Math,% Passing Reading,% Overall Passing
School Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
Cabrera High School,Charter,1858,"$1,081,356.00",$582.00,83.061895,83.97578,94.133477,97.039828,91.334769
Thomas High School,Charter,1635,"$1,043,130.00",$638.00,83.418349,83.84893,93.272171,97.308869,90.948012
Griffin High School,Charter,1468,"$917,500.00",$625.00,83.351499,83.816757,93.392371,97.138965,90.599455
Wilson High School,Charter,2283,"$1,319,574.00",$578.00,83.274201,83.989488,93.867718,96.539641,90.582567
Pena High School,Charter,962,"$585,858.00",$609.00,83.839917,84.044699,94.594595,95.945946,90.540541


## Bottom Performing Schools (By % Overall Passing)

In [28]:
# Sort the schools by `% Overall Passing` in ascending order and display the top 5 rows.
bottom_schools = ps_summary.sort_values('% Overall Passing')
bottom_schools.head(5)

Unnamed: 0_level_0,School Type,Total Students,Total School Budget,Spend per Capita,Avg Math Score,Avg Reading Score,% Passing Math,% Passing Reading,% Overall Passing
School Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
Rodriguez High School,District,3999,"$2,547,363.00",$637.00,76.842711,80.744686,66.366592,80.220055,52.988247
Figueroa High School,District,2949,"$1,884,411.00",$639.00,76.711767,81.15802,65.988471,80.739234,53.204476
Huang High School,District,2917,"$1,910,635.00",$655.00,76.629414,81.182722,65.683922,81.316421,53.513884
Hernandez High School,District,4635,"$3,022,020.00",$652.00,77.289752,80.934412,66.752967,80.862999,53.527508
Johnson High School,District,4761,"$3,094,650.00",$650.00,77.072464,80.966394,66.057551,81.222432,53.539172


## Math Scores by Grade

In [31]:
# Use the code provided to separate the data by grade
ninth = comprehensive[(comprehensive["grade"] == "9th")]
tenth = comprehensive[(comprehensive["grade"] == "10th")]
eleventh = comprehensive[(comprehensive["grade"] == "11th")]
twelfth = comprehensive[(comprehensive["grade"] == "12th")]

# Group by `school_name` and take the mean of the `math_score` column for each.
math9 = ninth[["school_name", "math_score"]]
avg_math9 = math9.groupby(["school_name"]).mean()
               
math10 = tenth[["school_name", "math_score"]]
avg_math10 = math10.groupby(["school_name"]).mean()
                
math11 = eleventh[["school_name", "math_score"]]
avg_math11 = math11.groupby(["school_name"]).mean()
                   
math12 = twelfth[["school_name", "math_score"]]
avg_math12 = math12.groupby(["school_name"]).mean()

# Combine each of the scores above into single DataFrame called `math_scores_by_grade`
math_bg1 = pd.merge(avg_math9, avg_math10, on="school_name")
math_bg101 = math_bg1.rename(columns={"math_score_x":"9th", "math_score_y":"10th"})
math_bg2 = pd.merge(math_bg101, avg_math11, on="school_name")
math_bg201 = math_bg2.rename(columns={"math_score":"11th"})
math_bg3 = pd.merge(math_bg201, avg_math12, on="school_name")
math_bg = math_bg3.rename(columns={"math_score":"12th"})

# Minor data wrangling
math_bg.index.name = None

# Display the DataFrame
math_bg

Unnamed: 0,9th,10th,11th,12th
Bailey High School,77.083676,76.996772,77.515588,76.492218
Cabrera High School,83.094697,83.154506,82.76556,83.277487
Figueroa High School,76.403037,76.539974,76.884344,77.151369
Ford High School,77.361345,77.672316,76.918058,76.179963
Griffin High School,82.04401,84.229064,83.842105,83.356164
Hernandez High School,77.438495,77.337408,77.136029,77.186567
Holden High School,83.787402,83.429825,85.0,82.855422
Huang High School,77.027251,75.908735,76.446602,77.225641
Johnson High School,77.187857,76.691117,77.491653,76.863248
Pena High School,83.625455,83.372,84.328125,84.121547


## Reading Score by Grade 

In [34]:
# Group by `school_name` and take the mean of `reading_score'
read9 = ninth[["school_name", "reading_score"]]
avg_read9 = read9.groupby(["school_name"]).mean()
               
read10 = tenth[["school_name", "reading_score"]]
avg_read10 = read10.groupby(["school_name"]).mean()
                
read11 = eleventh[["school_name", "reading_score"]]
avg_read11 = read11.groupby(["school_name"]).mean()
                   
read12 = twelfth[["school_name", "reading_score"]]
avg_read12 = read12.groupby(["school_name"]).mean()


# Combine each of the scores above into single DataFrame called `reading_scores_by_grade`
read_bg1 = pd.merge(avg_read9, avg_read10, on="school_name")
read_bg101 = read_bg1.rename(columns={"reading_score_x":"9th", "reading_score_y":"10th"})
read_bg2 = pd.merge(read_bg101, avg_read11, on="school_name")
read_bg201 = read_bg2.rename(columns={"reading_score":"11th"})
read_bg3 = pd.merge(read_bg201, avg_read12, on="school_name")
read_bg = read_bg3.rename(columns={"reading_score":"12th"})

# Minor data wrangling
read_bg.index.name = None

# Display the DataFrame
read_bg

Unnamed: 0,9th,10th,11th,12th
Bailey High School,81.303155,80.907183,80.945643,80.912451
Cabrera High School,83.676136,84.253219,83.788382,84.287958
Figueroa High School,81.198598,81.408912,80.640339,81.384863
Ford High School,80.632653,81.262712,80.403642,80.662338
Griffin High School,83.369193,83.706897,84.288089,84.013699
Hernandez High School,80.86686,80.660147,81.39614,80.857143
Holden High School,83.677165,83.324561,83.815534,84.698795
Huang High School,81.290284,81.512386,81.417476,80.305983
Johnson High School,81.260714,80.773431,80.616027,81.227564
Pena High School,83.807273,83.612,84.335938,84.59116


## Scores by School Spending

In [35]:
# Establish the bins 
spending_bins = [0, 585, 630, 645, 680]
labels = ["<$585", "$585-630", "$630-645", "$645-680"]

In [36]:
# copy the school summary since it has the "Per Student Budget" 
ps_spending_df = ps_summary.copy()

In [39]:
# Use `pd.cut` to categorize spending based on the bins.
ps_spending_df["Spend per Capita"] = ps_spending_df["Spend per Capita"].replace('[\$,]', '', regex=True).astype(float)

ps_spending_df["Spending Range (per student)"] = pd.cut(ps_spending_df["Spend per Capita"],
                                                       spending_bins, labels=labels, include_lowest=True)
ps_spending_df["Spend per Capita"] = ps_spending_df["Spend per Capita"].map("${:.2f}".format)
ps_spending_df

Unnamed: 0_level_0,School Type,Total Students,Total School Budget,Spend per Capita,Avg Math Score,Avg Reading Score,% Passing Math,% Passing Reading,% Overall Passing,Spending Range (per student)
School Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
Bailey High School,District,4976,"$3,124,928.00",$628.00,77.048432,81.033963,66.680064,81.93328,54.642283,$585-630
Cabrera High School,Charter,1858,"$1,081,356.00",$582.00,83.061895,83.97578,94.133477,97.039828,91.334769,<$585
Figueroa High School,District,2949,"$1,884,411.00",$639.00,76.711767,81.15802,65.988471,80.739234,53.204476,$630-645
Ford High School,District,2739,"$1,763,916.00",$644.00,77.102592,80.746258,68.309602,79.299014,54.289887,$630-645
Griffin High School,Charter,1468,"$917,500.00",$625.00,83.351499,83.816757,93.392371,97.138965,90.599455,$585-630
Hernandez High School,District,4635,"$3,022,020.00",$652.00,77.289752,80.934412,66.752967,80.862999,53.527508,$645-680
Holden High School,Charter,427,"$248,087.00",$581.00,83.803279,83.814988,92.505855,96.252927,89.227166,<$585
Huang High School,District,2917,"$1,910,635.00",$655.00,76.629414,81.182722,65.683922,81.316421,53.513884,$645-680
Johnson High School,District,4761,"$3,094,650.00",$650.00,77.072464,80.966394,66.057551,81.222432,53.539172,$645-680
Pena High School,Charter,962,"$585,858.00",$609.00,83.839917,84.044699,94.594595,95.945946,90.540541,$585-630


In [40]:
#  Calculate averages for the desired columns. 
spending_math = ps_spending_df.groupby(["Spending Range (per student)"])["Avg Math Score"].mean()
spending_reading = ps_spending_df.groupby(["Spending Range (per student)"])["Avg Reading Score"].mean()
spending_MathPass = ps_spending_df.groupby(["Spending Range (per student)"])["% Passing Math"].mean()
spending_ReadPass = ps_spending_df.groupby(["Spending Range (per student)"])["% Passing Reading"].mean()
spending_AllPass = ps_spending_df.groupby(["Spending Range (per student)"])["% Overall Passing"].mean()

In [43]:
# Assemble into DataFrame
SpendSummary1 = pd.merge(spending_math, spending_reading, on="Spending Range (per student)")
SpendSummary2 = pd.merge(SpendSummary1, spending_MathPass, on="Spending Range (per student)")
SpendSummary3 = pd.merge(SpendSummary2, spending_ReadPass, on="Spending Range (per student)")
SpendSummary = pd.merge(SpendSummary3, spending_AllPass, on="Spending Range (per student)")

# Display results
SpendSummary

Unnamed: 0_level_0,Avg Math Score,Avg Reading Score,% Passing Math,% Passing Reading,% Overall Passing
Spending Range (per student),Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
<$585,83.455399,83.933814,93.460096,96.610877,90.369459
$585-630,81.899826,83.155286,87.133538,92.718205,81.418596
$630-645,78.518855,81.624473,73.484209,84.391793,62.857656
$645-680,76.99721,81.027843,66.164813,81.133951,53.526855


## Scores by School Size

In [44]:
# Establish the bins.
size_bins = [0, 1000, 2000, 5000]
labels = ["Small (<1000)", "Medium (1000-2000)", "Large (2000-5000)"]

In [47]:
# copy the school summary since it has the "Total Students"
ps_size = ps_summary.copy()

# Use `pd.cut` to categorize score based upon school size.
ps_size["School Size"] = pd.cut(ps_size["Total Students"], size_bins, labels=labels, include_lowest=True)
ps_size

Unnamed: 0_level_0,School Type,Total Students,Total School Budget,Spend per Capita,Avg Math Score,Avg Reading Score,% Passing Math,% Passing Reading,% Overall Passing,School Size
School Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
Bailey High School,District,4976,"$3,124,928.00",$628.00,77.048432,81.033963,66.680064,81.93328,54.642283,Large (2000-5000)
Cabrera High School,Charter,1858,"$1,081,356.00",$582.00,83.061895,83.97578,94.133477,97.039828,91.334769,Medium (1000-2000)
Figueroa High School,District,2949,"$1,884,411.00",$639.00,76.711767,81.15802,65.988471,80.739234,53.204476,Large (2000-5000)
Ford High School,District,2739,"$1,763,916.00",$644.00,77.102592,80.746258,68.309602,79.299014,54.289887,Large (2000-5000)
Griffin High School,Charter,1468,"$917,500.00",$625.00,83.351499,83.816757,93.392371,97.138965,90.599455,Medium (1000-2000)
Hernandez High School,District,4635,"$3,022,020.00",$652.00,77.289752,80.934412,66.752967,80.862999,53.527508,Large (2000-5000)
Holden High School,Charter,427,"$248,087.00",$581.00,83.803279,83.814988,92.505855,96.252927,89.227166,Small (<1000)
Huang High School,District,2917,"$1,910,635.00",$655.00,76.629414,81.182722,65.683922,81.316421,53.513884,Large (2000-5000)
Johnson High School,District,4761,"$3,094,650.00",$650.00,77.072464,80.966394,66.057551,81.222432,53.539172,Large (2000-5000)
Pena High School,Charter,962,"$585,858.00",$609.00,83.839917,84.044699,94.594595,95.945946,90.540541,Small (<1000)


In [48]:
# Calculate averages for the desired columns. 
size_math = ps_size.groupby(["School Size"])["Avg Math Score"].mean()
size_reading = ps_size.groupby(["School Size"])["Avg Reading Score"].mean()
size_MathPass = ps_size.groupby(["School Size"])["% Passing Math"].mean()
size_ReadPass = ps_size.groupby(["School Size"])["% Passing Reading"].mean()
size_AllPass = ps_size.groupby(["School Size"])["% Overall Passing"].mean()

In [54]:
# Create new DataFrame with mean scores above.
SizeSummary1 = pd.merge(size_math, size_reading, on="School Size")
SizeSummary2 = pd.merge(SizeSummary1, size_MathPass, on="School Size")
SizeSummary3 = pd.merge(SizeSummary2, size_ReadPass, on="School Size")
SizeSummary = pd.merge(SizeSummary3, size_AllPass, on="School Size")

# Display results
SizeSummary

Unnamed: 0_level_0,Avg Math Score,Avg Reading Score,% Passing Math,% Passing Reading,% Overall Passing
School Size,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Small (<1000),83.821598,83.929843,93.550225,96.099437,89.883853
Medium (1000-2000),83.374684,83.864438,93.599695,96.79068,90.621535
Large (2000-5000),77.746417,81.344493,69.963361,82.766634,58.286003


## Scores by School Type

In [50]:
# Group the per_school_summary DataFrame by "School Type" and average the results.
AvgMath_bt = ps_summary.groupby(["School Type"])["Avg Math Score"].mean()
AvgRead_bt = ps_summary.groupby(["School Type"])["Avg Reading Score"].mean()
AvgMathPass_bt = ps_summary.groupby(["School Type"])["% Passing Math"].mean()
AvgReadPass_bt = ps_summary.groupby(["School Type"])["% Passing Reading"].mean()
AvgAllPass_bt = ps_summary.groupby(["School Type"])["% Overall Passing"].mean()

In [52]:
# Create new DataFrame with mean scores above.
TypSum1 = pd.merge(AvgMath_bt, AvgRead_bt, on="School Type")
TypSum2 = pd.merge(TypSum1, AvgMathPass_bt, on="School Type")
TypSum3 = pd.merge(TypSum2, AvgReadPass_bt, on="School Type")
TypSum = pd.merge(TypSum3, AvgAllPass_bt, on="School Type")

# Display results
TypSum