# PyCity Schools Analysis

Analysis Summary of District and School Performance

This analysis evaluates the academic performance of schools within a district, considering metrics such as average test scores, budget allocation, school size, and school type. Key calculations include average math and reading scores and the percentage of students passing in each subject, segmented by school size, per-student spending, and school type.

Key Findings

	1.	Impact of School Spending on Academic Performance:
Schools with lower per-student spending (less than $585) generally demonstrated higher average math and reading scores, along with higher passing rates. This suggests that reduced spending per student does not necessarily correlate with lower performance, indicating potential efficiencies in resource allocation among smaller-budget schools.
	2.	School Size and Passing Rates:
Smaller schools (fewer than 1,000 students) exhibited higher overall passing rates compared to larger schools. Small and medium-sized schools consistently outperformed larger ones in both average scores and passing rates, suggesting that smaller enrollment may foster a more supportive learning environment.
	3.	School Type and Academic Performance:
Charter schools significantly outperformed district schools in both math and reading scores and in passing rates. Charter schools displayed higher percentages of students meeting passing benchmarks across subjects, likely due to more flexible educational models or targeted resource allocation.

Concluding Observations

The analysis highlights that smaller, charter-type schools with modest spending per student achieve better academic outcomes across the district. These trends may be influenced by the increased individual attention in smaller and charter schools and efficient resource management. These insights are valuable for district policymakers when considering budget allocations, school sizes, and support for different school types to optimize student success across the district.

This study ties school type into the broader discussion of factors affecting performance, highlighting the strength of charter schools.

In [288]:
# Dependencies and Setup
import pandas as pd
from pathlib import Path

# File to Load (Remember to Change These)
school_data_to_load = Path("Resources/schools_complete.csv")
student_data_to_load = Path("Resources/students_complete.csv")

# Read School and Student Data File and store into Pandas DataFrames
school_data = pd.read_csv(school_data_to_load)
student_data = pd.read_csv(student_data_to_load)

# Combine the data into a single dataset.
school_data_complete = pd.merge(student_data, school_data, how="left", on="school_name")
school_data_complete.head()


Unnamed: 0,Student ID,student_name,gender,grade,school_name,reading_score,math_score,School ID,type,size,budget
0,0,Paul Bradley,M,9th,Huang High School,66,79,0,District,2917,1910635
1,1,Victor Smith,M,12th,Huang High School,94,61,0,District,2917,1910635
2,2,Kevin Rodriguez,M,12th,Huang High School,90,60,0,District,2917,1910635
3,3,Dr. Richard Scott,M,12th,Huang High School,67,58,0,District,2917,1910635
4,4,Bonnie Ray,F,9th,Huang High School,97,84,0,District,2917,1910635


## District Summary

In [289]:
# Calculate the total number of unique schools
school_count = school_data_complete["school_name"].nunique()
school_count


15

In [290]:
# Calculate the total number of students
student_count = school_data_complete["student_name"].count()
student_count


39170

In [291]:
# Calculate the total budget
total_budget = school_data["budget"].sum()
total_budget


24649428

In [292]:
# Calculate the average (mean) math score
average_math_score = school_data_complete["math_score"].mean()
average_math_score

78.98537145774827

In [293]:
# Calculate the average (mean) reading score
average_reading_score = school_data_complete["reading_score"].mean()
average_reading_score

81.87784018381414

In [294]:
# Use the following to calculate the percentage of students who passed math (math scores greather than or equal to 70)
passing_math_count = school_data_complete[(school_data_complete["math_score"] >= 70)].count()["student_name"]
passing_math_percentage = passing_math_count / float(student_count) * 100
passing_math_percentage


74.9808526933878

In [295]:
# Use the following to calculate the percentage of students who passed reading (reading score >= 70)
passing_reading_count = school_data_complete[(school_data_complete["reading_score"] >= 70)].count()["student_name"]
passing_reading_percentage = passing_reading_count / float(student_count) * 100
passing_reading_percentage

85.80546336482001

In [296]:
# Use the following to calculate the percentage of students that passed math and reading
passing_math_reading_count = school_data_complete[
    (school_data_complete["math_score"] >= 70) & (school_data_complete["reading_score"] >= 70)
].count()["student_name"]
overall_passing_rate = passing_math_reading_count /  float(student_count) * 100
overall_passing_rate


65.17232575950983

In [297]:
# Create a high-level snapshot of the district's key metrics in a DataFrame
district_summary = pd.DataFrame({
    "Total Schools": [school_count],
    "Total Students": [student_count],
    "Total Budget": [total_budget],
    "Average Math Score": [average_math_score],
    "Average Reading Score": [average_reading_score],
    "% Passing Math": [passing_math_percentage],
    "% Passing Reading": [passing_reading_percentage],
    "% Overall Passing": [overall_passing_rate]
})

# Formatting
district_summary["Total Students"] = district_summary["Total Students"].map("{:,}".format)
district_summary["Total Budget"] = district_summary["Total Budget"].map("${:,.2f}".format)

# Display the DataFrame
district_summary


Unnamed: 0,Total Schools,Total Students,Total Budget,Average Math Score,Average Reading Score,% Passing Math,% Passing Reading,% Overall Passing
0,15,39170,"$24,649,428.00",78.985371,81.87784,74.980853,85.805463,65.172326


## School Summary

In [298]:
# Use the code provided to select the type per school from school_data
school_types = school_data.set_index(["school_name"])[["type"]]

# Display the DataFrame
school_types.head()

Unnamed: 0_level_0,type
school_name,Unnamed: 1_level_1
Huang High School,District
Figueroa High School,District
Shelton High School,Charter
Hernandez High School,District
Griffin High School,Charter


In [299]:
# Calculate the total student count per school from school_data
per_school_counts = school_data.set_index(["school_name"])[["size"]]

# Display the DataFrame
per_school_counts.head(15)

Unnamed: 0_level_0,size
school_name,Unnamed: 1_level_1
Huang High School,2917
Figueroa High School,2949
Shelton High School,1761
Hernandez High School,4635
Griffin High School,1468
Wilson High School,2283
Cabrera High School,1858
Bailey High School,4976
Holden High School,427
Pena High School,962


In [300]:
# Calculate the total school budget and per capita spending per school from school_data
per_school_budget = school_data.set_index(["school_name"])[["budget"]]
per_school_capita = per_school_budget["budget"] / per_school_counts["size"]

# Combine them into one DataFrame
school_budget_summary = pd.DataFrame({
    "Total Budget": per_school_budget["budget"],
    "Per Capita Spending": per_school_capita
})

# Display the combined DataFrame
school_budget_summary.head()

Unnamed: 0_level_0,Total Budget,Per Capita Spending
school_name,Unnamed: 1_level_1,Unnamed: 2_level_1
Huang High School,1910635,655.0
Figueroa High School,1884411,639.0
Shelton High School,1056600,600.0
Hernandez High School,3022020,652.0
Griffin High School,917500,625.0


In [301]:
# Calculate the average math score per school
per_school_math = school_data_complete.groupby("school_name")["math_score"].mean()

# Calculate the average reading score per school
per_school_reading = school_data_complete.groupby("school_name")["reading_score"].mean()

# Combine them into a single DataFrame
school_scores_summary = pd.DataFrame({
    "Average Math Score": per_school_math,
    "Average Reading Score": per_school_reading
})

# Display the combined DataFrame
school_scores_summary.head()

Unnamed: 0_level_0,Average Math Score,Average Reading Score
school_name,Unnamed: 1_level_1,Unnamed: 2_level_1
Bailey High School,77.048432,81.033963
Cabrera High School,83.061895,83.97578
Figueroa High School,76.711767,81.15802
Ford High School,77.102592,80.746258
Griffin High School,83.351499,83.816757


In [302]:
# Calculate the number of students with math scores of 70 or higher
students_passing_math = school_data_complete[school_data_complete["math_score"] >= 70]

# Calculate the number of students per school who passed math
school_students_passing_math = students_passing_math.groupby("school_name").size()

# Convert to a DataFrame for display
school_students_passing_math = school_students_passing_math.to_frame(name="Students Passing Math (>=70)")

# Display the DataFrame
school_students_passing_math.head()

Unnamed: 0_level_0,Students Passing Math (>=70)
school_name,Unnamed: 1_level_1
Bailey High School,3318
Cabrera High School,1749
Figueroa High School,1946
Ford High School,1871
Griffin High School,1371


In [303]:
# Calculate the number of students with reading scores of 70 or higher
students_passing_reading = school_data_complete[school_data_complete["reading_score"] >= 70]

# Calculate the number of students per school who passed reading
school_students_passing_reading = students_passing_reading.groupby("school_name").size()

# Convert to a DataFrame for display
school_students_passing_reading = school_students_passing_reading.to_frame(name="Students Passing Reading (>=70)")

# Display the DataFrame
school_students_passing_reading.head()

Unnamed: 0_level_0,Students Passing Reading (>=70)
school_name,Unnamed: 1_level_1
Bailey High School,4077
Cabrera High School,1803
Figueroa High School,2381
Ford High School,2172
Griffin High School,1426


In [304]:
# Use the provided code to calculate the number of students per school that passed both math and reading
students_passing_math_and_reading = school_data_complete[
    (school_data_complete["reading_score"] >= 70) & (school_data_complete["math_score"] >= 70)
]

# Calculate the number of students per school who passed both math and reading
school_students_passing_math_and_reading = students_passing_math_and_reading.groupby("school_name").size()

# Convert to a DataFrame for display
school_students_passing_math_and_reading = school_students_passing_math_and_reading.to_frame(name="Students Passing Math and Reading (>=70)")

# Display the DataFrame
school_students_passing_math_and_reading.head()

Unnamed: 0_level_0,Students Passing Math and Reading (>=70)
school_name,Unnamed: 1_level_1
Bailey High School,2719
Cabrera High School,1697
Figueroa High School,1569
Ford High School,1487
Griffin High School,1330


In [305]:
# First, calculate the total number of students passing math, reading, and both subjects
students_passing_math = school_data_complete[school_data_complete["math_score"] >= 70]
students_passing_reading = school_data_complete[school_data_complete["reading_score"] >= 70]
students_passing_math_and_reading = school_data_complete[
    (school_data_complete["reading_score"] >= 70) & (school_data_complete["math_score"] >= 70)
]

# Calculate the total number of students (this will be the same for all schools, as we're considering total students)
total_students = len(school_data_complete)

# Calculate the passing rate for math (total students passing math / total students)
passing_math_count = len(students_passing_math)
per_school_passing_math = (passing_math_count / total_students) * 100

# Calculate the passing rate for reading (total students passing reading / total students)
passing_reading_count = len(students_passing_reading)
per_school_passing_reading = (passing_reading_count / total_students) * 100

# Calculate the overall passing rate (students passing both subjects) / total students
passing_math_reading_count = len(students_passing_math_and_reading)
overall_passing_rate = (passing_math_reading_count / total_students) * 100

# Combine the results into a DataFrame
school_passing_rates_df = pd.DataFrame({
    "% Passing Math": [per_school_passing_math],
    "% Passing Reading": [per_school_passing_reading],
    "% Overall Passing": [overall_passing_rate]
})

# Display the DataFrame
school_passing_rates_df

Unnamed: 0,% Passing Math,% Passing Reading,% Overall Passing
0,74.980853,85.805463,65.172326


In [306]:
# Print the type and content of each variable to check if they are Series or scalar
print("per_school_counts:", type(per_school_counts), per_school_counts)
print("per_school_budget:", type(per_school_budget), per_school_budget)
print("per_school_capita:", type(per_school_capita), per_school_capita)
print("per_school_math:", type(per_school_math), per_school_math)
print("per_school_reading:", type(per_school_reading), per_school_reading)
print("per_school_passing_math:", type(per_school_passing_math), per_school_passing_math)
print("per_school_passing_reading:", type(per_school_passing_reading), per_school_passing_reading)
print("overall_passing_rate:", type(overall_passing_rate), overall_passing_rate)

per_school_counts: <class 'pandas.core.frame.DataFrame'>                        size
school_name                
Huang High School      2917
Figueroa High School   2949
Shelton High School    1761
Hernandez High School  4635
Griffin High School    1468
Wilson High School     2283
Cabrera High School    1858
Bailey High School     4976
Holden High School      427
Pena High School        962
Wright High School     1800
Rodriguez High School  3999
Johnson High School    4761
Ford High School       2739
Thomas High School     1635
per_school_budget: <class 'pandas.core.frame.DataFrame'>                         budget
school_name                   
Huang High School      1910635
Figueroa High School   1884411
Shelton High School    1056600
Hernandez High School  3022020
Griffin High School     917500
Wilson High School     1319574
Cabrera High School    1081356
Bailey High School     3124928
Holden High School      248087
Pena High School        585858
Wright High School     1049400
Rodrigu

In [307]:
# Recalculate the passing rates per school as Series
per_school_passing_math = (school_students_passing_math["Students Passing Math (>=70)"] / per_school_counts["size"]) * 100
per_school_passing_reading = (school_students_passing_reading["Students Passing Reading (>=70)"] / per_school_counts["size"]) * 100
overall_passing_rate = (school_students_passing_math_and_reading["Students Passing Math and Reading (>=70)"] / per_school_counts["size"]) * 100

# Now create the `per_school_summary` DataFrame
per_school_summary = pd.DataFrame({
    "School Type": school_types["type"],                    # Type of each school
    "Total Students": per_school_counts["size"],             # Total number of students per school
    "Total School Budget": per_school_budget["budget"],      # Total budget for each school
    "Per Student Budget": per_school_capita,                 # Budget per student
    "Average Math Score": per_school_math,                   # Average math score per school
    "Average Reading Score": per_school_reading,             # Average reading score per school
    "% Passing Math": per_school_passing_math,               # Percentage of students passing math
    "% Passing Reading": per_school_passing_reading,         # Percentage of students passing reading
    "% Overall Passing": overall_passing_rate                # Overall passing rate per school
})

# Formatting for the budget columns
per_school_summary["Total School Budget"] = per_school_summary["Total School Budget"].map("${:,.2f}".format)
per_school_summary["Per Student Budget"] = per_school_summary["Per Student Budget"].map("${:,.2f}".format)

# Display the DataFrame
per_school_summary

Unnamed: 0_level_0,School Type,Total Students,Total School Budget,Per Student Budget,Average Math Score,Average Reading Score,% Passing Math,% Passing Reading,% Overall Passing
school_name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
Bailey High School,District,4976,"$3,124,928.00",$628.00,77.048432,81.033963,66.680064,81.93328,54.642283
Cabrera High School,Charter,1858,"$1,081,356.00",$582.00,83.061895,83.97578,94.133477,97.039828,91.334769
Figueroa High School,District,2949,"$1,884,411.00",$639.00,76.711767,81.15802,65.988471,80.739234,53.204476
Ford High School,District,2739,"$1,763,916.00",$644.00,77.102592,80.746258,68.309602,79.299014,54.289887
Griffin High School,Charter,1468,"$917,500.00",$625.00,83.351499,83.816757,93.392371,97.138965,90.599455
Hernandez High School,District,4635,"$3,022,020.00",$652.00,77.289752,80.934412,66.752967,80.862999,53.527508
Holden High School,Charter,427,"$248,087.00",$581.00,83.803279,83.814988,92.505855,96.252927,89.227166
Huang High School,District,2917,"$1,910,635.00",$655.00,76.629414,81.182722,65.683922,81.316421,53.513884
Johnson High School,District,4761,"$3,094,650.00",$650.00,77.072464,80.966394,66.057551,81.222432,53.539172
Pena High School,Charter,962,"$585,858.00",$609.00,83.839917,84.044699,94.594595,95.945946,90.540541


## Highest-Performing Schools (by % Overall Passing)

In [308]:
# Sort the schools by `% Overall Passing` in descending order and display the top 5 rows
top_schools = per_school_summary.sort_values(by="% Overall Passing", ascending=False)

# Display the top 5 schools
top_schools.head(5)


Unnamed: 0_level_0,School Type,Total Students,Total School Budget,Per Student Budget,Average Math Score,Average Reading Score,% Passing Math,% Passing Reading,% Overall Passing
school_name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
Cabrera High School,Charter,1858,"$1,081,356.00",$582.00,83.061895,83.97578,94.133477,97.039828,91.334769
Thomas High School,Charter,1635,"$1,043,130.00",$638.00,83.418349,83.84893,93.272171,97.308869,90.948012
Griffin High School,Charter,1468,"$917,500.00",$625.00,83.351499,83.816757,93.392371,97.138965,90.599455
Wilson High School,Charter,2283,"$1,319,574.00",$578.00,83.274201,83.989488,93.867718,96.539641,90.582567
Pena High School,Charter,962,"$585,858.00",$609.00,83.839917,84.044699,94.594595,95.945946,90.540541


## Bottom Performing Schools (By % Overall Passing)

In [309]:
# Sort the schools by `% Overall Passing` in ascending order and display the bottom 5 rows
bottom_schools = per_school_summary.sort_values(by="% Overall Passing", ascending=True)

# Display the bottom 5 schools
bottom_schools.head(5)

Unnamed: 0_level_0,School Type,Total Students,Total School Budget,Per Student Budget,Average Math Score,Average Reading Score,% Passing Math,% Passing Reading,% Overall Passing
school_name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
Rodriguez High School,District,3999,"$2,547,363.00",$637.00,76.842711,80.744686,66.366592,80.220055,52.988247
Figueroa High School,District,2949,"$1,884,411.00",$639.00,76.711767,81.15802,65.988471,80.739234,53.204476
Huang High School,District,2917,"$1,910,635.00",$655.00,76.629414,81.182722,65.683922,81.316421,53.513884
Hernandez High School,District,4635,"$3,022,020.00",$652.00,77.289752,80.934412,66.752967,80.862999,53.527508
Johnson High School,District,4761,"$3,094,650.00",$650.00,77.072464,80.966394,66.057551,81.222432,53.539172


## Math Scores by Grade

In [310]:
# Separate the data by grade
ninth_graders = school_data_complete[school_data_complete["grade"] == "9th"]
tenth_graders = school_data_complete[school_data_complete["grade"] == "10th"]
eleventh_graders = school_data_complete[school_data_complete["grade"] == "11th"]
twelfth_graders = school_data_complete[school_data_complete["grade"] == "12th"]

# Group by `school_name` and take the mean of the `math_score` column for each
ninth_grade_math_scores = ninth_graders.groupby("school_name")["math_score"].mean()
tenth_grade_math_scores = tenth_graders.groupby("school_name")["math_score"].mean()
eleventh_grade_math_scores = eleventh_graders.groupby("school_name")["math_score"].mean()
twelfth_grade_math_scores = twelfth_graders.groupby("school_name")["math_score"].mean()

# Combine each of the scores above into a single DataFrame called `math_scores_by_grade`
math_scores_by_grade = pd.DataFrame({
    "9th": ninth_grade_math_scores,
    "10th": tenth_grade_math_scores,
    "11th": eleventh_grade_math_scores,
    "12th": twelfth_grade_math_scores
})

# Minor data wrangling
math_scores_by_grade.index.name = None

# Display the DataFrame
math_scores_by_grade

Unnamed: 0,9th,10th,11th,12th
Bailey High School,77.083676,76.996772,77.515588,76.492218
Cabrera High School,83.094697,83.154506,82.76556,83.277487
Figueroa High School,76.403037,76.539974,76.884344,77.151369
Ford High School,77.361345,77.672316,76.918058,76.179963
Griffin High School,82.04401,84.229064,83.842105,83.356164
Hernandez High School,77.438495,77.337408,77.136029,77.186567
Holden High School,83.787402,83.429825,85.0,82.855422
Huang High School,77.027251,75.908735,76.446602,77.225641
Johnson High School,77.187857,76.691117,77.491653,76.863248
Pena High School,83.625455,83.372,84.328125,84.121547


## Reading Score by Grade 

In [311]:
# Separate the data by grade
ninth_graders = school_data_complete[school_data_complete["grade"] == "9th"]
tenth_graders = school_data_complete[school_data_complete["grade"] == "10th"]
eleventh_graders = school_data_complete[school_data_complete["grade"] == "11th"]
twelfth_graders = school_data_complete[school_data_complete["grade"] == "12th"]

# Group by `school_name` and take the mean of the `reading_score` column for each
ninth_grade_reading_scores = ninth_graders.groupby("school_name")["reading_score"].mean()
tenth_grade_reading_scores = tenth_graders.groupby("school_name")["reading_score"].mean()
eleventh_grade_reading_scores = eleventh_graders.groupby("school_name")["reading_score"].mean()
twelfth_grade_reading_scores = twelfth_graders.groupby("school_name")["reading_score"].mean()

# Combine each of the scores above into a single DataFrame called `reading_scores_by_grade`
reading_scores_by_grade = pd.DataFrame({
    "9th": ninth_grade_reading_scores,
    "10th": tenth_grade_reading_scores,
    "11th": eleventh_grade_reading_scores,
    "12th": twelfth_grade_reading_scores
})

# Minor data wrangling
reading_scores_by_grade = reading_scores_by_grade[["9th", "10th", "11th", "12th"]]
reading_scores_by_grade.index.name = None

# Display the DataFrame
reading_scores_by_grade

Unnamed: 0,9th,10th,11th,12th
Bailey High School,81.303155,80.907183,80.945643,80.912451
Cabrera High School,83.676136,84.253219,83.788382,84.287958
Figueroa High School,81.198598,81.408912,80.640339,81.384863
Ford High School,80.632653,81.262712,80.403642,80.662338
Griffin High School,83.369193,83.706897,84.288089,84.013699
Hernandez High School,80.86686,80.660147,81.39614,80.857143
Holden High School,83.677165,83.324561,83.815534,84.698795
Huang High School,81.290284,81.512386,81.417476,80.305983
Johnson High School,81.260714,80.773431,80.616027,81.227564
Pena High School,83.807273,83.612,84.335938,84.59116


## Scores by School Spending

In [312]:
import warnings

# Ignore warnings
with warnings.catch_warnings():
    warnings.simplefilter("ignore")

    # Remove dollar signs and commas from "Per Student Budget" and convert to numeric
    per_school_summary["Per Student Budget"] = per_school_summary["Per Student Budget"].replace(r'[\$,]', '', regex=True).astype(float)

    # Establish the bins and labels for spending ranges
    spending_bins = [0, 585, 630, 645, 680]
    labels = ["<$585", "$585-630", "$630-645", "$645-680"]

    # Categorize the per-student spending into these bins
    per_school_summary["Spending Ranges (Per Student)"] = pd.cut(per_school_summary["Per Student Budget"], bins=spending_bins, labels=labels)

# Display the modified DataFrame
per_school_summary

Unnamed: 0_level_0,School Type,Total Students,Total School Budget,Per Student Budget,Average Math Score,Average Reading Score,% Passing Math,% Passing Reading,% Overall Passing,Spending Ranges (Per Student)
school_name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
Bailey High School,District,4976,"$3,124,928.00",628.0,77.048432,81.033963,66.680064,81.93328,54.642283,$585-630
Cabrera High School,Charter,1858,"$1,081,356.00",582.0,83.061895,83.97578,94.133477,97.039828,91.334769,<$585
Figueroa High School,District,2949,"$1,884,411.00",639.0,76.711767,81.15802,65.988471,80.739234,53.204476,$630-645
Ford High School,District,2739,"$1,763,916.00",644.0,77.102592,80.746258,68.309602,79.299014,54.289887,$630-645
Griffin High School,Charter,1468,"$917,500.00",625.0,83.351499,83.816757,93.392371,97.138965,90.599455,$585-630
Hernandez High School,District,4635,"$3,022,020.00",652.0,77.289752,80.934412,66.752967,80.862999,53.527508,$645-680
Holden High School,Charter,427,"$248,087.00",581.0,83.803279,83.814988,92.505855,96.252927,89.227166,<$585
Huang High School,District,2917,"$1,910,635.00",655.0,76.629414,81.182722,65.683922,81.316421,53.513884,$645-680
Johnson High School,District,4761,"$3,094,650.00",650.0,77.072464,80.966394,66.057551,81.222432,53.539172,$645-680
Pena High School,Charter,962,"$585,858.00",609.0,83.839917,84.044699,94.594595,95.945946,90.540541,$585-630


In [313]:
# Create a copy of the school summary for later aggregations
school_spending_df = per_school_summary.copy()

# Display the copied DataFrame to confirm
school_spending_df.head()

Unnamed: 0_level_0,School Type,Total Students,Total School Budget,Per Student Budget,Average Math Score,Average Reading Score,% Passing Math,% Passing Reading,% Overall Passing,Spending Ranges (Per Student)
school_name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
Bailey High School,District,4976,"$3,124,928.00",628.0,77.048432,81.033963,66.680064,81.93328,54.642283,$585-630
Cabrera High School,Charter,1858,"$1,081,356.00",582.0,83.061895,83.97578,94.133477,97.039828,91.334769,<$585
Figueroa High School,District,2949,"$1,884,411.00",639.0,76.711767,81.15802,65.988471,80.739234,53.204476,$630-645
Ford High School,District,2739,"$1,763,916.00",644.0,77.102592,80.746258,68.309602,79.299014,54.289887,$630-645
Griffin High School,Charter,1468,"$917,500.00",625.0,83.351499,83.816757,93.392371,97.138965,90.599455,$585-630


In [314]:
import warnings

# Suppress warnings
warnings.filterwarnings("ignore", category=SyntaxWarning)

# Create a copy of the school summary for later aggregations
school_spending_df = per_school_summary.copy()

# Define spending ranges and apply them
spending_bins = [0, 585, 630, 645, 680]
labels = ["<$585", "$585-630", "$630-645", "$645-680"]

# Ensure correct formatting for Per Student Budget by removing dollar signs and converting to float
school_spending_df["Per Student Budget"] = school_spending_df["Per Student Budget"].replace(r'[\$,]', '', regex=True).astype(float)

# Use `pd.cut` on the per_student_budget to categorize based on spending bins
school_spending_df["Spending Ranges (Per Student)"] = pd.cut(school_spending_df["Per Student Budget"], bins=spending_bins, labels=labels)

# Convert Spending Ranges (Per Student) to a string
school_spending_df["Spending Ranges (Per Student)"] = school_spending_df["Spending Ranges (Per Student)"].astype(str)

# Display the DataFrame
school_spending_df

Unnamed: 0_level_0,School Type,Total Students,Total School Budget,Per Student Budget,Average Math Score,Average Reading Score,% Passing Math,% Passing Reading,% Overall Passing,Spending Ranges (Per Student)
school_name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
Bailey High School,District,4976,"$3,124,928.00",628.0,77.048432,81.033963,66.680064,81.93328,54.642283,$585-630
Cabrera High School,Charter,1858,"$1,081,356.00",582.0,83.061895,83.97578,94.133477,97.039828,91.334769,<$585
Figueroa High School,District,2949,"$1,884,411.00",639.0,76.711767,81.15802,65.988471,80.739234,53.204476,$630-645
Ford High School,District,2739,"$1,763,916.00",644.0,77.102592,80.746258,68.309602,79.299014,54.289887,$630-645
Griffin High School,Charter,1468,"$917,500.00",625.0,83.351499,83.816757,93.392371,97.138965,90.599455,$585-630
Hernandez High School,District,4635,"$3,022,020.00",652.0,77.289752,80.934412,66.752967,80.862999,53.527508,$645-680
Holden High School,Charter,427,"$248,087.00",581.0,83.803279,83.814988,92.505855,96.252927,89.227166,<$585
Huang High School,District,2917,"$1,910,635.00",655.0,76.629414,81.182722,65.683922,81.316421,53.513884,$645-680
Johnson High School,District,4761,"$3,094,650.00",650.0,77.072464,80.966394,66.057551,81.222432,53.539172,$645-680
Pena High School,Charter,962,"$585,858.00",609.0,83.839917,84.044699,94.594595,95.945946,90.540541,$585-630


In [315]:
# Calculate averages for the desired columns.
spending_math_scores = school_spending_df.groupby(["Spending Ranges (Per Student)"])["Average Math Score"].mean()
spending_reading_scores = school_spending_df.groupby(["Spending Ranges (Per Student)"])["Average Reading Score"].mean()
spending_passing_math = school_spending_df.groupby(["Spending Ranges (Per Student)"])["% Passing Math"].mean()
spending_passing_reading = school_spending_df.groupby(["Spending Ranges (Per Student)"])["% Passing Reading"].mean()
overall_passing_spending = school_spending_df.groupby(["Spending Ranges (Per Student)"])["% Overall Passing"].mean()

# Display the results
print("Spending Math Scores:\n", spending_math_scores)
print("\nSpending Reading Scores:\n", spending_reading_scores)
print("\nSpending Passing Math:\n", spending_passing_math)
print("\nSpending Passing Reading:\n", spending_passing_reading)
print("\nOverall Passing Spending:\n", overall_passing_spending)

Spending Math Scores:
 Spending Ranges (Per Student)
$585-630    81.899826
$630-645    78.518855
$645-680    76.997210
<$585       83.455399
Name: Average Math Score, dtype: float64

Spending Reading Scores:
 Spending Ranges (Per Student)
$585-630    83.155286
$630-645    81.624473
$645-680    81.027843
<$585       83.933814
Name: Average Reading Score, dtype: float64

Spending Passing Math:
 Spending Ranges (Per Student)
$585-630    87.133538
$630-645    73.484209
$645-680    66.164813
<$585       93.460096
Name: % Passing Math, dtype: float64

Spending Passing Reading:
 Spending Ranges (Per Student)
$585-630    92.718205
$630-645    84.391793
$645-680    81.133951
<$585       96.610877
Name: % Passing Reading, dtype: float64

Overall Passing Spending:
 Spending Ranges (Per Student)
$585-630    81.418596
$630-645    62.857656
$645-680    53.526855
<$585       90.369459
Name: % Overall Passing, dtype: float64


In [316]:
# Assemble into DataFrame
spending_summary = pd.DataFrame({
    "Average Math Score": spending_math_scores,
    "Average Reading Score": spending_reading_scores,
    "% Passing Math": spending_passing_math,
    "% Passing Reading": spending_passing_reading,
    "% Overall Passing": overall_passing_spending
})

# Display results
spending_summary

Unnamed: 0_level_0,Average Math Score,Average Reading Score,% Passing Math,% Passing Reading,% Overall Passing
Spending Ranges (Per Student),Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
$585-630,81.899826,83.155286,87.133538,92.718205,81.418596
$630-645,78.518855,81.624473,73.484209,84.391793,62.857656
$645-680,76.99721,81.027843,66.164813,81.133951,53.526855
<$585,83.455399,83.933814,93.460096,96.610877,90.369459


## Scores by School Size

In [317]:
# Establish the bins and labels
size_bins = [0, 1000, 2000, 5000]
labels = ["Small (<1000)", "Medium (1000-2000)", "Large (2000-5000)"]

# Apply pd.cut to categorize schools by size
school_spending_df["School Size"] = pd.cut(school_spending_df["Total Students"], bins=size_bins, labels=labels)

# Display the updated DataFrame to check the categorization
school_spending_df

Unnamed: 0_level_0,School Type,Total Students,Total School Budget,Per Student Budget,Average Math Score,Average Reading Score,% Passing Math,% Passing Reading,% Overall Passing,Spending Ranges (Per Student),School Size
school_name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
Bailey High School,District,4976,"$3,124,928.00",628.0,77.048432,81.033963,66.680064,81.93328,54.642283,$585-630,Large (2000-5000)
Cabrera High School,Charter,1858,"$1,081,356.00",582.0,83.061895,83.97578,94.133477,97.039828,91.334769,<$585,Medium (1000-2000)
Figueroa High School,District,2949,"$1,884,411.00",639.0,76.711767,81.15802,65.988471,80.739234,53.204476,$630-645,Large (2000-5000)
Ford High School,District,2739,"$1,763,916.00",644.0,77.102592,80.746258,68.309602,79.299014,54.289887,$630-645,Large (2000-5000)
Griffin High School,Charter,1468,"$917,500.00",625.0,83.351499,83.816757,93.392371,97.138965,90.599455,$585-630,Medium (1000-2000)
Hernandez High School,District,4635,"$3,022,020.00",652.0,77.289752,80.934412,66.752967,80.862999,53.527508,$645-680,Large (2000-5000)
Holden High School,Charter,427,"$248,087.00",581.0,83.803279,83.814988,92.505855,96.252927,89.227166,<$585,Small (<1000)
Huang High School,District,2917,"$1,910,635.00",655.0,76.629414,81.182722,65.683922,81.316421,53.513884,$645-680,Large (2000-5000)
Johnson High School,District,4761,"$3,094,650.00",650.0,77.072464,80.966394,66.057551,81.222432,53.539172,$645-680,Large (2000-5000)
Pena High School,Charter,962,"$585,858.00",609.0,83.839917,84.044699,94.594595,95.945946,90.540541,$585-630,Small (<1000)


In [318]:
# Create a copy of the school summary for later aggregations
school_size_df = per_school_summary.copy()

# Display the copied DataFrame to confirm
school_size_df.head()

Unnamed: 0_level_0,School Type,Total Students,Total School Budget,Per Student Budget,Average Math Score,Average Reading Score,% Passing Math,% Passing Reading,% Overall Passing,Spending Ranges (Per Student)
school_name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
Bailey High School,District,4976,"$3,124,928.00",628.0,77.048432,81.033963,66.680064,81.93328,54.642283,$585-630
Cabrera High School,Charter,1858,"$1,081,356.00",582.0,83.061895,83.97578,94.133477,97.039828,91.334769,<$585
Figueroa High School,District,2949,"$1,884,411.00",639.0,76.711767,81.15802,65.988471,80.739234,53.204476,$630-645
Ford High School,District,2739,"$1,763,916.00",644.0,77.102592,80.746258,68.309602,79.299014,54.289887,$630-645
Griffin High School,Charter,1468,"$917,500.00",625.0,83.351499,83.816757,93.392371,97.138965,90.599455,$585-630


In [319]:
# Adjust bins and labels so the number of labels matches one fewer than the number of bins.
spending_bins = [0, 585, 630, 645, 680]
labels = ["<$585", "$585-630", "$630-645", "$645-680"]

# Use `pd.cut` to categorize per student spending based on the bins.
school_spending_df["Spending Ranges (Per Student)"] = pd.cut(
    per_school_capita, bins=spending_bins, labels=labels
)

# Optional: Convert to string for consistency
school_spending_df["Spending Ranges (Per Student)"] = school_spending_df["Spending Ranges (Per Student)"].astype(str)

# Display to confirm changes
school_spending_df.head()

Unnamed: 0_level_0,School Type,Total Students,Total School Budget,Per Student Budget,Average Math Score,Average Reading Score,% Passing Math,% Passing Reading,% Overall Passing,Spending Ranges (Per Student),School Size
school_name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
Bailey High School,District,4976,"$3,124,928.00",628.0,77.048432,81.033963,66.680064,81.93328,54.642283,$585-630,Large (2000-5000)
Cabrera High School,Charter,1858,"$1,081,356.00",582.0,83.061895,83.97578,94.133477,97.039828,91.334769,<$585,Medium (1000-2000)
Figueroa High School,District,2949,"$1,884,411.00",639.0,76.711767,81.15802,65.988471,80.739234,53.204476,$630-645,Large (2000-5000)
Ford High School,District,2739,"$1,763,916.00",644.0,77.102592,80.746258,68.309602,79.299014,54.289887,$630-645,Large (2000-5000)
Griffin High School,Charter,1468,"$917,500.00",625.0,83.351499,83.816757,93.392371,97.138965,90.599455,$585-630,Medium (1000-2000)


In [320]:
# Establish the bins and labels for school size, ensuring they match correctly
size_bins = [0, 1000, 2000, 5000]  # 4 bins (Small, Medium, Large)
size_labels = ["Small (<1000)", "Medium (1000-2000)", "Large (2000-5000)"]  # 3 labels

# Assign school sizes based on the updated bins and labels
school_size_df["School Size"] = pd.cut(
    per_school_counts.squeeze(),  # Ensures correct format if it's a DataFrame
    bins=size_bins,
    labels=size_labels
)

# Now calculate the averages for the desired columns with observed=False to prevent warnings
import warnings
warnings.filterwarnings("ignore", category=FutureWarning)

size_math_scores = school_size_df.groupby("School Size", observed=False)["Average Math Score"].mean()
size_reading_scores = school_size_df.groupby("School Size", observed=False)["Average Reading Score"].mean()
size_passing_math = school_size_df.groupby("School Size", observed=False)["% Passing Math"].mean()
size_passing_reading = school_size_df.groupby("School Size", observed=False)["% Passing Reading"].mean()
size_overall_passing = school_size_df.groupby("School Size", observed=False)["% Overall Passing"].mean()

# Displaying one of the results to verify
size_math_scores

School Size
Small (<1000)         83.821598
Medium (1000-2000)    83.374684
Large (2000-5000)     77.746417
Name: Average Math Score, dtype: float64

In [321]:
# Create the `size_summary` DataFrame based on the calculated averages by school size
size_summary = pd.DataFrame({
    "Average Math Score": size_math_scores,
    "Average Reading Score": size_reading_scores,
    "% Passing Math": size_passing_math,
    "% Passing Reading": size_passing_reading,
    "% Overall Passing": size_overall_passing
})

# Display results
size_summary


Unnamed: 0_level_0,Average Math Score,Average Reading Score,% Passing Math,% Passing Reading,% Overall Passing
School Size,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Small (<1000),83.821598,83.929843,93.550225,96.099437,89.883853
Medium (1000-2000),83.374684,83.864438,93.599695,96.79068,90.621535
Large (2000-5000),77.746417,81.344493,69.963361,82.766634,58.286003


## Scores by School Type

In [322]:
# Group the per_school_summary DataFrame by "School Type" and average the results.
average_math_score_by_type = per_school_summary.groupby(["School Type"])["Average Math Score"].mean()
print("\nAverage Math Score by Type:\n", average_math_score_by_type)

average_reading_score_by_type = per_school_summary.groupby(["School Type"])["Average Reading Score"].mean()
print("\nAverage Reading Score by Type:\n", average_reading_score_by_type)

average_percent_passing_math_by_type = per_school_summary.groupby(["School Type"])["% Passing Math"].mean()
print("\nAverage % Passing Math by Type:\n", average_percent_passing_math_by_type)

average_percent_passing_reading_by_type = per_school_summary.groupby(["School Type"])["% Passing Reading"].mean()
print("\nAverage % Passing Reading by Type:\n", average_percent_passing_reading_by_type)

average_percent_overall_passing_by_type = per_school_summary.groupby(["School Type"])["% Overall Passing"].mean()
print("\nAverage % Overall Passing by Type:\n", average_percent_overall_passing_by_type)


Average Math Score by Type:
 School Type
Charter     83.473852
District    76.956733
Name: Average Math Score, dtype: float64

Average Reading Score by Type:
 School Type
Charter     83.896421
District    80.966636
Name: Average Reading Score, dtype: float64

Average % Passing Math by Type:
 School Type
Charter     93.620830
District    66.548453
Name: % Passing Math, dtype: float64

Average % Passing Reading by Type:
 School Type
Charter     96.586489
District    80.799062
Name: % Passing Reading, dtype: float64

Average % Overall Passing by Type:
 School Type
Charter     90.432244
District    53.672208
Name: % Overall Passing, dtype: float64


In [323]:
# Assemble the new data by type into a DataFrame called `type_summary`
type_summary = pd.DataFrame({
    "Average Math Score": average_math_score_by_type,
    "Average Reading Score": average_reading_score_by_type,
    "% Passing Math": average_percent_passing_math_by_type,
    "% Passing Reading": average_percent_passing_reading_by_type,
    "% Overall Passing": average_percent_overall_passing_by_type
})

# Display the `type_summary` DataFrame with results grouped by School Type
type_summary

Unnamed: 0_level_0,Average Math Score,Average Reading Score,% Passing Math,% Passing Reading,% Overall Passing
School Type,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Charter,83.473852,83.896421,93.62083,96.586489,90.432244
District,76.956733,80.966636,66.548453,80.799062,53.672208
