# PyCity Schools Analysis

* As a whole, schools with higher budgets, did not yield better test results. By contrast, schools with higher spending 645-675 per student actually underperformed compared to schools with smaller budgets (585 per student).

* As a whole, smaller and medium sized schools dramatically out-performed large sized schools on passing math performances (89-91% passing vs 67%).

* As a whole, charter schools out-performed the public district schools across all metrics. However, more analysis will be required to glean if the effect is due to school practices or the fact that charter schools tend to serve smaller student populations per school. 
---

In [1]:
# Dependencies and Setup
import pandas as pd

# File to Load (Remember to Change These)
school_data_to_load = "schools_complete.csv"
student_data_to_load = "students_complete.csv"

# Read School and Student Data File and store into Pandas DataFrames
school_dataDF = pd.read_csv(school_data_to_load)
student_dataDF = pd.read_csv(student_data_to_load)

# Combine the data into a single dataset.  
school_data_completeDF = pd.merge(student_dataDF, school_dataDF, how="left", on=["school_name", "school_name"])
school_data_completeDF.head()

Unnamed: 0,Student ID,student_name,gender,grade,school_name,reading_score,math_score,School ID,type,size,budget
0,0,Paul Bradley,M,9th,Huang High School,66,79,0,District,2917,1910635
1,1,Victor Smith,M,12th,Huang High School,94,61,0,District,2917,1910635
2,2,Kevin Rodriguez,M,12th,Huang High School,90,60,0,District,2917,1910635
3,3,Dr. Richard Scott,M,12th,Huang High School,67,58,0,District,2917,1910635
4,4,Bonnie Ray,F,9th,Huang High School,97,84,0,District,2917,1910635


In [2]:
#isolate the school name column to count 
schoolCountsDF = pd.DataFrame(school_data_completeDF["school_name"]).value_counts()
schoolCountsDF.head()

school_name          
Bailey High School       4976
Johnson High School      4761
Hernandez High School    4635
Rodriguez High School    3999
Figueroa High School     2949
dtype: int64

## District Summary

In [43]:
# Calculate the total number of unique schools
num_unique_schools = school_data_completeDF['school_name'].nunique() 
print("The total number of unique schools is:", num_unique_schools)

The total number of unique schools is: 15


In [45]:
# Calculate the total number of students
student_count = school_data_completeDF['Student ID'].count()
print("The total number of students are:", student_count)

The total number of students are: 39170


In [46]:
# Calculate the total budget
total_budget = school_data_completeDF['budget'].unique().sum()
print("The total budget is:", total_budget)

The total budget is: 24649428


In [6]:
# Calculate the average (mean) math score
average_math_score = school_data_completeDF['math_score'].mean()
print("The average math score is:", average_math_score)

The average math score is: 78.98537145774827


In [7]:
# Calculate the average (mean) reading score
average_reading_score = school_data_completeDF['reading_score'].mean()
print("The average reading score is:", average_reading_score)

The average reading score is: 81.87784018381414


In [8]:
# Use the following to calculate the percentage of students who passed math (math scores greather than or equal to 70)

#calculate the number of students who passed math
passing_math_count = school_data_completeDF[(school_data_completeDF['math_score'] >= 70)]['student_name'].count()

#Calculate the total number of students
total_students = school_data_completeDF['student_name'].count()

#calculate teh percentage of students who passed math
passing_math_percentage = (passing_math_count /total_students) * 100

print("Percentage of students who passed math:", passing_math_percentage)

Percentage of students who passed math: 74.9808526933878


In [9]:
# Calculate the percentage of students who passeed reading (hint: look at how the math percentage was calculated) 
#calculate the number of students who passed reading
passing_reading_count = school_data_completeDF[(school_data_completeDF["reading_score"] >= 70)]["student_name"].count()

#calculate the total number of students
total_sudents = school_data_completeDF['student_name'].count()

#calculate the percentage of students who passed reading
passing_reading_percentage = (passing_reading_count /total_students) * 100

print("Percentage of students who passed reading:", passing_reading_percentage)


Percentage of students who passed reading: 85.80546336482001


In [10]:
# Use the following to calculate the percentage of students that passed math and reading
#find the number of students who passed both math and reading
passed_both = school_data_completeDF[(school_data_completeDF['reading_score']>=70)&(school_data_completeDF['math_score']>=70)]
#calculate the percentage of students who passed both
overall_passing_rate = (len(passed_both)/len(school_data_completeDF)) * 100
print("The percent of students who passed both math and reading is:", overall_passing_rate)


The percent of students who passed both math and reading is: 65.17232575950983


In [11]:
# Create a high-level snapshot of the district's key metrics in a DataFrame
#Calculate the total number of schools
total_schools = len(school_data_completeDF['School ID'].unique())

#calculate the total number of students
total_students = len(school_data_completeDF)

#calculate total budget
total_budget = school_data_completeDF['budget'].unique().sum()

#calculate the avg math score
average_math_score = school_data_completeDF['math_score'].mean()

#calcuate the avg reading score
average_reading_score = school_data_completeDF['reading_score'].mean()

#calculate the percentage of students who passed math
passing_math_perc = (len(school_data_completeDF[school_data_completeDF['math_score']>=70])/len(school_data_completeDF))*100

#calculate the percentage of students who passed reading
passing_read_perc = (len(school_data_completeDF[school_data_completeDF['reading_score']>=70])/len(school_data_completeDF))*100

#calculate percentage passing both
passed_both= (len(school_data_completeDF[(school_data_completeDF['reading_score']>=70)&(school_data_completeDF['math_score']>=70)])/ len(school_data_completeDF))*100

#create a new dataframe with the values
district_summaryDF = pd.DataFrame({
    'Total Schools': [total_schools],
    'Total Students': [total_students],
    'Total Budget': [total_budget],
    'Average Math Score':[average_math_score],
    'Average Reading Score': [average_reading_score],
    '% Students passing Math':[passing_math_perc],
    '% Students passing Reading': [passing_read_perc],
    '% Students passing Overall': [passed_both] 
})

# Formatting
district_summaryDF['Total Students'] = district_summaryDF['Total Students'].apply(lambda x: '{:,.0f}'.format(x))
district_summaryDF['Total Budget'] = district_summaryDF['Total Budget'].apply(lambda x:'${:,.2f}'.format(x))

# Display the DataFrame
district_summaryDF

Unnamed: 0,Total Schools,Total Students,Total Budget,Average Math Score,Average Reading Score,% Students passing Math,% Students passing Reading,% Students passing Overall
0,15,39170,"$24,649,428.00",78.985371,81.87784,74.980853,85.805463,65.172326


## School Summary

In [12]:
# Use the code provided to select the school type
school_types = school_data_completeDF["type"]
print(school_types)

0        District
1        District
2        District
3        District
4        District
           ...   
39165     Charter
39166     Charter
39167     Charter
39168     Charter
39169     Charter
Name: type, Length: 39170, dtype: object


In [13]:
# Calculate the total student count
per_school_counts = len(school_data_completeDF['Student ID'])
print(per_school_counts)

39170


In [47]:
# Calculate the total school budget and per capita spending
total_budget = school_data_completeDF['budget'].unique().sum()
per_school_capita = total_budget / per_school_counts

print(f"The school budget is ${total_budget:,.2f}.")
print(f"The per capita spending is ${per_school_capita:,.2f}.")

The school budget is $24,649,428.00.
The per capita spending is $629.29.


In [55]:
# Calculate the average test scores
avg_math_score = school_data_completeDF['math_score'].mean()
print("The average math score is:", avg_math_score)

avg_reading_score = school_data_completeDF['reading_score'].mean()
print("The average reading score is:", avg_reading_score)
# per_school_math = 
# per_school_reading = 

The average math score is: 78.98537145774827
The average reading score is: 81.87784018381414


In [15]:
# Calculate the number of schools with math scores of 70 or higher
#Create df to count schools
school_passing_mathDF = school_data_completeDF[school_data_completeDF['math_score'] >=70]

#Count the number of unique schools in the new df
num_schools_passed = school_passing_mathDF['School ID'].nunique()
print(f"The number of schools that passed math is {num_schools_passed}.")

The number of schools that passed math is 15.


In [16]:
# Calculate the number of schools with reading scores of 70 or higher
school_passing_readingDF = school_data_completeDF[school_data_completeDF['reading_score'] >=70]

#Count the number of unique schools in the new df
num_schools_passed = school_passing_readingDF['School ID'].nunique()
print(f"The number of schools that passed reading is {num_schools_passed}.")

The number of schools that passed reading is 15.


In [17]:
# Use the provided code to calculate the schools that passed both math and reading with scores of 70 or higher

#create a df to do the count
passing_math_and_readingDF =school_data_completeDF[(school_data_completeDF['math_score'] >=70)& (school_data_completeDF['reading_score']>=70)] 

#count the number of schoold in the df
num_schools = passing_math_and_readingDF['School ID'].nunique()
print(f"The number of schools that passed math and reading is {num_schools}.")


The number of schools that passed math and reading is 15.


In [18]:
# Use the provided code to calculate the passing rates

per_school_passing_math = len(school_data_completeDF[school_data_completeDF['math_score']>=70])/ len(school_data_completeDF)*100

per_school_passing_reading = len(school_data_completeDF[school_data_completeDF['reading_score']>=70])/ len(school_data_completeDF)*100

overall_passing_rate = len(school_data_completeDF[(school_data_completeDF['math_score']>=70)& (school_data_completeDF['reading_score']>=70)]) / len(school_data_completeDF)*100

print(f"The percentage of students who passed math is {per_school_passing_math:.2f}%")
print(f"The percentage of students who passed reading is {per_school_passing_reading:.2f}%")
print(f"The percentage of students who passed both is {overall_passing_rate:.2f}%")



The percentage of students who passed math is 74.98%
The percentage of students who passed reading is 85.81%
The percentage of students who passed both is 65.17%


In [19]:
# Create a DataFrame called `per_school_summary` with columns for the calculations above.

#Group the data by school name and type
grouped_school_data = school_data_completeDF.groupby(['school_name', 'type'])

#Calculate the total number of students for ea school
total_students = grouped_school_data['Student ID'].count()

#Calculate the total budget for each school
total_budget = grouped_school_data['budget'].first()

#Calculate the per student budget for each school

per_student_budget = total_budget / total_students

#Calculate the avg math score for each school
avg_math_score = grouped_school_data ['math_score'].mean()

#Calculate the avg readign score for each school
avg_reading_score = grouped_school_data ['reading_score'].mean()

#Calculate the percentage of the students passing math for each school
passing_math = school_data_completeDF[school_data_completeDF['math_score'] >= 70].groupby(['school_name', 'type'])\
    ['math_score'].count() /total_students * 100

#Calculate the percentage of the students passing reading for each school
passing_reading = school_data_completeDF[school_data_completeDF['reading_score'] >=70].groupby(['school_name', 'type'])\
    ['reading_score'].count() / total_students * 100

#Calculate the number of students passing both math and reading for each school
passing_overall = school_data_completeDF[(school_data_completeDF['math_score'] >= 70) & (school_data_completeDF['reading_score']\
    >=70)].groupby (['school_name', 'type'])['Student ID'].count() / total_students * 100

#Create a new dataframe with the calculated values
per_school_summary = pd.DataFrame({
    
    'Total Students': total_students,
    'Total School Budget': total_budget.map('${:,.2f}'.format),
    'Per Student Budget': per_student_budget.map('${:,.2f}'.format),
    'Average Math Score': avg_math_score,
    'Average Reading Score': avg_reading_score,
    '% Passing Math': passing_math,
    '% Passing Reading': passing_reading,
    '% Overall Passing': passing_overall
})

#Reset index to make school name and type columns
per_school_summary = per_school_summary.reset_index()

#Rename the type column to School Type
per_school_summary = per_school_summary.rename(columns={'type': 'School Type'})

per_school_summary = per_school_summary.set_index('school_name')
per_school_summary.index.name = None

# Display the DataFrame
per_school_summary


# # Formatting
# per_school_summary["Total School Budget"] = per_school_summary["Total School Budget"].map("${:,.2f}".format)
# per_school_summary["Per Student Budget"] = per_school_summary["Per Student Budget"].map("${:,.2f}".format)



Unnamed: 0,School Type,Total Students,Total School Budget,Per Student Budget,Average Math Score,Average Reading Score,% Passing Math,% Passing Reading,% Overall Passing
Bailey High School,District,4976,"$3,124,928.00",$628.00,77.048432,81.033963,66.680064,81.93328,54.642283
Cabrera High School,Charter,1858,"$1,081,356.00",$582.00,83.061895,83.97578,94.133477,97.039828,91.334769
Figueroa High School,District,2949,"$1,884,411.00",$639.00,76.711767,81.15802,65.988471,80.739234,53.204476
Ford High School,District,2739,"$1,763,916.00",$644.00,77.102592,80.746258,68.309602,79.299014,54.289887
Griffin High School,Charter,1468,"$917,500.00",$625.00,83.351499,83.816757,93.392371,97.138965,90.599455
Hernandez High School,District,4635,"$3,022,020.00",$652.00,77.289752,80.934412,66.752967,80.862999,53.527508
Holden High School,Charter,427,"$248,087.00",$581.00,83.803279,83.814988,92.505855,96.252927,89.227166
Huang High School,District,2917,"$1,910,635.00",$655.00,76.629414,81.182722,65.683922,81.316421,53.513884
Johnson High School,District,4761,"$3,094,650.00",$650.00,77.072464,80.966394,66.057551,81.222432,53.539172
Pena High School,Charter,962,"$585,858.00",$609.00,83.839917,84.044699,94.594595,95.945946,90.540541


## Highest-Performing Schools (by % Overall Passing)

In [20]:
# Sort the schools by `% Overall Passing` in descending order and display the top 5 rows.
sorted_schools = per_school_summary.sort_values('% Overall Passing', ascending=False)
sorted_schools.head()

Unnamed: 0,School Type,Total Students,Total School Budget,Per Student Budget,Average Math Score,Average Reading Score,% Passing Math,% Passing Reading,% Overall Passing
Cabrera High School,Charter,1858,"$1,081,356.00",$582.00,83.061895,83.97578,94.133477,97.039828,91.334769
Thomas High School,Charter,1635,"$1,043,130.00",$638.00,83.418349,83.84893,93.272171,97.308869,90.948012
Griffin High School,Charter,1468,"$917,500.00",$625.00,83.351499,83.816757,93.392371,97.138965,90.599455
Wilson High School,Charter,2283,"$1,319,574.00",$578.00,83.274201,83.989488,93.867718,96.539641,90.582567
Pena High School,Charter,962,"$585,858.00",$609.00,83.839917,84.044699,94.594595,95.945946,90.540541


## Bottom Performing Schools (By % Overall Passing)

In [21]:
# Sort the schools by `% Overall Passing` in ascending order and display the top 5 rows.
sorted_schools = per_school_summary.sort_values('% Overall Passing', ascending=True)
sorted_schools.head()

Unnamed: 0,School Type,Total Students,Total School Budget,Per Student Budget,Average Math Score,Average Reading Score,% Passing Math,% Passing Reading,% Overall Passing
Rodriguez High School,District,3999,"$2,547,363.00",$637.00,76.842711,80.744686,66.366592,80.220055,52.988247
Figueroa High School,District,2949,"$1,884,411.00",$639.00,76.711767,81.15802,65.988471,80.739234,53.204476
Huang High School,District,2917,"$1,910,635.00",$655.00,76.629414,81.182722,65.683922,81.316421,53.513884
Hernandez High School,District,4635,"$3,022,020.00",$652.00,77.289752,80.934412,66.752967,80.862999,53.527508
Johnson High School,District,4761,"$3,094,650.00",$650.00,77.072464,80.966394,66.057551,81.222432,53.539172


## Math Scores by Grade

In [23]:
# Use the code provided to separate the data by grade
# ninth_graders = school_data_complete[(school_data_complete["grade"] == "9th")]
# tenth_graders = school_data_complete[(school_data_complete["grade"] == "10th")]
# eleventh_graders = school_data_complete[(school_data_complete["grade"] == "11th")]
# twelfth_graders = school_data_complete[(school_data_complete["grade"] == "12th")]

# Group by "school_name" and take the mean of each.
math_scores_by_grade = student_dataDF.groupby(['school_name','grade']).mean()['math_score'].unstack()

#Reverse the order of the grad columns
math_scores_by_grade = math_scores_by_grade.loc[:, ::-1]

# Minor data wrangling
math_scores_by_grade.index.name = None

# Use the code to select only the `math_score`.
# ninth_grade_math_scores = ninth_graders_scores["math_score"]
# tenth_grader_math_scores = tenth_graders_scores["math_score"]
# eleventh_grader_math_scores = eleventh_graders_scores.mean()["math_score"]
# twelfth_grader_math_scores = twelfth_graders_scores["math_score"]

# Combine each of the scores above into single DataFrame called `math_scores_by_grade`

# Display the DataFrame
math_scores_by_grade

grade,9th,12th,11th,10th
Bailey High School,77.083676,76.492218,77.515588,76.996772
Cabrera High School,83.094697,83.277487,82.76556,83.154506
Figueroa High School,76.403037,77.151369,76.884344,76.539974
Ford High School,77.361345,76.179963,76.918058,77.672316
Griffin High School,82.04401,83.356164,83.842105,84.229064
Hernandez High School,77.438495,77.186567,77.136029,77.337408
Holden High School,83.787402,82.855422,85.0,83.429825
Huang High School,77.027251,77.225641,76.446602,75.908735
Johnson High School,77.187857,76.863248,77.491653,76.691117
Pena High School,83.625455,84.121547,84.328125,83.372


## Reading Score by Grade 

In [24]:
# Use the code provided to separate the data by grade
# ninth_graders = school_data_complete[(school_data_complete["grade"] == "9th")]
# tenth_graders = school_data_complete[(school_data_complete["grade"] == "10th")]
# eleventh_graders = school_data_complete[(school_data_complete["grade"] == "11th")]
# twelfth_graders = school_data_complete[(school_data_complete["grade"] == "12th")]

# Group by "school_name" and take the mean of each.
reading_scores_by_grade = student_dataDF.groupby(['school_name','grade']).mean()['reading_score'].unstack()

# Use the code to select only the `reading_score`.
# ninth_grade_reading_scores = ninth_graders_scores["reading_score"]
# tenth_grader_reading_scores = tenth_graders_scores["reading_score"]
# eleventh_grader_reading_scores = eleventh_graders_scores.mean()["reading_score"]
# twelfth_grader_reading_scores = twelfth_graders_scores["reading_score"]

# Combine each of the scores above into single DataFrame called `reading_scores_by_grade`

#Reverse the order of the grad columns
reding_scores_by_grade = reading_scores_by_grade.loc[:, ::-1]

# Minor data wrangling
reading_scores_by_grade.index.name = None

# Display the DataFrame
reading_scores_by_grade

grade,10th,11th,12th,9th
Bailey High School,80.907183,80.945643,80.912451,81.303155
Cabrera High School,84.253219,83.788382,84.287958,83.676136
Figueroa High School,81.408912,80.640339,81.384863,81.198598
Ford High School,81.262712,80.403642,80.662338,80.632653
Griffin High School,83.706897,84.288089,84.013699,83.369193
Hernandez High School,80.660147,81.39614,80.857143,80.86686
Holden High School,83.324561,83.815534,84.698795,83.677165
Huang High School,81.512386,81.417476,80.305983,81.290284
Johnson High School,80.773431,80.616027,81.227564,81.260714
Pena High School,83.612,84.335938,84.59116,83.807273


## Scores by School Spending

In [25]:
# Establish the bins 
spending_bins = [0, 585, 630, 645, 680]
spending_labels = ["<$585", "$585-630", "$630-645", "$645-680"]

In [31]:
# Create a copy of the school summary since it has the "Per Student Budget" 
school_spending = per_school_summary.copy()

#Convert total school budget into numeric data type
school_spending["Total School Budget"] = pd.to_numeric(school_spending["Total School Budget"].str.replace("$", "").str.replace(",", ""))
school_spending["Per Student Budget"] = pd.to_numeric(school_spending["Per Student Budget"].str.replace("$", "").str.replace(",", ""))

school_spending["Total Students"] = pd.to_numeric(school_spending["Total Students"])
school_spending["Average Math Score"] = pd.to_numeric(school_spending["Average Math Score"])
school_spending["Average Reading Score"] = pd.to_numeric(school_spending["Average Reading Score"])
school_spending["% Passing Math"] = pd.to_numeric(school_spending["% Passing Math"])
school_spending["% Reading"] = pd.to_numeric(school_spending["% Passing Reading"])



  school_spending["Total School Budget"] = pd.to_numeric(school_spending["Total School Budget"].str.replace("$", "").str.replace(",", ""))
  school_spending["Per Student Budget"] = pd.to_numeric(school_spending["Per Student Budget"].str.replace("$", "").str.replace(",", ""))


In [33]:
# Use `pd.cut` to categorize spending based on the bins.
school_spending["Spending Ranges (Per Student)"] = pd.cut(school_spending["Per Student Budget"], spending_bins, labels=spending_labels)
school_spending

Unnamed: 0,School Type,Total Students,Total School Budget,Per Student Budget,Average Math Score,Average Reading Score,% Passing Math,% Passing Reading,% Overall Passing,% Reading,Spending Ranges (Per Student)
Bailey High School,District,4976,3124928.0,628.0,77.048432,81.033963,66.680064,81.93328,54.642283,81.93328,$585-630
Cabrera High School,Charter,1858,1081356.0,582.0,83.061895,83.97578,94.133477,97.039828,91.334769,97.039828,<$585
Figueroa High School,District,2949,1884411.0,639.0,76.711767,81.15802,65.988471,80.739234,53.204476,80.739234,$630-645
Ford High School,District,2739,1763916.0,644.0,77.102592,80.746258,68.309602,79.299014,54.289887,79.299014,$630-645
Griffin High School,Charter,1468,917500.0,625.0,83.351499,83.816757,93.392371,97.138965,90.599455,97.138965,$585-630
Hernandez High School,District,4635,3022020.0,652.0,77.289752,80.934412,66.752967,80.862999,53.527508,80.862999,$645-680
Holden High School,Charter,427,248087.0,581.0,83.803279,83.814988,92.505855,96.252927,89.227166,96.252927,<$585
Huang High School,District,2917,1910635.0,655.0,76.629414,81.182722,65.683922,81.316421,53.513884,81.316421,$645-680
Johnson High School,District,4761,3094650.0,650.0,77.072464,80.966394,66.057551,81.222432,53.539172,81.222432,$645-680
Pena High School,Charter,962,585858.0,609.0,83.839917,84.044699,94.594595,95.945946,90.540541,95.945946,$585-630


In [37]:
#  Calculate averages for the desired columns.
# Assemble into DataFrame
# spending_summary = school_spending

#group the data by spending range an caluculate the avg of the relevant columns
school_spending = school_spending.groupby("Spending Ranges (Per Student)").agg({
    "Average Math Score": "mean",
    "Average Reading Score": "mean",
    "% Passing Math": "mean",
    "% Passing Reading": "mean",
    "% Overall Passing": "mean"       
}).reset_index()

# spending_math_scores = school_spending_df.groupby(["Spending Ranges (Per Student)"]).mean()["Average Math Score"]
# spending_reading_scores = school_spending_df.groupby(["Spending Ranges (Per Student)"]).mean()["Average Reading Score"]
# spending_passing_math = school_spending_df.groupby(["Spending Ranges (Per Student)"]).mean()["% Passing Math"]
# spending_passing_reading = school_spending_df.groupby(["Spending Ranges (Per Student)"]).mean()["% Passing Reading"]
# overall_passing_spending = school_spending_df.groupby(["Spending Ranges (Per Student)"]).mean()["% Overall Passing"]

# Display results
school_spending


Unnamed: 0,Spending Ranges (Per Student),Average Math Score,Average Reading Score,% Passing Math,% Passing Reading,% Overall Passing
0,<$585,83.455399,83.933814,93.460096,96.610877,90.369459
1,$585-630,81.899826,83.155286,87.133538,92.718205,81.418596
2,$630-645,78.518855,81.624473,73.484209,84.391793,62.857656
3,$645-680,76.99721,81.027843,66.164813,81.133951,53.526855


## Scores by School Size

In [38]:
# Establish the bins.
size_bins = [0, 1000, 2000, 5000]
labels = ["Small (<1000)", "Medium (1000-2000)", "Large (2000-5000)"]

#create a copy of the school summary
size_summary = per_school_summary.copy()


In [39]:
# Categorize the spending based on the bins
# Use `pd.cut` on the "Total Students" column of the `per_school_summary` DataFrame.

size_summary["School Size"] = pd.cut(size_summary["Total Students"], size_bins, labels=labels)
size_summary

Unnamed: 0,School Type,Total Students,Total School Budget,Per Student Budget,Average Math Score,Average Reading Score,% Passing Math,% Passing Reading,% Overall Passing,School Size
Bailey High School,District,4976,"$3,124,928.00",$628.00,77.048432,81.033963,66.680064,81.93328,54.642283,Large (2000-5000)
Cabrera High School,Charter,1858,"$1,081,356.00",$582.00,83.061895,83.97578,94.133477,97.039828,91.334769,Medium (1000-2000)
Figueroa High School,District,2949,"$1,884,411.00",$639.00,76.711767,81.15802,65.988471,80.739234,53.204476,Large (2000-5000)
Ford High School,District,2739,"$1,763,916.00",$644.00,77.102592,80.746258,68.309602,79.299014,54.289887,Large (2000-5000)
Griffin High School,Charter,1468,"$917,500.00",$625.00,83.351499,83.816757,93.392371,97.138965,90.599455,Medium (1000-2000)
Hernandez High School,District,4635,"$3,022,020.00",$652.00,77.289752,80.934412,66.752967,80.862999,53.527508,Large (2000-5000)
Holden High School,Charter,427,"$248,087.00",$581.00,83.803279,83.814988,92.505855,96.252927,89.227166,Small (<1000)
Huang High School,District,2917,"$1,910,635.00",$655.00,76.629414,81.182722,65.683922,81.316421,53.513884,Large (2000-5000)
Johnson High School,District,4761,"$3,094,650.00",$650.00,77.072464,80.966394,66.057551,81.222432,53.539172,Large (2000-5000)
Pena High School,Charter,962,"$585,858.00",$609.00,83.839917,84.044699,94.594595,95.945946,90.540541,Small (<1000)


In [None]:
# Calculate averages for the desired columns. 
# size_math_scores = per_school_summary.groupby(["School Size"]).mean()["Average Math Score"]
# size_reading_scores = per_school_summary.groupby(["School Size"]).mean()["Average Reading Score"]
# size_passing_math = per_school_summary.groupby(["School Size"]).mean()["% Passing Math"]
# size_passing_reading = per_school_summary.groupby(["School Size"]).mean()["% Passing Reading"]
# size_overall_passing = per_school_summary.groupby(["School Size"]).mean()["% Overall Passing"]

In [40]:
# Create a DataFrame called `size_summary` that breaks down school performance based on school size (small, medium, or large).
# Use the scores above to create a new DataFrame called `size_summary`
size_summary = size_summary.groupby("School Size").agg({
   "Average Math Score": "mean",
    "Average Reading Score": "mean",
    "% Passing Math": "mean",
    "% Passing Reading": "mean",
    "% Overall Passing": "mean"
}).reset_index()  

# Display results
size_summary

Unnamed: 0,School Size,Average Math Score,Average Reading Score,% Passing Math,% Passing Reading,% Overall Passing
0,Small (<1000),83.821598,83.929843,93.550225,96.099437,89.883853
1,Medium (1000-2000),83.374684,83.864438,93.599695,96.79068,90.621535
2,Large (2000-5000),77.746417,81.344493,69.963361,82.766634,58.286003


## Scores by School Type

In [41]:
# Group the per_school_summary DataFrame by "School Type" and average the results.
# type_math_scores = 
# type_reading_scores = 
# type_passing_math = 
# type_passing_reading = 
# type_overall_passing = 
type_summary = per_school_summary.copy()

# Use the code provided to select new column data
# average_math_score_by_type = type_math_scores["Average Math Score"]
# average_reading_score_by_type = type_reading_scores["Average Reading Score"]
# average_percent_passing_math_by_type = type_passing_math["% Passing Math"]
# average_percent_passing_reading_by_type = type_passing_reading["% Passing Reading"]
# average_percent_overall_passing_by_type = type_overall_passing["% Overall Passing"]

In [42]:
# Assemble the new data by type into a DataFrame called `type_summary`
type_summary = type_summary.groupby("School Type").agg({
    "Average Math Score": "mean",
    "Average Reading Score": "mean",
    "% Passing Math": "mean",
    "% Passing Reading": "mean",
    "% Overall Passing": "mean"
}).reset_index()

# Display results
type_summary

Unnamed: 0,School Type,Average Math Score,Average Reading Score,% Passing Math,% Passing Reading,% Overall Passing
0,Charter,83.473852,83.896421,93.62083,96.586489,90.432244
1,District,76.956733,80.966636,66.548453,80.799062,53.672208


Analysis -

Summary: This exercise looked at the data from 15 different high schools. The schools were a mix of Charter schools and District schools. The data also compared math and reading scores across all grade levels and between the different schools. 

Conclusion #1 - While charter schools are smaller in population than the district schools, thier budgets are smaller in terms of dollar to dollar amount, however, the per student budget comparision shows that there is only about a $45 difference on average between the two types of schools. 
Conclusion #2 - The charters schools have higher academic scores than the district schools. This aligns with the concept that smaller class sizes attribute to higher test scores among students. 