# PyCity Schools Analysis

* As a whole, schools with higher budgets, did not yield better test results. By contrast, schools with higher spending per student actually (\$645-675) underperformed compared to schools with smaller budgets (<\$585 per student).

* As a whole, smaller and medium sized schools dramatically out-performed large sized schools on passing math performances (89-91% passing vs 67%).

* As a whole, charter schools out-performed the public district schools across all metrics. However, more analysis will be required to glean if the effect is due to school practices or the fact that charter schools tend to serve smaller student populations per school. 
---

In [1]:
# Dependencies and Setup
import pandas as pd
import numpy as np
import os, csv

# File to Load 
school_data_to_load = "Resources/schools_complete.csv"
student_data_to_load = "Resources/students_complete.csv"

# Read School and Student Data File and store into Pandas Data Frames
school_data = pd.read_csv(school_data_to_load)
student_data = pd.read_csv(student_data_to_load)

# Combine the data into a single dataset (consider using a left join)
all_schools_df = pd.merge(student_data,school_data, how='left')
all_schools_df

Unnamed: 0,Student ID,student_name,gender,grade,school_name,reading_score,math_score,School ID,type,size,budget
0,0,Paul Bradley,M,9th,Huang High School,66,79,0,District,2917,1910635
1,1,Victor Smith,M,12th,Huang High School,94,61,0,District,2917,1910635
2,2,Kevin Rodriguez,M,12th,Huang High School,90,60,0,District,2917,1910635
3,3,Dr. Richard Scott,M,12th,Huang High School,67,58,0,District,2917,1910635
4,4,Bonnie Ray,F,9th,Huang High School,97,84,0,District,2917,1910635
...,...,...,...,...,...,...,...,...,...,...,...
39165,39165,Donna Howard,F,12th,Thomas High School,99,90,14,Charter,1635,1043130
39166,39166,Dawn Bell,F,10th,Thomas High School,95,70,14,Charter,1635,1043130
39167,39167,Rebecca Tanner,F,9th,Thomas High School,73,84,14,Charter,1635,1043130
39168,39168,Desiree Kidd,F,10th,Thomas High School,99,90,14,Charter,1635,1043130


## District Summary

In [2]:
calculated_df = all_schools_df.groupby(['school_name'])

calculated_df.head()

Unnamed: 0,Student ID,student_name,gender,grade,school_name,reading_score,math_score,School ID,type,size,budget
0,0,Paul Bradley,M,9th,Huang High School,66,79,0,District,2917,1910635
1,1,Victor Smith,M,12th,Huang High School,94,61,0,District,2917,1910635
2,2,Kevin Rodriguez,M,12th,Huang High School,90,60,0,District,2917,1910635
3,3,Dr. Richard Scott,M,12th,Huang High School,67,58,0,District,2917,1910635
4,4,Bonnie Ray,F,9th,Huang High School,97,84,0,District,2917,1910635
...,...,...,...,...,...,...,...,...,...,...,...
37535,37535,Norma Mata,F,10th,Thomas High School,76,76,14,Charter,1635,1043130
37536,37536,Cody Miller,M,11th,Thomas High School,84,82,14,Charter,1635,1043130
37537,37537,Erik Snyder,M,9th,Thomas High School,80,90,14,Charter,1635,1043130
37538,37538,Tanya Martinez,F,9th,Thomas High School,71,69,14,Charter,1635,1043130


In [3]:
# Calculate the Totals (Schools and Students)
#total_schools = all_schools_df['school_name'].unique()
total_schools = calculated_df['school_name'].unique().count()
total_students = calculated_df['student_name'].count()
total_schools

15

In [4]:
# Calculate the Total Budget
total_budget = all_schools_df['budget'].drop_duplicates().sum()
total_budget
#calculated_df['Total Budget'] = total_budget

24649428

In [5]:
# Calculate the Average Scores
average_math = calculated_df['math_score'].mean()
average_reading = calculated_df['reading_score'].mean()


In [9]:
# Calculate the Percentage Pass Rates
passed_math = all_schools_df.loc[all_schools_df['math_score']>=70]
passed_reading = all_schools_df.loc[all_schools_df['reading_score']>=70]

passed_p_math = passed_math['student_name'].count() / total_students 
passed_p_reading = passed_reading['student_name'].count()/total_students
passed_p_math

school_name
Bailey High School        5.902331
Cabrera High School      15.807320
Figueroa High School      9.959308
Ford High School         10.722892
Griffin High School      20.006812
Hernandez High School     6.336570
Holden High School       68.782201
Huang High School        10.068564
Johnson High School       6.168872
Pena High School         30.530146
Rodriguez High School     7.344336
Shelton High School      16.678024
Thomas High School       17.963303
Wilson High School       12.864652
Wright High School       16.316667
Name: student_name, dtype: float64

In [13]:
# Minor Data Cleanup
#calculated_df = calculated_df.rename(columns = {'school_name' : 'School Name', 'student_name' : "Student Name",
                                 #"reading_score" : "Reading Score", 'math_score' : "Math Score" , "budget" : "Budget"})

In [29]:
collected_data = calculated_df.mean().rename(columns = {'school_name' : 'School Name', 'size' : "Student Count",
                                 "reading_score" : "Reading Score", 'math_score' : "Math Score" , "budget" : "Budget"})
del collected_data['School ID']
del collected_data['Student ID']
collected_data.index.names = ['School Name']
collected_data



# Display the data frame


Unnamed: 0_level_0,Reading Score,Math Score,Student Count,Budget
School Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Bailey High School,81.033963,77.048432,4976.0,3124928.0
Cabrera High School,83.97578,83.061895,1858.0,1081356.0
Figueroa High School,81.15802,76.711767,2949.0,1884411.0
Ford High School,80.746258,77.102592,2739.0,1763916.0
Griffin High School,83.816757,83.351499,1468.0,917500.0
Hernandez High School,80.934412,77.289752,4635.0,3022020.0
Holden High School,83.814988,83.803279,427.0,248087.0
Huang High School,81.182722,76.629414,2917.0,1910635.0
Johnson High School,80.966394,77.072464,4761.0,3094650.0
Pena High School,84.044699,83.839917,962.0,585858.0


## School Summary

In [None]:
# Determine the School Type

# Calculate the total student count

# Calculate the total school budget and per capita spending
# per_school_budget = school_data_complete.groupby(["school_name"]).mean()["budget"]

# Calculate the average test scores

# Calculate the passing scores by creating a filtered data frame

# Convert to data frame

# Minor data munging

# Display the data frame


## Top Performing Schools (By Passing Rate)

In [None]:
# Sort and show top five schools


## Bottom Performing Schools (By Passing Rate)

In [None]:
# Sort and show bottom five schools


## Math Scores by Grade

In [None]:
# Create data series of scores by grade levels using conditionals

# Group each by school name

# Combine series into single data frame

# Minor data munging

# Display the data frame


## Reading Score by Grade 

In [None]:
# Create data series of scores by grade levels using conditionals

# Group each by school name

# Combine series into single data frame

# Minor data munging

# Display the data frame


## Scores by School Spending

In [None]:
# Establish the bins -- choose any set of bins you would like, but see below for testing bins
# to test, set your bins as follows: [0, 585, 615, 645, 675]
# ALSO -- Note that the values for `% Passing Math`, `% Passing Reading` and `% Overall Passing Rate`
# were computed using averages of averages -- your results may vary if you use weighted averages 

# Categorize the spending based on the bins

# Assemble into data frame

# Minor data munging

# Display results


## Scores by School Size

In [None]:
# Establish the bins 

# Categorize the spending based on the bins

# Calculate the scores based on bins

# Assemble into data frame

# Minor data munging

# Display results


## Scores by School Type

In [None]:
# Type | Average Math Score | Average Reading Score | % Passing Math | % Passing Reading | % Overall Passing Rate

# Assemble into data frame

# Minor data munging

# Display results
