# Schools/Students Analysis

* Surprisingly, lower budget schools actually seemed to perform better in the reading section as higher-budget schools had a dramatic fall-off compared to the schools with less than 615 budget per student.

* There is a dramatic increase, and almost certain guarantee that charter schools have a higher reading passing rate than district schools, as 100% of students who go to charter schools pass the reading section, compared to 66% for district. This is also exemplified in the math passing percentage where charter students were 10% more likely to pass in math.

* There is no notable change in math/reading score per grade, as well as average math score in terms of grade, type, and etc.

## District Analysis

In [1]:
import pandas as pd
import numpy as np
import os

# Creating a path to the generated data
schools_file = os.path.join("Generators","PyCitySchools","generated_data","schools_complete.csv")
students_file = os.path.join("Generators","PyCitySchools","generated_data","students_complete.csv")

# Reading the two csv files
schools_pd = pd.read_csv(schools_file)
students_pd = pd.read_csv(students_file)

# Calculating total amount of students passed
passed_reading = students_pd.loc[students_pd['reading_score'] >= 70].count()["Student ID"]
passed_math = students_pd.loc[students_pd['math_score'] >= 70].count()["Student ID"]

# Calculating totals and averages of schools and students
total_schools = len(schools_pd['school_name'].unique())
total_students = students_pd['Student ID'].count()
total_budget = '${:,.2f}'.format(schools_pd['budget'].sum())
average_reading = round(students_pd['reading_score'].mean(), 2)
average_math = round(students_pd['math_score'].mean(),2)

# Calculating the passing rates for reading, math and overall
rate_reading = round((passed_reading/total_students)*100,2)
rate_math = round((passed_math/total_students)*100,2)
rate_overall = round(((rate_reading+rate_math)/2),2)

# Creating and organizing the summary table
summary_table = pd.DataFrame({
    "Total Schools": [total_schools],
    "Total Students": [total_students],
    "Total Budget": [total_budget],
    "Average Math Score": [average_math],
    "Average Reading Score": [average_reading],
    "% Passing Math": [rate_math],
    "% Passing Reading": [rate_reading],
    "% Passing Overall": [rate_overall]
                             })
summary_table_organized = summary_table[['Total Schools', 'Total Students','Total Budget','Average Math Score','Average Reading Score',\
                            '% Passing Math','% Passing Reading','% Passing Overall']]
summary_table_organized

Unnamed: 0,Total Schools,Total Students,Total Budget,Average Math Score,Average Reading Score,% Passing Math,% Passing Reading,% Passing Overall
0,11,29376,"$18,648,468.00",82.27,82.87,86.86,78.22,82.54


## School Summary

In [13]:
# Grouping and indexing for easier values to acquire
schools_indexed = schools_pd.set_index('school_name')
students_indexed = students_pd.set_index('school_name')
schools_grouped = students_indexed.groupby(['school_name'])
schools_students = pd.merge(schools_pd, students_pd, on="school_name")

# Grabbing the fields from the indexed schools
types = schools_indexed['type']
size = schools_indexed['size']
budget = schools_indexed['budget']
budgetps = schools_indexed['budget']/schools_indexed['size']

# Grabbing the means from the grouped schools
reading_mean = round(schools_grouped['reading_score'].mean(),2)
math_mean = round(schools_grouped['math_score'].mean(),2)

passed_math = students_indexed.loc[students_indexed['math_score'] >= 70].groupby('school_name').count()["Student ID"]
passed_reading = students_indexed.loc[students_indexed['reading_score'] >= 70].groupby('school_name').count()["Student ID"]

# Calculating the % passing 
pass_math = round(passed_math/size*100,2)
pass_reading = round(passed_reading/size*100,2)
pass_overall = round((pass_math+pass_reading)/2,2)

# Creating and formatting the school summary
school_unformatted = pd.DataFrame({
    "School Type": types,
    "School Size": size,
    "Total School Budget": budget,
    "Budget Per Student": budgetps,
    "Average Math Score": math_mean,
    "Average Reading Score": reading_mean,
    "% Pass Math": pass_math,
    "% Pass Reading": pass_reading,
    "% Pass Overall": pass_overall
})
school_summary = school_unformatted[['School Type','School Size','Total School Budget','Budget Per Student',\
                                    'Average Math Score','Average Reading Score','% Pass Math','% Pass Reading',\
                                    '% Pass Overall']]
school_summary

Unnamed: 0,School Type,School Size,Total School Budget,Budget Per Student,Average Math Score,Average Reading Score,% Pass Math,% Pass Reading,% Pass Overall
Campbell High School,Charter,271,157993,583.0,83.59,93.77,95.2,100.0,97.6
Galloway High School,Charter,2471,1445535,585.0,83.57,94.03,94.21,100.0,97.1
Glass High School,District,3271,2155589,659.0,81.29,76.89,82.33,67.35,74.84
Gomez High School,Charter,2154,1288092,598.0,83.84,94.03,93.96,100.0,96.98
Gonzalez High School,Charter,1855,1192765,643.0,83.44,94.14,93.21,100.0,96.6
Hawkins High School,District,4555,2851430,626.0,81.72,77.01,83.91,66.92,75.41
Kelly High School,District,3307,2225611,673.0,81.68,76.83,83.4,66.56,74.98
Macdonald High School,Charter,901,550511,611.0,83.78,93.93,94.34,100.0,97.17
Miller High School,Charter,2424,1418040,585.0,83.61,94.0,93.56,100.0,96.78
Sherman High School,District,3213,2152710,670.0,81.5,77.29,82.94,67.29,75.12


## Top Performing Schools

In [14]:
## Sorting for best passing overall
top_schools = school_summary.sort_values(["% Pass Overall"], ascending=False)
top_schools

Unnamed: 0,School Type,School Size,Total School Budget,Budget Per Student,Average Math Score,Average Reading Score,% Pass Math,% Pass Reading,% Pass Overall
Campbell High School,Charter,271,157993,583.0,83.59,93.77,95.2,100.0,97.6
Macdonald High School,Charter,901,550511,611.0,83.78,93.93,94.34,100.0,97.17
Galloway High School,Charter,2471,1445535,585.0,83.57,94.03,94.21,100.0,97.1
Gomez High School,Charter,2154,1288092,598.0,83.84,94.03,93.96,100.0,96.98
Miller High School,Charter,2424,1418040,585.0,83.61,94.0,93.56,100.0,96.78
Gonzalez High School,Charter,1855,1192765,643.0,83.44,94.14,93.21,100.0,96.6
Hawkins High School,District,4555,2851430,626.0,81.72,77.01,83.91,66.92,75.41
Sherman High School,District,3213,2152710,670.0,81.5,77.29,82.94,67.29,75.12
Kelly High School,District,3307,2225611,673.0,81.68,76.83,83.4,66.56,74.98
Glass High School,District,3271,2155589,659.0,81.29,76.89,82.33,67.35,74.84


## Bottom Performing Schools

In [15]:
## Sorting for worst passing overall
bottom_schools = school_summary.sort_values(["% Pass Overall"], ascending=True)
bottom_schools

Unnamed: 0,School Type,School Size,Total School Budget,Budget Per Student,Average Math Score,Average Reading Score,% Pass Math,% Pass Reading,% Pass Overall
Smith High School,District,4954,3210192,648.0,81.54,77.15,83.17,66.37,74.77
Glass High School,District,3271,2155589,659.0,81.29,76.89,82.33,67.35,74.84
Kelly High School,District,3307,2225611,673.0,81.68,76.83,83.4,66.56,74.98
Sherman High School,District,3213,2152710,670.0,81.5,77.29,82.94,67.29,75.12
Hawkins High School,District,4555,2851430,626.0,81.72,77.01,83.91,66.92,75.41
Gonzalez High School,Charter,1855,1192765,643.0,83.44,94.14,93.21,100.0,96.6
Miller High School,Charter,2424,1418040,585.0,83.61,94.0,93.56,100.0,96.78
Gomez High School,Charter,2154,1288092,598.0,83.84,94.03,93.96,100.0,96.98
Galloway High School,Charter,2471,1445535,585.0,83.57,94.03,94.21,100.0,97.1
Macdonald High School,Charter,901,550511,611.0,83.78,93.93,94.34,100.0,97.17


## Average Math Score Per Grade

In [16]:
# Grabbing the average math score per grade
ninth = round(students_indexed.loc[students_indexed['grade'] == '9th'].groupby("school_name")['math_score'].mean(), 2)
tenth = round(students_indexed.loc[students_indexed['grade'] == '10th'].groupby("school_name")['math_score'].mean(), 2)
eleventh = round(students_indexed.loc[students_indexed['grade'] == '11th'].groupby("school_name")['math_score'].mean(), 2)
twelfth = round(students_indexed.loc[students_indexed['grade'] == '12th'].groupby("school_name")['math_score'].mean(), 2)
math_unformatted = pd.DataFrame({
    "9th Grade": ninth,
    "10th Grade": tenth,
    "11th Grade": eleventh,
    "12th Grade": twelfth
})
math_summary = math_unformatted[['9th Grade','10th Grade','11th Grade','12th Grade']]
math_summary

Unnamed: 0_level_0,9th Grade,10th Grade,11th Grade,12th Grade
school_name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Campbell High School,83.84,84.27,83.94,82.06
Galloway High School,83.53,83.55,83.98,83.2
Glass High School,81.87,81.04,81.39,80.82
Gomez High School,83.68,83.97,83.87,83.83
Gonzalez High School,83.55,83.95,83.2,82.84
Hawkins High School,81.67,81.48,81.89,81.94
Kelly High School,81.79,81.88,81.5,81.45
Macdonald High School,84.26,83.81,83.48,83.52
Miller High School,83.82,83.62,83.64,83.3
Sherman High School,81.5,81.53,81.23,81.74


## Average Reading Score Per Grade

In [17]:
# Grabbing the average reading score per grade
ninth = round(students_indexed.loc[students_indexed['grade'] == '9th'].groupby("school_name")['reading_score'].mean(), 2)
tenth = round(students_indexed.loc[students_indexed['grade'] == '10th'].groupby("school_name")['reading_score'].mean(), 2)
eleventh = round(students_indexed.loc[students_indexed['grade'] == '11th'].groupby("school_name")['reading_score'].mean(),2)
twelfth = round(students_indexed.loc[students_indexed['grade'] == '12th'].groupby("school_name")['reading_score'].mean(),2)
reading_unformatted = pd.DataFrame({
    "9th Grade": ninth,
    "10th Grade": tenth,
    "11th Grade": eleventh,
    "12th Grade": twelfth
})
reading_summary = reading_unformatted[['9th Grade','10th Grade','11th Grade','12th Grade']]
reading_summary

Unnamed: 0_level_0,9th Grade,10th Grade,11th Grade,12th Grade
school_name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Campbell High School,93.47,93.88,94.08,93.71
Galloway High School,94.07,93.96,93.98,94.13
Glass High School,76.44,77.32,77.13,76.62
Gomez High School,94.19,93.97,93.81,94.13
Gonzalez High School,94.04,94.1,94.42,94.04
Hawkins High School,76.52,77.17,77.53,76.85
Kelly High School,76.37,77.27,76.64,76.97
Macdonald High School,94.05,94.14,93.8,93.67
Miller High School,93.9,94.04,94.24,93.82
Sherman High School,77.29,77.11,77.31,77.5


## Analysis of Schools per Budget per Student

In [18]:
# Binning section for the budget
budget_bins = [0, 584.99, 614.999, 644.999, 1000000000000000000000]
budgets = ['< $585', "$585 - 614", "$615 - 644", "> $644"]

# Executing the bin
school_summary["Spending Ranges"] = pd.cut(school_summary["Budget Per Student"], budget_bins, labels=budgets)
binned = school_summary.groupby("Spending Ranges")

# Calculating the averages in the bins
binned_math = round(binned["Average Math Score"].mean(), 2)
binned_reading = round(binned["Average Reading Score"].mean(), 2)
binned_pmath = round(binned["% Pass Math"].mean(), 2)
binned_preading = round(binned["% Pass Reading"].mean(), 2)
binned_poverall = round(binned["% Pass Overall"].mean(), 2)

binned_unformatted = pd.DataFrame({
    "Average Math Score": binned_math,
    "Average Reading Score": binned_reading,
    "% Pass Math": binned_pmath,
    "% Pass Reading": binned_preading,
    "% Pass Overall": binned_poverall    
})

binned_summary = binned_unformatted[["Average Math Score","Average Reading Score","% Pass Math","% Pass Reading","% Pass Overall"]]
binned_summary

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  


Unnamed: 0_level_0,Average Math Score,Average Reading Score,% Pass Math,% Pass Reading,% Pass Overall
Spending Ranges,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
< $585,83.59,93.77,95.2,100.0,97.6
$585 - 614,83.7,94.0,94.02,100.0,97.01
$615 - 644,82.58,85.58,88.56,83.46,86.0
> $644,81.5,77.04,82.96,66.89,74.93


## Analysis of Schools per School Type

In [8]:
# Grouping by type
typed = school_summary.groupby("School Type")

# Calculating the averages in the types
typed_math = round(typed["Average Math Score"].mean(), 2)
typed_reading = round(typed["Average Reading Score"].mean(), 2)
typed_pmath = round(typed["% Pass Math"].mean(), 2)
typed_preading = round(typed["% Pass Reading"].mean(), 2)
typed_poverall = round(typed["% Pass Overall"].mean(), 2)

typed_unformatted = pd.DataFrame({
    "Average Math Score": typed_math,
    "Average Reading Score": typed_reading,
    "% Pass Math": typed_pmath,
    "% Pass Reading": typed_preading,
    "% Pass Overall": typed_poverall    
})

typed_summary = typed_unformatted[["Average Math Score","Average Reading Score","% Pass Math","% Pass Reading","% Pass Overall"]]
typed_summary

Unnamed: 0_level_0,Average Math Score,Average Reading Score,% Pass Math,% Pass Reading,% Pass Overall
School Type,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Charter,83.64,93.98,94.08,100.0,97.04
District,81.55,77.03,83.15,66.9,75.02
