### Analysis of Py District Schools
   >* OBSERVATION 1: The smaller the school size, the better overall pass performance. Better overall pass performance may also be correlated to school _type_ since the Charter schools are generally smaller than the District schools. It would be nice to look into if smaller schools have a lower student-teacher ratio than the larger school, as could be assumed, but this is a limitation of the current data where faculty size is not provided.
   
   >* OBSERVATION 2: There is no correlation between per student budget and greater performance. This is likely because the per student budget variance is not significant across schools. The spread is under $100. If the set were larger, say all the districts in the state, perhaps the results would vary.
   
   >* OBSERVATION 3: There is not a material difference in results per grade by school for either the math scores or the reading scores. Said differently, average passing rates by subject, across grades within an individual school, are fairly stable. Based on this, there may need to be an audit of either the base subject curriculums or subject department leadership.

In [1]:
import os
import pandas as pd
import numpy as np

school_path = os.path.join("Resources", "schools_complete.csv")
student_path = os.path.join("Resources", "students_complete.csv")

school_info = pd.read_csv(school_path)
student_info = pd.read_csv(student_path)

primary_df = pd.merge(school_info, student_info, how='outer', on='school_name', sort=False)
primary_df = (primary_df.rename(columns={'school_name': 'School Name', 
                                         'type': 'School Type', 
                                         'size': 'Size', 
                                         'budget': 'Budget', 
                                         'student_name': 'Student',
                                         'grade': 'Grade', 
                                         'gender': 'Gender', 
                                         'reading_score': 'Reading Score', 
                                         'math_score': 'Math Score'}))
# primary_df.head()

# pull requested data points for summary table
school_count = primary_df['School Name'].nunique()
school_names = primary_df['School Name'].unique()
student_count = primary_df['Student ID'].nunique()
total_budget = ((primary_df['Budget']).drop_duplicates()).sum()
avg_math = ((primary_df['Math Score'].sum()) / student_count).round(2)
avg_read = ((primary_df['Reading Score'].sum()) / student_count).round(2)
math_pass = (len(primary_df.loc[primary_df['Math Score'] >= 70]) / student_count) * 100
read_pass = (len(primary_df.loc[primary_df['Reading Score'] >= 70]) / student_count) * 100
overall_pass = (math_pass + read_pass) / 2

### District Summary

In [2]:
# create dictionary and then dataframe for summary; added code to format numerical output
district_summary_dict = ({'Total Schools': [school_count], 
                          'Total Students': [student_count],
                          'Total Budget': [total_budget], 
                          'Average Math Score': [avg_math], 
                          'Average Reading Score': [avg_read], 
                          '% Passing Math': [math_pass], 
                          '% Passing Reading': [read_pass],
                          'Overall Pass Rate': [overall_pass],
                          'SUMMARY': 'District Results'})

district_summary_table_df = pd.DataFrame(district_summary_dict)
district_summary_table_df.set_index('SUMMARY', inplace = True)
district_summary_fin = (district_summary_table_df.style.format(
    {'Total Students': '{0:,d}',
     'Total Budget': '${0:,d}',
     'Average Math Score': '{0:,.1f}',
     'Average Reading Score': '{0:,.1f}',
     '% Passing Math': '{0:,.1f}%',
     '% Passing Reading': '{0:,.1f}%',
     'Overall Pass Rate': '{0:,.1f}%'}))

district_summary_fin

Unnamed: 0_level_0,Total Schools,Total Students,Total Budget,Average Math Score,Average Reading Score,% Passing Math,% Passing Reading,Overall Pass Rate
SUMMARY,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
District Results,15,39170,"$24,649,428",79.0,81.9,75.0%,85.8%,80.4%


### School Summary [Data Holder]

In [3]:
# generate base dataframe and data (lists) for use in subsequent dataframes
school_groups = primary_df.groupby('School Name')

# create a list of number of students passing math (per school)
school_math_list = []
for school in school_names:
    school_mask = primary_df['School Name'] == school
    math_mask = primary_df['Math Score'] >= 70
    school_math_pass_count = len(primary_df[school_mask & math_mask])
    school_math_list.append(school_math_pass_count)

# create a list of number of students passing reading (per school)    
school_read_list = []
for school in school_names:
    school_mask = primary_df['School Name'] == school
    reading_mask = primary_df['Reading Score'] >= 70
    school_reading_pass_count = len(primary_df[school_mask & reading_mask])
    school_read_list.append(school_reading_pass_count)

# create a list of total number of students (per school)    
students_per_school = []
for school in school_names:
    test_school_df = primary_df[primary_df['School Name'] == school]
    count = test_school_df['Student ID'].nunique()
    students_per_school.append(count)

# create dataframe to hold data summary by school
by_school_summary_df = school_groups['School Name', 'School Type', 'Size', 'Budget'].head(1)
by_school_summary_df.set_index('School Name', inplace = True)
by_school_summary_df['Per Student Budget'] = (
    by_school_summary_df['Budget']/by_school_summary_df['Size'])
by_school_summary_df['Average Math Score'] = (
    primary_df.groupby('School Name')['Math Score'].mean())
by_school_summary_df['Average Reading Score'] = (
    primary_df.groupby('School Name')['Reading Score'].mean())
by_school_summary_df['% Passing Math'] = (
    school_math_list / by_school_summary_df['Size'] * 100)
by_school_summary_df['% Passing Reading'] = (
    school_read_list / by_school_summary_df['Size'] * 100)
by_school_summary_df['Overall Pass Rate'] = (
    (by_school_summary_df['% Passing Math'] + by_school_summary_df['% Passing Reading']) / 2)
by_school_summary_df.rename(columns={"Size": "Total Students", "Budget": "Total School Budget"}, inplace = True)
by_school_summary_df.columns.name = 'SUMMARY'
by_school_summary_fin = (by_school_summary_df.style.format(
    {'Total Students': '{0:,d}',
     'Total School Budget': '${0:,d}',
     'Per Student Budget': '${0:,.2f}',
     'Average Math Score': '{0:,.1f}',
     'Average Reading Score': '{0:,.1f}',
     '% Passing Math': '{0:,.1f}%',
     '% Passing Reading': '{0:,.1f}%',
     'Overall Pass Rate': '{0:,.1f}%'}))
# by_school_summary_fin

### Top Performing Schools (By Passing Rate)

In [4]:
# return dataframe of top 5 overall pass rates
top_5_schools_df = by_school_summary_df.nlargest(5, 'Overall Pass Rate')
top_5_schools_df.columns.name = 'Top Performing Schools'
top_5_schools_fin = (top_5_schools_df.style.format(
    {'Total Students': '{0:,d}',
     'Total School Budget': '${0:,d}',
     'Per Student Budget': '${0:,.2f}',
     'Average Math Score': '{0:,.1f}',
     'Average Reading Score': '{0:,.1f}',
     '% Passing Math': '{0:,.1f}%',
     '% Passing Reading': '{0:,.1f}%',
     'Overall Pass Rate': '{0:,.1f}%'}))
top_5_schools_fin

Top Performing Schools,School Type,Total Students,Total School Budget,Per Student Budget,Average Math Score,Average Reading Score,% Passing Math,% Passing Reading,Overall Pass Rate
School Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
Cabrera High School,Charter,1858,"$1,081,356",$582.00,83.1,84.0,94.1%,97.0%,95.6%
Thomas High School,Charter,1635,"$1,043,130",$638.00,83.4,83.8,93.3%,97.3%,95.3%
Pena High School,Charter,962,"$585,858",$609.00,83.8,84.0,94.6%,95.9%,95.3%
Griffin High School,Charter,1468,"$917,500",$625.00,83.4,83.8,93.4%,97.1%,95.3%
Wilson High School,Charter,2283,"$1,319,574",$578.00,83.3,84.0,93.9%,96.5%,95.2%


### Bottom Performing Schools (By Passing Rate)

In [5]:
# return dataframe of lowest 5 overall pass rates
bottom_5_schools_df = by_school_summary_df.nsmallest(5, 'Overall Pass Rate')
bottom_5_schools_df.columns.name = 'Schools Needing Focus'
bottom_5_schools_fin = (bottom_5_schools_df.style.format(
    {'Total Students': '{0:,d}',
     'Total School Budget': '${0:,d}',
     'Per Student Budget': '${0:,.2f}',
     'Average Math Score': '{0:,.1f}',
     'Average Reading Score': '{0:,.1f}',
     '% Passing Math': '{0:,.1f}%',
     '% Passing Reading': '{0:,.1f}%',
     'Overall Pass Rate': '{0:,.1f}%'}))
bottom_5_schools_fin

Schools Needing Focus,School Type,Total Students,Total School Budget,Per Student Budget,Average Math Score,Average Reading Score,% Passing Math,% Passing Reading,Overall Pass Rate
School Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
Rodriguez High School,District,3999,"$2,547,363",$637.00,76.8,80.7,66.4%,80.2%,73.3%
Figueroa High School,District,2949,"$1,884,411",$639.00,76.7,81.2,66.0%,80.7%,73.4%
Huang High School,District,2917,"$1,910,635",$655.00,76.6,81.2,65.7%,81.3%,73.5%
Johnson High School,District,4761,"$3,094,650",$650.00,77.1,81.0,66.1%,81.2%,73.6%
Ford High School,District,2739,"$1,763,916",$644.00,77.1,80.7,68.3%,79.3%,73.8%


### Math Scores by Grade
Do we know why this section contained asterisks?

In [6]:
# generate dataframe to calculate average math score for 9th graders per school
frosh_math = primary_df[primary_df['Grade'] == '9th']
frosh_math_df = frosh_math.groupby(['School Name']).mean()
frosh_math_df = frosh_math_df['Math Score'].to_frame()

# generate dataframe to calculate average math score for 10th graders per school
soph_math = primary_df[primary_df['Grade'] == '10th']
soph_math_df = soph_math.groupby(['School Name']).mean()
soph_math_df = soph_math_df['Math Score'].to_frame()

# generate dataframe to calculate average math score for 11th graders per school
junior_math = primary_df[primary_df['Grade'] == '11th']
junior_math_df = junior_math.groupby(['School Name']).mean()
junior_math_df = junior_math_df['Math Score'].to_frame()

# generate dataframe to calculate average math score for 12th graders per school
senior_math = primary_df[primary_df['Grade'] == '12th']
senior_math_df = senior_math.groupby(['School Name']).mean()
senior_math_df = senior_math_df['Math Score'].to_frame()

# merge these into a summary of average math scores per school, by grade
math_avg_by_school_df = frosh_math_df.merge(soph_math_df, on = 'School Name')
math_avg_by_school_df.rename(columns={"Math Score_x": "9th Grade", "Math Score_y": "10th Grade"}, inplace = True)
math_avg_by_school_df = math_avg_by_school_df.merge(junior_math_df, on = 'School Name')
math_avg_by_school_df.rename(columns={"Math Score": "11th Grade"}, inplace = True)
math_avg_by_school_df = math_avg_by_school_df.merge(senior_math_df, on = 'School Name')
math_avg_by_school_df.rename(columns={"Math Score": "12th Grade"}, inplace = True)
math_avg_by_school_df.columns.name = 'Average Math Scores'
math_avg_by_school_fin = math_avg_by_school_df.style.format('{:,.1f}%')
math_avg_by_school_fin

Average Math Scores,9th Grade,10th Grade,11th Grade,12th Grade
School Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Bailey High School,77.1%,77.0%,77.5%,76.5%
Cabrera High School,83.1%,83.2%,82.8%,83.3%
Figueroa High School,76.4%,76.5%,76.9%,77.2%
Ford High School,77.4%,77.7%,76.9%,76.2%
Griffin High School,82.0%,84.2%,83.8%,83.4%
Hernandez High School,77.4%,77.3%,77.1%,77.2%
Holden High School,83.8%,83.4%,85.0%,82.9%
Huang High School,77.0%,75.9%,76.4%,77.2%
Johnson High School,77.2%,76.7%,77.5%,76.9%
Pena High School,83.6%,83.4%,84.3%,84.1%


### Reading Scores by Grade

In [7]:
# generate dataframe to calculate average reading score for 9th graders per school
frosh_read = primary_df[primary_df['Grade'] == '9th']
frosh_read_df = frosh_read.groupby(['School Name']).mean()
frosh_read_df = frosh_read_df['Reading Score'].to_frame()

# generate dataframe to calculate average reading score for 10th graders per school
soph_read = primary_df[primary_df['Grade'] == '10th']
soph_read_df = soph_read.groupby(['School Name']).mean()
soph_read_df = soph_read_df['Reading Score'].to_frame()

# generate dataframe to calculate average reading score for 11th graders per school
junior_read = primary_df[primary_df['Grade'] == '11th']
junior_read_df = junior_read.groupby(['School Name']).mean()
junior_read_df = junior_read_df['Reading Score'].to_frame()

# generate dataframe to calculate average reading score for 12th graders per school
senior_read = primary_df[primary_df['Grade'] == '12th']
senior_read_df = senior_read.groupby(['School Name']).mean()
senior_read_df = senior_read_df['Reading Score'].to_frame()

# merge into a summary of average reading scores per school, by grade
read_avg_by_school_df = frosh_read_df.merge(soph_read_df, on = 'School Name')
read_avg_by_school_df.rename(columns={"Reading Score_x": "9th Grade", "Reading Score_y": "10th Grade"}, inplace = True)
read_avg_by_school_df = read_avg_by_school_df.merge(junior_read_df, on = 'School Name')
read_avg_by_school_df.rename(columns={"Reading Score": "11th Grade"}, inplace = True)
read_avg_by_school_df = read_avg_by_school_df.merge(senior_read_df, on = 'School Name')
read_avg_by_school_df.rename(columns={"Reading Score": "12th Grade"}, inplace = True)
read_avg_by_school_df.columns.name = 'Average Reading Scores'
read_avg_by_school_fin = math_avg_by_school_df.style.format('{:,.1f}%')
read_avg_by_school_fin

Average Math Scores,9th Grade,10th Grade,11th Grade,12th Grade
School Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Bailey High School,77.1%,77.0%,77.5%,76.5%
Cabrera High School,83.1%,83.2%,82.8%,83.3%
Figueroa High School,76.4%,76.5%,76.9%,77.2%
Ford High School,77.4%,77.7%,76.9%,76.2%
Griffin High School,82.0%,84.2%,83.8%,83.4%
Hernandez High School,77.4%,77.3%,77.1%,77.2%
Holden High School,83.8%,83.4%,85.0%,82.9%
Huang High School,77.0%,75.9%,76.4%,77.2%
Johnson High School,77.2%,76.7%,77.5%,76.9%
Pena High School,83.6%,83.4%,84.3%,84.1%


### Scores by School Spending

In [8]:
# create 4 bins grouping schools into per student budget groups
budget_bounds = [0, 595, 615, 635, 655]
budget_labels = ['< $595', '$595 - $615', '$615 - $635', ' $635+']

# create a new dataframe summarizing performance by per student spending
per_student_budget_df = by_school_summary_df.reset_index(inplace = False)
per_student_budget_df['Budget per Student'] = (
    pd.cut(per_student_budget_df['Per Student Budget'], budget_bounds, labels = budget_labels))
grp_by_student_budget_df = per_student_budget_df.groupby('Budget per Student').mean()
grp_by_student_budget_df = (
    grp_by_student_budget_df.drop(columns = ['Total Students', 'Total School Budget', 'Per Student Budget']))
grp_by_student_budget_fin = grp_by_student_budget_df.style.format('{:,.1f}%')
grp_by_student_budget_fin
##formatting in Budget per Student index column???

SUMMARY,Average Math Score,Average Reading Score,% Passing Math,% Passing Reading,Overall Pass Rate
Budget per Student,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
< $595,83.5%,83.9%,93.5%,96.6%,95.0%
$595 - $615,83.6%,83.9%,94.2%,95.9%,95.1%
$615 - $635,80.2%,82.4%,80.0%,89.5%,84.8%
$635+,77.9%,81.4%,70.3%,83.0%,76.7%


### Scores by School Size

In [9]:
# create 3 bins grouping schools into small, medium, and large school size groups
school_size_bounds = [0, 1500, 3000, 5500]
school_size_labels = ['Small (<1500)', 'Medium (1500-3000)', 'Large (3000+)']

# create a new dataframe summarizing performance by per student spending
per_school_size_df = by_school_summary_df.reset_index(inplace = False)
per_school_size_df['School Size (Students)'] = (
    pd.cut(per_school_size_df['Total Students'], school_size_bounds, labels = school_size_labels))
grp_per_school_size_df = per_school_size_df.groupby('School Size (Students)').mean()
grp_per_school_size_df = (
    grp_per_school_size_df.drop(columns = ['Total Students', 'Total School Budget', 'Per Student Budget']))
grp_per_school_size_fin = grp_per_school_size_df.style.format('{:,.1f}%')
grp_per_school_size_fin

SUMMARY,Average Math Score,Average Reading Score,% Passing Math,% Passing Reading,Overall Pass Rate
School Size (Students),Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Small (<1500),83.7%,83.9%,93.5%,96.4%,95.0%
Medium (1500-3000),80.9%,82.8%,83.6%,90.6%,87.1%
Large (3000+),77.1%,80.9%,66.5%,81.1%,73.8%


### Scores by School Type

In [10]:
# generate dataframe to calculate averages per DISTRICT school
district_df = pd.DataFrame(by_school_summary_df[by_school_summary_df['School Type'] == 'District'])

district_df.reset_index(inplace = True)
district_df_clean = (
    district_df.drop(columns = ['School Name', 'Total Students', 'Total School Budget', 'Per Student Budget']))
district_summary_fin = pd.DataFrame(district_df_clean.mean()).transpose()
district_summary_fin['School Type'] = 'District'

# generate dataframe to calculate averages per CHARTER school
charter_df = pd.DataFrame(by_school_summary_df[by_school_summary_df['School Type'] == 'Charter'])

charter_df.reset_index(inplace = True)
charter_df_clean = (
    charter_df.drop(columns = ['School Name', 'Total Students', 'Total School Budget', 'Per Student Budget']))
charter_summary_fin = pd.DataFrame(charter_df_clean.mean()).transpose()
charter_summary_fin['School Type'] = 'Charter'

# merge into summary of performance by school type
school_type_summary_df = charter_summary_fin.merge(district_summary_fin, how = 'outer')
school_type_summary_df = school_type_summary_df.set_index('School Type')
school_type_summary_fin = school_type_summary_df.style.format('{:,.1f}%')
school_type_summary_fin

SUMMARY,Average Math Score,Average Reading Score,% Passing Math,% Passing Reading,Overall Pass Rate
School Type,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Charter,83.5%,83.9%,93.6%,96.6%,95.1%
District,77.0%,81.0%,66.5%,80.8%,73.7%
