### Analysis of Py District Schools
   >* OBSERVATION 1: The smaller the school size, the better overall pass performance. Better overall pass performance may also be correlated to school _type_ since the Charter schools are generally smaller than the District schools. It would be nice to look into if smaller schools have a lower student-teacher ratio than the larger school, as could be assumed, but this is a limitation of the current data where faculty size is not provided.
   
   >* OBSERVATION 2: There is no correlation between per student budget and greater performance. This is likely because the per student budget variance is not significant across schools. The spread is under $100. If the set were larger, say all the districts in the state, perhaps the results would vary.
   
   >* OBSERVATION 3: There is not a material difference in results per grade by school for either the math scores or the reading scores. Said differently, average passing rates by subject, across grades within an individual school, are fairly stable. Based on this, there may need to be an audit of either the base subject curriculums or subject department leadership.

In [None]:
import os
import pandas as pd
import numpy as np

school_path = os.path.join("Resources", "schools_complete.csv")
student_path = os.path.join("Resources", "students_complete.csv")

school_info = pd.read_csv(school_path)
student_info = pd.read_csv(student_path)

primary_df = pd.merge(school_info, student_info, how='outer', on='school_name', sort=False)
primary_df = (primary_df.rename(columns={'school_name': 'School Name', 
                                         'type': 'School Type', 
                                         'size': 'Size', 
                                         'budget': 'Budget', 
                                         'student_name': 'Student',
                                         'grade': 'Grade', 
                                         'gender': 'Gender', 
                                         'reading_score': 'Reading Score', 
                                         'math_score': 'Math Score'}))
# primary_df.head()

school_count = primary_df['School Name'].nunique()
school_names = primary_df['School Name'].unique()
student_count = primary_df['Student ID'].nunique()
total_budget = ((primary_df['Budget']).drop_duplicates()).sum()
avg_math = ((primary_df['Math Score'].sum()) / student_count).round(2)
avg_read = ((primary_df['Reading Score'].sum()) / student_count).round(2)
math_pass = (len(primary_df.loc[primary_df['Math Score'] >= 70]) / student_count) * 100
read_pass = (len(primary_df.loc[primary_df['Reading Score'] >= 70]) / student_count) * 100
overall_pass = (math_pass + read_pass) / 2

## EXCEPTION NOTES ['Overall Pass' & Scores by School Spending, Size, and Type] :

Validate with client that they want this calculated in this method. 
This recommendation carries through all 'Overall Pass Rate' calculations in the project.

Propose this be calculated as:
((Count of students whose avg math & reading scores >= 70) / total students) * 100

The proposal calculates percent of all passing students (by individual average) v the less accurate straight avg formula.

    *The numbers are -significantly- different:
      * per overall_pass listed above, results in 80.4%
      * the overall_avg_pass_pct recommended here and defined below in notes, results in 89.4%

In addition to the exceptions for each 'Overall Pass Rate', the averaging calculations within the 'Scores by School 
Spending', 'Scores by School Size', and 'Scores by School Type' sections in the back half of this exercise should also 
be reviewed for accuracy. These are showing as straight averages, rather than true averages by performance-by-student count. 
A school with 4976 students is weighted equally to a school with the schools that are under 1000...this isn't accurate mathematically. The numbers should be calculated for the last 3 parts similarly to how they are in the 'by Grade' breakdowns...

In [None]:
# # overall pass (by individual student average)
# avg_overall_pass_by_student_df = primary_df
# avg_overall_pass_by_student_df['Student Average'] = (
#     (avg_overall_pass_by_student_df['Math Score'] 
#      + 
#      avg_overall_pass_by_student_df['Reading Score']) 
#      / 2)
# overall_avg_mask = avg_overall_pass_by_student_df['Student Average'] >= 70
# overall_avg_pass_pct = (len(primary_df[overall_avg_mask])/student_count) * 100
# overall_avg_pass_pct

### District Summary

In [None]:
district_summary_dict = ({'Total Schools': [school_count], 
                          'Total Students': [student_count],
                          'Total Budget': [total_budget], 
                          'Average Math Score': [avg_math], 
                          'Average Reading Score': [avg_read], 
                          '% Passing Math': [math_pass], 
                          '% Passing Reading': [read_pass],
                          'Overall Pass Rate': [overall_pass],
                          'SUMMARY': 'District Results'})

district_summary_table_df = pd.DataFrame(district_summary_dict)
district_summary_table_df.set_index('SUMMARY', inplace = True)
district_summary_fin = (district_summary_table_df.style.format(
    {'Total Students': '{0:,d}',
     'Total Budget': '${0:,d}',
     'Average Math Score': '{0:,.1f}',
     'Average Reading Score': '{0:,.1f}',
     '% Passing Math': '{0:,.1f}%',
     '% Passing Reading': '{0:,.1f}%',
     'Overall Pass Rate': '{0:,.1f}%'}))

district_summary_fin

### School Summary [Data Holder]

In [None]:
school_groups = primary_df.groupby('School Name')

school_math_list = []
for school in school_names:
    school_mask = primary_df['School Name'] == school
    math_mask = primary_df['Math Score'] >= 70
    school_math_pass_count = len(primary_df[school_mask & math_mask])
    school_math_list.append(school_math_pass_count)

school_read_list = []
for school in school_names:
    school_mask = primary_df['School Name'] == school
    reading_mask = primary_df['Reading Score'] >= 70
    school_reading_pass_count = len(primary_df[school_mask & reading_mask])
    school_read_list.append(school_reading_pass_count)

students_per_school = []
for school in school_names:
    test_school_df = primary_df[primary_df['School Name'] == school]
    count = test_school_df['Student ID'].nunique()
    students_per_school.append(count)

by_school_summary_df = school_groups['School Name', 'School Type', 'Size', 'Budget'].head(1)
by_school_summary_df.set_index('School Name', inplace = True)
by_school_summary_df['Per Student Budget'] = (
    by_school_summary_df['Budget']/by_school_summary_df['Size'])
by_school_summary_df['Average Math Score'] = (
    primary_df.groupby('School Name')['Math Score'].mean())
by_school_summary_df['Average Reading Score'] = (
    primary_df.groupby('School Name')['Reading Score'].mean())
by_school_summary_df['% Passing Math'] = (
    school_math_list / by_school_summary_df['Size'] * 100)
by_school_summary_df['% Passing Reading'] = (
    school_read_list / by_school_summary_df['Size'] * 100)
by_school_summary_df['Overall Pass Rate'] = (
    (by_school_summary_df['% Passing Math'] + by_school_summary_df['% Passing Reading']) / 2)
by_school_summary_df.rename(columns={"Size": "Total Students", "Budget": "Total School Budget"}, inplace = True)
by_school_summary_df.columns.name = 'SUMMARY'
by_school_summary_fin = (by_school_summary_df.style.format(
    {'Total Students': '{0:,d}',
     'Total School Budget': '${0:,d}',
     'Per Student Budget': '${0:,.2f}',
     'Average Math Score': '{0:,.1f}',
     'Average Reading Score': '{0:,.1f}',
     '% Passing Math': '{0:,.1f}%',
     '% Passing Reading': '{0:,.1f}%',
     'Overall Pass Rate': '{0:,.1f}%'}))
# by_school_summary_fin

### Top Performing Schools (By Passing Rate)

In [None]:
top_5_schools_df = by_school_summary_df.nlargest(5, 'Overall Pass Rate')
top_5_schools_df.columns.name = 'Top Performing Schools'
top_5_schools_fin = (top_5_schools_df.style.format(
    {'Total Students': '{0:,d}',
     'Total School Budget': '${0:,d}',
     'Per Student Budget': '${0:,.2f}',
     'Average Math Score': '{0:,.1f}',
     'Average Reading Score': '{0:,.1f}',
     '% Passing Math': '{0:,.1f}%',
     '% Passing Reading': '{0:,.1f}%',
     'Overall Pass Rate': '{0:,.1f}%'}))
top_5_schools_fin

### Bottom Performing Schools (By Passing Rate)

In [None]:
bottom_5_schools_df = by_school_summary_df.nsmallest(5, 'Overall Pass Rate')
bottom_5_schools_df.columns.name = 'Schools Needing Focus'
bottom_5_schools_fin = (bottom_5_schools_df.style.format(
    {'Total Students': '{0:,d}',
     'Total School Budget': '${0:,d}',
     'Per Student Budget': '${0:,.2f}',
     'Average Math Score': '{0:,.1f}',
     'Average Reading Score': '{0:,.1f}',
     '% Passing Math': '{0:,.1f}%',
     '% Passing Reading': '{0:,.1f}%',
     'Overall Pass Rate': '{0:,.1f}%'}))
bottom_5_schools_fin

### Math Scores by Grade
Do we know why this section contained asterisks?

In [None]:
frosh_math = primary_df[primary_df['Grade'] == '9th']
frosh_math_df = frosh_math.groupby(['School Name']).mean()
frosh_math_df = frosh_math_df['Math Score'].to_frame()

soph_math = primary_df[primary_df['Grade'] == '10th']
soph_math_df = soph_math.groupby(['School Name']).mean()
soph_math_df = soph_math_df['Math Score'].to_frame()

junior_math = primary_df[primary_df['Grade'] == '11th']
junior_math_df = junior_math.groupby(['School Name']).mean()
junior_math_df = junior_math_df['Math Score'].to_frame()

senior_math = primary_df[primary_df['Grade'] == '12th']
senior_math_df = senior_math.groupby(['School Name']).mean()
senior_math_df = senior_math_df['Math Score'].to_frame()

math_avg_by_school_df = frosh_math_df.merge(soph_math_df, on = 'School Name')
math_avg_by_school_df.rename(columns={"Math Score_x": "9th Grade", "Math Score_y": "10th Grade"}, inplace = True)
math_avg_by_school_df = math_avg_by_school_df.merge(junior_math_df, on = 'School Name')
math_avg_by_school_df.rename(columns={"Math Score": "11th Grade"}, inplace = True)
math_avg_by_school_df = math_avg_by_school_df.merge(senior_math_df, on = 'School Name')
math_avg_by_school_df.rename(columns={"Math Score": "12th Grade"}, inplace = True)
math_avg_by_school_df.columns.name = 'Average Math Scores'
math_avg_by_school_fin = math_avg_by_school_df.style.format('{:,.1f}%')
math_avg_by_school_fin

### Reading Scores by Grade

In [None]:
frosh_read = primary_df[primary_df['Grade'] == '9th']
frosh_read_df = frosh_read.groupby(['School Name']).mean()
frosh_read_df = frosh_read_df['Reading Score'].to_frame()

soph_read = primary_df[primary_df['Grade'] == '10th']
soph_read_df = soph_read.groupby(['School Name']).mean()
soph_read_df = soph_read_df['Reading Score'].to_frame()

junior_read = primary_df[primary_df['Grade'] == '11th']
junior_read_df = junior_read.groupby(['School Name']).mean()
junior_read_df = junior_read_df['Reading Score'].to_frame()

senior_read = primary_df[primary_df['Grade'] == '12th']
senior_read_df = senior_read.groupby(['School Name']).mean()
senior_read_df = senior_read_df['Reading Score'].to_frame()

read_avg_by_school_df = frosh_read_df.merge(soph_read_df, on = 'School Name')
read_avg_by_school_df.rename(columns={"Reading Score_x": "9th Grade", "Reading Score_y": "10th Grade"}, inplace = True)
read_avg_by_school_df = read_avg_by_school_df.merge(junior_read_df, on = 'School Name')
read_avg_by_school_df.rename(columns={"Reading Score": "11th Grade"}, inplace = True)
read_avg_by_school_df = read_avg_by_school_df.merge(senior_read_df, on = 'School Name')
read_avg_by_school_df.rename(columns={"Reading Score": "12th Grade"}, inplace = True)
read_avg_by_school_df.columns.name = 'Average Reading Scores'
read_avg_by_school_fin = math_avg_by_school_df.style.format('{:,.1f}%')
read_avg_by_school_fin

### Scores by School Spending

In [None]:
budget_bounds = [0, 595, 615, 635, 655]
budget_labels = ['< $595', '$595 - $615', '$615 - $635', ' $635+']
per_student_budget_df = by_school_summary_df.reset_index(inplace = False)
per_student_budget_df['Budget per Student'] = (
    pd.cut(per_student_budget_df['Per Student Budget'], budget_bounds, labels = budget_labels))
grp_by_student_budget_df = per_student_budget_df.groupby('Budget per Student').mean()
grp_by_student_budget_df = (
    grp_by_student_budget_df.drop(columns = ['Total Students', 'Total School Budget', 'Per Student Budget']))
grp_by_student_budget_fin = grp_by_student_budget_df.style.format('{:,.1f}%')
grp_by_student_budget_fin
##formatting in Budget per Student index column???

### Scores by School Size

In [None]:
school_size_bounds = [0, 1500, 3000, 5500]
school_size_labels = ['Small (<1500)', 'Medium (1500-3000)', 'Large (3000+)']
per_school_size_df = by_school_summary_df.reset_index(inplace = False)
per_school_size_df['School Size (Students)'] = (
    pd.cut(per_school_size_df['Total Students'], school_size_bounds, labels = school_size_labels))
grp_per_school_size_df = per_school_size_df.groupby('School Size (Students)').mean()
grp_per_school_size_df = (
    grp_per_school_size_df.drop(columns = ['Total Students', 'Total School Budget', 'Per Student Budget']))
grp_per_school_size_fin = grp_per_school_size_df.style.format('{:,.1f}%')
grp_per_school_size_fin

### Scores by School Type

In [None]:
district_df = pd.DataFrame(by_school_summary_df[by_school_summary_df['School Type'] == 'District'])

district_df.reset_index(inplace = True)
district_df_clean = (
    district_df.drop(columns = ['School Name', 'Total Students', 'Total School Budget', 'Per Student Budget']))
district_summary_fin = pd.DataFrame(district_df_clean.mean()).transpose()
district_summary_fin['School Type'] = 'District'

charter_df = pd.DataFrame(by_school_summary_df[by_school_summary_df['School Type'] == 'Charter'])

charter_df.reset_index(inplace = True)
charter_df_clean = (
    charter_df.drop(columns = ['School Name', 'Total Students', 'Total School Budget', 'Per Student Budget']))
charter_summary_fin = pd.DataFrame(charter_df_clean.mean()).transpose()
charter_summary_fin['School Type'] = 'Charter'

school_type_summary_df = charter_summary_fin.merge(district_summary_fin, how = 'outer')
school_type_summary_df = school_type_summary_df.set_index('School Type')
school_type_summary_fin = school_type_summary_df.style.format('{:,.1f}%')
school_type_summary_fin