# Academy of Py Analysis
**Description:** This script takes in a student information file and a school information file. From there, the script provides 
    data summaries of student head count, school budget and student performance. 

**Key Observations:**

    1. Highest Overall Passing Scores are linked to charter schools with less than 2000 students
    2. Students' reading scores are higher than math scores, regardless of school type or size
    3. Dollars spent per student do not directly correlate with higher overall passing scores
    
**Assumption:** 

    1. Overall Passing Rates are calculated as an average of math and reading; this does not indicate whether students
       are passing BOTH reading and math
    2. Bin sizes have been set to accomodate test data sets and may require adjustment-please review and update if necessary
    3. Passing Threshold has been set at 65, please update the passthreshold variable if necessary
       
       


In [95]:
#Import Dependencies
import pandas as pd
import numpy as np
import os
from collections import OrderedDict
passthreshold=65

In [96]:
#Filepath/Read Files/Assign to DataFrame
schoolfilepath="schools_complete.csv"
studentfilepath="students_complete.csv"
school_df=pd.read_csv(schoolfilepath)
student_df=pd.read_csv(studentfilepath)
school_df=school_df.rename(columns={"name":"School"})
student_df=student_df.rename(columns={"school":"School"})

### District Summary


In [97]:
#Filter School Table by District
d_schoolfilter=school_df.loc[(school_df["type"] =="District")]
studentschoolinner=pd.merge(school_df,student_df,on="School")
d_students=studentschoolinner.loc[studentschoolinner["type"]=="District",:]
#%Passing Math
d_studentspassmath=d_students.loc[d_students["math_score"]>=passthreshold]
d_studentspassmathcount=d_studentspassmath["Student ID"].nunique()
d_studentspassmathnum=(d_studentspassmathcount/d_students["Student ID"].nunique())*100
#%Passing Reading
d_studentspassreading=d_students.loc[d_students["reading_score"]>=passthreshold]
d_studentspassreadcount=d_studentspassreading["Student ID"].nunique()
d_studentspassreadnum=(d_studentspassreadcount/d_students["Student ID"].nunique())*100
d_studentspassreadnum
#Overall Passing Rate (average of math and reading)
avgpassrate=(d_studentspassreadnum + d_studentspassmathnum)/2

In [98]:
#Overall Passing Rate (students passing BOTH math and reading) - This is an additional calculation
pass_readmath=d_students.loc[(d_students["reading_score"]>=passthreshold)&
                                  (d_students["math_score"]>=passthreshold)]
pass_readmathcount=pass_readmath["Student ID"].nunique()
avgpassreadmath=(pass_readmathcount/d_students["Student ID"].nunique())*100
avgpassreadmath

73.52461447212337

In [99]:
#Print District Summary Table
d_summarydata=pd.DataFrame(OrderedDict({
    "Total Schools":[d_schoolfilter["School"].nunique()],"Total Students":[d_students["Student ID"].nunique()],
    "Total Budget":[d_schoolfilter["budget"].sum()],"Average Math Score":[d_students["math_score"].mean()],
     "Average Reading Score":[d_students["reading_score"].mean()],"% Passing Math":[d_studentspassmathnum],
     "% Passing Reading":[d_studentspassreadnum],"Overall Passing Rate":[avgpassrate]
     }))
d_summarydata["Total Students"]=d_summarydata["Total Students"].map("{:,.0f}".format)
d_summarydata["Total Budget"]=d_summarydata["Total Budget"].map("${:,.0f}".format)
d_summarydata["Average Math Score"]=d_summarydata["Average Math Score"].map("{:.2f}".format)
d_summarydata["Average Reading Score"]=d_summarydata["Average Reading Score"].map("{:.2f}".format)
d_summarydata["% Passing Math"]=d_summarydata["% Passing Math"].map("{:.2f}%".format)
d_summarydata["% Passing Reading"]=d_summarydata["% Passing Reading"].map("{:.2f}%".format)
d_summarydata["Overall Passing Rate"]=d_summarydata["Overall Passing Rate"].map("{:.2f}%".format)
d_summarydata

Unnamed: 0,Total Schools,Total Students,Total Budget,Average Math Score,Average Reading Score,% Passing Math,% Passing Reading,Overall Passing Rate
0,7,26976,"$17,347,923",76.99,80.96,77.82%,94.48%,86.15%


*The above chart shows an overall passing rate of 86%, which indicates a student is passing reading or math, but not 
necessary both. The actual rate of passing both reading and math is 74%. 

### School Summary

In [100]:
#GroupBy Schools/Count Students
school_groups=studentschoolinner.groupby("School")
budg_student=school_groups["budget"].max()/school_groups["Student ID"].count()
school_avgread=school_groups["reading_score"].mean()
school_avgmath=school_groups["math_score"].mean()

In [101]:
#%Passing Math
st_numpassread=school_groups["reading_score"].apply(lambda x: x[x>=passthreshold].count())
st_perpassread=(st_numpassread/school_groups["Student ID"].count())*100
st_numpassmath=school_groups["math_score"].apply(lambda x: x[x>=passthreshold].count())
st_perpassmath=(st_numpassmath/school_groups["Student ID"].count())*100
st_avgpass=(st_perpassread+st_perpassmath)/2
school_group_df=pd.DataFrame(OrderedDict({"School Type": school_groups["type"].max(),
                                          "Total Students":school_groups["Student ID"].count(),
                              "Total School Budget":school_groups["budget"].max(),
                              "Per Student Budget":budg_student,"Average Math Score":school_avgmath,
                            "Average Reading Score":school_avgread,"% Passing Math":st_perpassmath,
                                "% Passing Reading":st_perpassread,"% Overall Passing":st_avgpass}))
school_group_df_formats=pd.DataFrame(OrderedDict({"School Type": school_groups["type"].max(),
                                          "Total Students":school_groups["Student ID"].count(),
                              "Total School Budget":school_groups["budget"].max(),
                              "Per Student Budget":budg_student,"Average Math Score":school_avgmath,
                            "Average Reading Score":school_avgread,"% Passing Math":st_perpassmath,
                                "% Passing Reading":st_perpassread,"% Overall Passing":st_avgpass}))
school_group_df_formats["Total Students"]=school_group_df_formats["Total Students"].map("{:,.0f}".format)
school_group_df_formats["Total School Budget"]=school_group_df_formats["Total School Budget"].map("${:,.0f}".format)
school_group_df_formats["Per Student Budget"]=school_group_df_formats["Per Student Budget"].map("${:,.2f}".format)
school_group_df_formats["Average Math Score"]=school_group_df_formats["Average Math Score"].map("{:.2f}".format)
school_group_df_formats["Average Reading Score"]=school_group_df_formats["Average Reading Score"].map("{:.2f}".format)
school_group_df_formats["% Passing Math"]=school_group_df_formats["% Passing Math"].map("{:.2f}%".format)
school_group_df_formats["% Passing Reading"]=school_group_df_formats["% Passing Reading"].map("{:.2f}%".format)
school_group_df_formats["%Overall Passing Rate"]=school_group_df_formats["% Overall Passing"].map("{:.2f}%".format)

school_group_df_formats

Unnamed: 0_level_0,School Type,Total Students,Total School Budget,Per Student Budget,Average Math Score,Average Reading Score,% Passing Math,% Passing Reading,% Overall Passing,%Overall Passing Rate
School,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
Bailey High School,District,4976,"$3,124,928",$628.00,77.05,81.03,77.91%,94.55%,86.233923,86.23%
Cabrera High School,Charter,1858,"$1,081,356",$582.00,83.06,83.98,100.00%,100.00%,100.0,100.00%
Figueroa High School,District,2949,"$1,884,411",$639.00,76.71,81.16,77.18%,94.54%,85.859613,85.86%
Ford High School,District,2739,"$1,763,916",$644.00,77.1,80.75,78.20%,93.87%,86.035049,86.04%
Griffin High School,Charter,1468,"$917,500",$625.00,83.35,83.82,100.00%,100.00%,100.0,100.00%
Hernandez High School,District,4635,"$3,022,020",$652.00,77.29,80.93,77.73%,94.61%,86.170442,86.17%
Holden High School,Charter,427,"$248,087",$581.00,83.8,83.81,100.00%,100.00%,100.0,100.00%
Huang High School,District,2917,"$1,910,635",$655.00,76.63,81.18,77.72%,94.48%,86.098732,86.10%
Johnson High School,District,4761,"$3,094,650",$650.00,77.07,80.97,77.97%,94.48%,86.221382,86.22%
Pena High School,Charter,962,"$585,858",$609.00,83.84,84.04,100.00%,100.00%,100.0,100.00%


### Display Botton 5 Schools Based on %Total Passing

In [102]:
school_group_df.sort_values("% Overall Passing").head(5)

Unnamed: 0_level_0,School Type,Total Students,Total School Budget,Per Student Budget,Average Math Score,Average Reading Score,% Passing Math,% Passing Reading,% Overall Passing
School,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
Figueroa High School,District,2949,1884411,639.0,76.711767,81.15802,77.178705,94.540522,85.859613
Ford High School,District,2739,1763916,644.0,77.102592,80.746258,78.203724,93.866375,86.035049
Huang High School,District,2917,1910635,655.0,76.629414,81.182722,77.716832,94.480631,86.098732
Hernandez High School,District,4635,3022020,652.0,77.289752,80.934412,77.734628,94.606257,86.170442
Johnson High School,District,4761,3094650,650.0,77.072464,80.966394,77.966814,94.47595,86.221382


### Display Top 5 Schools Based on %Total Passing

In [103]:
school_group_df.sort_values("% Overall Passing",ascending=False).head(5)

Unnamed: 0_level_0,School Type,Total Students,Total School Budget,Per Student Budget,Average Math Score,Average Reading Score,% Passing Math,% Passing Reading,% Overall Passing
School,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
Cabrera High School,Charter,1858,1081356,582.0,83.061895,83.97578,100.0,100.0,100.0
Griffin High School,Charter,1468,917500,625.0,83.351499,83.816757,100.0,100.0,100.0
Holden High School,Charter,427,248087,581.0,83.803279,83.814988,100.0,100.0,100.0
Pena High School,Charter,962,585858,609.0,83.839917,84.044699,100.0,100.0,100.0
Shelton High School,Charter,1761,1056600,600.0,83.359455,83.725724,100.0,100.0,100.0


### Average Math Score by Grade/School

In [104]:
mathavgbygrade=studentschoolinner["math_score"].groupby([studentschoolinner["School"],studentschoolinner["grade"]]).mean().unstack()
mathavgbygrade

grade,10th,11th,12th,9th
School,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Bailey High School,76.996772,77.515588,76.492218,77.083676
Cabrera High School,83.154506,82.76556,83.277487,83.094697
Figueroa High School,76.539974,76.884344,77.151369,76.403037
Ford High School,77.672316,76.918058,76.179963,77.361345
Griffin High School,84.229064,83.842105,83.356164,82.04401
Hernandez High School,77.337408,77.136029,77.186567,77.438495
Holden High School,83.429825,85.0,82.855422,83.787402
Huang High School,75.908735,76.446602,77.225641,77.027251
Johnson High School,76.691117,77.491653,76.863248,77.187857
Pena High School,83.372,84.328125,84.121547,83.625455


In [105]:
readavgbygrade=studentschoolinner["reading_score"].groupby([studentschoolinner["School"],studentschoolinner["grade"]]).mean().unstack()
readavgbygrade


grade,10th,11th,12th,9th
School,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Bailey High School,80.907183,80.945643,80.912451,81.303155
Cabrera High School,84.253219,83.788382,84.287958,83.676136
Figueroa High School,81.408912,80.640339,81.384863,81.198598
Ford High School,81.262712,80.403642,80.662338,80.632653
Griffin High School,83.706897,84.288089,84.013699,83.369193
Hernandez High School,80.660147,81.39614,80.857143,80.86686
Holden High School,83.324561,83.815534,84.698795,83.677165
Huang High School,81.512386,81.417476,80.305983,81.290284
Johnson High School,80.773431,80.616027,81.227564,81.260714
Pena High School,83.612,84.335938,84.59116,83.807273


In [106]:
#Budget Bins500,600,700,800
budgbins=[0,580,605,630,655]
budgbinlabels=["<580","580-605","605-630","630-655"]
# school_group_df["Per Student Budget"]
school_group_df["Student Budget Tiers"]=pd.cut(school_group_df["Per Student Budget"],budgbins,labels=budgbinlabels)
budgroup=school_group_df.groupby("Student Budget Tiers").mean()
budgroup[["Average Math Score","Average Reading Score","% Passing Math","% Passing Reading","% Overall Passing"]]

Unnamed: 0_level_0,Average Math Score,Average Reading Score,% Passing Math,% Passing Reading,% Overall Passing
Student Budget Tiers,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
<580,83.274201,83.989488,100.0,100.0,100.0
580-605,83.476713,83.867873,100.0,100.0,100.0
605-630,81.413283,82.96514,92.637996,98.18462,95.411308
630-655,77.866721,81.368774,80.963598,95.227627,88.095613


In [107]:
schsizebins=[0,1000,2000,5000]
schsizebinlabels=["<1000","1000-2000","2000-5000"]
school_group_df["School Size"]=pd.cut(school_group_df["Total Students"],bins=schsizebins,labels=schsizebinlabels)
# school_group_df["Per Student Budget"]
schsizegroup=school_group_df.groupby("School Size").mean()
schsizegroup[["Average Math Score","Average Reading Score","% Passing Math","% Passing Reading","% Overall Passing"]]

Unnamed: 0_level_0,Average Math Score,Average Reading Score,% Passing Math,% Passing Reading,% Overall Passing
School Size,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
<1000,83.821598,83.929843,100.0,100.0,100.0
1000-2000,83.374684,83.864438,100.0,100.0,100.0
2000-5000,77.746417,81.344493,80.582397,95.143406,87.862902


In [108]:
schoolscoretype=school_group_df.groupby("School Type").mean()
schoolscoretype[["Average Math Score","Average Reading Score","% Passing Math","% Passing Reading","% Overall Passing"]]

Unnamed: 0_level_0,Average Math Score,Average Reading Score,% Passing Math,% Passing Reading,% Overall Passing
School Type,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Charter,83.473852,83.896421,100.0,100.0,100.0
District,76.956733,80.966636,77.808454,94.449607,86.12903
