# PyCity Schools Analysis

* As a whole, schools with higher budgets, did not yield better test results. By contrast, schools with higher spending per student actually (\$645-675) underperformed compared to schools with smaller budgets (<\$585 per student).

* As a whole, smaller and medium sized schools dramatically out-performed large sized schools on passing math performances (89-91% passing vs 67%).

* As a whole, charter schools out-performed the public district schools across all metrics. However, more analysis will be required to glean if the effect is due to school practices or the fact that charter schools tend to serve smaller student populations per school. 
---

### Note
* Instructions have been included for each segment. You do not have to follow them exactly, but they are included to help you think through the steps.

In [56]:
# Dependencies and Setup
import pandas as pd
import numpy as np

# File to Load (Remember to Change These)
school_data_to_load = "../Resources/schools_complete.csv"
student_data_to_load = "../Resources/students_complete.csv"

# Read School and Student Data File and store into Pandas Data Frames
school_data = pd.read_csv(school_data_to_load)
student_data = pd.read_csv(student_data_to_load)

# Combine the data into a single dataset
sdc = pd.merge(student_data, school_data, how="left", on=["school_name", "school_name"])

SyntaxError: invalid syntax (<ipython-input-56-c0be00b24948>, line 6)

In [2]:
# Identify the columns in dataframe
school_data.columns

Index(['School ID', 'school_name', 'type', 'size', 'budget'], dtype='object')

In [3]:
# Identify the columns in dataframe
sdc.columns

Index(['Student ID', 'student_name', 'gender', 'grade', 'school_name',
       'reading_score', 'math_score', 'School ID', 'type', 'size', 'budget'],
      dtype='object')

In [4]:
# Identify the first five columns in dataframe
sdc.head()

Unnamed: 0,Student ID,student_name,gender,grade,school_name,reading_score,math_score,School ID,type,size,budget
0,0,Paul Bradley,M,9th,Huang High School,66,79,0,District,2917,1910635
1,1,Victor Smith,M,12th,Huang High School,94,61,0,District,2917,1910635
2,2,Kevin Rodriguez,M,12th,Huang High School,90,60,0,District,2917,1910635
3,3,Dr. Richard Scott,M,12th,Huang High School,67,58,0,District,2917,1910635
4,4,Bonnie Ray,F,9th,Huang High School,97,84,0,District,2917,1910635


## District Summary

* Calculate the total number of schools

* Calculate the total number of students

* Calculate the total budget

* Calculate the average math score 

* Calculate the average reading score

* Calculate the overall passing rate (overall average score), i.e. (avg. math score + avg. reading score)/2

* Calculate the percentage of students with a passing math score (70 or greater)

* Calculate the percentage of students with a passing reading score (70 or greater)

* Create a dataframe to hold the above results

* Optional: give the displayed data cleaner formatting

In [5]:
# Identify the number of schools
schools_qty = school_data["School ID"].unique()
schools =len(schools_qty)
schools

15

In [6]:
# list the total number of students 
student_total = len(sdc['student_name'])
student_total

39170

In [7]:
# Identify the total budget for all schools
sch_buget = school_data['budget'].sum()
sch_buget

24649428

In [8]:
# Identify the average math scores
avg_math_score = sdc['math_score'].mean()
avg_math_score

78.98537145774827

In [9]:
# Identify the average reading scores
avg_reading_score=sdc['reading_score'].mean()
avg_reading_score

81.87784018381414

In [10]:
# Identify the number of student with passing math scores
math_pass = sdc.loc[sdc['math_score'][:] > 69]

# used to get a count of student with passing math scores
math_pass['math_score'].count()

29370

In [11]:
# Identify the number of student with passing reading scores
reading_pass = sdc.loc[sdc['reading_score'][:] > 69]

In [12]:
# Identify the percentage of student with passing math scores
x = math_pass['math_score'].count()/student_total*100
x

74.980852693387803

In [13]:
# Identify the percentage of student with passing reading scores
y = reading_pass['reading_score'].count()/student_total*100
y

85.805463364820014

In [14]:
# Identify the percentage of overall student with passing scores
overall = (avg_math_score + avg_reading_score)/2
overall

80.43160582078121

In [15]:
# created dataframe with the info above
school_df = {"Total Schools":[schools],"Total Students":[student_total],"Total Budget":[sch_buget],"Average Math Score":[avg_math_score],"Average Reading Score":[avg_reading_score],"% Passing Math":[x],"% Passing Reading":[y],"% Overall Passing Rate":[overall]}
school_info_df = pd.DataFrame(school_df)

# Some cause the dataframe to print in reverse, below printed the headers in the correct order
school_info_df = school_info_df[["Total Schools", "Total Students", "Total Budget", "Average Math Score","Average Reading Score", "% Passing Math", "% Passing Reading", "% Overall Passing Rate"]]
school_info_df

Unnamed: 0,Total Schools,Total Students,Total Budget,Average Math Score,Average Reading Score,% Passing Math,% Passing Reading,% Overall Passing Rate
0,15,39170,24649428,78.985371,81.87784,74.980853,85.805463,80.431606


## School Summary

* Create an overview table that summarizes key metrics about each school, including:
  * School Name
  * School Type
  * Total Students
  * Total School Budget
  * Per Student Budget
  * Average Math Score
  * Average Reading Score
  * % Passing Math
  * % Passing Reading
  * Overall Passing Rate (Average of the above two)
  
* Create a dataframe to hold the above results

## Top Performing Schools (By Passing Rate)

* Sort and display the top five schools in overall passing rate

In [16]:
# Below each column are the individual columns created 
# separated before combining to make the total summarized output

In [17]:
sdc.columns

Index(['Student ID', 'student_name', 'gender', 'grade', 'school_name',
       'reading_score', 'math_score', 'School ID', 'type', 'size', 'budget'],
      dtype='object')

In [18]:
type_sch=sdc.set_index(["school_name"])["type"]
type_sch =sdc.groupby(["school_name"])["type"]
type_sch

<pandas.core.groupby.SeriesGroupBy object at 0x112ad81d0>

In [19]:
size_sch = sdc["school_name"].value_counts()
size_sch

Bailey High School       4976
Johnson High School      4761
Hernandez High School    4635
Rodriguez High School    3999
Figueroa High School     2949
Huang High School        2917
Ford High School         2739
Wilson High School       2283
Cabrera High School      1858
Wright High School       1800
Shelton High School      1761
Thomas High School       1635
Griffin High School      1468
Pena High School          962
Holden High School        427
Name: school_name, dtype: int64

In [20]:
sch_budget = sdc.groupby(["school_name"])["budget"].mean()
sch_budget

school_name
Bailey High School       3124928
Cabrera High School      1081356
Figueroa High School     1884411
Ford High School         1763916
Griffin High School       917500
Hernandez High School    3022020
Holden High School        248087
Huang High School        1910635
Johnson High School      3094650
Pena High School          585858
Rodriguez High School    2547363
Shelton High School      1056600
Thomas High School       1043130
Wilson High School       1319574
Wright High School       1049400
Name: budget, dtype: int64

In [21]:
# Drop the school id column

school_data.drop(school_data.columns[0], axis=1, inplace=True)

school_data

Unnamed: 0,school_name,type,size,budget
0,Huang High School,District,2917,1910635
1,Figueroa High School,District,2949,1884411
2,Shelton High School,Charter,1761,1056600
3,Hernandez High School,District,4635,3022020
4,Griffin High School,Charter,1468,917500
5,Wilson High School,Charter,2283,1319574
6,Cabrera High School,Charter,1858,1081356
7,Bailey High School,District,4976,3124928
8,Holden High School,Charter,427,248087
9,Pena High School,Charter,962,585858


In [22]:
# created the total budget per student column
Per_Stud_Budget = sch_budget/size_sch
Per_Stud_Budget

Bailey High School       628.0
Cabrera High School      582.0
Figueroa High School     639.0
Ford High School         644.0
Griffin High School      625.0
Hernandez High School    652.0
Holden High School       581.0
Huang High School        655.0
Johnson High School      650.0
Pena High School         609.0
Rodriguez High School    637.0
Shelton High School      600.0
Thomas High School       638.0
Wilson High School       578.0
Wright High School       583.0
dtype: float64

In [23]:
# The individual school average math score
sch_math_avg = sdc.groupby(['school_name'])['math_score'].mean()

sch_math_avg

school_name
Bailey High School       77.048432
Cabrera High School      83.061895
Figueroa High School     76.711767
Ford High School         77.102592
Griffin High School      83.351499
Hernandez High School    77.289752
Holden High School       83.803279
Huang High School        76.629414
Johnson High School      77.072464
Pena High School         83.839917
Rodriguez High School    76.842711
Shelton High School      83.359455
Thomas High School       83.418349
Wilson High School       83.274201
Wright High School       83.682222
Name: math_score, dtype: float64

In [24]:
# The individual school average reading score
sch_reading_avg = sdc.groupby(['school_name'])['reading_score'].mean()

sch_reading_avg

school_name
Bailey High School       81.033963
Cabrera High School      83.975780
Figueroa High School     81.158020
Ford High School         80.746258
Griffin High School      83.816757
Hernandez High School    80.934412
Holden High School       83.814988
Huang High School        81.182722
Johnson High School      80.966394
Pena High School         84.044699
Rodriguez High School    80.744686
Shelton High School      83.725724
Thomas High School       83.848930
Wilson High School       83.989488
Wright High School       83.955000
Name: reading_score, dtype: float64

In [25]:
# The individual school average overall score
stu_overall = (sch_math_avg + sch_reading_avg)/2
stu_overall

school_name
Bailey High School       79.041198
Cabrera High School      83.518837
Figueroa High School     78.934893
Ford High School         78.924425
Griffin High School      83.584128
Hernandez High School    79.112082
Holden High School       83.809133
Huang High School        78.906068
Johnson High School      79.019429
Pena High School         83.942308
Rodriguez High School    78.793698
Shelton High School      83.542589
Thomas High School       83.633639
Wilson High School       83.631844
Wright High School       83.818611
dtype: float64

In [26]:
# The individual school passing reading score
stu_read_pas = sdc.loc[sdc['reading_score'] >= 70]
#stu_read_pas

In [27]:
total_stud = sdc['school_name'].value_counts()
total_stud

Bailey High School       4976
Johnson High School      4761
Hernandez High School    4635
Rodriguez High School    3999
Figueroa High School     2949
Huang High School        2917
Ford High School         2739
Wilson High School       2283
Cabrera High School      1858
Wright High School       1800
Shelton High School      1761
Thomas High School       1635
Griffin High School      1468
Pena High School          962
Holden High School        427
Name: school_name, dtype: int64

In [28]:
stu_grp_reading = stu_read_pas['school_name'].value_counts()
stu_grp_reading

stu_per_reading =stu_grp_reading/total_stud*100
stu_per_reading

Bailey High School       81.933280
Cabrera High School      97.039828
Figueroa High School     80.739234
Ford High School         79.299014
Griffin High School      97.138965
Hernandez High School    80.862999
Holden High School       96.252927
Huang High School        81.316421
Johnson High School      81.222432
Pena High School         95.945946
Rodriguez High School    80.220055
Shelton High School      95.854628
Thomas High School       97.308869
Wilson High School       96.539641
Wright High School       96.611111
Name: school_name, dtype: float64

In [29]:
# The individual school passing math score
stu_math_pas = sdc.loc[(sdc['math_score'] >= 70)]
stu_grp_math = stu_math_pas['school_name'].value_counts()
stu_per_math =stu_grp_math/total_stud*100

stu_per_math



Bailey High School       66.680064
Cabrera High School      94.133477
Figueroa High School     65.988471
Ford High School         68.309602
Griffin High School      93.392371
Hernandez High School    66.752967
Holden High School       92.505855
Huang High School        65.683922
Johnson High School      66.057551
Pena High School         94.594595
Rodriguez High School    66.366592
Shelton High School      93.867121
Thomas High School       93.272171
Wilson High School       93.867718
Wright High School       93.333333
Name: school_name, dtype: float64

In [30]:
# The individual school average overall score
stu_per_overall = (stu_per_math + stu_per_reading)/2
stu_per_overall

Bailey High School       74.306672
Cabrera High School      95.586652
Figueroa High School     73.363852
Ford High School         73.804308
Griffin High School      95.265668
Hernandez High School    73.807983
Holden High School       94.379391
Huang High School        73.500171
Johnson High School      73.639992
Pena High School         95.270270
Rodriguez High School    73.293323
Shelton High School      94.860875
Thomas High School       95.290520
Wilson High School       95.203679
Wright High School       94.972222
Name: school_name, dtype: float64

In [31]:
# An attempt at printing out the results from the indivdual columns above for the schools
school_per_df = pd.DataFrame({"School Type": type_sch,"Total Students":size_sch,"Total School Budget":sch_budget,
                              "Per Student Budget":Per_Stud_Budget,"Average Math Score":sch_math_avg,"Average Reading Score":sch_reading_avg,
                              "% Passing Math":stu_per_math,"% Passing Reading":stu_per_reading,"% Overall Passing Rate":stu_per_overall})

school_per_df

Unnamed: 0,% Overall Passing Rate,% Passing Math,% Passing Reading,Average Math Score,Average Reading Score,Per Student Budget,School Type,Total School Budget,Total Students
Bailey High School,74.306672,66.680064,81.93328,77.048432,81.033963,628.0,"(Bailey High School, [District, District, Dist...",3124928,4976
Cabrera High School,95.586652,94.133477,97.039828,83.061895,83.97578,582.0,"(Cabrera High School, [Charter, Charter, Chart...",1081356,1858
Figueroa High School,73.363852,65.988471,80.739234,76.711767,81.15802,639.0,"(Figueroa High School, [District, District, Di...",1884411,2949
Ford High School,73.804308,68.309602,79.299014,77.102592,80.746258,644.0,"(Ford High School, [District, District, Distri...",1763916,2739
Griffin High School,95.265668,93.392371,97.138965,83.351499,83.816757,625.0,"(Griffin High School, [Charter, Charter, Chart...",917500,1468
Hernandez High School,73.807983,66.752967,80.862999,77.289752,80.934412,652.0,"(Hernandez High School, [District, District, D...",3022020,4635
Holden High School,94.379391,92.505855,96.252927,83.803279,83.814988,581.0,"(Holden High School, [Charter, Charter, Charte...",248087,427
Huang High School,73.500171,65.683922,81.316421,76.629414,81.182722,655.0,"(Huang High School, [District, District, Distr...",1910635,2917
Johnson High School,73.639992,66.057551,81.222432,77.072464,80.966394,650.0,"(Johnson High School, [District, District, Dis...",3094650,4761
Pena High School,95.27027,94.594595,95.945946,83.839917,84.044699,609.0,"(Pena High School, [Charter, Charter, Charter,...",585858,962


In [32]:
school_per_df.sort_values(["% Passing Reading","% Passing Math"],ascending=False).head()
#school_per_df = school_per_df.reset_index()

Unnamed: 0,% Overall Passing Rate,% Passing Math,% Passing Reading,Average Math Score,Average Reading Score,Per Student Budget,School Type,Total School Budget,Total Students
Thomas High School,95.29052,93.272171,97.308869,83.418349,83.84893,638.0,"(Thomas High School, [Charter, Charter, Charte...",1043130,1635
Griffin High School,95.265668,93.392371,97.138965,83.351499,83.816757,625.0,"(Griffin High School, [Charter, Charter, Chart...",917500,1468
Cabrera High School,95.586652,94.133477,97.039828,83.061895,83.97578,582.0,"(Cabrera High School, [Charter, Charter, Chart...",1081356,1858
Wright High School,94.972222,93.333333,96.611111,83.682222,83.955,583.0,"(Wright High School, [Charter, Charter, Charte...",1049400,1800
Wilson High School,95.203679,93.867718,96.539641,83.274201,83.989488,578.0,"(Wilson High School, [Charter, Charter, Charte...",1319574,2283


## Bottom Performing Schools (By Passing Rate)

* Sort and display the five worst-performing schools

In [33]:
school_per_df.sort_values(["% Passing Reading","% Passing Math"],ascending=False).tail()

Unnamed: 0,% Overall Passing Rate,% Passing Math,% Passing Reading,Average Math Score,Average Reading Score,Per Student Budget,School Type,Total School Budget,Total Students
Johnson High School,73.639992,66.057551,81.222432,77.072464,80.966394,650.0,"(Johnson High School, [District, District, Dis...",3094650,4761
Hernandez High School,73.807983,66.752967,80.862999,77.289752,80.934412,652.0,"(Hernandez High School, [District, District, D...",3022020,4635
Figueroa High School,73.363852,65.988471,80.739234,76.711767,81.15802,639.0,"(Figueroa High School, [District, District, Di...",1884411,2949
Rodriguez High School,73.293323,66.366592,80.220055,76.842711,80.744686,637.0,"(Rodriguez High School, [District, District, D...",2547363,3999
Ford High School,73.804308,68.309602,79.299014,77.102592,80.746258,644.0,"(Ford High School, [District, District, Distri...",1763916,2739


## Math Scores by Grade

* Create a table that lists the average Reading Score for students of each grade level (9th, 10th, 11th, 12th) at each school.

  * Create a pandas series for each grade. Hint: use a conditional statement.
  
  * Group each series by school
  
  * Combine the series into a dataframe
  
  * Optional: give the displayed data cleaner formatting

In [34]:
sdc.columns

Index(['Student ID', 'student_name', 'gender', 'grade', 'school_name',
       'reading_score', 'math_score', 'School ID', 'type', 'size', 'budget'],
      dtype='object')

In [35]:
df_9 = sdc[sdc['grade']=='9th'].groupby(['school_name'])['math_score'].mean()
df_10 = sdc[sdc['grade']=='10th'].groupby(['school_name'])['math_score'].mean()
df_11 = sdc[sdc['grade']=='11th'].groupby(['school_name'])['math_score'].mean()
df_12 = sdc[sdc['grade']=='9th'].groupby(['school_name'])['math_score'].mean()
df_9,df_10

(school_name
 Bailey High School       77.083676
 Cabrera High School      83.094697
 Figueroa High School     76.403037
 Ford High School         77.361345
 Griffin High School      82.044010
 Hernandez High School    77.438495
 Holden High School       83.787402
 Huang High School        77.027251
 Johnson High School      77.187857
 Pena High School         83.625455
 Rodriguez High School    76.859966
 Shelton High School      83.420755
 Thomas High School       83.590022
 Wilson High School       83.085578
 Wright High School       83.264706
 Name: math_score, dtype: float64, school_name
 Bailey High School       76.996772
 Cabrera High School      83.154506
 Figueroa High School     76.539974
 Ford High School         77.672316
 Griffin High School      84.229064
 Hernandez High School    77.337408
 Holden High School       83.429825
 Huang High School        75.908735
 Johnson High School      76.691117
 Pena High School         83.372000
 Rodriguez High School    76.612500
 She

In [36]:
school_math_df = pd.DataFrame({"9th":df_9,"10th":df_10,"11th":df_11,"12th":df_12})

school_math_df

Unnamed: 0_level_0,10th,11th,12th,9th
school_name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Bailey High School,76.996772,77.515588,77.083676,77.083676
Cabrera High School,83.154506,82.76556,83.094697,83.094697
Figueroa High School,76.539974,76.884344,76.403037,76.403037
Ford High School,77.672316,76.918058,77.361345,77.361345
Griffin High School,84.229064,83.842105,82.04401,82.04401
Hernandez High School,77.337408,77.136029,77.438495,77.438495
Holden High School,83.429825,85.0,83.787402,83.787402
Huang High School,75.908735,76.446602,77.027251,77.027251
Johnson High School,76.691117,77.491653,77.187857,77.187857
Pena High School,83.372,84.328125,83.625455,83.625455


## Reading Score by Grade 

* Perform the same operations as above for reading scores

In [37]:
df_9 = sdc[sdc['grade']=='9th'].groupby(['school_name'])['reading_score'].mean()
df_10 = sdc[sdc['grade']=='10th'].groupby(['school_name'])['reading_score'].mean()
df_11 = sdc[sdc['grade']=='11th'].groupby(['school_name'])['reading_score'].mean()
df_12 = sdc[sdc['grade']=='9th'].groupby(['school_name'])['reading_score'].mean()
df_9,df_10

school_reading_df = pd.DataFrame({"9th":df_9,"10th":df_10,"11th":df_11,"12th":df_12})

school_reading_df

Unnamed: 0_level_0,10th,11th,12th,9th
school_name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Bailey High School,80.907183,80.945643,81.303155,81.303155
Cabrera High School,84.253219,83.788382,83.676136,83.676136
Figueroa High School,81.408912,80.640339,81.198598,81.198598
Ford High School,81.262712,80.403642,80.632653,80.632653
Griffin High School,83.706897,84.288089,83.369193,83.369193
Hernandez High School,80.660147,81.39614,80.86686,80.86686
Holden High School,83.324561,83.815534,83.677165,83.677165
Huang High School,81.512386,81.417476,81.290284,81.290284
Johnson High School,80.773431,80.616027,81.260714,81.260714
Pena High School,83.612,84.335938,83.807273,83.807273


## Scores by School Spending

* Create a table that breaks down school performances based on average Spending Ranges (Per Student). Use 4 reasonable bins to group school spending. Include in the table each of the following:
  * Average Math Score
  * Average Reading Score
  * % Passing Math
  * % Passing Reading
  * Overall Passing Rate (Average of the above two)

In [38]:
# Sample bins. Feel free to create your own bins.
spending_bins = [0, 585, 615, 645, 675]
group_names = ["<$585", "$585-615", "$615-645", "$645-675"]

school_per_df.head()

Unnamed: 0,% Overall Passing Rate,% Passing Math,% Passing Reading,Average Math Score,Average Reading Score,Per Student Budget,School Type,Total School Budget,Total Students
Bailey High School,74.306672,66.680064,81.93328,77.048432,81.033963,628.0,"(Bailey High School, [District, District, Dist...",3124928,4976
Cabrera High School,95.586652,94.133477,97.039828,83.061895,83.97578,582.0,"(Cabrera High School, [Charter, Charter, Chart...",1081356,1858
Figueroa High School,73.363852,65.988471,80.739234,76.711767,81.15802,639.0,"(Figueroa High School, [District, District, Di...",1884411,2949
Ford High School,73.804308,68.309602,79.299014,77.102592,80.746258,644.0,"(Ford High School, [District, District, Distri...",1763916,2739
Griffin High School,95.265668,93.392371,97.138965,83.351499,83.816757,625.0,"(Griffin High School, [Charter, Charter, Chart...",917500,1468


In [39]:
school_per_df = school_per_df.reset_index(drop=True)

In [40]:
print(type(school_per_df))

<class 'pandas.core.frame.DataFrame'>


In [260]:
#scores_by_sch["School Size"]=pd.cut(school_summery_df["Total Students"],size_bins , labels=group_names)

In [42]:
school_per_df["School Spending per Student"]=pd.cut(school_per_df["Total School Budget"]/school_per_df["Total Students"],spending_bins , labels=group_names)

In [43]:
grp_sch_spending = school_per_df.groupby(school_per_df["School Spending per Student"])
grp_sch_spending.mean()

Unnamed: 0_level_0,% Overall Passing Rate,% Passing Math,% Passing Reading,Average Math Score,Average Reading Score,Per Student Budget,Total School Budget,Total Students
School Spending per Student,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
$585-615,95.065572,94.230858,95.900287,83.599686,83.885211,604.5,821229.0,1361.5
$615-645,80.887391,75.668212,86.106569,79.079225,81.891436,635.166667,1880208.0,2961.0
$645-675,73.649382,66.164813,81.133951,76.99721,81.027843,652.333333,2675768.0,4104.333333
<$585,95.035486,93.460096,96.610877,83.455399,83.933814,581.0,924604.2,1592.0


## Scores by School Size

* Perform the same operations as above, based on school size.

In [175]:
#school_per_df = school_per_df.reset_index(drop=True)

In [44]:
# Sample bins. Feel free to create your own bins.
size_bins = [0, 1000, 2000, 5000]
group_names = ["Small (<1000)", "Medium (1000-2000)", "Large (2000-5000)"]

In [46]:
school_per_df["School Size"]=pd.cut(school_per_df["Total Students"],size_bins , labels=group_names)

In [49]:
grp_sch_size = school_per_df.groupby(school_per_df["School Size"])
grp_sch_size.mean()

Unnamed: 0_level_0,% Overall Passing Rate,% Passing Math,% Passing Reading,Average Math Score,Average Reading Score,Per Student Budget,Total School Budget,Total Students
School Size,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
Large (2000-5000),76.364998,69.963361,82.766634,77.746417,81.344493,635.375,2333437.125,3657.375
Medium (1000-2000),95.195187,93.599695,96.79068,83.374684,83.864438,605.6,1029597.2,1704.4
Small (<1000),94.824831,93.550225,96.099437,83.821598,83.929843,595.0,416972.5,694.5


## Scores by School Type

* Perform the same operations as above, based on school type.

In [188]:
#school_per_df = school_per_df.reset_index(drop=True)

In [55]:
# Sample bins. Feel free to create your own bins.
#size_bins = [1, 2]
#group_names = ["District", "Charter"]
#school_per_df["School Types"]=pd.cut(school_per_df["School Type"],size_bins , labels=group_names)

In [53]:
grp_sch_type = school_per_df.groupby(["School Type"])
grp_sch_type.mean()

TypeError: 'Series' objects are mutable, thus they cannot be hashed

In [None]:
# A key observation:
# The average reading scores are not much diffent between the top performing schools and the bottom 
# performing schools. I can see that there is a greater difference between the average math scores.
# A larger student body my identify why there is a difference in passing percentages for both
# reading and math.

# A key observation:
# There is little difference in money spent per student between the top performing school and the
# bottom schools. It is however worth noting that the top schools are charter and bottom are 
# district. I would not say that money is key but a closer look should be given at what 
# methods are successful.