# PyCity Schools Analysis

* As a whole, schools with higher budgets, did not yield better test results. By contrast, schools with higher spending per student actually (\$645-675) underperformed compared to schools with smaller budgets (<\$585 per student).

* As a whole, smaller and medium sized schools dramatically out-performed large sized schools on passing math performances (89-91% passing vs 67%).

* As a whole, charter schools out-performed the public district schools across all metrics. However, more analysis will be required to glean if the effect is due to school practices or the fact that charter schools tend to serve smaller student populations per school. 
---

### Note
* Instructions have been included for each segment. You do not have to follow them exactly, but they are included to help you think through the steps.

In [1]:
# Dependencies and Setup
import pandas as pd
import numpy as np

# File to Load (Remember to Change These)
school_data_to_load = "Resources/schools_complete.csv"
student_data_to_load = "Resources/students_complete.csv"

# Read School and Student Data File and store into Pandas Data Frames
school_data = pd.read_csv(school_data_to_load)
student_data = pd.read_csv(student_data_to_load)

# Combine the data into a single dataset
school_data_complete = pd.merge(student_data, school_data, how="left", on=["school_name", "school_name"])

school_data_complete = school_data_complete.rename(columns={"school_name":"School Name"})

school_data_complete.head(20)


Unnamed: 0,Student ID,student_name,gender,grade,School Name,reading_score,math_score,School ID,type,size,budget
0,0,Paul Bradley,M,9th,Huang High School,66,79,0,District,2917,1910635
1,1,Victor Smith,M,12th,Huang High School,94,61,0,District,2917,1910635
2,2,Kevin Rodriguez,M,12th,Huang High School,90,60,0,District,2917,1910635
3,3,Dr. Richard Scott,M,12th,Huang High School,67,58,0,District,2917,1910635
4,4,Bonnie Ray,F,9th,Huang High School,97,84,0,District,2917,1910635
5,5,Bryan Miranda,M,9th,Huang High School,94,94,0,District,2917,1910635
6,6,Sheena Carter,F,11th,Huang High School,82,80,0,District,2917,1910635
7,7,Nicole Baker,F,12th,Huang High School,96,69,0,District,2917,1910635
8,8,Michael Roth,M,10th,Huang High School,95,87,0,District,2917,1910635
9,9,Matthew Greene,M,10th,Huang High School,96,84,0,District,2917,1910635


# District Summary

 * Create a high level snapshot (in table form) of the district's key metrics, including:
 * Total Schools
 * Total Students
 * Total Budget
 * Average Math Score
 * Average Reading Score
 * % Passing Math 
    * use conditionals to filter out scores grater than passing limit (60? 70?) and dvide by total number of students
 * % Passing Reading
 * Overall Passing Rate (Average of the above two)

In [2]:
school_data_complete.columns

Index(['Student ID', 'student_name', 'gender', 'grade', 'School Name',
       'reading_score', 'math_score', 'School ID', 'type', 'size', 'budget'],
      dtype='object')

In [3]:
# define percent calculator function. Takes a count of a population and the total population and returns the percent of the total 

def percent_calc(number,total):
    percent = 100*(number/total)
    return percent

In [4]:
total_schools = school_data_complete.loc[:,"School Name"].nunique()

In [5]:
total_students = school_data_complete.loc[:,"student_name"].count()

In [6]:
# can't just sum all the budgets because that will sum each school's budget for each student; the result will be way too high that way!
# need to find each school's budget (each unique budget value) and add them together

unique_budget = school_data_complete.loc[:,"budget"].unique()

total_budget = unique_budget.sum()

In [7]:
# calculate the average math scores and passing percentages
average_math = school_data_complete.loc[:,"math_score"].mean()

average_reading = school_data_complete.loc[:,"reading_score"].mean()

total_math_pass = pd.DataFrame(school_data_complete.loc[school_data_complete["math_score"] >= 70, :])

percent_math_pass = percent_calc(len(total_math_pass), total_students)

In [8]:
# calculate the same for reading scores
total_reading_pass = pd.DataFrame(school_data_complete.loc[school_data_complete["reading_score"] >= 70, :])

percent_reading_pass = percent_calc(len(total_reading_pass), total_students)

overall_passing_rate = (percent_math_pass + percent_reading_pass)/2

In [9]:
# define dataframe to sumamrize these values

data_summary = {"Total Schools":[total_schools],
                "Total Budget":[total_budget],
                "Average Math Score":[average_math],
                "Average Reading Score":[average_reading],
                "% Passing Math":[percent_math_pass],
                "% Passing Reading":[percent_reading_pass],
                "Overall Passing Rate":[overall_passing_rate]}

district_summary_df = pd.DataFrame(data_summary) 

district_summary_df

Unnamed: 0,Total Schools,Total Budget,Average Math Score,Average Reading Score,% Passing Math,% Passing Reading,Overall Passing Rate
0,15,24649428,78.985371,81.87784,74.980853,85.805463,80.393158


# School summary

##### Create an overview table that summarizes key metrics about each school, including:
  * School Name
  * School Type
  * Total Students
  * Total School Budget
  * Per Student Budget
  * Average Math Score
  * Average Reading Score
  * % Passing Math
  * % Passing Reading
  * Overall Passing Rate (Average of the above two)



In [10]:
# group by schools and preform a similar analyis as above
school_groups = school_data_complete.groupby(["School Name"])

In [11]:
# capture the school names, student count, school type, shool budget, and budget per student

school_student_count = school_groups["student_name"].count()
school_student_count = school_student_count.rename("Student Count")

# school type should be the same for each school, just aggregate them by the first value
school_type = school_groups["type"].first()
school_type = school_type.rename("School Type")

# All the values should be the same for each school, so just aggregate them by the first value
school_budget =  school_groups["budget"].first()
school_budget = school_budget.rename("School Budget")


per_student_budget = school_budget/school_student_count
per_student_budget = per_student_budget.rename("Per Student Budget")

school_budget
per_student_budget

School Name
Bailey High School       628.0
Cabrera High School      582.0
Figueroa High School     639.0
Ford High School         644.0
Griffin High School      625.0
Hernandez High School    652.0
Holden High School       581.0
Huang High School        655.0
Johnson High School      650.0
Pena High School         609.0
Rodriguez High School    637.0
Shelton High School      600.0
Thomas High School       638.0
Wilson High School       578.0
Wright High School       583.0
Name: Per Student Budget, dtype: float64

In [12]:
# define new dataframes for the total math scores and the passing math score (from part 1) that have been grouped by school.

school_math_score = pd.DataFrame(school_groups["math_score"].mean())
school_math_score = school_math_score.rename(columns={"math_score":"Math Score"})

school_math_pass = pd.DataFrame(total_math_pass.groupby(["School Name"]).count())

school_math_percent = 100*(school_math_pass["math_score"]/school_student_count)
school_math_percent = school_math_percent.rename("Math Pass Rate")

school_math_percent

School Name
Bailey High School       66.680064
Cabrera High School      94.133477
Figueroa High School     65.988471
Ford High School         68.309602
Griffin High School      93.392371
Hernandez High School    66.752967
Holden High School       92.505855
Huang High School        65.683922
Johnson High School      66.057551
Pena High School         94.594595
Rodriguez High School    66.366592
Shelton High School      93.867121
Thomas High School       93.272171
Wilson High School       93.867718
Wright High School       93.333333
Name: Math Pass Rate, dtype: float64

In [13]:
# do the same thing as the previous cell for reading scores

school_reading_score = pd.DataFrame(school_groups["reading_score"].mean())
school_reading_score = school_reading_score.rename(columns={"reading_score":"Reading Score"})

school_reading_pass = pd.DataFrame(total_reading_pass.groupby(["School Name"]).count())

school_reading_percent = 100*(school_reading_pass["math_score"]/school_student_count)
school_reading_percent = school_reading_percent.rename("Reading Pass Rate")


In [14]:
# average math and reading scores to get everage passing rate per school

school_overall_passing = (school_math_percent+school_reading_percent)/2
school_overall_passing = school_overall_passing.rename("Overall Pass Rate")


In [15]:
summary_list = [school_type, school_student_count, school_budget, per_student_budget, 
                school_math_score, school_math_percent, school_reading_score, school_reading_percent, school_overall_passing]

# for item in summary_list:
    #print(type(item))

In [16]:
# define dataframe to sumamrize school values

# I have a mix of series and dataframes for the sumamry data, but I can concactenate them all at once horizontally (axis = 1)
school_summary = pd.concat(summary_list, axis=1)

school_summary_df = pd.DataFrame(school_summary) 

school_summary_df

Unnamed: 0_level_0,School Type,Student Count,School Budget,Per Student Budget,Math Score,Math Pass Rate,Reading Score,Reading Pass Rate,Overall Pass Rate
School Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
Bailey High School,District,4976,3124928,628.0,77.048432,66.680064,81.033963,81.93328,74.306672
Cabrera High School,Charter,1858,1081356,582.0,83.061895,94.133477,83.97578,97.039828,95.586652
Figueroa High School,District,2949,1884411,639.0,76.711767,65.988471,81.15802,80.739234,73.363852
Ford High School,District,2739,1763916,644.0,77.102592,68.309602,80.746258,79.299014,73.804308
Griffin High School,Charter,1468,917500,625.0,83.351499,93.392371,83.816757,97.138965,95.265668
Hernandez High School,District,4635,3022020,652.0,77.289752,66.752967,80.934412,80.862999,73.807983
Holden High School,Charter,427,248087,581.0,83.803279,92.505855,83.814988,96.252927,94.379391
Huang High School,District,2917,1910635,655.0,76.629414,65.683922,81.182722,81.316421,73.500171
Johnson High School,District,4761,3094650,650.0,77.072464,66.057551,80.966394,81.222432,73.639992
Pena High School,Charter,962,585858,609.0,83.839917,94.594595,84.044699,95.945946,95.27027


### Top Performing Schools (By Passing Rate)

* Create a table that highlights the top 5 performing schools based on Overall Passing Rate. Include:
  * School Name
  * School Type
  * Total Students
  * Total School Budget
  * Per Student Budget
  * Average Math Score
  * Average Reading Score
  * % Passing Math
  * % Passing Reading
  * Overall Passing Rate (Average of the above two)

In [17]:
# sort the previous summary table by descending overall passing rate. Display first 5 values.
top_five = school_summary_df.sort_values("Overall Pass Rate", ascending=False)

top_five.head()

Unnamed: 0_level_0,School Type,Student Count,School Budget,Per Student Budget,Math Score,Math Pass Rate,Reading Score,Reading Pass Rate,Overall Pass Rate
School Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
Cabrera High School,Charter,1858,1081356,582.0,83.061895,94.133477,83.97578,97.039828,95.586652
Thomas High School,Charter,1635,1043130,638.0,83.418349,93.272171,83.84893,97.308869,95.29052
Pena High School,Charter,962,585858,609.0,83.839917,94.594595,84.044699,95.945946,95.27027
Griffin High School,Charter,1468,917500,625.0,83.351499,93.392371,83.816757,97.138965,95.265668
Wilson High School,Charter,2283,1319574,578.0,83.274201,93.867718,83.989488,96.539641,95.203679


### Bottom Performing Schools (By Passing Rate)

* Create a table that highlights the bottom 5 performing schools based on Overall Passing Rate. Include all of the same metrics as above.

In [18]:
# create the same table, but with ascending overall padding rate. Display first 5 values.
bottom_five = school_summary_df.sort_values("Overall Pass Rate", ascending=True)

bottom_five.head()

Unnamed: 0_level_0,School Type,Student Count,School Budget,Per Student Budget,Math Score,Math Pass Rate,Reading Score,Reading Pass Rate,Overall Pass Rate
School Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
Rodriguez High School,District,3999,2547363,637.0,76.842711,66.366592,80.744686,80.220055,73.293323
Figueroa High School,District,2949,1884411,639.0,76.711767,65.988471,81.15802,80.739234,73.363852
Huang High School,District,2917,1910635,655.0,76.629414,65.683922,81.182722,81.316421,73.500171
Johnson High School,District,4761,3094650,650.0,77.072464,66.057551,80.966394,81.222432,73.639992
Ford High School,District,2739,1763916,644.0,77.102592,68.309602,80.746258,79.299014,73.804308


### Math Scores by Grade\*\*

* Create a table that lists the average Math Score for students of each grade level (9th, 10th, 11th, 12th) at each school.

In [233]:
# create a list of all the grades

grade_list = school_data_complete["grade"].unique()

# sort the grades in ascending order. Slightly more complicated than it seems because the grades are strings, and it reads '9th' as greater that '12th' 
grade_list = sorted(grade_list, reverse= False)

# sort by length next to deal with the '9th' problem
grade_list = sorted(grade_list, key=len)

grade_list

['9th', '10th', '11th', '12th']

In [234]:
school_data_complete.index

Int64Index([    0,     1,     2,     3,     4,     5,     6,     7,     8,
                9,
            ...
            39160, 39161, 39162, 39163, 39164, 39165, 39166, 39167, 39168,
            39169],
           dtype='int64', length=39170)

# loop through the grades and take the mean of the math score for that grade at each school
index = 0

grade9_scores = [school_data_complete.iloc[index,3:7] for index in school_data_complete.index if (school_data_complete.iloc[index,3] == grade_list[0])]

grade9_scores = pd.DataFrame(grade9_scores)

# type(school_data_complete.loc[1,"grade"])

In [235]:
# loop through the grades and take the mean of the math score for that grade at each school. I'm creating a list of dataframes fr the test scores at each grade.
index = 0

# define array where the data frame for each grade will go. Need to do this otherwise you'ss get an 'Index out of Range' Error
grade_scores = [''] * len(grade_list)

for grade in grade_list:
    # for every row in the complete data frame, use list comprehension to check if it's the right grade level (9th thru 12th). If it is, grab the math scores, reading scores, grade, and school name. Store that data in the grade_scores array.
    i = list(grade_list).index(grade)
    grade_scores[i] = [school_data_complete.iloc[index,3:7] for index in school_data_complete.index if (school_data_complete.iloc[index,3] == grade)]
    # coonvert the grabbed data into a data frame
    grade_scores[i] = pd.DataFrame(grade_scores[i])


In [236]:
# define list to hold the group dataframes. Do this to avoid 'Index out of Range Error'

grouped_grades = [''] * len(grade_list)

for grade in grade_list:
    i = list(grade_list).index(grade)
    # group the data by school and then by grade, and convert it to a DataFrame. Store iit in the 'grouped_grades' array.
    grouped_grades[i] = pd.DataFrame(grade_scores[i].groupby(["School Name", "grade"]).mean())
    
grouped_grades[0:(len(grade_list))]


[                             reading_score  math_score
 School Name           grade                           
 Bailey High School    9th        81.303155   77.083676
 Cabrera High School   9th        83.676136   83.094697
 Figueroa High School  9th        81.198598   76.403037
 Ford High School      9th        80.632653   77.361345
 Griffin High School   9th        83.369193   82.044010
 Hernandez High School 9th        80.866860   77.438495
 Holden High School    9th        83.677165   83.787402
 Huang High School     9th        81.290284   77.027251
 Johnson High School   9th        81.260714   77.187857
 Pena High School      9th        83.807273   83.625455
 Rodriguez High School 9th        80.993127   76.859966
 Shelton High School   9th        84.122642   83.420755
 Thomas High School    9th        83.728850   83.590022
 Wilson High School    9th        83.939778   83.085578
 Wright High School    9th        83.833333   83.264706,
                              reading_score  ma

In [237]:
# merge all dataframes together to compare the grade performance at each school!

#grade_comp = pd.concat(grouped_grades[0:len(grade_list)], axis=1, join ='')

grade_comp_910 = pd.merge(grouped_grades[0], grouped_grades[3], on="School Name",how='inner', suffixes=('_9th','_10th'))
grade_comp_1122 = pd.merge(grouped_grades[2], grouped_grades[1], on="School Name",how='inner', suffixes=('_11th','_12th'))
grade_comp = pd.merge(grade_comp_910, grade_comp_1122, on="School Name",how='inner') #, suffixes=('_11th','_12th'))

grade_comp




Unnamed: 0_level_0,reading_score_9th,math_score_9th,reading_score_10th,math_score_10th,reading_score_11th,math_score_11th,reading_score_12th,math_score_12th
School Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
Bailey High School,81.303155,77.083676,80.912451,76.492218,80.945643,77.515588,80.907183,76.996772
Cabrera High School,83.676136,83.094697,84.287958,83.277487,83.788382,82.76556,84.253219,83.154506
Figueroa High School,81.198598,76.403037,81.384863,77.151369,80.640339,76.884344,81.408912,76.539974
Ford High School,80.632653,77.361345,80.662338,76.179963,80.403642,76.918058,81.262712,77.672316
Griffin High School,83.369193,82.04401,84.013699,83.356164,84.288089,83.842105,83.706897,84.229064
Hernandez High School,80.86686,77.438495,80.857143,77.186567,81.39614,77.136029,80.660147,77.337408
Holden High School,83.677165,83.787402,84.698795,82.855422,83.815534,85.0,83.324561,83.429825
Huang High School,81.290284,77.027251,80.305983,77.225641,81.417476,76.446602,81.512386,75.908735
Johnson High School,81.260714,77.187857,81.227564,76.863248,80.616027,77.491653,80.773431,76.691117
Pena High School,83.807273,83.625455,84.59116,84.121547,84.335938,84.328125,83.612,83.372


In [238]:
# a better option: use pandas to groupby "school name" and "grade" first! Then split that dataframe by grade, and merge those data frames horizoontally. This way, we don't have to pass through the entire data set 4 seperate times!

In [290]:
grade_groups = pd.DataFrame(school_data_complete.groupby(["grade", "School Name"]).mean())

grade_groups = grade_groups.loc[:,["reading_score", "math_score"]]

# reset index so we can iterate easily through the list
grade_groups = grade_groups.reset_index()

# grade_groups = grade_groups.set_index("grade")

grade_groups.head(20)

Unnamed: 0,grade,School Name,reading_score,math_score
0,10th,Bailey High School,80.907183,76.996772
1,10th,Cabrera High School,84.253219,83.154506
2,10th,Figueroa High School,81.408912,76.539974
3,10th,Ford High School,81.262712,77.672316
4,10th,Griffin High School,83.706897,84.229064
5,10th,Hernandez High School,80.660147,77.337408
6,10th,Holden High School,83.324561,83.429825
7,10th,Huang High School,81.512386,75.908735
8,10th,Johnson High School,80.773431,76.691117
9,10th,Pena High School,83.612,83.372


In [291]:
# loop through the summarized data to splt off each grade into a seperate data frame

# define array to hold the dataframes with each grade's score. Do this to avoid 'Index out of Range Error'
grade_container = [''] * len(grade_list)

for grade in grade_list:
    i = list(grade_list).index(grade)
    grade_container[i] = [grade_groups.iloc[index,:] for index in range(len(grade_groups)) if grade_groups.iloc[index,0] == grade]
    grade_container[i] = pd.DataFrame(grade_container[i])
    grade_container[i] = grade_container[i].set_index("grade")
    
for grade in grade_list:
    i = list(grade_list).index(grade)
    print(f"{grade_container[i]} \n")
# grade_groups.index

                 School Name  reading_score  math_score
grade                                                  
9th       Bailey High School      81.303155   77.083676
9th      Cabrera High School      83.676136   83.094697
9th     Figueroa High School      81.198598   76.403037
9th         Ford High School      80.632653   77.361345
9th      Griffin High School      83.369193   82.044010
9th    Hernandez High School      80.866860   77.438495
9th       Holden High School      83.677165   83.787402
9th        Huang High School      81.290284   77.027251
9th      Johnson High School      81.260714   77.187857
9th         Pena High School      83.807273   83.625455
9th    Rodriguez High School      80.993127   76.859966
9th      Shelton High School      84.122642   83.420755
9th       Thomas High School      83.728850   83.590022
9th       Wilson High School      83.939778   83.085578
9th       Wright High School      83.833333   83.264706 

                 School Name  reading_score  m

In [340]:
#pd.DataFrame(grade_comparison)
for grade in grade_list:
    i = list(grade_list).index(grade)
    if i == 0:
        # if it's the first grade in the list, assign it to the cmparison data frame
        grade_comparison = grade_container[0]
        # automatically update column names to have the grade
        grade_comparison = grade_comparison.rename(columns = {"reading_score":("Reading Score "+ grade + " Grade"), "math_score":("Math Score "+ grade + " Grade")})
    else:
        # merge next grade's data into the comparison dataframe
        grade_comparison = pd.merge(grade_comparison, grade_container[i], on="School Name", how='outer')
        # automatically update column names to have the grade
        grade_comparison = grade_comparison.rename(columns = {"reading_score":("Reading Score "+ grade + " Grade"), "math_score":("Math Score " + grade + " Grade")})


grade_comparison = grade_comparison.set_index("School Name")

grade_comparison

Unnamed: 0_level_0,Reading Score 9th Grade,Math Score 9th Grade,Reading Score 10th Grade,Math Score 10th Grade,Reading Score 11th Grade,Math Score 11th Grade,Reading Score 12th Grade,Math Score 12th Grade
School Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
Bailey High School,81.303155,77.083676,80.907183,76.996772,80.945643,77.515588,80.912451,76.492218
Cabrera High School,83.676136,83.094697,84.253219,83.154506,83.788382,82.76556,84.287958,83.277487
Figueroa High School,81.198598,76.403037,81.408912,76.539974,80.640339,76.884344,81.384863,77.151369
Ford High School,80.632653,77.361345,81.262712,77.672316,80.403642,76.918058,80.662338,76.179963
Griffin High School,83.369193,82.04401,83.706897,84.229064,84.288089,83.842105,84.013699,83.356164
Hernandez High School,80.86686,77.438495,80.660147,77.337408,81.39614,77.136029,80.857143,77.186567
Holden High School,83.677165,83.787402,83.324561,83.429825,83.815534,85.0,84.698795,82.855422
Huang High School,81.290284,77.027251,81.512386,75.908735,81.417476,76.446602,80.305983,77.225641
Johnson High School,81.260714,77.187857,80.773431,76.691117,80.616027,77.491653,81.227564,76.863248
Pena High School,83.807273,83.625455,83.612,83.372,84.335938,84.328125,84.59116,84.121547


In [351]:
# filter large table for the columns that only contain 'math' in the title.
contains_math = grade_comparison.columns.str.contains("Math")

#use .T to flip rows and columns
math_comp = pd.DataFrame([grade_comparison.iloc[:,y] for y in range(len(grade_comparison.columns)) if contains_math[y] == True]).T

math_comp


Unnamed: 0_level_0,Math Score 9th Grade,Math Score 10th Grade,Math Score 11th Grade,Math Score 12th Grade
School Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Bailey High School,77.083676,76.996772,77.515588,76.492218
Cabrera High School,83.094697,83.154506,82.76556,83.277487
Figueroa High School,76.403037,76.539974,76.884344,77.151369
Ford High School,77.361345,77.672316,76.918058,76.179963
Griffin High School,82.04401,84.229064,83.842105,83.356164
Hernandez High School,77.438495,77.337408,77.136029,77.186567
Holden High School,83.787402,83.429825,85.0,82.855422
Huang High School,77.027251,75.908735,76.446602,77.225641
Johnson High School,77.187857,76.691117,77.491653,76.863248
Pena High School,83.625455,83.372,84.328125,84.121547


In [352]:
# filter large table for the columns that only contain 'math' in the title.
contains_reading = grade_comparison.columns.str.contains("Reading")

#use .T to flip rows and columns
reading_comp = pd.DataFrame([grade_comparison.iloc[:,y] for y in range(len(grade_comparison.columns)) if contains_reading[y] == True]).T

reading_comp

Unnamed: 0_level_0,Reading Score 9th Grade,Reading Score 10th Grade,Reading Score 11th Grade,Reading Score 12th Grade
School Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Bailey High School,81.303155,80.907183,80.945643,80.912451
Cabrera High School,83.676136,84.253219,83.788382,84.287958
Figueroa High School,81.198598,81.408912,80.640339,81.384863
Ford High School,80.632653,81.262712,80.403642,80.662338
Griffin High School,83.369193,83.706897,84.288089,84.013699
Hernandez High School,80.86686,80.660147,81.39614,80.857143
Holden High School,83.677165,83.324561,83.815534,84.698795
Huang High School,81.290284,81.512386,81.417476,80.305983
Johnson High School,81.260714,80.773431,80.616027,81.227564
Pena High School,83.807273,83.612,84.335938,84.59116


In [None]:
# group the original data by school then by grade, aggregate by taking the mean

scores_gradelevel = pd.DataFrame(school_data_complete.groupby(["School Name", "grade"]).mean())

scores_gradelevel_trimmed = scores_gradelevel.loc[:,["reading_score","math_score"]]

scores_gradelevel_trimmed

# math_gradelevel.sort_values("grade", ascending=False)

# math_gradelevel

In [None]:
# print columns for reference
school_data_complete.columns

In [None]:
# use groupby to group by school name

school_summary = pd.DataFrame(school_groups.mean())
school_summary.drop(columns=["Student ID","budget"])
# school_trim = school_summary.iloc[:,[""]]

school_summary

Unnamed: 0_level_0,Average Math Score,Average Reading Score,% Passing Math,% Passing Reading,% Overall Passing Rate
School Type,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Charter,83.473852,83.896421,93.62083,96.586489,95.10366
District,76.956733,80.966636,66.548453,80.799062,73.673757
