# PyCity Schools Analysis

* As a whole, schools with higher budgets, did not yield better test results. By contrast, schools with higher spending per student actually (\$645-675) underperformed compared to schools with smaller budgets (<\$585 per student).

* As a whole, smaller and medium sized schools dramatically out-performed large sized schools on passing math performances (89-91% passing vs 67%).

* As a whole, charter schools out-performed the public district schools across all metrics. However, more analysis will be required to glean if the effect is due to school practices or the fact that charter schools tend to serve smaller student populations per school. 
---

### Note
* Instructions have been included for each segment. You do not have to follow them exactly, but they are included to help you think through the steps.

In [1]:
# Dependencies and Setup
import pandas as pd
import numpy as np

# File to Load (Remember to Change These)
school_data_to_load = "Resources/schools_complete.csv"
student_data_to_load = "Resources/students_complete.csv"

# Read School and Student Data File and store into Pandas Data Frames
school_data = pd.read_csv(school_data_to_load)
student_data = pd.read_csv(student_data_to_load)

# Combine the data into a single dataset
school_data_complete = pd.merge(student_data, school_data, how="left", on=["school_name", "school_name"])

# Using this to sort by school name so names are in same order when grouping by school
school_data_complete.sort_values('school_name', inplace=True)
school_data.sort_values('school_name', inplace=True)

In [2]:
school_data.head()

Unnamed: 0,School ID,school_name,type,size,budget
7,7,Bailey High School,District,4976,3124928
6,6,Cabrera High School,Charter,1858,1081356
1,1,Figueroa High School,District,2949,1884411
13,13,Ford High School,District,2739,1763916
4,4,Griffin High School,Charter,1468,917500


In [3]:
school_data_complete.head()

Unnamed: 0,Student ID,student_name,gender,grade,school_name,reading_score,math_score,School ID,type,size,budget
19584,19584,Tammie Fox,F,11th,Bailey High School,82,92,7,District,4976,3124928
21193,21193,Jennifer Murray,F,9th,Bailey High School,88,89,7,District,4976,3124928
21192,21192,Lisa Pineda,F,9th,Bailey High School,86,67,7,District,4976,3124928
21191,21191,Cameron Miller,M,11th,Bailey High School,70,75,7,District,4976,3124928
21190,21190,Thomas Rasmussen,M,12th,Bailey High School,77,82,7,District,4976,3124928


## District Summary

* Calculate the total number of schools

* Calculate the total number of students

* Calculate the total budget

* Calculate the average math score 

* Calculate the average reading score

* Calculate the overall passing rate (overall average score), i.e. (avg. math score + avg. reading score)/2

* Calculate the percentage of students with a passing math score (70 or greater)

* Calculate the percentage of students with a passing reading score (70 or greater)

* Create a dataframe to hold the above results

* Optional: give the displayed data cleaner formatting

# Total Schools in District

In [4]:
total_schools = len(school_data['school_name'].unique())
total_schools

15

# Total Students in District

In [5]:
total_students = len(student_data['Student ID'].unique())
total_students

39170

# Total Budget in District

In [6]:
total_budget = school_data['budget'].sum()
total_budget

24649428

# Average Math Score for District

In [7]:
average_math_score = student_data['math_score'].mean()
average_math_score

78.98537145774827

# Average Reading Score for District

In [8]:
average_reading_score = student_data['reading_score'].mean()
average_reading_score

81.87784018381414

# Overall Passing by Students for District

In [9]:
overall_passing_rate = (average_math_score + average_reading_score) / 2
overall_passing_rate

80.43160582078121

# Percent that Passed Math in District

In [10]:
# Create Dataframe containing only students with math scores that are 70 or above (considered passing) make markdown title
students_passing_math = student_data[student_data['math_score'] >= 70]
# Script for total number of students that are passing then convert it to percentage
percent_students_passing_math = len(students_passing_math['math_score']) / total_students * 100
percent_students_passing_math

74.9808526933878

# Percent that passed Reading in District

In [11]:
# Create Dataframe containing only students with reading scores that are 70 or above (considered passing) markdown title
students_passing_reading = student_data[student_data['reading_score'] >= 70]
# Script for total number of students that are passing then convert it to percentage
percent_students_passing_reading = len(students_passing_reading['reading_score']) / total_students * 100
percent_students_passing_reading

85.80546336482001

# District Summary Table covering entire District

In [12]:
# Create a district summary dictionary and list...need a markdown title
district_summary_data = {
    'total_schools': [total_schools],
    'total_students': [total_students],
    'total_budget': [total_budget],
    'average_math_score': [average_math_score],
    'average_reading_score': [average_reading_score],
    'overall_passing_rate': [overall_passing_rate],
    'percent_students_passing_math': [percent_students_passing_math],
    'percent_students_passing_reading': [percent_students_passing_reading]
}

district_summary_df = pd.DataFrame(district_summary_data)
district_summary_df.head()

Unnamed: 0,total_schools,total_students,total_budget,average_math_score,average_reading_score,overall_passing_rate,percent_students_passing_math,percent_students_passing_reading
0,15,39170,24649428,78.985371,81.87784,80.431606,74.980853,85.805463


## School Summary

* Create an overview table that summarizes key metrics about each school, including:
  * School Name
  * School Type
  * Total Students
  * Total School Budget
  * Per Student Budget
  * Average Math Score
  * Average Reading Score
  * % Passing Math
  * % Passing Reading
  * Overall Passing Rate (Average of the above two)
  
* Create a dataframe to hold the above results

# School Name

In [13]:
school_names = school_data['school_name']
school_names

7        Bailey High School
6       Cabrera High School
1      Figueroa High School
13         Ford High School
4       Griffin High School
3     Hernandez High School
8        Holden High School
0         Huang High School
12      Johnson High School
9          Pena High School
11    Rodriguez High School
2       Shelton High School
14       Thomas High School
5        Wilson High School
10       Wright High School
Name: school_name, dtype: object

# School Type 

In [14]:
school_types = school_data['type']
school_types

7     District
6      Charter
1     District
13    District
4      Charter
3     District
8      Charter
0     District
12    District
9      Charter
11    District
2      Charter
14     Charter
5      Charter
10     Charter
Name: type, dtype: object

# Total Students for Each School

In [15]:
school_total_students = school_data['size']
school_total_students

7     4976
6     1858
1     2949
13    2739
4     1468
3     4635
8      427
0     2917
12    4761
9      962
11    3999
2     1761
14    1635
5     2283
10    1800
Name: size, dtype: int64

# Total Budget for Each School

In [16]:
school_budgets = school_data['budget']
school_budgets

7     3124928
6     1081356
1     1884411
13    1763916
4      917500
3     3022020
8      248087
0     1910635
12    3094650
9      585858
11    2547363
2     1056600
14    1043130
5     1319574
10    1049400
Name: budget, dtype: int64

# Total Budget per Student

In [17]:
# Divides the budget for each school by the total number of students in each school
school_budgets_per_student = school_budgets / school_total_students
school_budgets_per_student

7     628.0
6     582.0
1     639.0
13    644.0
4     625.0
3     652.0
8     581.0
0     655.0
12    650.0
9     609.0
11    637.0
2     600.0
14    638.0
5     578.0
10    583.0
dtype: float64

In [18]:
# Group data by school, get mean, only keep average reading and math score columns
school_average_scores_df = school_data_complete.groupby('school_name').mean()[['reading_score', 'math_score']]
school_average_scores_df

Unnamed: 0_level_0,reading_score,math_score
school_name,Unnamed: 1_level_1,Unnamed: 2_level_1
Bailey High School,81.033963,77.048432
Cabrera High School,83.97578,83.061895
Figueroa High School,81.15802,76.711767
Ford High School,80.746258,77.102592
Griffin High School,83.816757,83.351499
Hernandez High School,80.934412,77.289752
Holden High School,83.814988,83.803279
Huang High School,81.182722,76.629414
Johnson High School,80.966394,77.072464
Pena High School,84.044699,83.839917


# Average Math Score for Each School

In [19]:
# average math score, converts to array by using .values
school_average_math_scores = school_average_scores_df['math_score'].values
school_average_math_scores

array([77.04843248, 83.06189451, 76.7117667 , 77.10259219, 83.35149864,
       77.28975189, 83.80327869, 76.62941378, 77.07246377, 83.83991684,
       76.84271068, 83.35945486, 83.41834862, 83.27420061, 83.68222222])

# Average Reading Score for Each School

In [20]:
# average reading score, converts to array by using .values
school_average_reading_scores = school_average_scores_df['reading_score'].values
school_average_reading_scores

array([81.03396302, 83.97578041, 81.15801967, 80.74625776, 83.81675749,
       80.93441208, 83.81498829, 81.18272197, 80.96639361, 84.04469854,
       80.74468617, 83.72572402, 83.84892966, 83.98948752, 83.955     ])

# Percent Passing Math for Each School

In [21]:
school_total_passing_math_df = school_data_complete[school_data_complete['math_score'] >= 70]

school_and_score_grouped = school_total_passing_math_df[['school_name', 'math_score', 'size']].groupby('school_name')

# Convert to an array by using .values
school_total_passing_math = school_and_score_grouped.count()['math_score'].values

# Calculate the percentage
school_percent_passing_math = school_total_passing_math / school_total_students.values * 100
school_percent_passing_math

array([66.68006431, 94.13347686, 65.98847067, 68.30960204, 93.39237057,
       66.75296656, 92.5058548 , 65.68392184, 66.05755093, 94.59459459,
       66.36659165, 93.86712095, 93.27217125, 93.86771792, 93.33333333])

# Percent Passing Reading for Each School

In [22]:
school_total_passing_reading_df = school_data_complete[school_data_complete['reading_score'] >= 70]

school_and_score_grouped = school_total_passing_reading_df[['school_name', 'reading_score', 'size']].groupby('school_name')

# Convert to an array by using .values
school_total_passing_reading = school_and_score_grouped.count()['reading_score'].values

# Calculate the percentage
school_percent_passing_reading = school_total_passing_reading / school_total_students.values * 100
school_percent_passing_reading

array([81.93327974, 97.03982777, 80.73923364, 79.29901424, 97.13896458,
       80.86299892, 96.2529274 , 81.31642098, 81.22243226, 95.94594595,
       80.22005501, 95.85462805, 97.3088685 , 96.53964082, 96.61111111])

# Percent Overall Passing Rate for Each School

In [23]:
school_overall_passing_rate = (school_percent_passing_math + school_percent_passing_reading) / 2
school_overall_passing_rate

array([74.30667203, 95.58665231, 73.36385215, 73.80430814, 95.26566757,
       73.80798274, 94.3793911 , 73.50017141, 73.6399916 , 95.27027027,
       73.29332333, 94.8608745 , 95.29051988, 95.20367937, 94.97222222])

# School Data Summary Table

In [25]:
school_summary_data = {
    'name': school_names,
    'type': school_types,
    'total_students': school_total_students,
    'budget': school_budgets,
    'budget_per_student': school_budgets_per_student,
    'average_math_scores': school_average_math_scores,
    'average_reading_scores': school_average_reading_scores,
    'percent_passing_math': school_percent_passing_math,
    'percent_passing_reading': school_percent_passing_reading,
    'overall_passing_rate': school_overall_passing_rate
}
school_summary_df = pd.DataFrame(school_summary_data).set_index('name')
school_summary_df

Unnamed: 0_level_0,type,total_students,budget,budget_per_student,average_math_scores,average_reading_scores,percent_passing_math,percent_passing_reading,overall_passing_rate
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
Bailey High School,District,4976,3124928,628.0,77.048432,81.033963,66.680064,81.93328,74.306672
Cabrera High School,Charter,1858,1081356,582.0,83.061895,83.97578,94.133477,97.039828,95.586652
Figueroa High School,District,2949,1884411,639.0,76.711767,81.15802,65.988471,80.739234,73.363852
Ford High School,District,2739,1763916,644.0,77.102592,80.746258,68.309602,79.299014,73.804308
Griffin High School,Charter,1468,917500,625.0,83.351499,83.816757,93.392371,97.138965,95.265668
Hernandez High School,District,4635,3022020,652.0,77.289752,80.934412,66.752967,80.862999,73.807983
Holden High School,Charter,427,248087,581.0,83.803279,83.814988,92.505855,96.252927,94.379391
Huang High School,District,2917,1910635,655.0,76.629414,81.182722,65.683922,81.316421,73.500171
Johnson High School,District,4761,3094650,650.0,77.072464,80.966394,66.057551,81.222432,73.639992
Pena High School,Charter,962,585858,609.0,83.839917,84.044699,94.594595,95.945946,95.27027


## Top Performing Schools (By Passing Rate)

* Sort and display the top five schools in overall passing rate

In [26]:
# Sort values by passing rate, ascending=False to visualize scores from greatest to least
school_summary_df.sort_values('overall_passing_rate', ascending=False)['overall_passing_rate'].head()

name
Cabrera High School    95.586652
Thomas High School     95.290520
Pena High School       95.270270
Griffin High School    95.265668
Wilson High School     95.203679
Name: overall_passing_rate, dtype: float64

## Bottom Performing Schools (By Passing Rate)

* Sort and display the five worst-performing schools

In [27]:
# Sort overall passing rate, leave as is to visualize scores from least to greatest
school_summary_df.sort_values('overall_passing_rate')['overall_passing_rate'].head()

name
Rodriguez High School    73.293323
Figueroa High School     73.363852
Huang High School        73.500171
Johnson High School      73.639992
Ford High School         73.804308
Name: overall_passing_rate, dtype: float64

## Math Scores by Grade

* Create a table that lists the average Reading Score for students of each grade level (9th, 10th, 11th, 12th) at each school.

  * Create a pandas series for each grade. Hint: use a conditional statement.
  
  * Group each series by school
  
  * Combine the series into a dataframe
  
  * Optional: give the displayed data cleaner formatting

In [28]:
# For each grade take out the other grades, groupby the school name, get the mean
ninth_grade_math_scores = school_data_complete[school_data_complete['grade'] == '9th'].groupby('school_name').mean()['math_score']
tenth_grade_math_scores = school_data_complete[school_data_complete['grade'] == '10th'].groupby('school_name').mean()['math_score']
eleventh_grade_math_scores = school_data_complete[school_data_complete['grade'] == '11th'].groupby('school_name').mean()['math_score']
twelfth_grade_math_scores = school_data_complete[school_data_complete['grade'] == '12th'].groupby('school_name').mean()['math_score']

In [29]:
# Establish columns for results
math_scores_by_grade = {
    '9th': ninth_grade_math_scores,
    '10th': tenth_grade_math_scores,
    '11th': eleventh_grade_math_scores,
    '12th': twelfth_grade_math_scores
}

math_scores_by_grade_df = pd.DataFrame(math_scores_by_grade)
math_scores_by_grade_df

Unnamed: 0_level_0,9th,10th,11th,12th
school_name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Bailey High School,77.083676,76.996772,77.515588,76.492218
Cabrera High School,83.094697,83.154506,82.76556,83.277487
Figueroa High School,76.403037,76.539974,76.884344,77.151369
Ford High School,77.361345,77.672316,76.918058,76.179963
Griffin High School,82.04401,84.229064,83.842105,83.356164
Hernandez High School,77.438495,77.337408,77.136029,77.186567
Holden High School,83.787402,83.429825,85.0,82.855422
Huang High School,77.027251,75.908735,76.446602,77.225641
Johnson High School,77.187857,76.691117,77.491653,76.863248
Pena High School,83.625455,83.372,84.328125,84.121547


## Reading Score by Grade 

* Perform the same operations as above for reading scores

In [30]:
# For each grade take out the other grades, groupby the school name, get the mean
ninth_grade_reading_scores = school_data_complete[school_data_complete['grade'] == '9th'].groupby('school_name').mean()['reading_score']
tenth_grade_reading_scores = school_data_complete[school_data_complete['grade'] == '10th'].groupby('school_name').mean()['reading_score']
eleventh_grade_reading_scores = school_data_complete[school_data_complete['grade'] == '11th'].groupby('school_name').mean()['reading_score']
twelfth_grade_reading_scores = school_data_complete[school_data_complete['grade'] == '12th'].groupby('school_name').mean()['reading_score']

In [31]:
# Establish columns for results
reading_scores_by_grade = {
    '9th': ninth_grade_reading_scores,
    '10th': tenth_grade_reading_scores,
    '11th': eleventh_grade_reading_scores,
    '12th': twelfth_grade_reading_scores
}

reading_scores_by_grade_df = pd.DataFrame(reading_scores_by_grade)
reading_scores_by_grade_df

Unnamed: 0_level_0,9th,10th,11th,12th
school_name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Bailey High School,81.303155,80.907183,80.945643,80.912451
Cabrera High School,83.676136,84.253219,83.788382,84.287958
Figueroa High School,81.198598,81.408912,80.640339,81.384863
Ford High School,80.632653,81.262712,80.403642,80.662338
Griffin High School,83.369193,83.706897,84.288089,84.013699
Hernandez High School,80.86686,80.660147,81.39614,80.857143
Holden High School,83.677165,83.324561,83.815534,84.698795
Huang High School,81.290284,81.512386,81.417476,80.305983
Johnson High School,81.260714,80.773431,80.616027,81.227564
Pena High School,83.807273,83.612,84.335938,84.59116


## Scores by School Spending

* Create a table that breaks down school performances based on average Spending Ranges (Per Student). Use 4 reasonable bins to group school spending. Include in the table each of the following:
  * Average Math Score
  * Average Reading Score
  * % Passing Math
  * % Passing Reading
  * Overall Passing Rate (Average of the above two)

In [32]:
# I'm using bins of less than 580, then by 30's, 580-610, 610-640, 640-670
spending_bins = [0, 580, 610, 640, 670]
group_names = ["<$580", "$580-610", "$610-640", "$640-670"]

In [33]:
school_data_complete.head()

Unnamed: 0,Student ID,student_name,gender,grade,school_name,reading_score,math_score,School ID,type,size,budget
19584,19584,Tammie Fox,F,11th,Bailey High School,82,92,7,District,4976,3124928
21193,21193,Jennifer Murray,F,9th,Bailey High School,88,89,7,District,4976,3124928
21192,21192,Lisa Pineda,F,9th,Bailey High School,86,67,7,District,4976,3124928
21191,21191,Cameron Miller,M,11th,Bailey High School,70,75,7,District,4976,3124928
21190,21190,Thomas Rasmussen,M,12th,Bailey High School,77,82,7,District,4976,3124928


In [34]:
# Group together relevant previously ran scripts to use in further computations
scores_by_school_columns = ['school_name','math_score', 'reading_score', 'size', 'budget', 'type']
relevant_school_data = school_data_complete[scores_by_school_columns]
relevant_school_data.head()

Unnamed: 0,school_name,math_score,reading_score,size,budget,type
19584,Bailey High School,92,82,4976,3124928,District
21193,Bailey High School,89,88,4976,3124928,District
21192,Bailey High School,67,86,4976,3124928,District
21191,Bailey High School,75,70,4976,3124928,District
21190,Bailey High School,82,77,4976,3124928,District


## Calculate Budget per Student Reletive to School Spending

In [35]:
# budget per student
budget_per_student = relevant_school_data["budget"] / relevant_school_data["size"]

# Assign 'budget per student' to add column
school_data_with_budget_per_student = school_data_complete.assign(budget_per_student = budget_per_student)
school_data_with_budget_per_student.head()

Unnamed: 0,Student ID,student_name,gender,grade,school_name,reading_score,math_score,School ID,type,size,budget,budget_per_student
19584,19584,Tammie Fox,F,11th,Bailey High School,82,92,7,District,4976,3124928,628.0
21193,21193,Jennifer Murray,F,9th,Bailey High School,88,89,7,District,4976,3124928,628.0
21192,21192,Lisa Pineda,F,9th,Bailey High School,86,67,7,District,4976,3124928,628.0
21191,21191,Cameron Miller,M,11th,Bailey High School,70,75,7,District,4976,3124928,628.0
21190,21190,Thomas Rasmussen,M,12th,Bailey High School,77,82,7,District,4976,3124928,628.0


## Create Budget per Student Relative to School Spending Column

In [36]:
# grouping based previous spending bins and assign labels, used pd.cuts
budget_groups = pd.cut(school_data_with_budget_per_student['budget_per_student'], spending_bins,labels=group_names)

# convert these new groups to dataframe
budget_groups_binned = budget_groups.to_frame()['budget_per_student']
budget_groups_binned.head()

19584    $610-640
21193    $610-640
21192    $610-640
21191    $610-640
21190    $610-640
Name: budget_per_student, dtype: category
Categories (4, object): [<$580 < $580-610 < $610-640 < $640-670]

In [37]:
# Assign 'budget groups' column
school_data_with_budgets = school_data_with_budget_per_student.assign(budget_groups=budget_groups_binned)
school_data_with_budgets.head()

Unnamed: 0,Student ID,student_name,gender,grade,school_name,reading_score,math_score,School ID,type,size,budget,budget_per_student,budget_groups
19584,19584,Tammie Fox,F,11th,Bailey High School,82,92,7,District,4976,3124928,628.0,$610-640
21193,21193,Jennifer Murray,F,9th,Bailey High School,88,89,7,District,4976,3124928,628.0,$610-640
21192,21192,Lisa Pineda,F,9th,Bailey High School,86,67,7,District,4976,3124928,628.0,$610-640
21191,21191,Cameron Miller,M,11th,Bailey High School,70,75,7,District,4976,3124928,628.0,$610-640
21190,21190,Thomas Rasmussen,M,12th,Bailey High School,77,82,7,District,4976,3124928,628.0,$610-640


In [38]:
school_data_with_budgets.describe()

Unnamed: 0,Student ID,reading_score,math_score,School ID,size,budget,budget_per_student
count,39170.0,39170.0,39170.0,39170.0,39170.0,39170.0,39170.0
mean,19584.5,81.87784,78.985371,6.978172,3332.95711,2117241.0,629.293541
std,11307.549359,10.23958,12.309968,4.444329,1323.914069,874998.7,25.034815
min,0.0,63.0,55.0,0.0,427.0,248087.0,578.0
25%,9792.25,73.0,69.0,3.0,1858.0,1081356.0,625.0
50%,19584.5,82.0,79.0,7.0,2949.0,1910635.0,638.0
75%,29376.75,91.0,89.0,11.0,4635.0,3022020.0,650.0
max,39169.0,99.0,99.0,14.0,4976.0,3124928.0,655.0


In [39]:
# Group by budget group
grouped_with_budgets = school_data_with_budgets.groupby('budget_groups')

## Calculate Average Math Score relative to School Spending

In [40]:
average_math_score_by_budget = grouped_with_budgets['math_score'].mean()
average_math_score_by_budget.head()

budget_groups
<$580       83.274201
$580-610    83.459313
$610-640    78.236441
$640-670    77.058995
Name: math_score, dtype: float64

## Calculate Average Reading Score relative to School Spending

In [41]:
average_reading_score_by_budget = grouped_with_budgets['reading_score'].mean()
average_reading_score_by_budget.head()

budget_groups
<$580       83.989488
$580-610    83.905259
$610-640    81.559460
$640-670    80.958411
Name: reading_score, dtype: float64

## Calculate Total Students relative to School Spending

In [42]:
total_students_by_budget = school_data_with_budgets.groupby('budget_groups').count()['Student ID'].values
total_students_by_budget

array([ 2283,  6808, 15027, 15052], dtype=int64)

## Calculate Percent Passing Math relative to School Spending

In [43]:
passing_math_by_budget_count = school_data_with_budgets[school_data_with_budgets['math_score'] >= 70].groupby('budget_groups').count()['Student ID'].values
percent_passing_math_by_budget = passing_math_by_budget_count / total_students_by_budget * 100
percent_passing_math_by_budget

array([93.86771792, 93.81609871, 71.9637985 , 66.60908849])

## Calculate Percent Passing Reading relative to School Spending

In [44]:
passing_reading_by_budget_count = school_data_with_budgets[school_data_with_budgets['reading_score'] >= 70].groupby('budget_groups').count()['Student ID'].values
percent_passing_reading_by_budget = passing_reading_by_budget_count / total_students_by_budget * 100
percent_passing_reading_by_budget

array([96.53964082, 96.4159812 , 84.40141079, 80.7799628 ])

## Calculate Overall Passing Rate relative to School Spending

In [45]:
percent_overall_passing_by_budget = (percent_passing_math_by_budget + percent_passing_reading_by_budget) / 2
percent_overall_passing_by_budget

array([95.20367937, 95.11603995, 78.18260464, 73.69452564])

In [46]:
data_by_budget = {
    'Average Math Score': average_math_score_by_budget,
    'Average Reading Score': average_reading_score_by_budget,
    '% Passing Math': percent_passing_math_by_budget,
    '% Passing Reading': percent_passing_reading_by_budget,
    '% Overall Passing Rate': percent_overall_passing_by_budget,
}

data_by_budget_df = pd.DataFrame(data_by_budget)
data_by_budget_df.head()

Unnamed: 0_level_0,Average Math Score,Average Reading Score,% Passing Math,% Passing Reading,% Overall Passing Rate
budget_groups,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
<$580,83.274201,83.989488,93.867718,96.539641,95.203679
$580-610,83.459313,83.905259,93.816099,96.415981,95.11604
$610-640,78.236441,81.55946,71.963798,84.401411,78.182605
$640-670,77.058995,80.958411,66.609088,80.779963,73.694526


## Scores by School Size

* Perform the same operations as above, based on school size.

In [47]:
# Sample bins. Feel free to create your own bins.
size_bins = [0, 1000, 2000, 5000]
group_names = ["Small (<1000)", "Medium (1000-2000)", "Large (2000-5000)"]

In [48]:
size_categories = pd.cut(relevant_school_data['size'], size_bins,labels=group_names)
size_groups = size_categories.to_frame()
len(size_groups)

39170

In [49]:
school_data_with_groups = relevant_school_data.assign(size_groups=size_groups)
school_data_with_groups.head()

Unnamed: 0,school_name,math_score,reading_score,size,budget,type,size_groups
19584,Bailey High School,92,82,4976,3124928,District,Large (2000-5000)
21193,Bailey High School,89,88,4976,3124928,District,Large (2000-5000)
21192,Bailey High School,67,86,4976,3124928,District,Large (2000-5000)
21191,Bailey High School,75,70,4976,3124928,District,Large (2000-5000)
21190,Bailey High School,82,77,4976,3124928,District,Large (2000-5000)


In [50]:
average_score_by_group = school_data_with_groups.groupby('size_groups').mean()
average_score_by_group

Unnamed: 0_level_0,math_score,reading_score,size,budget
size_groups,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Large (2000-5000),77.477597,81.198674,3924.352268,2510395.0
Medium (1000-2000),83.372682,83.867989,1715.739732,1033931.0
Small (<1000),83.828654,83.974082,797.532757,482022.0


## Calculate Math Averege Passing Rate relative to School Size

In [51]:
average_math_score_by_group = average_score_by_group['math_score']
average_math_score_by_group

size_groups
Large (2000-5000)     77.477597
Medium (1000-2000)    83.372682
Small (<1000)         83.828654
Name: math_score, dtype: float64

## Calculate Reading Average Passing Rate relative to School Size

In [52]:
average_reading_score_by_group = average_score_by_group['reading_score']
average_reading_score_by_group

size_groups
Large (2000-5000)     81.198674
Medium (1000-2000)    83.867989
Small (<1000)         83.974082
Name: reading_score, dtype: float64

## Calculate Total Students relative to School Size

In [53]:
total_students_groups = school_data_with_groups.groupby('size_groups').count()['school_name']
total_students_groups

size_groups
Large (2000-5000)     29259
Medium (1000-2000)     8522
Small (<1000)          1389
Name: school_name, dtype: int64

## Calculate Math Percent Passing Rate relative to School Size

In [54]:
passing_math_groups = school_data_with_groups[school_data_with_groups['math_score'] >= 70].groupby('size_groups').count()['school_name']
percent_passing_math_groups = passing_math_groups.values / total_students_groups.values * 100
percent_passing_math_groups

array([68.65238046, 93.61652194, 93.9524838 ])

## Calculate Reading percent Passing Rate relative to School Size

In [55]:
passing_reading_groups = school_data_with_groups[school_data_with_groups['reading_score'] >= 70].groupby('size_groups').count()['school_name']
percent_passing_reading_groups = passing_reading_groups.values / total_students_groups.values * 100
percent_passing_reading_groups

array([82.12515807, 96.77305797, 96.04031677])

## Calculate Overall Passing Rate relative to School Size

In [56]:
percent_overall_groups = (percent_passing_math_groups + percent_passing_reading_groups) / 2
percent_overall_groups

array([75.38876927, 95.19478996, 94.99640029])

In [57]:
data_by_group = {
    'Average Math Score': average_math_score_by_group,
    'Average Reading Score': average_reading_score_by_group,
    '% Passing Math': percent_passing_math_groups,
    '% Passing Reading': percent_passing_reading_groups,
    '% Overall Passing Rate': percent_overall_groups,
}

data_by_group_df = pd.DataFrame(data_by_group)
data_by_group_df.sort_index(ascending=False).head()

Unnamed: 0_level_0,Average Math Score,Average Reading Score,% Passing Math,% Passing Reading,% Overall Passing Rate
size_groups,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Small (<1000),83.828654,83.974082,93.952484,96.040317,94.9964
Medium (1000-2000),83.372682,83.867989,93.616522,96.773058,95.19479
Large (2000-5000),77.477597,81.198674,68.65238,82.125158,75.388769


## Scores by School Type

* Perform the same operations as above, based on school type.

In [58]:
average_score_by_type = relevant_school_data.groupby('type').mean()
average_score_by_type

Unnamed: 0_level_0,math_score,reading_score,size,budget
type,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Charter,83.406183,83.902821,1717.352468,1024543.0
District,76.987026,80.962485,4063.261195,2611175.0


## Calculate Math Averege Passing Rate relative to School Type

In [59]:
average_math_type = average_score_by_type['math_score']
average_math_type

type
Charter     83.406183
District    76.987026
Name: math_score, dtype: float64

## Calculate Reading Average Passing Rate relative to School Type

In [60]:
average_reading_type = average_score_by_type['reading_score']
average_reading_type

type
Charter     83.902821
District    80.962485
Name: reading_score, dtype: float64

## Calculate Total Students relative to School Type

In [61]:
total_students_type = relevant_school_data.groupby('type').count()['school_name']
total_students_type

type
Charter     12194
District    26976
Name: school_name, dtype: int64

## Calculate Math Percent Passing Rate relative to School Type

In [62]:
passing_math_type = relevant_school_data[relevant_school_data['math_score'] >= 70].groupby('type').count()['school_name']
percent_passing_math_type = passing_math_type.values / total_students_type.values * 100
percent_passing_math_type

array([93.70182057, 66.51838671])

## Calculate Reading percent Passing Rate relative to School Type

In [63]:
passing_reading_type = relevant_school_data[relevant_school_data['reading_score'] >= 70].groupby('type').count()['school_name']
percent_passing_reading_type = passing_reading_type.values / total_students_type.values * 100
percent_passing_reading_type

array([96.64589142, 80.90524911])

## Calculate Overall Passing Rate relative to School Type

In [64]:
percent_overall_type = (percent_passing_math_type + percent_passing_reading_type) / 2
percent_overall_type

array([95.17385599, 73.71181791])

In [65]:
data_by_type = {
    'Average Math Score': average_math_type,
    'Average Reading Score': average_reading_type,
    '% Passing Math': percent_passing_math_type,
    '% Passing Reading': percent_passing_reading_type,
    '% Overall Passing Rate': percent_overall_type,
}

data_by_type_df = pd.DataFrame(data_by_type)
data_by_type_df.head()

Unnamed: 0_level_0,Average Math Score,Average Reading Score,% Passing Math,% Passing Reading,% Overall Passing Rate
type,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Charter,83.406183,83.902821,93.701821,96.645891,95.173856
District,76.987026,80.962485,66.518387,80.905249,73.711818
