# Strategic Analysis of District-Wide Standardized Test Results
## Identifying Trends in School Performance for Budget and Priority Planning
As the Chief Data Scientist for the City School District, this analysis will guide strategic decision-making for future school budgets and priorities by examining student math and reading scores across various schools.

# Table of Contents

1. [Load Data](#load-data)
2. [Calculating Key Metrics for Schools and Students](#calculating-key-metrics-for-schools-and-students)
3. [Analysis of Student Performance Metrics](#analysis-of-student-performance-metrics)
4. [Detailed School Performance Analysis and Summary](#detailed-school-performance-analysis-and-summary)
5. [Top and Bottom 5 Schools by Overall Passing Rate](#top-and-bottom-5-schools-by-overall-passing-rate)
6. [Average Score by Grade](#average-score-by-grade)
7. [Grades Based Off of Spending](#grades-based-off-of-spending)
8. [Scores by School Size](#scores-by-school-size)
9. [Scores by School Type](#scores-by-school-type)


# Load Data

In [3]:
import pandas as pd


#Define file path
csv_path_schools = "/Users/kaylabiddle/Downloads/Starter_Code-4/PyCitySchools/Resources/schools_complete.csv"
school_data = pd.read_csv(csv_path_schools)
csv_path_students = "/Users/kaylabiddle/Downloads/Starter_Code-4/PyCitySchools/Resources/students_complete.csv"
student_data = pd.read_csv(csv_path_students)

#Load data into DataFrames
school_data.head()
student_data.head()

Unnamed: 0,Student ID,student_name,gender,grade,school_name,reading_score,math_score
0,0,Paul Bradley,M,9th,Huang High School,66,79
1,1,Victor Smith,M,12th,Huang High School,94,61
2,2,Kevin Rodriguez,M,12th,Huang High School,90,60
3,3,Dr. Richard Scott,M,12th,Huang High School,67,58
4,4,Bonnie Ray,F,9th,Huang High School,97,84


# Calculating Key Metrics for Schools and Students

In [4]:
# Number of unique schools
def calculate_total_schools(df):
    total_schools = df["school_name"].nunique()
    return total_schools

total_schools = calculate_total_schools(school_data)
print(f"Total Number of Schools: {total_schools}")

# Number of Students
def calculate_total_students(df):
    total_students = df["student_name"].nunique()
    return total_students

total_students = calculate_total_students(student_data)
print(f"Total Students: {total_students}")

# Calculated Budget
def calculate_total_budget(df):
    total_budget = df["budget"].sum()
    return total_budget

total_budget = calculate_total_budget(school_data)
print(f"Budget: ${total_budget:,.2f}")


Total Number of Schools: 15
Total Students: 32715
Budget: $24,649,428.00


Student Scores:

# Analysis of Student Performance Metrics

In [5]:
# Average math score
def calculate_math_score(df):
    total_math_scores = df["math_score"].sum()
    average_math_score = total_math_scores/total_students
    return average_math_score

average_math_score = calculate_math_score(student_data)
print(f"Average math score: {average_math_score:.2f}%")

# Average reading score
def calculate_reading_score(df):
    total_read_scores = df["reading_score"].sum()
    average_read_score = total_read_scores/total_students
    return average_read_score

average_read_score = calculate_reading_score(student_data)
print(f"Average reading score: {average_read_score:.2f}%")

def calculate_math_passing(df):
    passing_score = 70
    num_passed_math = df[df['math_score'] >= passing_score].shape[0]
    total_students = df.shape[0]
    percent_passed_math = (num_passed_math/total_students)*100
    return percent_passed_math

percent_passed_math = calculate_math_passing(student_data)
print(f"Passing math: {percent_passed_math:.2f}%")

def calculate_reading_passing(df):
    passing_score = 70
    num_passed_reading = df[df['reading_score'] >= passing_score].shape[0]
    total_students = df.shape[0]
    percent_passed_reading = (num_passed_reading/total_students)*100
    return percent_passed_reading

percent_passed_reading = calculate_reading_passing(student_data)
print(f"Passing reading: {percent_passed_reading:.2f}%")

def calculate_both_passing(df):
    passing_score = 70
    num_passed_both = df[(df['reading_score'] >= passing_score) & (df['math_score'] >= passing_score)].shape[0]
    total_students = df.shape[0]
    percent_passed_both = (num_passed_both/total_students)*100
    return percent_passed_both

percent_passed_both = calculate_both_passing(student_data)
print(f"Overall passing: {percent_passed_both:.2f}%")

Average math score: 94.57%
Average reading score: 98.03%
Passing math: 74.98%
Passing reading: 85.81%
Overall passing: 65.17%


# Detailed School Performance Analysis and Summary

In [7]:
merged_data = pd.merge(student_data, school_data, on="school_name")

total_students = merged_data['school_name'].value_counts()

# Total school budget
total_budget = merged_data.groupby('school_name')['budget'].mean()

# Per student budget
per_student_budget = total_budget / total_students

# Average math score
average_math_score = merged_data.groupby('school_name')['math_score'].mean()

# Average reading score
average_reading_score = merged_data.groupby('school_name')['reading_score'].mean()

# Define passing scores
passing_score = 70

# Calculate passing columns
merged_data['passed_math'] = merged_data['math_score'] >= passing_score
merged_data['passed_reading'] = merged_data['reading_score'] >= passing_score
merged_data['overall_passing'] = merged_data['passed_math'] & merged_data['passed_reading']


# % Passing Math
percent_passing_math = merged_data.groupby('school_name')['passed_math'].mean() * 100

# % Passing Reading
percent_passing_reading = merged_data.groupby('school_name')['passed_reading'].mean() * 100

# % Overall Passing
percent_overall_passing = merged_data.groupby('school_name')['overall_passing'].mean() * 100

# Create the summary DataFrame
per_school_summary = pd.DataFrame({
    'School Type': school_data.set_index('school_name')['type'],
    'Total Students': total_students,
    'Total School Budget': total_budget.map('${:,.2f}'.format),
    'Per Student Budget': per_student_budget.map('${:,.2f}'.format),
    'Average Math Score': average_math_score.map('{:.2f}%'.format),
    'Average Reading Score': average_reading_score.map('{:.2f}%'.format),
    '% Passing Math': percent_passing_math.map('{:.2f}%'.format),
    '% Passing Reading': percent_passing_reading.map('{:.2f}%'.format),
    '% Overall Passing': percent_overall_passing.map('{:.2f}%'.format)
})

# Display the summary DataFrame
per_school_summary.head()

Unnamed: 0_level_0,School Type,Total Students,Total School Budget,Per Student Budget,Average Math Score,Average Reading Score,% Passing Math,% Passing Reading,% Overall Passing
school_name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
Bailey High School,District,4976,"$3,124,928.00",$628.00,77.05%,81.03%,66.68%,81.93%,54.64%
Cabrera High School,Charter,1858,"$1,081,356.00",$582.00,83.06%,83.98%,94.13%,97.04%,91.33%
Figueroa High School,District,2949,"$1,884,411.00",$639.00,76.71%,81.16%,65.99%,80.74%,53.20%
Ford High School,District,2739,"$1,763,916.00",$644.00,77.10%,80.75%,68.31%,79.30%,54.29%
Griffin High School,Charter,1468,"$917,500.00",$625.00,83.35%,83.82%,93.39%,97.14%,90.60%


# Top and Bottom 5 Schools by Overall Passing Rate

In [13]:
# Top 5 Passing Schools
passing_score = 70
merged_data = pd.merge(student_data, school_data, on="school_name")
merged_data['passed_math'] = merged_data['math_score'] >= passing_score
merged_data['passed_reading'] = merged_data['reading_score'] >= passing_score
merged_data['overall_passing'] = merged_data['passed_math'] & merged_data['passed_reading']

school_passing_summary = merged_data.groupby('school_name').agg({
    'overall_passing': 'mean',
    'Student ID': 'count'
}).rename(columns={'overall_passing': 'overall_passing_rate', 'Student ID': 'total_students'})

school_passing_summary['overall_passing_rate'] = school_passing_summary['overall_passing_rate'] * 100

sorted_schools = school_passing_summary.sort_values(by='overall_passing_rate', ascending=False)

top_schools = sorted_schools.head(5)

# Worst 5 schools
sorted_schools_ascending = school_passing_summary.sort_values(by='overall_passing_rate', ascending=True)

bottom_schools = sorted_schools_ascending.head(5)



In [12]:
print("Top 5 Passing Schools:")
top_schools.head()

Top 5 Passing Schools:


Unnamed: 0_level_0,overall_passing_rate,total_students
school_name,Unnamed: 1_level_1,Unnamed: 2_level_1
Cabrera High School,91.334769,1858
Thomas High School,90.948012,1635
Griffin High School,90.599455,1468
Wilson High School,90.582567,2283
Pena High School,90.540541,962


In [15]:

print("Worst 5 Schools:")
bottom_schools.head()

Worst 5 Schools:


Unnamed: 0_level_0,overall_passing_rate,total_students
school_name,Unnamed: 1_level_1,Unnamed: 2_level_1
Rodriguez High School,52.988247,3999
Figueroa High School,53.204476,2949
Huang High School,53.513884,2917
Hernandez High School,53.527508,4635
Johnson High School,53.539172,4761


# Average Score by Grade

In [16]:
# Average Score by Grade

# Calculate average math scores by school and grade
average_math_scores_by_school_grade = student_data.groupby(['school_name', 'grade']).agg({'math_score': 'mean'}).unstack()
average_math_scores_by_school_grade.columns = average_math_scores_by_school_grade.columns.droplevel()
average_math_scores_by_school_grade = average_math_scores_by_school_grade[['9th', '10th', '11th', '12th']]
average_math_scores_by_school_grade = average_math_scores_by_school_grade.rename(columns={'9th': '9th Grade', '10th': '10th Grade', '11th': '11th Grade', '12th': '12th Grade'})

# Calculate average reading scores by school and grade
average_reading_scores_by_school_grade = student_data.groupby(['school_name', 'grade']).agg({'reading_score': 'mean'}).unstack()
average_reading_scores_by_school_grade.columns = average_reading_scores_by_school_grade.columns.droplevel()
average_reading_scores_by_school_grade = average_reading_scores_by_school_grade[['9th', '10th', '11th', '12th']]
average_reading_scores_by_school_grade = average_reading_scores_by_school_grade.rename(columns={'9th': '9th Grade', '10th': '10th Grade', '11th': '11th Grade', '12th': '12th Grade'})


In [17]:
average_math_scores_by_school_grade.head()

grade,9th Grade,10th Grade,11th Grade,12th Grade
school_name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Bailey High School,77.083676,76.996772,77.515588,76.492218
Cabrera High School,83.094697,83.154506,82.76556,83.277487
Figueroa High School,76.403037,76.539974,76.884344,77.151369
Ford High School,77.361345,77.672316,76.918058,76.179963
Griffin High School,82.04401,84.229064,83.842105,83.356164


In [19]:
average_reading_scores_by_school_grade.head()

grade,9th Grade,10th Grade,11th Grade,12th Grade
school_name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Bailey High School,81.303155,80.907183,80.945643,80.912451
Cabrera High School,83.676136,84.253219,83.788382,84.287958
Figueroa High School,81.198598,81.408912,80.640339,81.384863
Ford High School,80.632653,81.262712,80.403642,80.662338
Griffin High School,83.369193,83.706897,84.288089,84.013699


# Grades based off of spending

In [20]:
# Grades based off of spending
spending_bins = [0, 585, 630, 645, 680]
labels = ["<$585", "$585-630", "$630-645", "$645-680"]
merged_data['spending_range'] = pd.cut(merged_data['budget'] / merged_data['size'], bins=spending_bins, labels=labels)
passing_score = 70

spending_summary = merged_data.groupby('spending_range').agg({
    'math_score': 'mean',
    'reading_score': 'mean',
    'Student ID': 'count',
    'passed_math': 'mean',
    'passed_reading': 'mean',
    'overall_passing': 'mean'

}).rename(columns={
    'math_score': 'average_math_score',
    'reading_score': 'average_reading_score',
    'Student ID': 'total_students',
    'passed_math': 'percent_passing_math',
    'passed_reading': 'percent_passing_reading',
    'overall_passing': 'percent_overall_passing'
})
spending_summary['percent_passing_math'] *= 100
spending_summary['percent_passing_reading'] *= 100
spending_summary['percent_overall_passing'] *= 100

spending_summary.head()

  spending_summary = merged_data.groupby('spending_range').agg({


Unnamed: 0_level_0,average_math_score,average_reading_score,total_students,percent_passing_math,percent_passing_reading,percent_overall_passing
spending_range,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
<$585,83.363065,83.964039,6368,93.702889,96.686558,90.640704
$585-630,79.982873,82.312643,9167,79.109851,88.513145,70.939239
$630-645,77.821056,81.301007,11322,70.623565,82.600247,58.841194
$645-680,77.049297,81.005604,12313,66.230813,81.109397,53.528791


# Scores by School Size

In [21]:
# Define bins and labels for school size
size_bins = [0, 1000, 2000, 5000]
labels = ["Small (<1000)", "Medium (1000-2000)", "Large (2000-5000)"]

# Use pd.cut on the "Total Students" column to create a new "School Size" column
per_school_summary['School Size'] = pd.cut(total_students, bins=size_bins, labels=labels)

# Remove formatting from numeric columns for aggregation
numeric_summary = pd.DataFrame({
    'Average Math Score': average_math_score,
    'Average Reading Score': average_reading_score,
    '% Passing Math': percent_passing_math,
    '% Passing Reading': percent_passing_reading,
    '% Overall Passing': percent_overall_passing
})

# Add the "School Size" column to numeric_summary
numeric_summary['School Size'] = pd.cut(total_students, bins=size_bins, labels=labels)

# Group by "School Size" and calculate mean values
size_summary = numeric_summary.groupby('School Size').mean()

# Format the results
size_summary['Average Math Score'] = size_summary['Average Math Score'].map('{:.2f}'.format)
size_summary['Average Reading Score'] = size_summary['Average Reading Score'].map('{:.2f}'.format)
size_summary['% Passing Math'] = size_summary['% Passing Math'].map('{:.2f}%'.format)
size_summary['% Passing Reading'] = size_summary['% Passing Reading'].map('{:.2f}%'.format)
size_summary['% Overall Passing'] = size_summary['% Overall Passing'].map('{:.2f}%'.format)

# Print the size summary 
size_summary.head()

  size_summary = numeric_summary.groupby('School Size').mean()


Unnamed: 0_level_0,Average Math Score,Average Reading Score,% Passing Math,% Passing Reading,% Overall Passing
School Size,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Small (<1000),83.82,83.93,93.55%,96.10%,89.88%
Medium (1000-2000),83.37,83.86,93.60%,96.79%,90.62%
Large (2000-5000),77.75,81.34,69.96%,82.77%,58.29%


# Scores by School Type

In [40]:
# Group by School Type and find the average for each subject
average_math_score_by_type = per_school_summary.groupby(["School Type"])["Average Math Score"].mean()
average_reading_score_by_type = per_school_summary.groupby(["School Type"])["Average Reading Score"].mean()
average_percent_passing_math_by_type = per_school_summary.groupby(["School Type"])["% Passing Math"].mean()
average_percent_passing_reading_by_type = per_school_summary.groupby(["School Type"])["% Passing Reading"].mean()
average_percent_overall_passing_by_type = per_school_summary.groupby(["School Type"])["% Overall Passing"].mean()

# Plug these values into one DataFram called 'type_summary'
type_summary = pd.DataFrame({
                "Average Math Score": average_math_score_by_type,
                "Average Reading Score": average_reading_score_by_type,
                "% Passing Math": average_percent_passing_math_by_type,
                "% Passing Reading": average_percent_passing_reading_by_type,
                "% Overall Passing": average_percent_overall_passing_by_type                
                })

# Results
type_summary

Unnamed: 0_level_0,Average Math Score,Average Reading Score,% Passing Math,% Passing Reading,% Overall Passing
School Type,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Charter,83.4725,83.89625,93.62,96.58625,90.43125
District,76.955714,80.965714,66.548571,80.798571,53.671429
