# Pandas Analysis: NYC Public School SAT Performance

This project analyzes NYC public school SAT data using **Pandas** to assess math performance, top schools by combined scores, and borough-level variability. I applied **data filtering, aggregation, and statistical analysis** to identify high-performing schools.

In [1]:
# Re-run this cell 
import pandas as pd

# Read in the data
schools = pd.read_csv("schools.csv")

# Preview the data
schools.head()

#top 10 schools
filtered_schools = schools[schools['average_math'] >= 0.8 * 800]
best_math_schools = filtered_schools[['school_name', 'average_math']].sort_values(by='average_math', ascending=False)

In [2]:
best_math_schools.head()

Unnamed: 0,school_name,average_math
88,Stuyvesant High School,754
170,Bronx High School of Science,714
93,Staten Island Technical High School,711
365,Queens High School for the Sciences at York Co...,701
68,"High School for Mathematics, Science, and Engi...",683


In [4]:
best_math_schools.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 10 entries, 88 to 45
Data columns (total 2 columns):
 #   Column        Non-Null Count  Dtype 
---  ------        --------------  ----- 
 0   school_name   10 non-null     object
 1   average_math  10 non-null     int64 
dtypes: int64(1), object(1)
memory usage: 240.0+ bytes


In [5]:
schools['total_SAT']= schools['average_math']+schools['average_reading']+schools['average_writing']

In [6]:
schools.columns

Index(['school_name', 'borough', 'building_code', 'average_math',
       'average_reading', 'average_writing', 'percent_tested', 'total_SAT'],
      dtype='object')

In [72]:
top_10_schools= schools[['school_name', 'total_SAT']].sort_values(by='total_SAT', ascending=False)

In [73]:
top_10_schools.head(10)

Unnamed: 0,school_name,total_SAT
88,Stuyvesant High School,2144
170,Bronx High School of Science,2041
93,Staten Island Technical High School,2041
174,High School of American Studies at Lehman College,2013
333,Townsend Harris High School,1981
365,Queens High School for the Sciences at York Co...,1947
5,Bard High School Early College,1914
280,Brooklyn Technical High School,1896
45,Eleanor Roosevelt High School,1889
68,"High School for Mathematics, Science, and Engi...",1889


In [74]:
import numpy as np
std_dev = schools.groupby('borough')['total_SAT'].agg(['count', 'mean', 'std']).round(2)

In [75]:
print(std_dev)

               count     mean     std
borough                              
Bronx             98  1202.72  150.39
Brooklyn         109  1230.26  154.87
Manhattan         89  1340.13  230.29
Queens            69  1345.48  195.25
Staten Island     10  1439.00  222.30


In [76]:
largest_std_dev= std_dev[std_dev['std']==std_dev['std'].max()]

In [77]:
print(largest_std_dev)

           count     mean     std
borough                          
Manhattan     89  1340.13  230.29


In [78]:
largest_std_dev= largest_std_dev.rename(columns={'count':'num_schools', 'mean':'average_SAT', 'std':'std_SAT'})

In [79]:
print(largest_std_dev)

           num_schools  average_SAT  std_SAT
borough                                     
Manhattan           89      1340.13   230.29


## Result
Manhattan had the highest number of standard deviation in the combined SAT score.