### Introduction

This project focuses on analyzing the SAT performance of New York City (NYC) public schools using a dataset of school performance in math, reading, and writing, where each section is scored out of 800 points. The SAT results are a crucial metric for students, schools, and policymakers, as they influence college admissions and can reflect the quality of education in various schools across NYC.

The analysis aims to answer three key questions:

- Top Math Performers: Identifying schools with the best math results, defined as schools where the average math score is at least 80% of the maximum score (640 or higher).

- Top 10 Schools by SAT: Ranking the top 10 schools based on the combined SAT score (math, reading, and writing), creating a new metric for total performance.

- Borough Variability: Determining which borough has the largest variation in total SAT scores by calculating the standard deviation and identifying the borough with the highest variability.

This project aims to inform stakeholders about school performance, helping researchers, policymakers, and parents make data-driven decisions about the education system in NYC.

In [1]:
import pandas as pd
schools = pd.read_csv("Dataset/schools.csv")

In [2]:
schools.head()

Unnamed: 0,school_name,borough,building_code,average_math,average_reading,average_writing,percent_tested
0,"New Explorations into Science, Technology and ...",Manhattan,M022,657,601,601,
1,Essex Street Academy,Manhattan,M445,395,411,387,78.9
2,Lower Manhattan Arts Academy,Manhattan,M445,418,428,415,65.1
3,High School for Dual Language and Asian Studies,Manhattan,M445,613,453,463,95.9
4,Henry Street School for International Studies,Manhattan,M056,410,406,381,59.7


In [3]:
schools.describe()

Unnamed: 0,average_math,average_reading,average_writing,percent_tested
count,375.0,375.0,375.0,355.0
mean,432.944,424.504,418.458667,64.976338
std,71.952373,61.881069,64.548599,18.747634
min,317.0,302.0,284.0,18.5
25%,386.0,386.0,382.0,50.95
50%,415.0,413.0,403.0,64.8
75%,458.5,445.0,437.5,79.6
max,754.0,697.0,693.0,100.0


In [4]:
schools.shape

(375, 7)

In [5]:
schools.columns

Index(['school_name', 'borough', 'building_code', 'average_math',
       'average_reading', 'average_writing', 'percent_tested'],
      dtype='object')

### 1) Finding schools with the best math scores

Sachant que le score maximale est de 800 on prend les ecoles avec un taux supérieur a 80% de reussite soit un score moyen de 640

In [6]:
schools_math_score = schools[schools["average_math"] > 640 ]

In [7]:
schools_math_score.shape

(10, 7)

In [8]:
top_math_schools = schools_math_score[["school_name","average_math"]].sort_values( by="average_math", ascending=False)


In [9]:
top_math_schools

Unnamed: 0,school_name,average_math
88,Stuyvesant High School,754
170,Bronx High School of Science,714
93,Staten Island Technical High School,711
365,Queens High School for the Sciences at York Co...,701
68,"High School for Mathematics, Science, and Engi...",683
280,Brooklyn Technical High School,682
333,Townsend Harris High School,680
174,High School of American Studies at Lehman College,669
0,"New Explorations into Science, Technology and ...",657
45,Eleanor Roosevelt High School,641


To have a beter visualisation of the result we can reset the index

In [10]:
top_math_schools.reset_index(drop=True, inplace=True)
top_math_schools.index = top_math_schools.index + 1

In [11]:

top_math_schools

Unnamed: 0,school_name,average_math
1,Stuyvesant High School,754
2,Bronx High School of Science,714
3,Staten Island Technical High School,711
4,Queens High School for the Sciences at York Co...,701
5,"High School for Mathematics, Science, and Engi...",683
6,Brooklyn Technical High School,682
7,Townsend Harris High School,680
8,High School of American Studies at Lehman College,669
9,"New Explorations into Science, Technology and ...",657
10,Eleanor Roosevelt High School,641


So we have the top10 schools of the dataset trié par la note moyenne en maths sachant que le score max est de 800

We can calculate le pourcentage comparé au score maximum pour plus de lisibilit des resukltas 

In [12]:
max_score = 800
top_math_schools['success_percentage'] = (top_math_schools['average_math'] / max_score) * 100


In [13]:
top_math_schools

Unnamed: 0,school_name,average_math,success_percentage
1,Stuyvesant High School,754,94.25
2,Bronx High School of Science,714,89.25
3,Staten Island Technical High School,711,88.875
4,Queens High School for the Sciences at York Co...,701,87.625
5,"High School for Mathematics, Science, and Engi...",683,85.375
6,Brooklyn Technical High School,682,85.25
7,Townsend Harris High School,680,85.0
8,High School of American Studies at Lehman College,669,83.625
9,"New Explorations into Science, Technology and ...",657,82.125
10,Eleanor Roosevelt High School,641,80.125


### Identifying the top 10 performing schools accross the three SAT sections

In [14]:
schools.columns

Index(['school_name', 'borough', 'building_code', 'average_math',
       'average_reading', 'average_writing', 'percent_tested'],
      dtype='object')

In [15]:
schools["total_SAT"] = schools["average_math"] + schools["average_reading"] + schools["average_writing"]

In [16]:
schools["total_SAT"]

0      1859
1      1193
2      1261
3      1529
4      1197
       ... 
370    1086
371    1114
372    1280
373    1207
374    1716
Name: total_SAT, Length: 375, dtype: int64

In [17]:
schools.head()

Unnamed: 0,school_name,borough,building_code,average_math,average_reading,average_writing,percent_tested,total_SAT
0,"New Explorations into Science, Technology and ...",Manhattan,M022,657,601,601,,1859
1,Essex Street Academy,Manhattan,M445,395,411,387,78.9,1193
2,Lower Manhattan Arts Academy,Manhattan,M445,418,428,415,65.1,1261
3,High School for Dual Language and Asian Studies,Manhattan,M445,613,453,463,95.9,1529
4,Henry Street School for International Studies,Manhattan,M056,410,406,381,59.7,1197


In [18]:
Top10_school = schools.sort_values(by='total_SAT', ascending=False).head(10)

In [19]:
Top10_school.reset_index(drop=True, inplace=True)
Top10_school.index = Top10_school.index + 1

In [20]:
Top10_school

Unnamed: 0,school_name,borough,building_code,average_math,average_reading,average_writing,percent_tested,total_SAT
1,Stuyvesant High School,Manhattan,M477,754,697,693,97.4,2144
2,Bronx High School of Science,Bronx,X445,714,660,667,97.0,2041
3,Staten Island Technical High School,Staten Island,R440,711,660,670,99.7,2041
4,High School of American Studies at Lehman College,Bronx,X905,669,672,672,91.8,2013
5,Townsend Harris High School,Queens,Q515,680,640,661,97.1,1981
6,Queens High School for the Sciences at York Co...,Queens,Q774,701,621,625,97.9,1947
7,Bard High School Early College,Manhattan,M097,634,641,639,70.8,1914
8,Brooklyn Technical High School,Brooklyn,K430,682,608,606,95.5,1896
9,Eleanor Roosevelt High School,Manhattan,M855,641,617,631,86.0,1889
10,"High School for Mathematics, Science, and Engi...",Manhattan,M812,683,610,596,92.6,1889


Now we have the top10 schools accross the three SAT sections

### Which single borough has the largest standard deviation in the combined SAT score?

In [47]:
schools_borought = schools.groupby('borough')['total_SAT'].agg(["min", "max", "std"])


In [50]:
schools_borought.sort_values( by="std", ascending=False).round(2)

Unnamed: 0_level_0,min,max,std
borough,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Manhattan,1005,2144,230.29
Staten Island,1258,2041,222.3
Queens,978,1981,195.25
Brooklyn,926,1896,154.87
Bronx,924,2041,150.39


The borough of Manhattan have the largest standard deviation in combined SAT scores indicates the area with the greatest disparity in academic performance across its schools.