<img src = "./Desktop/DATA ANALYSIS PROJECT/Project/workspace (7)/workspace/schoolbus.jpg">

Photo by [Jannis Lucas](https://unsplash.com/@jannis_lucas) on [Unsplash](https://unsplash.com).
<br>

Every year, American high school students take SATs, which are standardized tests intended to measure literacy, numeracy, and writing skills. There are three sections - reading, math, and writing, each with a **maximum score of 800 points**. These tests are extremely important for students and colleges, as they play a pivotal role in the admissions process.

Analyzing the performance of schools is important for a variety of stakeholders, including policy and education professionals, researchers, government, and even parents considering which school their children should attend. 

You have been provided with a dataset called `schools.csv`, which is previewed below.

You have been tasked with answering three key questions about New York City (NYC) public school SAT performance.

In [47]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import sklearn as sk
from sklearn.linear_model import LinearRegression, LogisticRegression
import seaborn as sns
import scipy.stats as ss

In [14]:
schools = pd.read_csv("./Desktop/DATA ANALYSIS PROJECT/Project/workspace (7)/workspace/schools.csv")

In [31]:
schools.head()

Unnamed: 0,school_name,borough,building_code,average_math,average_reading,average_writing,percent_tested,total_SAT
0,"New Explorations into Science, Technology and ...",Manhattan,M022,657,601,601,,1859
1,Essex Street Academy,Manhattan,M445,395,411,387,78.9,1193
2,Lower Manhattan Arts Academy,Manhattan,M445,418,428,415,65.1,1261
3,High School for Dual Language and Asian Studies,Manhattan,M445,613,453,463,95.9,1529
4,Henry Street School for International Studies,Manhattan,M056,410,406,381,59.7,1197


In [33]:
schools.columns

Index(['school_name', 'borough', 'building_code', 'average_math',
       'average_reading', 'average_writing', 'percent_tested', 'total_SAT'],
      dtype='object')

In [35]:
schools.shape

(375, 8)

In [39]:
schools.describe().T

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
average_math,375.0,432.944,71.952373,317.0,386.0,415.0,458.5,754.0
average_reading,375.0,424.504,61.881069,302.0,386.0,413.0,445.0,697.0
average_writing,375.0,418.458667,64.548599,284.0,382.0,403.0,437.5,693.0
percent_tested,355.0,64.976338,18.747634,18.5,50.95,64.8,79.6,100.0
total_SAT,375.0,1275.906667,194.906283,924.0,1157.0,1226.0,1330.5,2144.0


In [41]:
schools.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 375 entries, 0 to 374
Data columns (total 8 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   school_name      375 non-null    object 
 1   borough          375 non-null    object 
 2   building_code    375 non-null    object 
 3   average_math     375 non-null    int64  
 4   average_reading  375 non-null    int64  
 5   average_writing  375 non-null    int64  
 6   percent_tested   355 non-null    float64
 7   total_SAT        375 non-null    int64  
dtypes: float64(1), int64(4), object(3)
memory usage: 23.6+ KB


In [43]:
schools.value_counts()

school_name                                          borough    building_code  average_math  average_reading  average_writing  percent_tested  total_SAT
A. Philip Randolph Campus High School                Manhattan  M540           459           453              447              74.0            1359         1
Pathways College Preparatory School (College Board)  Queens     Q192           405           427              409              72.4            1241         1
Monroe Academy for Visual Arts and Design            Bronx      X420           361           354              351              46.8            1066         1
Millennium High School                               Manhattan  M824           577           560              567              94.0            1704         1
Millennium Brooklyn High School                      Brooklyn   K460           553           551              539              79.6            1643         1
                                                         

In [45]:
schools.dtypes

school_name         object
borough             object
building_code       object
average_math         int64
average_reading      int64
average_writing      int64
percent_tested     float64
total_SAT            int64
dtype: object

In [21]:
# Calc 80% of 800
math_score_threshold = 0.8 * 800

# Filter school with average math scores >= threshold
schools_above_threshold = schools[schools["average_math"] >= math_score_threshold]

# Select required columns and sort
best_math_schools = schools_above_threshold[["school_name","average_math"]].sort_values("average_math", ascending=False)

# Display the result
print(best_math_schools.head())

                                           school_name  average_math
88                              Stuyvesant High School           754
170                       Bronx High School of Science           714
93                 Staten Island Technical High School           711
365  Queens High School for the Sciences at York Co...           701
68   High School for Mathematics, Science, and Engi...           683


In [23]:
schools["total_SAT"] = schools["average_math"] + schools["average_reading"] + schools["average_writing"]

In [25]:
top_10_schools = schools[["school_name","total_SAT"]].sort_values("total_SAT", ascending = False).head(10)

In [27]:
top_10_schools

Unnamed: 0,school_name,total_SAT
88,Stuyvesant High School,2144
170,Bronx High School of Science,2041
93,Staten Island Technical High School,2041
174,High School of American Studies at Lehman College,2013
333,Townsend Harris High School,1981
365,Queens High School for the Sciences at York Co...,1947
5,Bard High School Early College,1914
280,Brooklyn Technical High School,1896
45,Eleanor Roosevelt High School,1889
68,"High School for Mathematics, Science, and Engi...",1889


In [29]:
# Group by borough and calculate statistics
borough_stats = schools.groupby("borough")["total_SAT"].agg(["mean","std","count"]).reset_index()

# Rename columns
borough_stats.columns = ["borough", "average_SAT", "std_SAT", "num_schools"]

# Borough with the largest standard deviation
largest_std_dev = borough_stats.sort_values("std_SAT", ascending = False).head(1).round(2)


print(largest_std_dev)

     borough  average_SAT  std_SAT  num_schools
2  Manhattan      1340.13   230.29           89
