![New York City schoolbus](schoolbus.jpg)

Photo by [Jannis Lucas](https://unsplash.com/@jannis_lucas) on [Unsplash](https://unsplash.com).
<br>

Every year, American high school students take SATs, which are standardized tests intended to measure literacy, numeracy, and writing skills. There are three sections - reading, math, and writing, each with a **maximum score of 800 points**. These tests are extremely important for students and colleges, as they play a pivotal role in the admissions process.

Analyzing the performance of schools is important for a variety of stakeholders, including policy and education professionals, researchers, government, and even parents considering which school their children should attend. 

You have been provided with a dataset called `schools.csv`, which is previewed below.

You have been tasked with answering three key questions about New York City (NYC) public school SAT performance.

In [63]:
# Re-run this cell 
import pandas as pd

# Read in the data
schools = pd.read_csv("schools.csv")


First, I separated the data of the schools with an average math score greater than or equal to 80% of the total possible correct answers, which is 640, into the DataFrame best_math_schools. I created the best_math_schools DataFrame, which stores the data from the columns school_name and average_math, sorted in descending order

In [64]:
#the best math results 
best_math_schools_aux = schools[schools["average_math"] >= 800*0.8]
best_math_schools = best_math_schools_aux[["school_name", "average_math"]].sort_values("average_math", ascending=False)

I created a column called total_SAT, which stores the sum of the values from the columns average_math, average_reading, and average_writing for each row. Then, I created the top_10_schools DataFrame, which stores the columns school_name and total_SAT, sorted in descending order by the values in total_SAT, limited to the first 10 rows.

In [65]:
#top 10 performing schools
schools["total_SAT"] = schools[["average_math", "average_reading", "average_writing"]].sum(axis=1)
top_10_schools = schools[["school_name", "total_SAT"]].sort_values("total_SAT", ascending=False).head(10)

I grouped the data by borough, selected the total_SAT column within each group, and calculated the mean, storing the result in the new column average_SAT. I also calculated the standard deviations and stored them in std_SAT, and the number of schools in each borough, which I added to the column num_schools. All the newly created columns were rounded to 2 decimal places.

I identified the borough with the highest standard deviation and then created the DataFrame largest_std_dev, which contains the data of the borough with the highest standard deviation.

In [None]:
borough = schools.groupby("borough")["total_SAT"].agg(
    average_SAT="mean", std_SAT="std", num_schools="count"
).round(2)

borough_with_highest_std_dev = borough["std_SAT"].idxmax()

largest_std_dev = borough.loc[borough.index == borough_with_highest_std_dev]
