# Datacamp Assoc DS
Project #2 - NYC SAT results

Every year, American high school students take SATs, which are standardized tests intended to measure literacy, numeracy, and writing skills. There are three sections - reading, math, and writing, each with a maximum score of 800 points. These tests are extremely important for students and colleges, as they play a pivotal role in the admissions process.

Analyzing the performance of schools is important for a variety of stakeholders, including policy and education professionals, researchers, government, and even parents considering which school their children should attend.

You have been provided with a dataset called schools.csv, which is previewed below.

You have been tasked with answering three key questions about New York City (NYC) public school SAT performance.

---

Which NYC schools have the best math results?

The best math results are at least 80% of the *maximum possible score of 800* for math.
Save your results in a pandas DataFrame called best_math_schools, including "school_name" and "average_math" columns, sorted by "average_math" in descending order.
What are the top 10 performing schools based on the combined SAT scores?

Save your results as a pandas DataFrame called top_10_schools containing the "school_name" and a new column named "total_SAT", with results ordered by "total_SAT" in descending order.
Which single borough has the largest standard deviation in the combined SAT score?

Save your results as a pandas DataFrame called largest_std_dev.
The DataFrame should contain one row, with:
"borough" - the name of the NYC borough with the largest standard deviation of "total_SAT".
"num_schools" - the number of schools in the borough.
"average_SAT" - the mean of "total_SAT".
"std_SAT" - the standard deviation of "total_SAT".
Round all numeric values to two decimal places.

---

Solution

In [None]:
# Re-run this cell 
import pandas as pd

# Read in the data
schools = pd.read_csv("schools.csv")

# Preview the data
schools.head()

# Start coding here...

# Which schools are best for math?
best_math_schools = schools[schools["average_math"] >= 640][["school_name", "average_math"]].sort_values("average_math", ascending=False)

# Calculate total_SAT per school
schools["total_SAT"] = schools["average_math"] + schools["average_reading"] + schools["average_writing"]

# Who are the top 10 performing schools?
top_10_schools = schools.sort_values("total_SAT", ascending=False)[["school_name", "total_SAT"]].head(10)

# Which NYC borough has the highest standard deviation for total_SAT?
boroughs = schools.groupby("borough")["total_SAT"].agg(["count", "mean", "std"]).round(2)

# Filter for max std and make borough a column
largest_std_dev = boroughs[boroughs["std"] == boroughs["std"].max()]

# Rename the columns for clarity
largest_std_dev = largest_std_dev.rename(columns={"count": "num_schools", "mean": "average_SAT", "std": "std_SAT"})

# Optional: Move borough from index to column
largest_std_dev.reset_index(inplace=True)

---

My code

In [None]:
# Re-run this cell 
import pandas as pd

# Read in the data
schools = pd.read_csv("schools.csv")

# Preview the data
schools.head()

# Start coding here...
# Add as many cells as you like...

######### 1 #################
best_math = schools[schools["average_math"]>=640][["school_name", "average_math"]]
best_math_schools = best_math.sort_values("average_math", ascending=False)


######### 2 #################
schools["total_SAT"]=schools["average_math"]+schools["average_reading"]+schools["average_writing"]
top_schools = schools.sort_values("total_SAT", ascending=False)
top_10_schools = top_schools[["school_name", "total_SAT"]].head(10)
top_10_schools


######### 3 #################
import numpy as np
schools

group_borough = schools.groupby("borough")
# group_borough["num_schools"] = schools["borough"].value_counts()
# group_borough["average_SAT"] = group_borough["total_SAT"].mean()
std = group_borough["total_SAT"].std()
mean = group_borough["total_SAT"].mean()
num_school = group_borough.size()

max_std_borough = std.idxmax()

# Create a DataFrame with the required information
largest_std_dev = pd.DataFrame({
    "borough": [max_std_borough],
    "num_schools": [num_school[max_std_borough]],
    "average_SAT": [mean[max_std_borough]],
    "std_SAT": [std[max_std_borough]]
})
largest_std_dev = largest_std_dev.round(2)
best_math_schools

# max_std = std.max()
# largest = std[std==max_std]
# # if schools["borough"] == "Manhattan":
# num_schools = schools["borough"].value_counts()
# num_schools["Manhattan"]
# schools["borough"]
# largest["
# if largest["average_SAT"] = group_borough["total_SAT"].mean()
# group_borough.head()


# largest
# largest_std
# schools.head()
# sort_values("total_SAT").head(1)
# largest_bor = group_borough[group_borough["total_SAT"]==230.2941395364]