In [None]:
import plotly.io as pio
pio.renderers.default = "notebook_connected+plotly_mimetype"

# Project 2
By Aileen Yang cy2830

Dataset 1: Math State Tests Results of students in Grades 3 - 8 at a school district level. <br>
Data Source: https://infohub.nyced.org/reports/academics/test-results <br> <br>
Dataset 2: Demographic Snapshot, which provides data on annual enrollment at the citywide, borough, district, and school levels. <br>
Data Source: https://infohub.nyced.org/reports/students-and-schools/school-quality/information-and-data-overview

## Introduction
In this project, I wish to explore how economic disadvantage relates to student math performance across New York City school districts.

I have combined two public datasets from the NYC Public Schools InfoHub:

1. **Math State Test Results (Grades 3–8)** at the school district level  <br>
2. **Demographic Snapshot of Students**, which includes each district’s Economic Need Index(ENI) that estimates the percentages of students facing econmic hardship. A student's ENI will be 1.0 if they meets any of these criteria: <br>
- Eligible for NYC public assistance (HRA)
- Has lived in temporary housing within the last four years 

If they do not meet any of these criteria, their ENI will be determined by the poverty rate of families in that student's census tract. 

Therefore, ENI captures more fundamental problems other than income shown by poverty rate. It captures indicators like housing instability, welfare dependence and neighborhoood poverty etc. Therefore, it would be a meaningful index to study for understanding systemic barriers students face. 

Hypothesis:
Districts with higher Economic Need Index tend to have lower math proficiency rates. 

The relationship between ENI and the percentage of students scoring at Level 3 or 4 on the math state tests (i.e., meeting or exceeding proficiency) will be explored.

In [None]:
import pandas as pd 
# Importing first dataset that contains district and test results
test_result = pd.read_csv('district-math-results-2018-2025-public.csv')

# Importing second dataset that contains demographic snapshots
demo = pd.read_csv('demographic-snapshot-2020-21-to-2024-25-public.csv')

In [4]:
test_result.head()

Unnamed: 0,District,Grade,Year,Category,Number Tested,Mean Scale Score,# Level 1,% Level 1,# Level 2,% Level 2,# Level 3,% Level 3,# Level 4,% Level 4,# Level 3+4,% Level 3+4
0,1,3,2025,Asian,119,483,4,3.4,7,5.9,34,28.6,74,62.2,108,90.8
1,1,3,2025,Black,63,447,12,19.0,21,33.3,23,36.5,7,11.1,30,47.6
2,1,3,2025,Hispanic,221,450,38,17.2,73,33.0,80,36.2,30,13.6,110,49.8
3,1,3,2025,Multi-Racial,46,s,s,s,s,s,s,s,s,s,s,s
4,1,3,2025,Native American,3,s,s,s,s,s,s,s,s,s,s,s


In [3]:
# Standardizing the Year column expression
demo['Year'] = demo['Year'].str.slice(5, 7).astype(int) + 2000
demo.head()

Unnamed: 0,Administrative District,Year,Total Enrollment,Grade 3K,Grade PK (Half Day & Full Day),Grade K,Grade 1,Grade 2,Grade 3,Grade 4,...,% White,# Missing Race/Ethnicity Data,% Missing Race/Ethnicity Data,# Students with Disabilities,% Students with Disabilities,# English Language Learners,% English Language Learners,# Poverty,% Poverty,Economic Need Index
0,1,2021,11021,209,623,728,762,746,787,775,...,18.1%,101,0.9%,2501,22.7%,919,8.3%,7087,64.3%,67.0%
1,1,2022,10327,347,524,685,701,686,680,707,...,18.0%,29,0.3%,2440,23.6%,813,7.9%,6509,63.0%,65.5%
2,1,2023,9910,364,545,612,678,664,684,662,...,17.9%,78,0.8%,2406,24.3%,776,7.8%,6344,64.0%,67.3%
3,1,2024,10302,386,560,668,704,699,737,749,...,17.1%,37,0.4%,2396,23.3%,1225,11.9%,6961,67.6%,69.0%
4,1,2025,9827,314,459,619,659,668,689,675,...,17.3%,37,0.4%,2284,23.2%,1102,11.2%,6498,66.1%,67.9%


In [None]:
# Merge the two datasets by mapping the District and Year columns
merged = pd.merge(
    test_result,
    demo,
    left_on=['District', 'Year'],
    right_on=['Administrative District', 'Year'],
    how='left'
)
merged.head()

Unnamed: 0,District,Grade,Year,Category,Number Tested,Mean Scale Score,# Level 1,% Level 1,# Level 2,% Level 2,...,% White,# Missing Race/Ethnicity Data,% Missing Race/Ethnicity Data,# Students with Disabilities,% Students with Disabilities,# English Language Learners,% English Language Learners,# Poverty,% Poverty,Economic Need Index
0,1,3,2025,Asian,119,483,4,3.4,7,5.9,...,17.3%,37,0.4%,2284,23.2%,1102,11.2%,6498,66.1%,67.9%
1,1,3,2025,Black,63,447,12,19.0,21,33.3,...,17.3%,37,0.4%,2284,23.2%,1102,11.2%,6498,66.1%,67.9%
2,1,3,2025,Hispanic,221,450,38,17.2,73,33.0,...,17.3%,37,0.4%,2284,23.2%,1102,11.2%,6498,66.1%,67.9%
3,1,3,2025,Multi-Racial,46,s,s,s,s,s,...,17.3%,37,0.4%,2284,23.2%,1102,11.2%,6498,66.1%,67.9%
4,1,3,2025,Native American,3,s,s,s,s,s,...,17.3%,37,0.4%,2284,23.2%,1102,11.2%,6498,66.1%,67.9%


We can now examine how economic need and math achievement intersect. 

**x-axis:** Economic Need Index (percentage of economically disadvantaged students) <br>
**y-axis:** Percentage of students scoring Level 3 or 4 on the state math exam (proficient or above)


In [28]:
# Cleaning Economic Need index data, convert from % to numeric values
merged['Economic Need Index'] = (
    merged['Economic Need Index']
    .astype(str)
    .str.replace('%', '', regex=False)
    .str.replace(',', '', regex=False)
    .str.strip()
)
merged = merged.dropna(subset=['Economic Need Index'])
merged['Economic Need Index'] = pd.to_numeric(merged['Economic Need Index'], errors='coerce')

In [29]:
# Cleaning Level 3+4 percentage data, which represents the higher achievers, convert from % to numeric values
merged['% Level 3+4'] = (
    merged['% Level 3+4']
    .astype(str)
    .str.replace('%', '', regex=False)
    .str.strip()
)
merged['% Level 3+4'] = pd.to_numeric(merged['% Level 3+4'], errors='coerce')

In [30]:
import plotly.express as px

plot_df = merged.dropna(subset=['Economic Need Index', '% Level 3+4'])

fig = px.scatter(
    plot_df,
    x='Economic Need Index',
    y='% Level 3+4',
    trendline='ols',
    opacity=0.7,
    title="Relationship Between Economic Need Index and Math Proficiency",
    labels={
        'Economic Need Index': 'Economic Need Index (%)',
        '% Level 3+4': 'Math Proficiency (% Level 3+4)'
    }
)

fig.update_layout(
    template='plotly_white',
    title_x=0.5
)

fig.show()

## Graphic Interpretation
The resulting scatterplot shows a negative relationship between a school district's ENI and its math proficiency rate. As ENI increases, the percentage of students scoring Level 3 or 4 on the math exam tends to decline. It means that students who are facing structural problems such as economic hardship, housing instability or public assistance usage tend to have lower math proficiency, which is consistent with our hypothesis.

This downward trend is also captured by the regression line, in which the slope is downward across the plot.

However, the vertical spread of points is wide. It implies that districts with similar ENI values may have very different proficiency outcomes. 

For example, even among districts with high ENI of above 85%, some reach proficiency rates above 70%, while others fall below 30%. 

While economic need is an important factor related to academic performance for young children, it is not the sole determinant factor. Other factors such as teaching quality, community supports, and district-level policies may influence the result.

## Possible policy implication:
Districts with high ENI scores consistently show lower math proficiency, indicating that the local government should prioritize targeted, sustained resource, such as tutoring, smaller class sizes, and expanded student support services in these districts.

## Summary
The visualization demonstrates that structural economic hardship is fairly associated with lower district-level math performance in NYC. This makes ENI a powerful lens for understanding and addressing educational inequality across the city.