# Day 3: SQL via Python: NYC School Data Exploration

**Author:** Alexander Kuhn

**Date:** January 15, 2026

# This notebook shows:

1.  Installing collected package
2.  DB connection setup with Import Libraries & displayed table
3.  School Distribution: How many schools are there in each borough?
4.  Language Learners: What is the average percentage of English Language Learners (ELL) per borough?
5.  Schools supporting special needs

## 1. Installing collected package

In [53]:
pip install psycopg2-binary

Note: you may need to restart the kernel to use updated packages.


## 2. DB connection setup with Import Libraries & displayed table

In [54]:
import pandas as pd
from sqlalchemy import create_engine

DATABASE = 'neondb'
USERNAME = 'neondb_owner'
PASSWORD = 'a9Am7Yy5r9_T7h4OF2GN'
HOST = 'ep-falling-glitter-a5m0j5gk-pooler.us-east-2.aws.neon.tech'
PORT = '5432'

DATABASE_URL = f'postgresql+psycopg2://{USERNAME}:{PASSWORD}@{HOST}:{PORT}/{DATABASE}?sslmode=require'

engine = create_engine(DATABASE_URL)

df = pd.read_sql("SELECT * FROM nyc_schools.high_school_directory LIMIT 5;", engine)
df.head()

Unnamed: 0,dbn,school_name,borough,building_code,phone_number,fax_number,grade_span_min,grade_span_max,expgrade_span_min,expgrade_span_max,...,number_programs,Location 1,Community Board,Council District,Census Tract,Zip Codes,Community Districts,Borough Boundaries,City Council Districts,Police Precincts
0,27Q260,Frederick Douglass Academy VI High School,Queens,Q465,718-471-2154,718-471-2890,9.0,12,,,...,1,"{'latitude': '40.601989336', 'longitude': '-73...",14,31,100802,20529,51,3,47,59
1,21K559,Life Academy High School for Film and Music,Brooklyn,K400,718-333-7750,718-333-7775,9.0,12,,,...,1,"{'latitude': '40.593593811', 'longitude': '-73...",13,47,306,17616,21,2,45,35
2,16K393,Frederick Douglass Academy IV Secondary School,Brooklyn,K026,718-574-2820,718-574-2821,9.0,12,,,...,1,"{'latitude': '40.692133704', 'longitude': '-73...",3,36,291,18181,69,2,49,52
3,08X305,Pablo Neruda Academy,Bronx,X450,718-824-1682,718-824-1663,9.0,12,,,...,1,"{'latitude': '40.822303765', 'longitude': '-73...",9,18,16,11611,58,5,31,26
4,03M485,Fiorello H. LaGuardia High School of Music & A...,Manhattan,M485,212-496-0700,212-724-5748,9.0,12,,,...,6,"{'latitude': '40.773670507', 'longitude': '-73...",7,6,151,12420,20,4,19,12



## 3. School Distribution: How many schools are there in each borough?

In [55]:
query = """
-- Group by borough and count unique dbns (school identifier)
SELECT borough, COUNT(DISTINCT dbn) AS school_count
FROM nyc_schools.high_school_directory
GROUP BY borough
ORDER BY school_count DESC
"""

df_school_dist = pd.read_sql(query, engine)
df_school_dist

Unnamed: 0,borough,school_count
0,Brooklyn,121
1,Bronx,118
2,Manhattan,106
3,Queens,80
4,Staten Island,10


### Answer: Staten Island has only 10 schools.The other boroughs have between 80 and 121 schools.

## 4.Language Learners: What is the average percentage of English Language Learners (ELL) per borough?

In [56]:
query = """
-- Group by borough and calculate average of ell_percent
SELECT  dir.borough,
        AVG(dem.ell_percent) AS avg_ell_percent
FROM nyc_schools.high_school_directory dir
LEFT JOIN nyc_schools.school_demographics dem
ON dir.dbn = dem.dbn
GROUP BY dir.borough
ORDER BY avg_ell_percent
"""

df_ell = pd.read_sql(query, engine)
df_ell['avg_ell_percent'] = df_ell['avg_ell_percent'].round(2)
df_ell

Unnamed: 0,borough,avg_ell_percent
0,Manhattan,7.57
1,Brooklyn,
2,Queens,
3,Staten Island,
4,Bronx,


### Answer: 
#### - Schools in Manhattan have 7.57% of English language learners
#### - There is no data for ELL's in schools in the other boroughs
#### - This could be either because the data is missing or because the schools do not teach English

## 5.Schools supporting special needs

##### Using combined data from school_demographics and high_school_directory, analyze and rank schools by borough to extract the top 3 institutions with the highest percentage of special education students.

In [57]:
query = """
SELECT  r.borough, r.dbn, r.schoolyear, r.school_name, r.sped_percent
FROM (
    -- Join high school directory and school demographics on dbn
    -- Calculate a rank for each school within their borough
    SELECT  dir.borough,
            dir.dbn,
            dem.schoolyear,
            dir.school_name,
            dem.sped_percent,
            -- Calculate the rank with a window function that partitions (groups) by borough, sorts it by sped_percent,
            -- and then assigns the row number as the rank
            row_number() OVER (PARTITION BY dir.borough ORDER BY dem.sped_percent DESC) as rank
    FROM nyc_schools.high_school_directory dir
    LEFT JOIN nyc_schools.school_demographics dem
    ON dir.dbn = dem.dbn
    WHERE dem.sped_percent IS NOT NULL
) r
-- Filter for ranks <=3 and sort, to get top 3 schools per borough
WHERE r.rank <= 3
ORDER BY r.borough, r.rank
"""

df_sn = pd.read_sql(query, engine)
df_sn

Unnamed: 0,borough,dbn,schoolyear,school_name,sped_percent
0,Manhattan,01M450,20092010,East Side Community School,28.8
1,Manhattan,01M450,20102011,East Side Community School,27.7
2,Manhattan,01M450,20082009,East Side Community School,26.7


### Answer:

##### - Among Manhattan schools, the highest concentrations of special education students are found in three institutions, with rates spanning 26.7% to 28.8%
##### - The top three results all reference East Side Community School due to multi-year records in school_demographics
##### - The second query below accounts for this duplication
##### - Special education data is only available for Manhattan schools

### Schools supporting special needs over all time

##### The school_demographics table contains multi-year data, causing East Side Community School to occupy all three top positions in Manhattan. To gain broader insights into special education support across schools, we can either filter for a specific school year or aggregate the data by calculating average percentages across all years per school.

In [58]:
query = """
SELECT  r.borough, r.dbn, r.school_name, r.avg_sped_percent
FROM (
    -- Same query as above, but calculate the average sped_percent for each school
    -- by grouping, to get rid of multiple entries for the same school (caused by different school years)
    SELECT  dir.borough,
            dir.dbn,
            dir.school_name,
            AVG(dem.sped_percent) AS avg_sped_percent,
            row_number() OVER (PARTITION BY dir.borough ORDER BY AVG(dem.sped_percent) DESC) as rank
    FROM nyc_schools.high_school_directory dir
    LEFT JOIN nyc_schools.school_demographics dem
    ON dir.dbn = dem.dbn
    WHERE dem.sped_percent IS NOT NULL
    GROUP BY dir.borough, dir.dbn, dir.school_name
) r
WHERE r.rank <= 3
ORDER BY r.borough, r.rank
"""

df_sn = pd.read_sql(query, engine)
df_sn['avg_sped_percent'] = df_sn['avg_sped_percent'].round(2)
df_sn

Unnamed: 0,borough,dbn,school_name,avg_sped_percent
0,Manhattan,01M450,East Side Community School,26.29
1,Manhattan,01M292,Henry Street School for International Studies,23.01
2,Manhattan,01M509,Marta Valle High School,22.21


### Answer:

#### - The results reveal additional schools with comparably high special education enrollment rates.