# Do Kids in NYC Public Housing Have Access to High Quality Schools?

###### An Aside:
When I was told I could do my final Python project on anything, I thought, "why not do a fun lil project about [squirrels in Central Park?](https://data.cityofnewyork.us/Environment/2018-Central-Park-Squirrel-Census-Squirrel-Data/vfnx-vebw)" But my curiosity and lifelong dedication to improving public education for students of color and those from low-income households got the better of me once again... Please enjoy reading this project that ended up being [more complicated than the New York City high school admissions lottery.](https://www.the74million.org/article/adams-the-lottery-parents-fear-the-looming-horror-of-nycs-new-school-admissions-process/)


# Introduction

It is a begrudgingly understood fact that, in most places in the United States, a child's educational opportunities are constrained by his or her family's wealth. In New York City, 95% of Black and Latinx students attend schools where a [majority of the students are low-income.](https://www.theatlantic.com/education/archive/2016/02/concentration-poverty-american-schools/471414/) Many of these students live in New York City Housing Authority (NYCHA) developments that offer ["affordable housing for low- and moderate-income New Yorkers".](https://www.nyc.gov/assets/nycha/downloads/pdf/NYCHA-Fact-Sheet_2019_08-01.pdf) The project delves into the spatial relationship between the NYCHA developments and public schools in New York City.

### Questions:

1. What is the closest high quality elementary or middle school to each NYCHA development?
2. Are high quality elementary or middle schools accessible to children who live in NYCHA developments?

### Hypothesis:

Given that the NYC schools are [economically and racially segregated](https://steinhardt.nyu.edu/research-alliance/understanding-and-addressing-segregation-nyc-schools), and residents of NYCHA are overwhelmingly [people of color and low-income](https://www.nyc.gov/assets/nycha/downloads/pdf/Resident-Data-Book-Summary-2022.pdf), I hypothesize that students who live in NYCHA developments will not live near high quality elementary or middle schools, and if they do, they will not be accessible to them.

### Datasets:

I used three datasets to prove or disprove my hypothesis.

1. [2017 School Quality Report](https://data.cityofnewyork.us/Education/2017-School-Quality-Report/cxrn-zyvb)

This dataset contains the quality ratings (as of 2017) for all NYC public schools (charters included). I used only the elementary/middle school ratings because the high schools are a lottery system. All elementary schools are zoned, meaning that students go to the school assigned to them (typically in their neighborhood); some middle schools are zoned whereas some are a lottery system. There are 1269 schools in this dataset.

2. [School Locations](https://data.cityofnewyork.us/Education/School-Point-Locations/jfju-ynrr)

This dataset contains the addresses of all NYC schools.

3. [NYCHA Development Addresses](https://data.cityofnewyork.us/Housing-Development/NYCHA-Residential-Addresses/3ub5-4ph8)

This dataset contains the addresses of all the NYCHA developments in all five boroughs. There are 3793 individual NYCHA buildings.

# Data Clean Up & Initial Fiddling with the Data

### First, I need to import important packages before I can begin coding

In [1]:
### IMPORT PACKAGES
# Pandas is key
import pandas as pd
# Sometimes necessary for mapping
import plotly.express as px
# Imported because I don't want to overwhelm the API servers
import time
# Imported to do math later on
import numpy as np
# Imported just in case it was necessary
import json

# boilerplate for allowing PDF export
import plotly.io as pio

pio.renderers.default = "notebook_connected+pdf"

### Secondly, I load both the Schools Quality dataset and the School Locations dataset

When I load the School Locations dataset, I rename the column containing the school code to "dbn" so I can merge both data frames together on that column.

##### What is a DBN?
"DBN" stands for "District Borough Number." Each school has a name but also this unique identifier that never changes.

District ➤ There are 32 school districts (plus District 75 which houses schools that support students with learning and emotional disabilities)

Borough ➤ There are 5 boroughs: Brooklyn, Bronx, Manhattan, Queens, and Staten Island

Number ➤ The number assigned to the school

###### Here's an example:
_01M015_

01 = District 1

M = Manhattan

015 = Public School #15

In [2]:
# Load schools quality report dataset
quality = pd.read_csv("./2017_School_Quality_Report -elementary middle (2).csv")
quality.head(10)

Unnamed: 0,dbn,school_name,school_type,enrollment,rating_ri_disp,rating_ct_disp,rating_se_disp,rating_es_disp,rating_sf_disp,rating_tr_disp,...,val_rating_mean_mth_6gr,n_rating_mean_mth_6gr,val_rating_mean_mth_7gr,n_rating_mean_mth_7gr,val_rating_mean_mth_8gr,n_rating_mean_mth_8gr,val_rating_mean_sci_4gr,n_rating_mean_sci_4gr,val_rating_mean_sci_8gr,n_rating_mean_sci_8gr
0,01M015,P.S. 015 Roberto Clemente,Elementary,161,Exceeding Target,Exceeding Target,Exceeding Target,Exceeding Target,Meeting Target,Exceeding Target,...,,,,,,,3.46,23.0,,
1,01M019,P.S. 019 Asher Levy,Elementary,247,Exceeding Target,Exceeding Target,Meeting Target,Exceeding Target,Meeting Target,Meeting Target,...,,,,,,,3.86,41.0,,
2,01M020,P.S. 020 Anna Silver,Elementary,499,Not Meeting Target,Approaching Target,Approaching Target,Meeting Target,Approaching Target,Approaching Target,...,,,,,,,3.64,70.0,,
3,01M034,P.S. 034 Franklin D. Roosevelt,K-8,337,Approaching Target,Approaching Target,Approaching Target,Approaching Target,Approaching Target,Approaching Target,...,2.55,36.0,2.16,56.0,2.36,53.0,3.09,27.0,2.83,59.0
4,01M063,The STAR Academy - P.S.63,Elementary,178,Exceeding Target,Exceeding Target,Meeting Target,Exceeding Target,Meeting Target,Meeting Target,...,,,,,,,4.04,24.0,,
5,01M064,P.S. 064 Robert Simon,Elementary,226,Meeting Target,Exceeding Target,Meeting Target,Exceeding Target,Meeting Target,Exceeding Target,...,,,,,,,3.34,31.0,,
6,01M110,P.S. 110 Florence Nightingale,Elementary,351,Approaching Target,Meeting Target,Meeting Target,Meeting Target,Meeting Target,Approaching Target,...,,,,,,,4.07,75.0,,
7,01M134,P.S. 134 Henrietta Szold,Elementary,332,Meeting Target,Meeting Target,Meeting Target,Meeting Target,Meeting Target,Meeting Target,...,,,,,,,3.33,55.0,,
8,01M140,P.S. 140 Nathan Straus,K-8,364,Meeting Target,Meeting Target,Approaching Target,Meeting Target,Approaching Target,Meeting Target,...,2.32,55.0,2.24,61.0,2.19,47.0,2.88,25.0,2.59,37.0
9,01M142,P.S. 142 Amalia Castro,Elementary,322,Meeting Target,Meeting Target,Meeting Target,Exceeding Target,Meeting Target,Exceeding Target,...,,,,,,,3.39,41.0,,


In [3]:
# Load school addresses dataset
schooladdress = pd.read_csv("./public-schools-points-2011-2012a.csv")
schooladdress.head(10)

# Rename ATS_CODE,C,26 to dbn, so I can merge both dfs
schooladdress = schooladdress.rename(columns={"ATS_CODE,C,26" : 'dbn'})

# Preview df again
schooladdress.head(10)

Unnamed: 0,dbn,"BORO,C,20","BORONUM,N,19,0","LOC_CODE,C,13","SCHOOLNAME,C,61","SCH_TYPE,C,31","MANAGED_BY,N,14,0","GEO_DISTRI,N,10,0","ADMIN_DIST,N,12,0","ADDRESS,C,37","STATE_CODE,C,13","ZIP,N,10,0","PRINCIPAL,C,27","PRIN_PH,C,24","FAX,C,13","GRADES,C,70","City,C,50"
0,15K001,K,2,K001,P.S. 001 THE BERGEN,Elementary,1,15,15,309 47 STREET,NY,11220,Jennifer Eusanio,718-567-7661,718-567-9771,"PK,0K,01,02,03,04,05,SE",BROOKLYN
1,17K002,K,2,K002,M.S. 002,Junior High-Intermediate-Middle,1,17,17,655 PARKSIDE AVENUE,NY,11226,ADRIENNE SPENCER,718-462-6992,718-284-7717,"06,07,08,SE",BROOKLYN
2,21K095,K,2,K095,P.S. 095 THE GRAVESEND,K-8,1,21,21,345 VAN SICKLEN STREET,NY,11223,Janet Ndzibah,718-449-5050,718-449-3047,"PK,0K,01,02,03,04,05,06,07,08,SE",BROOKLYN
3,21K096,K,2,K096,I.S. 096 SETH LOW,Junior High-Intermediate-Middle,1,21,21,99 AVENUE P,NY,11204,Denise Sandra Levinsky,718-236-1344,718-236-2397,"06,07,08,SE",BROOKLYN
4,21K097,K,2,K097,P.S. 97 THE HIGHLAWN,Elementary,1,21,21,1855 STILLWELL AVENUE,NY,11223,KRISTINE MUSTILLO,718-372-7393,718-372-3842,"PK,0K,01,02,03,04,05,SE",BROOKLYN
5,21K098,K,2,K098,I.S. 98 BAY ACADEMY,Junior High-Intermediate-Middle,1,21,21,1401 EMMONS AVENUE,NY,11235,MARIA TIMO,718-891-9005,718-891-3865,"06,07,08,SE",BROOKLYN
6,21K099,K,2,K099,P.S. 099 ISAAC ASIMOV,K-8,1,21,21,1120 EAST 10 STREET,NY,11230,GREGORY PIRRAGLIA,718-338-9201,718-951-0418,"PK,0K,01,02,03,04,05,06,07,08,SE",BROOKLYN
7,21K100,K,2,K100,P.S. 100 THE CONEY ISLAND SCHOOL,Elementary,1,21,21,2951 WEST 3 STREET,NY,11224,Katherine A. Moloney,718-266-9477,718-266-7112,"PK,0K,01,02,03,04,05,SE",BROOKLYN
8,21K101,K,2,K101,P.S. 101 THE VERRAZANO,Elementary,1,21,21,2360 BENSON AVENUE,NY,11214,GREGG KORROL,718-372-0221,718-372-1873,"PK,0K,01,02,03,04,05,SE",BROOKLYN
9,20K102,K,2,K102,P.S. 102 THE BAYVIEW,Elementary,1,20,20,211 72 STREET,NY,11209,Ms. Theresa Dovi,718-748-7404,718-836-9265,"0K,01,02,03,04,05,SE",BROOKLYN


#### And now we merge!

1. Keep only the columns that I want
2. Make all the column headers lowercase so the data are easier to work with
3. Drop the schools when the rating columns are blanks

##### What do the rating columns mean?
A school is rated in quality on seven factors:

1. Rigorous Instruction
2. Collaborative Teachers
3. Supportive Environment
4. Effective School Leadership
5. Strong Family-Community Ties
6. Trust 
7. Student Achievement

Schools can receive a rating ranging from:

1. Not Meeting Target
2. Approaching Target
3. Meeting Target
4. Exceeding Target


In [4]:
# Merge both datasets on "dbn" column
schools = pd.merge(schooladdress, quality, on="dbn")
schools.head(10)

# Keep the columns I want
schools = schools.loc[:, ["dbn", "SCHOOLNAME,C,61", "SCH_TYPE,C,31", "ADDRESS,C,37", "STATE_CODE,C,13", "ZIP,N,10,0", "school_name", "school_type", "enrollment", "rating_ri_disp", "rating_ct_disp", "rating_se_disp", "rating_es_disp", "rating_sf_disp", "rating_tr_disp", "rating_sa_disp"]]
schools #1125 rows

# Drop schools when the rating columns are blank
schools = schools.dropna(subset=["rating_ri_disp", "rating_ct_disp", "rating_se_disp", "rating_es_disp", "rating_sf_disp", "rating_tr_disp", "rating_sa_disp"], how='all')
schools #1113 rows

# Make all column headers lowercase
schools.columns = schools.columns.str.lower()
schools

Unnamed: 0,dbn,"schoolname,c,61","sch_type,c,31","address,c,37","state_code,c,13","zip,n,10,0",school_name,school_type,enrollment,rating_ri_disp,rating_ct_disp,rating_se_disp,rating_es_disp,rating_sf_disp,rating_tr_disp,rating_sa_disp
0,15K001,P.S. 001 THE BERGEN,Elementary,309 47 STREET,NY,11220,P.S. 001 The Bergen,Elementary,1207,Approaching Target,Approaching Target,Meeting Target,Approaching Target,Meeting Target,Approaching Target,Meeting Target
1,17K002,M.S. 002,Junior High-Intermediate-Middle,655 PARKSIDE AVENUE,NY,11226,Parkside Preparatory Academy,Middle,495,Meeting Target,Meeting Target,Exceeding Target,Exceeding Target,Meeting Target,Meeting Target,Meeting Target
2,21K095,P.S. 095 THE GRAVESEND,K-8,345 VAN SICKLEN STREET,NY,11223,P.S. 095 The Gravesend,K-8,905,Exceeding Target,Exceeding Target,Exceeding Target,Exceeding Target,Exceeding Target,Exceeding Target,Meeting Target
3,21K096,I.S. 096 SETH LOW,Junior High-Intermediate-Middle,99 AVENUE P,NY,11204,I.S. 096 Seth Low,Middle,723,Meeting Target,Exceeding Target,Meeting Target,Exceeding Target,Approaching Target,Meeting Target,Meeting Target
4,21K097,P.S. 97 THE HIGHLAWN,Elementary,1855 STILLWELL AVENUE,NY,11223,P.S. 97 The Highlawn,Elementary,790,Meeting Target,Meeting Target,Meeting Target,Exceeding Target,Meeting Target,Exceeding Target,Meeting Target
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1120,84M518,East Harlem Scholars Academy Charter School,Elementary,1573 MADISON AVENUE,NY,10029,East Harlem Scholars Academy Charter School,Elementary,406,Meeting Target,Meeting Target,Meeting Target,Meeting Target,Meeting Target,Meeting Target,Meeting Target
1121,84M523,Upper West Success Academy Charter School,Elementary,145 WEST 84 STREET,NY,10024,Success Academy Charter School - Upper West,Elementary,649,,,,,,,Exceeding Target
1122,84Q359,Academy of the City Charter School,Elementary,36-14 12TH STREET,NY,11106,Academy of the City Charter School,Elementary,413,Exceeding Target,Exceeding Target,Exceeding Target,Exceeding Target,Exceeding Target,Exceeding Target,Exceeding Target
1123,84X538,Icahn Charter School 5,Elementary,1500 PELHAM PARKWAY SOUTH,NY,10461,Icahn Charter School 5,K-8,285,Exceeding Target,Meeting Target,Meeting Target,Meeting Target,Approaching Target,Exceeding Target,


#### Now, I need to change the ratings from a string variable into a float variable (aka into numbers)

When a rating is:
<br>
- Blank, the cell equals 0
<br>
- Not Meeting Target, the cell equals 1
<br>
- Approaching Target, the cell equals 2
<br>
- Meeting Target, the cell equals 3
<br>
- Exceeding Target, the cell equals 4 

Then, I add up all seven factors for a school and then divide by 7.

That average is now stored in a new column called "average_rating."


In [5]:
# Turn ratings into ordinal numerical variables

# Replace blanks with a 0
schools.fillna(0, inplace=True)
schools

mapping = {"0": 0, "Not Meeting Target": 1, "Approaching Target": 2, "Meeting Target": 3, "Exceeding Target": 4}
schools["instruction"] = schools['rating_ri_disp'].map(mapping)
schools["teachers"] = schools["rating_ct_disp"].map(mapping)
schools["environment"] = schools["rating_se_disp"].map(mapping)
schools["leadership"] = schools["rating_es_disp"].map(mapping)
schools["familycommunity"] = schools['rating_sf_disp'].map(mapping)
schools["trust"] = schools["rating_tr_disp"].map(mapping)
schools["studentachievement"] = schools['rating_sa_disp'].map(mapping)

# Create column averaging the ratings
sum_cols = ["instruction", "teachers", "environment", "leadership", "familycommunity", "trust", "studentachievement"]
result = schools[sum_cols].sum(axis=1) / 7
# Add column of result
schools["average_rating"] = round(result, 2)
schools

Unnamed: 0,dbn,"schoolname,c,61","sch_type,c,31","address,c,37","state_code,c,13","zip,n,10,0",school_name,school_type,enrollment,rating_ri_disp,...,rating_tr_disp,rating_sa_disp,instruction,teachers,environment,leadership,familycommunity,trust,studentachievement,average_rating
0,15K001,P.S. 001 THE BERGEN,Elementary,309 47 STREET,NY,11220,P.S. 001 The Bergen,Elementary,1207,Approaching Target,...,Approaching Target,Meeting Target,2.0,2.0,3.0,2.0,3.0,2.0,3.0,2.43
1,17K002,M.S. 002,Junior High-Intermediate-Middle,655 PARKSIDE AVENUE,NY,11226,Parkside Preparatory Academy,Middle,495,Meeting Target,...,Meeting Target,Meeting Target,3.0,3.0,4.0,4.0,3.0,3.0,3.0,3.29
2,21K095,P.S. 095 THE GRAVESEND,K-8,345 VAN SICKLEN STREET,NY,11223,P.S. 095 The Gravesend,K-8,905,Exceeding Target,...,Exceeding Target,Meeting Target,4.0,4.0,4.0,4.0,4.0,4.0,3.0,3.86
3,21K096,I.S. 096 SETH LOW,Junior High-Intermediate-Middle,99 AVENUE P,NY,11204,I.S. 096 Seth Low,Middle,723,Meeting Target,...,Meeting Target,Meeting Target,3.0,4.0,3.0,4.0,2.0,3.0,3.0,3.14
4,21K097,P.S. 97 THE HIGHLAWN,Elementary,1855 STILLWELL AVENUE,NY,11223,P.S. 97 The Highlawn,Elementary,790,Meeting Target,...,Exceeding Target,Meeting Target,3.0,3.0,3.0,4.0,3.0,4.0,3.0,3.29
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1120,84M518,East Harlem Scholars Academy Charter School,Elementary,1573 MADISON AVENUE,NY,10029,East Harlem Scholars Academy Charter School,Elementary,406,Meeting Target,...,Meeting Target,Meeting Target,3.0,3.0,3.0,3.0,3.0,3.0,3.0,3.00
1121,84M523,Upper West Success Academy Charter School,Elementary,145 WEST 84 STREET,NY,10024,Success Academy Charter School - Upper West,Elementary,649,0,...,0,Exceeding Target,,,,,,,4.0,0.57
1122,84Q359,Academy of the City Charter School,Elementary,36-14 12TH STREET,NY,11106,Academy of the City Charter School,Elementary,413,Exceeding Target,...,Exceeding Target,Exceeding Target,4.0,4.0,4.0,4.0,4.0,4.0,4.0,4.00
1123,84X538,Icahn Charter School 5,Elementary,1500 PELHAM PARKWAY SOUTH,NY,10461,Icahn Charter School 5,K-8,285,Exceeding Target,...,Exceeding Target,0,4.0,3.0,3.0,3.0,2.0,4.0,,2.71


#### Additional clean up on the schools data frame

First, I concatenate (aka merge) the school address columns to have the info all in one column

Then, I get rid of any extra spaces (this is important for when I calculate distances between locations later

In [6]:
# Concatonate address, state, and zip code for full address
schools["full_school_address"] = schools["address,c,37"] + ' New York ' + schools["state_code,c,13"] + ' ' + schools["zip,n,10,0"].apply(str)
schools

# Create function to get rid of extra spaces
def extra_space(address):
    return " ".join(address.split())

# Run function
schools["full_school_address"] = schools["full_school_address"].apply(extra_space)
schools

Unnamed: 0,dbn,"schoolname,c,61","sch_type,c,31","address,c,37","state_code,c,13","zip,n,10,0",school_name,school_type,enrollment,rating_ri_disp,...,rating_sa_disp,instruction,teachers,environment,leadership,familycommunity,trust,studentachievement,average_rating,full_school_address
0,15K001,P.S. 001 THE BERGEN,Elementary,309 47 STREET,NY,11220,P.S. 001 The Bergen,Elementary,1207,Approaching Target,...,Meeting Target,2.0,2.0,3.0,2.0,3.0,2.0,3.0,2.43,309 47 STREET New York NY 11220
1,17K002,M.S. 002,Junior High-Intermediate-Middle,655 PARKSIDE AVENUE,NY,11226,Parkside Preparatory Academy,Middle,495,Meeting Target,...,Meeting Target,3.0,3.0,4.0,4.0,3.0,3.0,3.0,3.29,655 PARKSIDE AVENUE New York NY 11226
2,21K095,P.S. 095 THE GRAVESEND,K-8,345 VAN SICKLEN STREET,NY,11223,P.S. 095 The Gravesend,K-8,905,Exceeding Target,...,Meeting Target,4.0,4.0,4.0,4.0,4.0,4.0,3.0,3.86,345 VAN SICKLEN STREET New York NY 11223
3,21K096,I.S. 096 SETH LOW,Junior High-Intermediate-Middle,99 AVENUE P,NY,11204,I.S. 096 Seth Low,Middle,723,Meeting Target,...,Meeting Target,3.0,4.0,3.0,4.0,2.0,3.0,3.0,3.14,99 AVENUE P New York NY 11204
4,21K097,P.S. 97 THE HIGHLAWN,Elementary,1855 STILLWELL AVENUE,NY,11223,P.S. 97 The Highlawn,Elementary,790,Meeting Target,...,Meeting Target,3.0,3.0,3.0,4.0,3.0,4.0,3.0,3.29,1855 STILLWELL AVENUE New York NY 11223
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1120,84M518,East Harlem Scholars Academy Charter School,Elementary,1573 MADISON AVENUE,NY,10029,East Harlem Scholars Academy Charter School,Elementary,406,Meeting Target,...,Meeting Target,3.0,3.0,3.0,3.0,3.0,3.0,3.0,3.00,1573 MADISON AVENUE New York NY 10029
1121,84M523,Upper West Success Academy Charter School,Elementary,145 WEST 84 STREET,NY,10024,Success Academy Charter School - Upper West,Elementary,649,0,...,Exceeding Target,,,,,,,4.0,0.57,145 WEST 84 STREET New York NY 10024
1122,84Q359,Academy of the City Charter School,Elementary,36-14 12TH STREET,NY,11106,Academy of the City Charter School,Elementary,413,Exceeding Target,...,Exceeding Target,4.0,4.0,4.0,4.0,4.0,4.0,4.0,4.00,36-14 12TH STREET New York NY 11106
1123,84X538,Icahn Charter School 5,Elementary,1500 PELHAM PARKWAY SOUTH,NY,10461,Icahn Charter School 5,K-8,285,Exceeding Target,...,0,4.0,3.0,3.0,3.0,2.0,4.0,,2.71,1500 PELHAM PARKWAY SOUTH New York NY 10461


#### Now, I need to filter the schools data frame to only include "high quality" schools with an average rating of 3.5 or higher

This is an arbitrary number that I chose to signify "high quality." It is based in part that a "Meeting Target" rating is equal to a 3; so a high quality school should have a better rating than simply "meeting target."

Of the 1,113 schools I began with, only 213 are considered "high quality."

In [7]:
# Filter schools to only those that are "high quality" -- average rating of 3.5 or higher

# Make new df of only high quality schools
# keep rows where value of "average rating" column is 3.5 or higher
# .copy() to actually make the new df
hq_schools = schools[schools["average_rating"]>= 3.5].copy()
hq_schools #213 out of 1113 schools


Unnamed: 0,dbn,"schoolname,c,61","sch_type,c,31","address,c,37","state_code,c,13","zip,n,10,0",school_name,school_type,enrollment,rating_ri_disp,...,rating_sa_disp,instruction,teachers,environment,leadership,familycommunity,trust,studentachievement,average_rating,full_school_address
2,21K095,P.S. 095 THE GRAVESEND,K-8,345 VAN SICKLEN STREET,NY,11223,P.S. 095 The Gravesend,K-8,905,Exceeding Target,...,Meeting Target,4.0,4.0,4.0,4.0,4.0,4.0,3.0,3.86,345 VAN SICKLEN STREET New York NY 11223
5,21K098,I.S. 98 BAY ACADEMY,Junior High-Intermediate-Middle,1401 EMMONS AVENUE,NY,11235,I.S. 98 Bay Academy,Middle,1513,Exceeding Target,...,Exceeding Target,4.0,4.0,3.0,4.0,2.0,4.0,4.0,3.57,1401 EMMONS AVENUE New York NY 11235
6,21K099,P.S. 099 ISAAC ASIMOV,K-8,1120 EAST 10 STREET,NY,11230,P.S. 099 Isaac Asimov,K-8,843,Exceeding Target,...,Meeting Target,4.0,4.0,3.0,4.0,3.0,4.0,3.0,3.57,1120 EAST 10 STREET New York NY 11230
8,21K101,P.S. 101 THE VERRAZANO,Elementary,2360 BENSON AVENUE,NY,11214,P.S. 101 The Verrazano,Elementary,885,Exceeding Target,...,Exceeding Target,4.0,4.0,4.0,4.0,3.0,4.0,4.0,3.86,2360 BENSON AVENUE New York NY 11214
9,20K102,P.S. 102 THE BAYVIEW,Elementary,211 72 STREET,NY,11209,P.S. 102 The Bayview,Elementary,1428,Exceeding Target,...,Meeting Target,4.0,4.0,4.0,3.0,3.0,4.0,3.0,3.57,211 72 STREET New York NY 11209
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1082,10X228,JONAS BRONCK ACADEMY,Junior High-Intermediate-Middle,400 EAST FORDHAM ROAD,NY,10458,Jonas Bronck Academy,Middle,266,Exceeding Target,...,Meeting Target,4.0,4.0,4.0,4.0,3.0,3.0,3.0,3.57,400 EAST FORDHAM ROAD New York NY 10458
1088,10X056,P.S. 056 NORWOOD HEIGHTS,Elementary,341 EAST 207 STREET,NY,10467,P.S. 056 Norwood Heights,Elementary,699,Exceeding Target,...,Meeting Target,4.0,4.0,4.0,4.0,3.0,4.0,3.0,3.71,341 EAST 207 STREET New York NY 10467
1091,06M366,WASHINGTON HEIGHTS ACADEMY,Elementary,202 SHERMAN AVE,NY,10034,Washington Heights Academy,K-8,510,Exceeding Target,...,Exceeding Target,4.0,4.0,3.0,3.0,3.0,4.0,4.0,3.57,202 SHERMAN AVE New York NY 10034
1106,11X529,One World Middle School at Edenwald,Junior High-Intermediate-Middle,3750 BAYCHESTER AVENUE,NY,10466,One World Middle School at Edenwald,Middle,343,Exceeding Target,...,Meeting Target,4.0,4.0,3.0,4.0,3.0,4.0,3.0,3.57,3750 BAYCHESTER AVENUE New York NY 10466


#### Didn't your Python professor tell you 80% of coding is data cleaning?

In my analysis later on, I realized that the API could not read many of the school addresses because it would recognize a numerical street name (like 10th Ave) as an address number. So, I added suffixes to the street names. As you will see later, there are still errors reading the street addresses, but there are less.

In [8]:
# Create function to add suffixes to school streets
def append_st_suffix(address):
    # Split address string by each word, delimter = space
    split_addy = address.split(" ")
    # Makes a list of each word. Check for only the numbers. if number, then true
    digits = [word.isdigit() for word in split_addy]
    # Make sure there is a street that is a number, then do all below; but don't do the last number (zip)
    if True in digits[1:-1]:
        # Start by looking at the 2nd (1 in python counting) number to find the street number; but don't do the last number (zip)
        st_idx = digits[1:-1].index(True)
        # Define dictionary of suffixes
        suffixes = {"1": "st", "2": "nd", "3": "rd", "4": "th", "5": "th",
                    "6": "th", "7": "th", "8": "th", "9": "th", "0": "th",}
        # Look at the last number [-1] of the street [split_addy] to give the new suffix [suffix]
        if split_addy[1+st_idx] in ("11", "12", "13"):
            suffix = 'th'
        else:
            suffix = suffixes[split_addy[1+st_idx][-1]]
        # Now change from normal street number and add on the new suffix
        split_addy[1+st_idx] = split_addy[1+st_idx] + suffix
        # Puts full address back together 
    re_join = " ".join(split_addy)
    return re_join

# Run function
hq_schools["clean_school_address"] = hq_schools["full_school_address"].apply(append_st_suffix)
hq_schools

Unnamed: 0,dbn,"schoolname,c,61","sch_type,c,31","address,c,37","state_code,c,13","zip,n,10,0",school_name,school_type,enrollment,rating_ri_disp,...,instruction,teachers,environment,leadership,familycommunity,trust,studentachievement,average_rating,full_school_address,clean_school_address
2,21K095,P.S. 095 THE GRAVESEND,K-8,345 VAN SICKLEN STREET,NY,11223,P.S. 095 The Gravesend,K-8,905,Exceeding Target,...,4.0,4.0,4.0,4.0,4.0,4.0,3.0,3.86,345 VAN SICKLEN STREET New York NY 11223,345 VAN SICKLEN STREET New York NY 11223
5,21K098,I.S. 98 BAY ACADEMY,Junior High-Intermediate-Middle,1401 EMMONS AVENUE,NY,11235,I.S. 98 Bay Academy,Middle,1513,Exceeding Target,...,4.0,4.0,3.0,4.0,2.0,4.0,4.0,3.57,1401 EMMONS AVENUE New York NY 11235,1401 EMMONS AVENUE New York NY 11235
6,21K099,P.S. 099 ISAAC ASIMOV,K-8,1120 EAST 10 STREET,NY,11230,P.S. 099 Isaac Asimov,K-8,843,Exceeding Target,...,4.0,4.0,3.0,4.0,3.0,4.0,3.0,3.57,1120 EAST 10 STREET New York NY 11230,1120 EAST 10th STREET New York NY 11230
8,21K101,P.S. 101 THE VERRAZANO,Elementary,2360 BENSON AVENUE,NY,11214,P.S. 101 The Verrazano,Elementary,885,Exceeding Target,...,4.0,4.0,4.0,4.0,3.0,4.0,4.0,3.86,2360 BENSON AVENUE New York NY 11214,2360 BENSON AVENUE New York NY 11214
9,20K102,P.S. 102 THE BAYVIEW,Elementary,211 72 STREET,NY,11209,P.S. 102 The Bayview,Elementary,1428,Exceeding Target,...,4.0,4.0,4.0,3.0,3.0,4.0,3.0,3.57,211 72 STREET New York NY 11209,211 72nd STREET New York NY 11209
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1082,10X228,JONAS BRONCK ACADEMY,Junior High-Intermediate-Middle,400 EAST FORDHAM ROAD,NY,10458,Jonas Bronck Academy,Middle,266,Exceeding Target,...,4.0,4.0,4.0,4.0,3.0,3.0,3.0,3.57,400 EAST FORDHAM ROAD New York NY 10458,400 EAST FORDHAM ROAD New York NY 10458
1088,10X056,P.S. 056 NORWOOD HEIGHTS,Elementary,341 EAST 207 STREET,NY,10467,P.S. 056 Norwood Heights,Elementary,699,Exceeding Target,...,4.0,4.0,4.0,4.0,3.0,4.0,3.0,3.71,341 EAST 207 STREET New York NY 10467,341 EAST 207th STREET New York NY 10467
1091,06M366,WASHINGTON HEIGHTS ACADEMY,Elementary,202 SHERMAN AVE,NY,10034,Washington Heights Academy,K-8,510,Exceeding Target,...,4.0,4.0,3.0,3.0,3.0,4.0,4.0,3.57,202 SHERMAN AVE New York NY 10034,202 SHERMAN AVE New York NY 10034
1106,11X529,One World Middle School at Edenwald,Junior High-Intermediate-Middle,3750 BAYCHESTER AVENUE,NY,10466,One World Middle School at Edenwald,Middle,343,Exceeding Target,...,4.0,4.0,3.0,4.0,3.0,4.0,3.0,3.57,3750 BAYCHESTER AVENUE New York NY 10466,3750 BAYCHESTER AVENUE New York NY 10466


### Thirdly, I need to load the NYCHA addresses dataset

I initially began with 3,793 NYCHA buildings.

But many of these buildings are clustered together in developments (of the same name). Therefore, these buildings are all a similar distance from schools. So, I removed the duplicate developments. I am now down to 285 NYCHA developments.

Next, I make all column headers lowercase (to make my life easier).

Finally, I only keep the columns I want or that I may need in my analysis.

In [9]:
### Upload the NYCHA addresses dataset

nycha = pd.read_csv("./NYCHA_Residential_Addresses (1).csv")
nycha #3793 rows

# Remove duplicate developments
nycha = nycha.drop_duplicates(subset=["DEVELOPMENT"]) 
nycha #285 rows

# Make all column headers lowercase
nycha.columns = nycha.columns.str.lower()
nycha

# Keep only the columns I need/want
nycha = nycha.loc[:, ["development", "borough", "address", "city", "state", "zip code", "census tract (2010)", "latitude", "longitude"]]
nycha 

Unnamed: 0,development,borough,address,city,state,zip code,census tract (2010),latitude,longitude
0,1010 EAST 178TH STREET,BRONX,1010 EAST 178TH STREET,BRONX,NY,10460,361,40.841169,-73.880247
1,104-14 TAPSCOTT STREET,BROOKLYN,104 TAPSCOTT STREET,BROOKLYN,NY,11212,900,40.665125,-73.920422
2,1162-1176 WASHINGTON AVENUE,BRONX,1162 WASHINGTON AVENUE,BRONX,NY,10456,145,40.829928,-73.907535
3,131 SAINT NICHOLAS AVENUE,MANHATTAN,131 SAINT NICHOLAS AVENUE,NEW YORK,NY,10026,218,40.804045,-73.952962
4,1471 WATSON AVENUE,BRONX,1471 WATSON AVENUE,BRONX,NY,10472,52,40.825850,-73.880943
...,...,...,...,...,...,...,...,...,...
3751,WSUR (BROWNSTONES),MANHATTAN,125 WEST 93RD STREET,NEW YORK,NY,10025,177,40.791410,-73.969849
3787,WSUR (SITE A) 120 WEST 94TH STREET,MANHATTAN,120 WEST 94TH STREET,NEW YORK,NY,10025,177,40.791934,-73.969689
3788,WSUR (SITE B) 74 WEST 92ND STREET,MANHATTAN,74 WEST 92ND STREET,NEW YORK,NY,10025,177,40.790052,-73.969098
3789,WSUR (SITE C) 589 AMSTERDAM AVENUE,MANHATTAN,589 AMSTERDAM AVENUE,NEW YORK,NY,10024,173,40.789235,-73.973633


# Analysis of the data

### In order to calculate how far away schools are from each NYCHA development, I first need to convert the school addresses to longitude and latitude (because that's how the API works)

I used Nominatim API to convert addresses to longitude and latitude coordinates. To make sure this was working, I printed my results. I also wanted to know when an addresses didn't convert, so that is why there is the word "error" interspersed among school addresses. I tried many ways of cleaning the dataframe (as seen above in the data cleaning section), but ultimately I wanted to keep the hair on my head, so I used only the addresses that did convert.

Out of the 213 high quality schools, I now had the longitude and latitude of 146 of them.

In [10]:
### INSTALL Nominatim API to convert school addresses to longitude and latitude

In [11]:
# Install geopy package
# pip install geopy
# Install package to slow down requests to Nominatim to not time out the server
import time

from geopy.geocoders import Nominatim
geolocator = Nominatim(user_agent="Python final project")

# Create function to get latitude and longitude of school addresses
def get_coordinates(address):
    print(address)
    try:
        school_location = geolocator.geocode(address)
        latitude = school_location.latitude
        longitude = school_location.longitude
        # Slow down requests
        time.sleep(1)
        return latitude, longitude
    except:
        print("error")
        return "na", "na"

# Run function on schools df
hq_schools["latitude"], hq_schools["longitude"] = zip(*hq_schools["clean_school_address"].apply(get_coordinates))
hq_schools

345 VAN SICKLEN STREET New York NY 11223
1401 EMMONS AVENUE New York NY 11235
1120 EAST 10th STREET New York NY 11230
error
2360 BENSON AVENUE New York NY 11214
211 72nd STREET New York NY 11209
1301 8th AVENUE New York NY 11215
error
200 LINWOOD STREET New York NY 11208
7115 15th AVENUE New York NY 11228
error
18 BEAVER STREET New York NY 11206
error
5301 20th AVENUE New York NY 11204
error
7805 7th AVENUE New York NY 11228
error
70 OCEAN PARKWAY New York NY 11218
4305 FT HAMILTON PARKWAY New York NY 11219
4001 18th AVENUE New York NY 11218
error
760 PROSPECT PLACE New York NY 11216
325 BUSHWICK AVENUE New York NY 11206
1625 11th AVENUE New York NY 11215
error
4211 14th AVENUE New York NY 11219
error
7109 6th AVENUE New York NY 11209
error
825 4th AVENUE New York NY 11232
error
1225 69th STREET New York NY 11219
1171 65th STREET New York NY 11219
8010 12th AVENUE New York NY 11228
error
8101 15th AVENUE New York NY 11228
error
1 ALBEMARLE ROAD New York NY 11218
6302 AVENUE U New York 

341 EAST 207th STREET New York NY 10467
202 SHERMAN AVE New York NY 10034
error
3750 BAYCHESTER AVENUE New York NY 10466
36-14 12TH STREET New York NY 11106
error


Unnamed: 0,dbn,"schoolname,c,61","sch_type,c,31","address,c,37","state_code,c,13","zip,n,10,0",school_name,school_type,enrollment,rating_ri_disp,...,environment,leadership,familycommunity,trust,studentachievement,average_rating,full_school_address,clean_school_address,latitude,longitude
2,21K095,P.S. 095 THE GRAVESEND,K-8,345 VAN SICKLEN STREET,NY,11223,P.S. 095 The Gravesend,K-8,905,Exceeding Target,...,4.0,4.0,4.0,4.0,3.0,3.86,345 VAN SICKLEN STREET New York NY 11223,345 VAN SICKLEN STREET New York NY 11223,40.595959,-73.974874
5,21K098,I.S. 98 BAY ACADEMY,Junior High-Intermediate-Middle,1401 EMMONS AVENUE,NY,11235,I.S. 98 Bay Academy,Middle,1513,Exceeding Target,...,3.0,4.0,2.0,4.0,4.0,3.57,1401 EMMONS AVENUE New York NY 11235,1401 EMMONS AVENUE New York NY 11235,40.583773,-73.95422
6,21K099,P.S. 099 ISAAC ASIMOV,K-8,1120 EAST 10 STREET,NY,11230,P.S. 099 Isaac Asimov,K-8,843,Exceeding Target,...,3.0,4.0,3.0,4.0,3.0,3.57,1120 EAST 10 STREET New York NY 11230,1120 EAST 10th STREET New York NY 11230,na,na
8,21K101,P.S. 101 THE VERRAZANO,Elementary,2360 BENSON AVENUE,NY,11214,P.S. 101 The Verrazano,Elementary,885,Exceeding Target,...,4.0,4.0,3.0,4.0,4.0,3.86,2360 BENSON AVENUE New York NY 11214,2360 BENSON AVENUE New York NY 11214,40.597373,-73.991937
9,20K102,P.S. 102 THE BAYVIEW,Elementary,211 72 STREET,NY,11209,P.S. 102 The Bayview,Elementary,1428,Exceeding Target,...,4.0,3.0,3.0,4.0,3.0,3.57,211 72 STREET New York NY 11209,211 72nd STREET New York NY 11209,40.634334,-74.029316
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1082,10X228,JONAS BRONCK ACADEMY,Junior High-Intermediate-Middle,400 EAST FORDHAM ROAD,NY,10458,Jonas Bronck Academy,Middle,266,Exceeding Target,...,4.0,4.0,3.0,3.0,3.0,3.57,400 EAST FORDHAM ROAD New York NY 10458,400 EAST FORDHAM ROAD New York NY 10458,40.860988,-73.89155
1088,10X056,P.S. 056 NORWOOD HEIGHTS,Elementary,341 EAST 207 STREET,NY,10467,P.S. 056 Norwood Heights,Elementary,699,Exceeding Target,...,4.0,4.0,3.0,4.0,3.0,3.71,341 EAST 207 STREET New York NY 10467,341 EAST 207th STREET New York NY 10467,40.875224,-73.875176
1091,06M366,WASHINGTON HEIGHTS ACADEMY,Elementary,202 SHERMAN AVE,NY,10034,Washington Heights Academy,K-8,510,Exceeding Target,...,3.0,3.0,3.0,4.0,4.0,3.57,202 SHERMAN AVE New York NY 10034,202 SHERMAN AVE New York NY 10034,na,na
1106,11X529,One World Middle School at Edenwald,Junior High-Intermediate-Middle,3750 BAYCHESTER AVENUE,NY,10466,One World Middle School at Edenwald,Middle,343,Exceeding Target,...,3.0,4.0,3.0,4.0,3.0,3.57,3750 BAYCHESTER AVENUE New York NY 10466,3750 BAYCHESTER AVENUE New York NY 10466,40.886335,-73.840519


In [12]:
# Filter out NA results in latitude and longitude columns
valid_hq_schools = hq_schools[hq_schools["latitude"] != "na"].copy()
valid_hq_schools #146 schools (out of the original 213)

Unnamed: 0,dbn,"schoolname,c,61","sch_type,c,31","address,c,37","state_code,c,13","zip,n,10,0",school_name,school_type,enrollment,rating_ri_disp,...,environment,leadership,familycommunity,trust,studentachievement,average_rating,full_school_address,clean_school_address,latitude,longitude
2,21K095,P.S. 095 THE GRAVESEND,K-8,345 VAN SICKLEN STREET,NY,11223,P.S. 095 The Gravesend,K-8,905,Exceeding Target,...,4.0,4.0,4.0,4.0,3.0,3.86,345 VAN SICKLEN STREET New York NY 11223,345 VAN SICKLEN STREET New York NY 11223,40.595959,-73.974874
5,21K098,I.S. 98 BAY ACADEMY,Junior High-Intermediate-Middle,1401 EMMONS AVENUE,NY,11235,I.S. 98 Bay Academy,Middle,1513,Exceeding Target,...,3.0,4.0,2.0,4.0,4.0,3.57,1401 EMMONS AVENUE New York NY 11235,1401 EMMONS AVENUE New York NY 11235,40.583773,-73.95422
8,21K101,P.S. 101 THE VERRAZANO,Elementary,2360 BENSON AVENUE,NY,11214,P.S. 101 The Verrazano,Elementary,885,Exceeding Target,...,4.0,4.0,3.0,4.0,4.0,3.86,2360 BENSON AVENUE New York NY 11214,2360 BENSON AVENUE New York NY 11214,40.597373,-73.991937
9,20K102,P.S. 102 THE BAYVIEW,Elementary,211 72 STREET,NY,11209,P.S. 102 The Bayview,Elementary,1428,Exceeding Target,...,4.0,3.0,3.0,4.0,3.0,3.57,211 72 STREET New York NY 11209,211 72nd STREET New York NY 11209,40.634334,-74.029316
14,19K108,P.S. 108 SAL ABBRACCIAMENTO,Elementary,200 LINWOOD STREET,NY,11208,P.S. 108 Sal Abbracciamento,Elementary,890,Exceeding Target,...,4.0,4.0,3.0,3.0,4.0,3.71,200 LINWOOD STREET New York NY 11208,200 LINWOOD STREET New York NY 11208,40.681277,-73.88431
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1078,06M346,COMMUNITY HEALTH ACADEMY OF THE HEIGHTS,Secondary School,512 W 182ND ST,NY,10033,Community Health Academy of the Heights,Middle,278,Meeting Target,...,3.0,4.0,4.0,4.0,3.0,3.57,512 W 182ND ST New York NY 10033,512 W 182ND ST New York NY 10033,40.848995,-73.931459
1079,03M243,M.S. 243 CENTER SCHOOL,Junior High-Intermediate-Middle,100 WEST 84 STREET,NY,10024,M.S. 243 Center School,Middle,234,Meeting Target,...,4.0,4.0,3.0,4.0,4.0,3.57,100 WEST 84 STREET New York NY 10024,100 WEST 84th STREET New York NY 10024,40.785102,-73.973863
1082,10X228,JONAS BRONCK ACADEMY,Junior High-Intermediate-Middle,400 EAST FORDHAM ROAD,NY,10458,Jonas Bronck Academy,Middle,266,Exceeding Target,...,4.0,4.0,3.0,3.0,3.0,3.57,400 EAST FORDHAM ROAD New York NY 10458,400 EAST FORDHAM ROAD New York NY 10458,40.860988,-73.89155
1088,10X056,P.S. 056 NORWOOD HEIGHTS,Elementary,341 EAST 207 STREET,NY,10467,P.S. 056 Norwood Heights,Elementary,699,Exceeding Target,...,4.0,4.0,3.0,4.0,3.0,3.71,341 EAST 207 STREET New York NY 10467,341 EAST 207th STREET New York NY 10467,40.875224,-73.875176


### Next, I needed to find the closest high quality school to each NYCHA development.

This is what I meant by overly complicated.

In [13]:
### Calculate difference between the school locations and NYCHA in miles

# Import math package because of complicated formulas
from math import *
def distance(location1, location2):
    # Convert coordinates to radians
    lat1, lon1, lat2, lon2 = map(radians, [location1[1], location1[0], location2[1], location2[0]])

    # Calculate differences in latitude and longitude
    dlat = lat2 - lat1
    dlon = lon2 - lon1

    # Calculate distance using Haversine formula
    a = sin(dlat/2)**2 + cos(lat1) * cos(lat2) * sin(dlon/2)**2
    c = 2 * atan2(sqrt(a), sqrt(1-a))
    distance = 6371 * c * 0.621371 # Convert from km to miles

    return distance

# Create function to calculate the nearest school for each NYCHA development
def nycha_school(coords):
    long_nycha, lat_nycha = coords
    # Make copy of schools df
    temp_schools = valid_hq_schools.copy()
    # Store distance between nycha and school in a list
    distances = []
    for xx in range(len(temp_schools)):
        long_school = temp_schools.iloc[xx]['longitude']
        lat_school = temp_schools.iloc[xx]['latitude']
    # Looking at all rows in my temp df, loop and find long and lat
        distances.append(distance((long_nycha, lat_nycha),(long_school, lat_school)))
    # Create new distance column
    temp_schools['distance'] = distances
    # Sort by shortest distance
    temp_schools.sort_values(['distance'], ascending=True, inplace=True)
    return temp_schools.head(1)['dbn'].values[0], temp_schools.head(1)['distance'].values[0]

#### Note: before I decided to use Nominatim, I wanted to use the OpenRouteService API to calculate the distance between NYCHA developments and schools

But, due to usage restrictions, OpenRouteService was not a viable option. I discuss in my Limitations section at the end of this project why OpenRouteService was my preferred API. Please enjoy more complicated code.

In [14]:
# Import test run for calculating distances between locations
#import json

# ## Connect with API client once
# client = openrouteservice.Client(key=' ')
# #set location coordinates in longitude,latitude order
# coords = ((80.21787585263182,6.025423265401452),(80.23929481745174,6.019639381180123))
# #call API
# res = client.directions(coords)
# #test our response
# with(open('test.json','+w')) as f:
#  f.write(json.dumps(res,indent=4, sort_keys=True))

In [15]:
## Instead of getting distance between two locations in a separate file, instead we set up the steps to create a function to find the distance between two points in my additional dataframe
# Need to get the value of the key "distance"
# res['routes'] --First, acess the dictionary "routes", which is made up of a series of other dictionaries, ending with "distance" and it's paired value
# [0]-- Second, get first item "summary" in the list "routes" (bc distance is within "sumamry")
# ['summary']-- Third, "summary" is now a key and we want to access its pair (another dictionary), so we are calling the "summary" dictionary
# ['distance']-- Fourth, 'distance' and 'duration' (within curly brackets) are now new keys of our dictionary. We want to call the value of the key "distance."
# *0.00062137 -- Fifth, convert meters to miles
# round(......, 2) -- Finally, round to two decimal places

#round(res['routes'][0]['summary']['distance']*0.00062137, 2)


In [16]:
## Define a function to find distance between two location points
# First, define my function of finding distance between location 1 and location 2 in miles
# def distance(location1, location2):
#     ## Call up API
#     #set location coordinates in longitude,latitude order
#     coords = (location1, location2)
#     #call API
#     res = client.directions(coords)
#     time.sleep(1)
#     distance_miles = round(res['routes'][0]['summary']['distance']*0.00062137, 2)
#     return distance_miles

In [17]:
# Using API example coordinates, check that function works
# distance((80.21787585263182,6.025423265401452), (80.23929481745174,6.019639381180123))

### Below is additional code I needed to merge the NYCHA and Schools data frames so that I can map all coordinates to show the spatial connection between the developments and the schools

In [18]:
# Combine the longitude and latitude into a single column (for mapping)
nycha['cords'] = [(cords[0], cords[1]) for cords in nycha[['longitude', 'latitude']].values]

In [19]:
# Apply the nycha_school function to the new nycha coordinate column and then store the result in the school_dbn and distance
# School_dbn names the school closest to the nycha development and distance is how far that school is
nycha['school_dbn'], nycha['distance'] = zip(*nycha['cords'].apply(nycha_school))

In [20]:
# Keep the dbn of the closest school from the above function and then use that dbn to keep the correct school
nycha_and_school = nycha.merge(valid_hq_schools, left_on='school_dbn', right_on='dbn')

In [21]:
# View dataframe
nycha_and_school

Unnamed: 0,development,borough,address,city,state,zip code,census tract (2010),latitude_x,longitude_x,cords,...,environment,leadership,familycommunity,trust,studentachievement,average_rating,full_school_address,clean_school_address,latitude_y,longitude_y
0,1010 EAST 178TH STREET,BRONX,1010 EAST 178TH STREET,BRONX,NY,10460,361,40.841169,-73.880247,"(-73.8802474669211, 40.8411692981621)",...,3.0,4.0,3.0,4.0,4.0,3.57,2055 MAPES AVENUE New York NY 10460,2055 MAPES AVENUE New York NY 10460,40.845682,-73.886532
1,BRYANT AVENUE-EAST 174TH STREET,BRONX,1705 BRYANT AVENUE,BRONX,NY,10460,161,40.835883,-73.885564,"(-73.8855636241279, 40.8358825940019)",...,3.0,4.0,3.0,4.0,4.0,3.57,2055 MAPES AVENUE New York NY 10460,2055 MAPES AVENUE New York NY 10460,40.845682,-73.886532
2,EAST 180TH STREET-MONTEREY AVENUE,BRONX,2111 LAFONTAINE AVENUE,BRONX,NY,10457,37504,40.850206,-73.892279,"(-73.8922789585132, 40.8502058302162)",...,3.0,4.0,3.0,4.0,4.0,3.57,2055 MAPES AVENUE New York NY 10460,2055 MAPES AVENUE New York NY 10460,40.845682,-73.886532
3,HOE AVENUE-EAST 173RD STREET,BRONX,1700 HOE AVENUE,BRONX,NY,10460,161,40.836348,-73.887118,"(-73.8871179295976, 40.8363484252329)",...,3.0,4.0,3.0,4.0,4.0,3.57,2055 MAPES AVENUE New York NY 10460,2055 MAPES AVENUE New York NY 10460,40.845682,-73.886532
4,TWIN PARKS EAST (SITE 9),BRONX,2070 CLINTON AVENUE,BRONX,NY,10457,371,40.847116,-73.887986,"(-73.8879861549906, 40.8471160615418)",...,3.0,4.0,3.0,4.0,4.0,3.57,2055 MAPES AVENUE New York NY 10460,2055 MAPES AVENUE New York NY 10460,40.845682,-73.886532
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
280,RED HOOK WEST,BROOKLYN,135 RICHARDS STREET,BROOKLYN,NY,11231,85,40.676758,-74.009829,"(-74.0098290482315, 40.6767580807613)",...,4.0,4.0,4.0,4.0,4.0,3.86,26 BROADWAY New York NY 10004,26 BROADWAY New York NY 10004,40.705313,-74.012911
281,ROBBINS PLAZA,MANHATTAN,341 EAST 70TH STREET,NEW YORK,NY,10021,126,40.766801,-73.957299,"(-73.9572986917384, 40.7668006722677)",...,4.0,4.0,4.0,4.0,4.0,4.00,213 EAST 63RD STREET New York NY 10065,213 EAST 63RD STREET New York NY 10065,40.763912,-73.963976
282,SHELTON HOUSE,QUEENS,89-09 162ND STREET,JAMAICA,NY,11432,44601,40.706456,-73.798852,"(-73.798852415945, 40.7064558560137)",...,3.0,4.0,3.0,3.0,4.0,3.57,85-05 144 STREET New York NY 11435,85-05 144th STREET New York NY 11435,40.710377,-73.811964
283,WASHINGTON,MANHATTAN,1761 3RD AVENUE,NEW YORK,NY,10029,15602,40.786068,-73.948252,"(-73.9482520656376, 40.7860682233969)",...,3.0,4.0,4.0,4.0,4.0,3.71,323 East 91st Street New York NY 10128,323 East 91st Street New York NY 10128,40.78067,-73.947848


#### Here, I merge the two data frames into a very simplified data frame, which I will map

In [22]:
nycha_coords = nycha_and_school[['latitude_x', 'longitude_x']].copy()
nycha_coords['Organization Type'] = 'NYCHA'
nycha_coords.columns = ['latitude', 'longitude', 'Organization Type']
school_coords = nycha_and_school[['latitude_y', 'longitude_y']].copy()
school_coords['Organization Type'] = 'High Quality School'
school_coords.columns = ['latitude', 'longitude', 'Organization Type']
combined = pd.concat([nycha_coords, school_coords])
combined

Unnamed: 0,latitude,longitude,Organization Type
0,40.841169,-73.880247,NYCHA
1,40.835883,-73.885564,NYCHA
2,40.850206,-73.892279,NYCHA
3,40.836348,-73.887118,NYCHA
4,40.847116,-73.887986,NYCHA
...,...,...,...
280,40.705313,-74.012911,High Quality School
281,40.763912,-73.963976,High Quality School
282,40.710377,-73.811964,High Quality School
283,40.78067,-73.947848,High Quality School


In [23]:
# save output in excel
# Only 68 unique schools for 285 NYCHA developments
combined.to_csv("combined.csv")

# Findings

## 1. Not every NYCHA development is near a high quality elementary or middle school

Here are descriptive statistics that showcase the distance of high quality schools from NYCHA developments:
- **Mean:** .65 miles (about 13 city blocks)
- **Median:** .51 miles
- **Minimum:** .03 miles
- **Maximum:** 2.4 miles

In [24]:
# Calculate descriptive statistics regarding the distance of schools from the NYCHA developments
nycha_and_school[["distance"]].describe()

Unnamed: 0,distance
count,285.0
mean,0.644173
std,0.473333
min,0.029931
25%,0.305404
50%,0.499387
75%,0.84561
max,2.395565


### Let's take a look at this map of all NYCHA developments and the schools nearest to them

A couple things to take note of:
- There is quite a bit of clustering of NYCHA developments, and only a couple high quality schools near them (especially in lower Manhattan, upper Manhattan, the south Bronx, and eastern Brooklyn)
- There are 285 NYCHA developments and only 68 unique high quality schools that are closest to these developments
- The distance between schools and NYCHA developments grow larger in the more suburban areas of Staten Island, eastern Brooklyn, and Queenx

In [25]:
### MAP
import plotly.express as px
fig = px.scatter_mapbox(combined, lat="latitude", lon="longitude", zoom=10,
                       mapbox_style='carto-positron', color='Organization Type', title="Map of Nearest High Quality Elementary or Middle Schools to NYCHA Developments"
                       )

fig.show()

## 2. When there is a nearby high quality school, there may be too many NYCHA developments nearby that need to share. As a result, not every child in a development can go to their closest high quality school

### The likelihood is that they are instead going to a worse quality, but closer school that they are zoned into

Take, for example, P.S. 398 in District 17 in Brooklyn. _Fifteen_ NYCHA developments have this as their closest high quality school. There is no way that all students in this development will be able to attend this school.

In [26]:
# There is a lot of clustering in the map. How many NYCHA developments per high quality school?
# First, group by school_dbn
# .count() gives me the count of NYCHA devs per schools
# .iloc[:,1] selects only the first column, the count/borough column
# .sort_values(["borough"], ascending = False)) sorts from most to least
nycha_and_school.groupby(["school_dbn"]).count().iloc[:,1].to_frame().sort_values(["borough"], ascending = False)


Unnamed: 0_level_0,borough
school_dbn,Unnamed: 1_level_1
23K644,15
17K398,15
09X327,14
08X337,13
01M188,12
...,...
10X244,1
02M151,1
02M896,1
06M178,1


# In Conclusion...

## I was able to answer both of my questions:
1. What is the closest high quality elementary or middle school to each NYCHA development?
2. Are high quality elementary or middle schools accessible to children who live in NYCHA developments?

## My hypothesis was not fully correct:

#### My hypothesis:

>Given that the NYC schools are [economically and racially segregated](https://steinhardt.nyu.edu/research-alliance/understanding-and-addressing-segregation-nyc-schools), and residents of NYCHA are overwhelmingly [people of color and low-income](https://www.nyc.gov/assets/nycha/downloads/pdf/Resident-Data-Book-Summary-2022.pdf), I hypothesize that students who live in NYCHA developments will not live near high quality elementary or middle schools, and if they do, they will not be accessible to them.

1. Looking at the map above, we see that many NYCHA developments are within blocks of a high quality elementary or middle school, but many are not. This is especially the case for more suburban areas of the city. On average, a high quality school being .65 miles away is not bad. But it should be remembered that aggregated data can distort the true reality for thousands of students.


2. Due to the clustering on the map, I would argue that most students who live in NYCHA developments will not have access to their nearest high quality school. Most NYCHA developments are simply too clustered together and there are too few high quality schools for the students to attend (remember, only 213 of 1,113 schools are considered high quality). This is in line with students who are from low income backgrounds attending worse quality schools.



# Limitations 

#### There are a number of issues I ran into, or did not think of, before embarking on this project:
1. The 2017 School Quality dataset contained both elementary and middle schools. As stated at the beginning, elementary schools are zoned and students who live in the neighborhood attend their neighborhood school. Middle schools can be zoned but are sometimes lottery admission. I should have considered this in the beginning and taken the middle schools out of the dataset before continuing on with my analysis.


2. A school with an average rating of 3.5 is an arbitrary rating of "high quality" that I chose. In the end, I decided that a school that is "Meeting Target" on all seven levels would have a rating of 3. Thus, to be considered high quality, I thought a school should perform better than "Meeting Target."


3. I had issues with the Nominatim API when converting school addresses to coordinates. I tried many ways to fix this addresses issue, but could not figure it out. I went from 213 high quality schools to 146 schools -- a decrease of about a third. This would obviously affect the number of high quality schools available to kids in NYCHA developments.


4. I initially wanted to use the OpenServiceRoute API to determine the actually driving distance (in miles) between development and school. But there were daily usage limits that prevented me from being able to analyze the whole data frame using the API.


5. I should have found a way to determine the number of miles to a school via driving, biking, or walking rather than using the distance formula that calculates using the birds eye distance (aka from NYCHA development to school as if a bird were flying with no obstructions). Getting to school is not that simple in reality. A .05 mile distance (about 10 blocks) may take a lot longer to get through when on a busy NYC road.

6. It should also be noted that many New Yorkers, especially low income New Yorkers, rely on public transportation to get to and from school. When a high quality school is even half a mile away, this can prove difficult for young children in elementary school to navigate the bus and/or subway system.

>"When you live in a poor neighborhood, you are living in an area where you have poor schools. When you have poor schools, you have poor teachers. When you have poor teachers, you get a poor education. When you get a poor education, you can only work in a poor-paying job. And that poor-paying job enables you to live again in a poor neighborhood. So, it's a very vicious cycle."
<br>
<br>
~ Malcolm X