# Analysis of NYC public schools results in ELA and math grades 3-8.

<span style="color: red;">**If kernel can't connect to server again run command:**
*netsh winsock reset*<span>

<a id="TOC"></a> 
## Table of Contents
1. [Data sources and definitions](#data)
2. [Imports: modules](#modules)
3. [Read and prepare data](#read)
4. [Best middle schools by math](#best)
5. [Maps of middle schools by math results](#maps) 


<a id="data"></a> 
#### Data:
1. Data New York City grades 3-8 New York State English Language Arts and Math State Tests results 2013-2023:<br>https://infohub.nyced.org/reports/academics/test-results
2. NYS schools locations:<br>
https://data.gis.ny.gov/maps/b6c624c740e4476689aa60fdc4aacb8f/about
3. Citywide or Boroughwide status:
<br>https://www.nycschoolhelp.com/borowide-citywide-middle-schools

#### Definitions of Performance Levels for the 2023 Grades 3-8 English Language Arts and Mathematics Tests  

**NYS Level 1**: Students performing at this level are below proficient in standards for their grade. They may demonstrate limited knowledge, skills, and practices embodied by the Learning Standards that are considered insufficient for the expectations at this grade. 

**NYS Level 2**: Students performing at this level are partially proficient in standards for their grade. They demonstrate knowledge, skills, and practices embodied by the Learning Standards that are considered partial but insufficient for the expectations at this grade. Students performing at Level 2 are considered on track to meet current New York high school graduation requirements but are not yet proficient in Learning Standards at this grade. 

**NYS Level 3**: Students performing at this level are proficient in standards for their grade. They demonstrate knowledge, skills, and practices embodied by the Learning Standards that are considered sufficient for the expectations at this grade.  

**NYS Level 4**: Students performing at this level excel in standards for their grade. They demonstrate knowledge, skills, and practices embodied by the Learning Standards that are considered more than sufficient for the expectations at this grade.  

*Source: NYSED, 2023, https://www.p12.nysed.gov/irs/ela-math/2023/ela-math-score-ranges-performance-levels-2023.pdf*

## Questions
*1. How the test results changed?*
<br>Compare last year test results in a school with the school 10-year average as percentage of average:
<br> school_change = (school_current_year - school_10year_average)/school_10year_average
<br> citywide_change = (city_current_year - city_10year_average)/city_10year_average
<br> relative_school_change = school_change - citywide_change
<br><br>
*2. How good the school is?* 
<br>Last three testing period results (2019, 2022, 2023) are different for some schools: due to COVID disruptions, testing procedures changes, in Destrict 15 due to admission rules changed. Therefore average 10 years scores do not reflect well schools situation now. Results for these 3 last testing years are teken instead.
<br><br>
*3. Is the school citywide or borowide?*
<br><br>
*4. Diversity?*
<br><br>
*5. Size?*

<a id="modules"></a> 
#### Imports: modules

In [2]:
import os
import pandas as pd
import geopandas as gpd
# import matplotlib.pyplot as plt
import folium
from shapely.geometry import Point
# import difflib
# from fuzzywuzzy import process
# import fuzzywuzzy
# import base64
# from io import BytesIO
# import math
from tqdm import tqdm

pd.set_option('display.float_format', '{:.3f}'.format)

In [3]:
basePath = r"G:\My Drive\Kids\NYC_schools"
dataFolder = r"raw_data"
outputFolder = r"processed_data"

<a id="maps"></a> 
### Maps of the NYC middle schools by assessment results 

Below mapping is done with the __[Folium library](https://python-visualization.github.io/folium/latest/)__

#### Read schools geolocation file

In [4]:
## Read GeoJSON into data frame
SchoolsFile = 'NYC_K-12_schools_public.geojson'
NYCSchoolsPath = os.path.join(basePath, dataFolder, SchoolsFile)
NYCSchoolsData = gpd.read_file(NYCSchoolsPath)

DistrictsFile = 'School Districts.geojson'
NYCDistrictsPath = os.path.join(basePath, dataFolder, DistrictsFile)
NYCDistrictsData = gpd.read_file(NYCDistrictsPath)

### Preparing layer with districts

In [5]:
# Read file with district wide Math test results to add to the map
DistrictMathFile = "DistrictsMSMAthNorm.xlsx"
DistrictMathPath = os.path.join(basePath, outputFolder, DistrictMathFile)
DistrictMSMathData = pd.read_excel(DistrictMathPath)
print(DistrictMSMathData.head(5))

# Read file with district wide ELA test results to add to the map
DistrictELAFile = "DistrictsMSELANorm.xlsx"
DistrictELAPath = os.path.join(basePath, outputFolder, DistrictELAFile)
DistrictMSELAData = pd.read_excel(DistrictELAPath)
print(DistrictMSELAData.head(5))

   Unnamed: 0  Year  # Level 1  # Level 2  # Level 3  # Level 4  District
0           0  2013      0.259      0.310      0.195      0.237         1
1           1  2014      0.256      0.313      0.218      0.214         1
2           2  2015      0.235      0.330      0.206      0.229         1
3           3  2016      0.225      0.330      0.178      0.267         1
4           4  2017      0.293      0.289      0.173      0.245         1
   Unnamed: 0  Year  # Level 1  # Level 2  # Level 3  # Level 4  District
0           0  2013      0.265      0.461      0.465      0.997         1
1           1  2014      0.214      0.438      0.518      0.997         1
2           2  2015      0.216      0.397      0.537      0.997         1
3           3  2016      0.156      0.425      0.535      0.997         1
4           4  2017      0.169      0.395      0.501      0.998         1


In [8]:
mathColumns = {'# Level 1':'# Level 1 Math','# Level 2':'# Level 2 Math', '# Level 3':'# Level 3 Math','# Level 4':'# Level 4 Math'}
DistrictMSMathData.rename(columns = mathColumns, inplace = True) 
print(DistrictMSMathData.head())

ELAColumns = {'# Level 1':'# Level 1 ELA','# Level 2':'# Level 2 ELA', '# Level 3':'# Level 3 ELA','# Level 4':'# Level 4 ELA'}
DistrictMSELAData.rename(columns = ELAColumns, inplace = True)
DistrictMSELAData.head()

   Unnamed: 0  Year  # Level 1 Math  # Level 2 Math  # Level 3 Math  \
0           0  2013           0.259           0.310           0.195   
1           1  2014           0.256           0.313           0.218   
2           2  2015           0.235           0.330           0.206   
3           3  2016           0.225           0.330           0.178   
4           4  2017           0.293           0.289           0.173   

   # Level 4 Math  District  
0           0.237         1  
1           0.214         1  
2           0.229         1  
3           0.267         1  
4           0.245         1  


Unnamed: 0.1,Unnamed: 0,Year,# Level 1 ELA,# Level 2 ELA,# Level 3 ELA,# Level 4 ELA,District
0,0,2013,0.265,0.461,0.465,0.997,1
1,1,2014,0.214,0.438,0.518,0.997,1
2,2,2015,0.216,0.397,0.537,0.997,1
3,3,2016,0.156,0.425,0.535,0.997,1
4,4,2017,0.169,0.395,0.501,0.998,1


In [12]:
DistrictMSMathData.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 288 entries, 0 to 287
Data columns (total 7 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   Unnamed: 0      288 non-null    int64  
 1   Year            288 non-null    int64  
 2   # Level 1 Math  288 non-null    float64
 3   # Level 2 Math  288 non-null    float64
 4   # Level 3 Math  288 non-null    float64
 5   # Level 4 Math  288 non-null    float64
 6   District        288 non-null    int64  
dtypes: float64(4), int64(3)
memory usage: 15.9 KB


In [16]:
DistrictAllData = pd.merge(DistrictMSMathData, DistrictMSELAData, on = ['Year', 'District'], how = 'inner')

In [21]:
DistrictAllData['# Level 4 Math+ELA'] = DistrictAllData['# Level 4 Math'] + DistrictAllData['# Level 4 ELA']

In [25]:
DistrictAllData.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 288 entries, 0 to 287
Data columns (total 13 columns):
 #   Column              Non-Null Count  Dtype  
---  ------              --------------  -----  
 0   Unnamed: 0_x        288 non-null    int64  
 1   Year                288 non-null    int64  
 2   # Level 1 Math      288 non-null    float64
 3   # Level 2 Math      288 non-null    float64
 4   # Level 3 Math      288 non-null    float64
 5   # Level 4 Math      288 non-null    float64
 6   District            288 non-null    int64  
 7   Unnamed: 0_y        288 non-null    int64  
 8   # Level 1 ELA       288 non-null    float64
 9   # Level 2 ELA       288 non-null    float64
 10  # Level 3 ELA       288 non-null    float64
 11  # Level 4 ELA       288 non-null    float64
 12  # Level 4 Math+ELA  288 non-null    float64
dtypes: float64(9), int64(4)
memory usage: 31.5 KB


In [19]:
# Read saved GeoJSON with average tests results and plots
AVGTestsPlotFile = 'schoolDataPlots.geojson'
AVGTestsPlotPath = os.path.join(basePath, outputFolder,AVGTestsPlotFile)
NYCSchoolsAVGData = gpd.read_file(AVGTestsPlotPath)

In [20]:
NYCSchoolsAVGData.info()

<class 'geopandas.geodataframe.GeoDataFrame'>
RangeIndex: 922 entries, 0 to 921
Data columns (total 47 columns):
 #   Column                         Non-Null Count  Dtype   
---  ------                         --------------  -----   
 0   OBJECTID                       922 non-null    int64   
 1   LEGAL_NAME                     922 non-null    object  
 2   PHYSADDRLINE1                  922 non-null    object  
 3   PHYSCITY                       922 non-null    object  
 4   COUNTY_DESC                    922 non-null    object  
 5   RECORD_TYPE_DESC               922 non-null    object  
 6   SDL_DESC                       922 non-null    object  
 7   DBN                            922 non-null    object  
 8   School Name_x                  922 non-null    object  
 9   10yrs avg Lvl 1 Math           914 non-null    float64 
 10  10yrs avg Lvl 2 Math           914 non-null    float64 
 11  10yrs avg Lvl 3 Math           914 non-null    float64 
 12  10yrs avg Lvl 4 Math        

#### Producing NYC map

In [24]:
from IPython.core.display import display, HTML

display(HTML("<style>.output_scroll { height: auto !important; max-height: 1500px; }</style>"))

# Create a map object, centered at NYC
MathTestMS_map = folium.Map(location=[40.6839, -73.9026], zoom_start=11, tiles="cartodb positron")
   
# Add dataframes with coordinates and test results to the map

def my_style(x):
    level4 = x['properties']['3yrs avg Lvl 4 Math+Ela']
    openTo = x['properties']['Open to']
    color = 'yellow' if openTo == 'Citywide'  else '#3862e0' if openTo == 'Brooklyn' else '#54B96D'
    weight = 2 if openTo == 'Citywide'  else 2 if openTo == 'Brooklyn' else 0.5
    if level4 is None:
        level4 = 0
    #print(level4)
    return {
        "radius": (level4)*500,
        "color": color,
        "weight": weight
    }  


## Adding the layer to the map
districts = folium.Choropleth(
    geo_data = NYCDistrictsData,
    data = DistrictAllData[DistrictAllData['Year'] == 2023],
    columns = ['District','# Level 4 Math+ELA'],
    key_on = "feature.properties.school_dist",
    fill_color = "BuPu",
    fill_opacity = 0.8,
    line_opacity=0.3,
    nan_fill_color="white",
    legend_name = 'Sum of percentages of middle school test takers with Level 4 result in Math and ELA, 2023',
    popup = folium.GeoJsonPopup(fields=["school_dist", "Year", "# Level 1", "# Level 2", "# Level 3", "# Level 4"]), 
    name = "School districts"
).add_to(MathTestMS_map)

# folium.GeoJson(
#     NYCpublicSchoolsMath_mappable[NYCpublicSchoolsMath_mappable['Year'] == 2023],
#     marker = folium.Circle(radius=10, fill_color='yellow', fill_opacity=1.0, color="orange", weight=0.5),
#     tooltip = folium.GeoJsonTooltip(fields=["LEGAL_NAME","RECORD_TYPE_DESC", "# Level 4"]),
#     popup = folium.GeoJsonPopup(fields=["LEGAL_NAME","RECORD_TYPE_DESC", "# Level 4"]),
#     style_function = my_style,        
#     zoom_on_click = True,
#     name = "All middle schools"
# ).add_to(MathTestMS_map)

# Function to create iframe for a given row
def create_iframe(row):    
    html =  '<strong>{0}:</strong> {1}<br><strong>{2}:</strong> {3}<br><strong>{4}:</strong> {5}<br>\
    <strong>{6}:</strong> {7}<br><strong>{8}:</strong> {9}<br><strong>{10}:</strong> {11}<br>\
    <strong>{12}:</strong> <br><img src="data:image/png;base64,{13}"><br>\
    <img src="data:image/png;base64,{14}"<br> <img src="data:image/png;base64,{15}">'.format(
        'School Name', row['LEGAL_NAME'],
        'School Number', row['DBN'],
        'Total Enrollment, students', row['Total Enrollment'],
        'Level 4 share Avg 2019-2023 Math', round(row['3yrs avg Lvl 4 Math'], 2), 
        'Level 4 share Avg 2019-2023 ELA', round(row['3yrs avg Lvl 4 ELA'], 2),
        'School Math+ELA % Level 4 2023 - 10 years average', round(row['2023-10yAVG'], 3),
        'Citywide Math+ELA % Level 4 2023 - 10 years average: 0.100',
        row['plot Math'], row['plot ELA'], row['Dvst_chart'])
    return folium.IFrame(html, width=500, height=450)

def create_popup(x):
    iframe = create_iframe(x)
    popup = folium.Popup(iframe)
    return popup

# Iterate over the GeoDataFrame and add a popup to each feature
for _, row in tqdm(NYCSchoolsAVGData.iterrows(), total = len(NYCSchoolsAVGData)):
    iframe = create_iframe(row)
        
    data = gpd.GeoDataFrame(row.to_frame().T, crs=NYCSchoolsAVGData.crs)
    
    folium.GeoJson(
    data,
    marker = folium.Circle(radius=10, fill_color='green', fill_opacity=1.0, color="green", weight=0.5),
    popup = folium.Popup(iframe),
    style_function = my_style, 
    control = False    
    #zoom_on_click = True,    
).add_to(MathTestMS_map)    
        
folium.LayerControl().add_to(MathTestMS_map)    
  
# # Display the map
# MathTestMS_map

# Save map to html
mfile = 'NYCMSmap.html'
mpath = os.path.join(basePath, outputFolder, mfile)
MathTestMS_map.save(mpath)

100%|████████████████████████████████████████████████████████████████████████████████| 922/922 [00:44<00:00, 20.70it/s]


#### Producing map zoomed on District 15

In [10]:
from IPython.core.display import display, HTML

display(HTML("<style>.output_scroll { height: auto !important; max-height: 1500px; }</style>"))

# Create a map object, centered at NYC
MathTestMS_map = folium.Map(location=[40.666591, -73.995518], zoom_start=13, tiles="cartodb positron")
   
# Add dataframes with coordinates and test results to the map

def my_style(x):
    level4 = x['properties']['3yrs avg Lvl 4 Math+Ela']
    openTo = x['properties']['Open to']
    color = 'yellow' if openTo == 'Citywide'  else '#3862e0' if openTo == 'Brooklyn' else '#54B96D'
    weight = 2 if openTo == 'Citywide'  else 2 if openTo == 'Brooklyn' else 0.5
    if level4 is None:
        level4 = 0
    #print(level4)
    return {
        "radius": (level4)*150,
        "color": color,
        "weight": weight
    }  


## Adding the layer to the map
districts = folium.Choropleth(
    geo_data = NYCDistrictsData,
    data = DistrictMSMathData[DistrictMSMathData['Year'] == 2023],
    columns = ['District','# Level 4'],
    key_on = "feature.properties.school_dist",
    fill_color = "BuPu",
    fill_opacity = 0.8,
    line_opacity=0.3,
    nan_fill_color="white",
    legend_name = 'Percent of middle school test takers with Level 4 result',
    popup = folium.GeoJsonPopup(fields=["school_dist", "Year", "# Level 1", "# Level 2", "# Level 3", "# Level 4"]), 
    name = "School districts"
).add_to(MathTestMS_map)

# folium.GeoJson(
#     NYCpublicSchoolsMath_mappable[NYCpublicSchoolsMath_mappable['Year'] == 2023],
#     marker = folium.Circle(radius=10, fill_color='yellow', fill_opacity=1.0, color="orange", weight=0.5),
#     tooltip = folium.GeoJsonTooltip(fields=["LEGAL_NAME","RECORD_TYPE_DESC", "# Level 4"]),
#     popup = folium.GeoJsonPopup(fields=["LEGAL_NAME","RECORD_TYPE_DESC", "# Level 4"]),
#     style_function = my_style,        
#     zoom_on_click = True,
#     name = "All middle schools"
# ).add_to(MathTestMS_map)

# Function to create iframe for a given row
def create_iframe(row):    
    html =  '<strong>{0}:</strong> {1}<br><strong>{2}:</strong> {3}<br><strong>{4}:</strong> {5}<br>\
    <strong>{6}:</strong> {7}<br><strong>{8}:</strong> {9}<br><img src="data:image/png;base64,{10}"><br>\
    <img src="data:image/png;base64,{11}"<br> <img src="data:image/png;base64,{12}">'.format(
        'School Name', row['LEGAL_NAME'],
        'School Number', row['DBN'],
        'Total Enrollment, students', row['Total Enrollment'],
        'Level 4 share Avg 2019-2023 Math', round(row['3yrs avg Lvl 4 Math'], 2), 
        'Level 4 share Avg 2019-2023 ELA', round(row['3yrs avg Lvl 4 ELA'], 2),
        row['plot Math'], row['plot ELA'], row['Dvst_chart'])
    return folium.IFrame(html, width=500, height=450)

def create_popup(x):
    iframe = create_iframe(x)
    popup = folium.Popup(iframe)
    return popup

# Iterate over the GeoDataFrame and add a popup to each feature
for _, row in tqdm(NYCSchoolsAVGData.iterrows(), total = len(NYCSchoolsAVGData)):
    iframe = create_iframe(row)
        
    data = gpd.GeoDataFrame(row.to_frame().T, crs=NYCSchoolsAVGData.crs)
    
    folium.GeoJson(
    data,
    marker = folium.Circle(radius=10, fill_color='green', fill_opacity=1.0, color="green", weight=0.5),
    popup = folium.Popup(iframe),
    style_function = my_style, 
    control = False    
    #zoom_on_click = True,    
).add_to(MathTestMS_map)    
        
folium.LayerControl().add_to(MathTestMS_map)    
  
# # Display the map
# MathTestMS_map

# Save map to html
mfile = 'NYCMSmap_Dist15.html'
mpath = os.path.join(basePath, outputFolder, mfile)
MathTestMS_map.save(mpath)

100%|████████████████████████████████████████████████████████████████████████████████| 922/922 [00:44<00:00, 20.74it/s]


In [27]:
import re
import csv

# Lists of schools
brooklyn_schools = [
"MS 113 Ronald Edmonds Learning Center (13K113) audition or Language criteria",
"Urban Assembly Institute of Math and Science for Young Women (13K527)",
"Fort Greene Prep (13K691) Language criteria",
"Juan Morel Campos Secondary School (14K071)",
"PS/MS 84 Jose de Diego (14K084) Language criteria",
"IS 318 Eugenio Maria De Hostos (14K318)",
"Young Women’s Leadership School of Brooklyn (14K614)",
"Lyons Community School (14K586)",
"JHS 88 Peter Rouget (15K088)",
"Boerum Hill School for International Studies (15K497)",
"IS 136 Charles O. Dewey (15K136)",
"Ebbets Field (17K352)",
"Science, Technology and Research Early College HS at Erasmus (17K543) school based assessment",
"East Flatbush Community Research School (18K581)",
"IS 285 Meyer Levin (18K285) audition and open",
"Legacy School of the Arts (19K907) audition",
"JHS 218 James P. Sinnott (19K218) Language criteria",
"JHS 220 John J. Pershing (20K220)",
"Urban Assembly School for Leadership and Empowerment (20K609)",
"IS 228 David Boody (21K228) audition, talent assessment, test",
"IS 281 Joseph B. Cavallaro (21K281)",
"Eagle Academy for Young Men (23K644)",
"Kappa V (23K518)",
"IS 349 Math, Science & Tech (32K349)",
"JHS 383 Philippa Schuyler (32K383) course grades",
"JHS 291 Roland Hayes (32K291)",
"Evergreen Middle School for Urban Exploration (32K562)",
"All City Leadership Secondary School (32K554) course grades"
]

citywide_schools = [
"New Explorations into Science, Technology & Math (01M539) course grades",
"The Anderson School (03M334) course grades",
"Tag Young Scholars (04M012) course grades",
"Brooklyn School of Inquiry (20K686) course grades",
"The 30th Avenue School (30Q300) course grades",
"School for Global Leaders  (01M378)",
"The 47 American Sign Language & English Lower School (02M347)",
"Ballet Tech, NYC Public School Dance (02M442) audition",
"Ella Baker School (02M225)",
"I.C.E. Institute for Collaborative Education (02M407)",
"School of the Future Middle and High School (02M413)",
"Professional Performing Arts School (02M408) audition",
"Quest to Learn (02M422)",
"Special Music School (03M859) audition",
"MS 224 Manhattan East School for Arts & Academics (04M224)",
"J.H.S. 123 James M. Kieran  (08X123)",
"Restoration Academy Magnet School of Global Exploration and Innovation (13K301)",
"J.H.S. 050 John D. Wells  (14K050) Language criteria",
"MS 448 Brooklyn Collaborative (15K448)",
"Park Slope Collegiate  (15K464)",
"M.S. 035 Stephen Decatur  (16K035)",
"The Brooklyn Green School  (16K898)",
"Medgar Evers College Preparatory School (17K590) school based assessment",
"Lenox Academy (18K235) course grades",
"I.S. 171 Abraham Lincoln  (19K171)",
"Van Siclen Community Middle School  (19K654)",
"Mark Twain IS 239 for the Gifted & Talented (21K239) variety of school based assessments and audition depending on the program",
"I.S. 392  (23K392)",
"Scholar’s Academy (27Q323) course grades",
"Catherine & Count Basie Middle School  (28Q072)",
"Redwood Middle School  (28Q332)",
"M.S. 358  (28Q358)",
"Baccalaureate School for Global Education (30Q580) school based assessment"
]

# Function to process schools
def process_schools(school_list, open_to):
    processed_schools = []
    pattern = re.compile(r"(.+?)\s+\((\d+[KMQ]\d{3})\)(.*)")
    for school in school_list:
        match = pattern.match(school)
        if match:
            school_name = match.group(1).strip()
            dbn = match.group(2).strip()
            comments = match.group(3).strip()
            processed_schools.append([dbn, school_name, open_to, comments])
    return processed_schools

# Process both lists
processed_brooklyn = process_schools(brooklyn_schools, "Brooklyn")
processed_citywide = process_schools(citywide_schools, "Citywide")

# Combine lists
all_schools = processed_brooklyn + processed_citywide

# Write to CSV
csv_filename = "cityBoroughWideschools.csv"
csv_path = os.path.join(basePath, dataFolder, csv_filename)
with open(csv_filename, "w", newline='') as csvfile:
    writer = csv.writer(csvfile)
    writer.writerow(['DBN', 'School Name', 'Open to', 'Comments']) # CSV header
    for school in all_schools:
        writer.writerow(school)

print(f"CSV file '{csv_filename}' has been created with {len(all_schools)} schools.")


CSV file 'cityBoroughWideschools.csv' has been created with 60 schools.
