# Analysis of NYS public and charter schools results in ELA and math grades 6-8.

## Data processing.

<a id="TOC"></a> 
## Table of Contents
1. [Data sources and definitions](#data)
2. [Imports: modules](#modules)
3. [Read and prepare data](#read)
4. [Generating geoJSON for mapping](#maps) 

<a id="data"></a> 
## Data, definitions

#### Data sources:

**1) State test Math and ELA results (2022-2023)**

Report Card Database (251.35 megabytes): This Access database contains assessment results (elementary- and intermediate-level ELA, Math, and Science; Annual Regents; Total Cohort Regents; NYSESLAT; NYSAA), for the state, districts, public with charter schools, by county, and Need to Resource Capacity group.
https://data.nysed.gov/downloads.php

**2) Schools locations**

NYS GIS Clearinghouse: NYS Schools
https://data.gis.ny.gov/maps/b6c624c740e4476689aa60fdc4aacb8f/about

#### Definitions of Performance Levels for the 2023 Grades 3-8 English Language Arts and Mathematics Tests  

**NYS Level 1**: Students performing at this level are below proficient in standards for their grade. They may demonstrate limited knowledge, skills, and practices embodied by the Learning Standards that are considered insufficient for the expectations at this grade. 

**NYS Level 2**: Students performing at this level are partially proficient in standards for their grade. They demonstrate knowledge, skills, and practices embodied by the Learning Standards that are considered partial but insufficient for the expectations at this grade. Students performing at Level 2 are considered on track to meet current New York high school graduation requirements but are not yet proficient in Learning Standards at this grade. 

**NYS Level 3**: Students performing at this level are proficient in standards for their grade. They demonstrate knowledge, skills, and practices embodied by the Learning Standards that are considered sufficient for the expectations at this grade.  

**NYS Level 4**: Students performing at this level excel in standards for their grade. They demonstrate knowledge, skills, and practices embodied by the Learning Standards that are considered more than sufficient for the expectations at this grade.  

*Source: NYSED, 2023, https://www.p12.nysed.gov/irs/ela-math/2023/ela-math-score-ranges-performance-levels-2023.pdf*

### Imports

In [2]:
# Appending the path to utils

import sys

parent_dir = 'C:\\GITHUB\\NY_schools_maps\\notebooks'
sys.path.append(parent_dir)

In [3]:
import os
import pandas as pd
import geopandas as gpd
import matplotlib.pyplot as plt
import folium
from shapely.geometry import Point
from fuzzywuzzy import process
import fuzzywuzzy
import base64
from io import BytesIO
import math
from tqdm import tqdm
from utils import create_plot, match_name

pd.set_option('display.float_format', '{:.3f}'.format)

### Read data

In [4]:
basePath = r"G:\My Drive\Kids\NYC_schools_mapped"
dataFolder = r"raw_data"
outputFolder = r"processed_data"

In [None]:
# Read GeoJSON into data frame
SchoolsFile = 'NYC_K-12_schools.geojson'
NYCSchoolsPath = os.path.join(basePath, dataFolder, SchoolsFile)
NYCSchoolsGeom = gpd.read_file(NYCSchoolsPath)

In [5]:
## Read schools test results files

# read schools math results file
fileName_math = "NYC_MATH_2022-2023_from_NYS.xlsx"
mathPath = os.path.join(basePath,dataFolder,fileName_math)
print(mathPath)
mathResultsDF = pd.read_excel(mathPath)

# read schools ELA results file
fileName_ELA = "NYC_ELA_2022-2023_from_NYS.xlsx"
ELAPath = os.path.join(basePath, dataFolder, fileName_ELA)
print(ELAPath)
ELAResultsDF = pd.read_excel(ELAPath)

G:\My Drive\Kids\NYC_schools_mapped\raw_data\NYC_MATH_2022-2023_from_NYS.xlsx
G:\My Drive\Kids\NYC_schools_mapped\raw_data\NYC_ELA_2022-2023_from_NYS.xlsx


In [None]:
## Read district results files

# Read file with district wide Math test results to add to the map
DistrictMathFile = "DistrictsMSMAthNorm.xlsx"
DistrictMathPath = os.path.join(basePath, outputFolder, DistrictMathFile)
DistrictMSMathData = pd.read_excel(DistrictMathPath)
print(DistrictMSMathData.head(5))

# Read file with district wide ELA test results to add to the map
DistrictELAFile = "DistrictsMSELANorm.xlsx"
DistrictELAPath = os.path.join(basePath, outputFolder, DistrictELAFile)
DistrictMSELAData = pd.read_excel(DistrictELAPath)
print(DistrictMSELAData.head(5))

In [None]:
mathResultsDF.info()

In [None]:
ELAResultsDF.info()

### Prepare layer with district data

In [None]:
mathColumns = {'# Level 1':'# Level 1 Math','# Level 2':'# Level 2 Math', '# Level 3':'# Level 3 Math','# Level 4':'# Level 4 Math'}
DistrictMSMathData.rename(columns = mathColumns, inplace = True) 
print(DistrictMSMathData.head())

ELAColumns = {'# Level 1':'# Level 1 ELA','# Level 2':'# Level 2 ELA', '# Level 3':'# Level 3 ELA','# Level 4':'# Level 4 ELA'}
DistrictMSELAData.rename(columns = ELAColumns, inplace = True)
DistrictMSELAData.head()

In [None]:
DistrictAllData = pd.merge(DistrictMSMathData, DistrictMSELAData, on = ['Year', 'District'], how = 'inner')

In [None]:
DistrictAllData['# Level 4 Math+ELA'] = DistrictAllData['# Level 4 Math'] + DistrictAllData['# Level 4 ELA']

### Prepare school layer

In [None]:
# Get locations for public schools only 
#(select only public schools (public, charter, charter, SATELLITE SITE FOR CHARTER SCHOOLS) from geoJSON)

NYCSchoolsGeom = NYCSchoolsGeom[NYCSchoolsGeom['INST_TYPE_DESC'] == 'PUBLIC SCHOOLS']
NYCSchoolsGeom

In [None]:
# Make a dataframe from geoJSON with minimum needed columns

NYCSchoolsGeom_short = NYCSchoolsGeom[['OBJECTID', 'LEGAL_NAME', 'INSTSUBTYPDESC', 'SDL_DESC', 'geometry']]
NYCSchoolsGeom_short

In [None]:
name = 'NYCPubChSchools_temp.csv'
path = os.path.join(basePath, outputFolder, name)
NYCSchoolsGeom_short.to_csv(path)

del name, path

In [None]:
# Dictionnary for schools test results results
subjects = ['Math', 'ELA']
resultsDFs = {'Math': mathResultsDF, 'ELA': ELAResultsDF}

In [None]:
for subject in subjects:
    resultsDF = resultsDFs[subject]
    resultsDF = resultsDF[resultsDF['Year'] == 2023]
    resultsDF = resultsDF[['ENTITY_NAME', 'Year', 'ASSESSMENT_NAME', 'LEVEL1_COUNT', 'LEVEL2_COUNT', 'LEVEL3_COUNT', 'LEVEL4_COUNT']]
    resultsDF.info()
    resultsDFs[subject] = resultsDF
    print(len(resultsDF))
    
del resultsDF

In [None]:
# resultsDF.info() showed that most of the columns are objects instead of numbers and needed to be converted
for subject in subjects:
    resultsDF = resultsDFs[subject]
    resultsDF_colToConvert = ['LEVEL1_COUNT', 'LEVEL2_COUNT', 'LEVEL3_COUNT', 'LEVEL4_COUNT']
    resultsDF[resultsDF_colToConvert] = resultsDF[resultsDF_colToConvert].apply(pd.to_numeric, errors = 'coerce')
    resultsDF.info()
    resultsDFs[subject] = resultsDF
    print(len(resultsDF))
    
del resultsDF

In [None]:
results_Norm = {}

for subject in subjects:
        
    resultsDF = resultsDFs[subject]
    
    resultsDF_grouped = resultsDF.groupby(['ENTITY_NAME', 'Year'])[['LEVEL1_COUNT', 'LEVEL2_COUNT', 'LEVEL3_COUNT', 'LEVEL4_COUNT']].sum()
    # Change column names to include subject
    resultsDF_grouped.columns = [f'Level 1 {subject}',f'Level 2 {subject}',f'Level 3 {subject}',f'Level 4 {subject}']
    
    # Dataframe for middle schools by years with normalized values
    results_Norm[subject] = resultsDF_grouped.div(resultsDF_grouped.sum(axis=1), axis=0)
    results_Norm[subject].reset_index(inplace=True)
    
    print(results_Norm[subject].head(20))
    
    # Dataframe with average
    
del resultsDF, resultsDF_grouped

In [None]:
# Make a merged dataframe with both Math and ELA results
DFs = list(results_Norm.values())
allResultsDF = pd.merge(DFs[0], DFs[1], on = ['ENTITY_NAME', 'Year'], how = 'inner')
allResultsDF.head(5)

In [None]:
allResultsDF.info()

In [None]:
allResultsDF['Level 4 Math+Ela'] = allResultsDF[f'Level 4 {subjects[0]}']+allResultsDF[f'Level 4 {subjects[1]}']
allResultsDF.head(10)

In [None]:
name = 'NYCPubChSchoolsTestResults2023_temp.csv'
path = os.path.join(basePath, outputFolder, name)
allResultsDF.to_csv(path)

del name, path

In [None]:
# Make plots for popups in the map and add them as columns to the mappable dataframe


# list of schools names

schoolsNames = allResultsDF['ENTITY_NAME'].to_list()
testResults = allResultsDF

# Create disctionnary to hold the dataframes by schools
schoolDFs = {}

# Make dataframes by schools 
for name in schoolsNames:
    dfName = name
    schoolDFs[dfName] = testResults[testResults['ENTITY_NAME'] == name]

plots = []
plotsDFs = {}

for subject in subjects:
    columns_to_plot = [f"Level 1 {subject}", f"Level 2 {subject}", f"Level 3 {subject}", f"Level 4 {subject}"]  
    # Plot dataframes by school
    for schoolDF, current_dataframe in schoolDFs.items():
        # schoolDF contains the name of the dataframe
        # current_dataframe contains the dataframe itself

            # Do something with current_dataframe
            # Create a plot
            fig = create_plot(current_dataframe, schoolDF, columns_to_plot)

            # Convert the plot to a PNG image and then encode it
            io_buf = BytesIO()
            fig.savefig(io_buf, format='png', bbox_inches='tight')
            io_buf.seek(0)
            base64_string = base64.b64encode(io_buf.read()).decode('utf8')

            pair = (schoolDF, base64_string)

            plots.append(pair)

    # add the plots to the geodataframe of middle schools subject results 
    plotsDF = pd.DataFrame(plots, columns=['ENTITY_NAME', f'plot {subject}'])

    plotsDFs[subject] = plotsDF
            
for subject, df in plotsDFs.items():
    allResultsDF = pd.merge(allResultsDF, df, left_on = 'ENTITY_NAME', right_on='ENTITY_NAME')

In [None]:
allResultsDF.info()

In [None]:
# Matching the school all data file with spatial data (geojson of schools locations)

# Matching names from resultsMS_bySchl_Norm[subject] to NYCSchoolsDataShort
allResultsDF['matched_name'] = allResultsDF['ENTITY_NAME'].apply(lambda x: match_name(x, NYCSchoolsGeom_short['LEGAL_NAME'], min_score=60))

name = 'NYCPubChSchoolsTestResults2023_tempMatched.csv'
path = os.path.join(basePath, outputFolder, name)
allResultsDF.to_csv(path)

del name, path

# Merging DataFrames based on the matched name
finalGeoDF = pd.merge(NYCSchoolsGeom_short,allResultsDF, left_on='LEGAL_NAME', right_on='matched_name')
allData_Name = 'PublicCharterNYCschools.geojson'
allData_Path = os.path.join(basePath,outputFolder, allData_Name)
finalGeoDF.to_file(allData_Path, driver="GeoJSON")

del allData_Name, allData_Path

In [None]:
finalGeoDF.info()

### Generating the map

In [None]:
# Prepare legend
legend_html = '''
     <div style="position: fixed; 
                 bottom: 50px; left: 50px; width: 300px; height: 110px; 
                 border:1px solid grey; z-index:9999; font-size:10px;
                 background-color: rgba(255, 255, 255, 0.7);
                 padding: 10px;
                 ">
                   <div><i class="fa fa-circle" style="border:0.5px solid #54B96D; color:green; border-radius:50%; display:inline-block;"></i><span style="margin-left: 5px;"> NYC public schools</span><br>
                   The size of each circle reflects the average share of students scoring at the highest level (level 4) on state ELA and math tests over the years 2019 to 2023. &nbsp; </div>
                   <br>
                   <div><i class="fa fa-circle" style="border:2px solid yellow; color:green; border-radius:50%; display:inline-block;"></i><span style="margin-left: 5px;">Open to all city residents </span></div>
                   <div><i class="fa fa-circle" style="border:2px solid #3862e0; color:green;border-radius:50%; display:inline-block;"></i><span style="margin-left: 5px;">Open to Brooklyn residents </span></div>
                   
      </div>
     '''

In [None]:
from IPython.core.display import display, HTML

display(HTML("<style>.output_scroll { height: auto !important; max-height: 1500px; }</style>"))

# Create a map object, centered at NYC
mapNYC = folium.Map(location=[40.6839, -73.9026], zoom_start=11, tiles="cartodb positron")
   
# Add dataframes with coordinates and test results to the map

def my_style(x):
    level4 = x['properties']['Level 4 Math+Ela']
    charter = x['properties']['INSTSUBTYPDESC']
    color = '#f0a607' if charter == 'CHARTER SCHOOL'  else '#f0a607' if charter == 'SATELLITE SITE FOR CHARTER SCHOOLS' else '#06a6cf'
    #fill_color = '#f0a607' if charter == 'CHARTER SCHOOL'  else '#f0a607' if charter == 'SATELLITE SITE FOR CHARTER SCHOOLS' else '#06a6cf'
    if level4 is None:
        level4 = 0
    #print(level4)
    return {
        "radius": (level4)*500,
        "color": color,
        #"fill_color": fill_color,
    }  


## Adding the layer to the map
districts = folium.Choropleth(
    geo_data = NYCDistrictsGeom,
    data = DistrictAllData[DistrictAllData['Year'] == 2023],
    columns = ['District','# Level 4 Math+ELA'],
    key_on = "feature.properties.school_dist",
    fill_color = "BuPu",
    fill_opacity = 0.8,
    line_opacity=0.3,
    nan_fill_color="white",
    legend_name = 'Sum of percentages of middle school test takers with Level 4 result in Math and ELA, 2023',
    popup = folium.GeoJsonPopup(fields=["school_dist", "Year", "# Level 1", "# Level 2", "# Level 3", "# Level 4"]), 
    name = "School districts"
).add_to(mapNYC)


# Function to create iframe for a given row
def create_iframe(row):    
    html =  '<strong>{0}:</strong> {1}<br><strong>{2}:</strong> {3}<br><strong>{4}:</strong> {5}<br>\
    <br><img src="data:image/png;base64,{6}"><br>\
    <img src="data:image/png;base64,{7}">'.format(
        'School Name', row['LEGAL_NAME'],
        'Level 4 share 2023 Math', round(row['Level 4 Math'], 2), 
        'Level 4 share 2023 ELA', round(row['Level 4 ELA'], 2),
        row['plot Math'], row['plot ELA'])
    return folium.IFrame(html, width=500, height=450)

def create_popup(x):
    iframe = create_iframe(x)
    popup = folium.Popup(iframe)
    return popup

# Iterate over the GeoDataFrame and add a popup to each feature
for _, row in tqdm(finalGeoDF.iterrows(), total = len(finalGeoDF)):
    iframe = create_iframe(row)
        
    data = gpd.GeoDataFrame(row.to_frame().T, crs=finalGeoDF.crs)
    
    folium.GeoJson(
    data,
    marker = folium.Circle(radius=10, fill_color='white', fill_opacity=0, color="green", weight=2),
    #marker = folium.Circle(radius=10),    
    popup = folium.Popup(iframe),
    style_function = my_style, 
    control = False    
    #zoom_on_click = True,    
).add_to(mapNYC)    
        
folium.LayerControl().add_to(mapNYC)    
  
# # Display the map
# mapNYC

#Adding legend to the map

# Add the HTML to the map using a feature group
mapNYC.get_root().html.add_child(folium.Element(legend_html))

# Save map to html
mfile = 'NYCpublicAndCharter.html'
mpath = os.path.join(basePath, outputFolder, mfile)
mapNYC.save(mpath)

In [None]:
finalGeoDF['SDL_DESC'].unique()

In [None]:
NYCSchoolsGeom_short['SDL_DESC'].unique()

In [None]:
NYCSchoolsGeom_short.info()