# Analysis of NYS public and charter schools results in ELA and math grades 6-8.

## Data processing.

<a id="TOC"></a> 
#### Table of Contents
1. [Data sources, definitions](#data)
2. [Imports: modules](#modules)
3. [Read and prepare data](#read)
5. [Processing school data, preparing plots for pop-ups](#processing)
    1. [Merging data by schools, averaging by years](#averaging)
    2. [Adding plots by schools, tests for map pop-ups](#plotting)
8. [Matching test and location data for final geoJSON](#maps)    

<a id="data"></a> 
### Data, definitions

#### Data sources:

**1) State test Math and ELA results (2022-2023)**

New York State Education Department: Report Card Database (251.35 megabytes): "This Access database contains assessment results (elementary- and intermediate-level ELA, Math, and Science; Annual Regents; Total Cohort Regents; NYSESLAT; NYSAA), for the state, districts, public with charter schools, by county, and Need to Resource Capacity group."
https://data.nysed.gov/downloads.php

**2) Schools locations**

NYS GIS Clearinghouse: NYS Schools
https://data.gis.ny.gov/maps/b6c624c740e4476689aa60fdc4aacb8f/about

#### Definitions of Performance Levels for the 2023 Grades 3-8 English Language Arts and Mathematics Tests  

**NYS Level 1**: Students performing at this level are below proficient in standards for their grade. They may demonstrate limited knowledge, skills, and practices embodied by the Learning Standards that are considered insufficient for the expectations at this grade. 

**NYS Level 2**: Students performing at this level are partially proficient in standards for their grade. They demonstrate knowledge, skills, and practices embodied by the Learning Standards that are considered partial but insufficient for the expectations at this grade. Students performing at Level 2 are considered on track to meet current New York high school graduation requirements but are not yet proficient in Learning Standards at this grade. 

**NYS Level 3**: Students performing at this level are proficient in standards for their grade. They demonstrate knowledge, skills, and practices embodied by the Learning Standards that are considered sufficient for the expectations at this grade.  

**NYS Level 4**: Students performing at this level excel in standards for their grade. They demonstrate knowledge, skills, and practices embodied by the Learning Standards that are considered more than sufficient for the expectations at this grade.  

*Source: NYSED, 2023, https://www.p12.nysed.gov/irs/ela-math/2023/ela-math-score-ranges-performance-levels-2023.pdf*

#### About this notebook

- This notebook '*1._NYS_public_and_charter_middle_schools_data_processing*' contains the steps for processing data on state testing of public and charter schools in New York State. 
- The notebook '*2._Generating_NYS_middle_schools_map*' contains code to generate the map from the processed data.
- The map is available at: https://nysmsmap.netlify.app

<a id="modules"></a>
### Imports

In [None]:
# Appending the path to 'utils' modules with this project's functions 

import sys

parent_dir = 'C:\\GITHUB\\NY_schools_maps\\notebooks'
sys.path.append(parent_dir)

In [None]:
import os
import pandas as pd
import geopandas as gpd
import matplotlib.pyplot as plt
from fuzzywuzzy import process
import fuzzywuzzy
import base64
from io import BytesIO
import math
from tqdm import tqdm
from utils import create_plot, match_name

pd.set_option('display.float_format', '{:.3f}'.format)

<a id="read"></a>
### Read data

In [None]:
basePath = r"G:\My Drive\Kids\NYC_schools_mapped"
dataFolder = r"raw_data"
outputFolder = r"processed_data"

In [None]:
# Read GeoJSON into dataframe
SchoolsFile = 'NYS_Schools.geojson'
NYSchoolsPath = os.path.join(basePath, dataFolder, SchoolsFile)
NYSchoolsGeom = gpd.read_file(NYSchoolsPath)

In [None]:
## Read schools test results files

# read schools math results file
fileName_math = "NYS_MS_MATH_from_NYS.xlsx"
mathPath = os.path.join(basePath,dataFolder,fileName_math)
print(mathPath)
mathResultsDF = pd.read_excel(mathPath)

# read schools ELA results file
fileName_ELA = "NYS_MS_ELA_from_NYS.xlsx"
ELAPath = os.path.join(basePath, dataFolder, fileName_ELA)
print(ELAPath)
ELAResultsDF = pd.read_excel(ELAPath)

In [None]:
mathResultsDF.info()

In [None]:
ELAResultsDF.info()

<a id="processing"></a>
### Processing school data, preparing plots for pop-ups

<a id="averaging"></a>
#### Merging data by schools, averaging by years

In [None]:
# Dictionnary for schools test results dataframes to use throughout the analysis
subjects = ['Math', 'ELA']
resultsDFs = {'Math': mathResultsDF, 'ELA': ELAResultsDF}

In [None]:
# Renaming "YEAR" column to use the 'create_plot' function below
for subject in subjects:
    resultsDF = resultsDFs[subject]
    resultsDF = resultsDF.rename(columns = {'YEAR':'Year'})
    resultsDFs[subject] = resultsDF
    
del resultsDF

In [None]:
# .info() above showed that most of the columns in the test results dataframes
# are objects instead of numbers and needed to be converted, so:

for subject in subjects:
    resultsDF = resultsDFs[subject]
    resultsDF_colToConvert = ['LEVEL1_COUNT', 'LEVEL2_COUNT', 'LEVEL3_COUNT', 'LEVEL4_COUNT']
    resultsDF[resultsDF_colToConvert] = resultsDF[resultsDF_colToConvert].apply(pd.to_numeric, errors = 'coerce')
    resultsDF.info()
    resultsDFs[subject] = resultsDF
    print(len(resultsDF))
    
del resultsDF

In [None]:
# Normalizing the results datagfames for making plots for pop-ups later

results_Norm = {}

for subject in subjects:
        
    resultsDF = resultsDFs[subject]
    
    resultsDF_grouped = resultsDF.groupby(['ENTITY_CD', 'ENTITY_NAME', 'Year'])[['LEVEL1_COUNT', 'LEVEL2_COUNT', 'LEVEL3_COUNT', 'LEVEL4_COUNT']].sum()
  
    # Change column names to include subject
    resultsDF_grouped.columns = [f'Level 1 {subject}',f'Level 2 {subject}',f'Level 3 {subject}',f'Level 4 {subject}']
    
    # Dataframe for middle schools by years with normalized values
    results_Norm[subject] = resultsDF_grouped.div(resultsDF_grouped.sum(axis=1), axis=0)
    results_Norm[subject].reset_index(inplace=True)
    
    print(results_Norm[subject].head())
    
del resultsDF, resultsDF_grouped

In [None]:
# Make a merged dataframe with both Math and ELA results
DFs = list(results_Norm.values())
allResultsDF = pd.merge(DFs[0], DFs[1], on = ['ENTITY_CD', 'Year'], how = 'inner', suffixes=('', '_drop'))
allResultsDF = allResultsDF.loc[:, ~allResultsDF.columns.str.endswith('_drop')]
allResultsDF.head(5)

del DFs

In [None]:
allResultsDF.info()

In [None]:
# Calculating average results for 2 years

results_AVG2y = {}

for subject in subjects:
        
    resultsDF = resultsDFs[subject]
    
    resultsDF_grouped = resultsDF.groupby(['ENTITY_CD', 'ENTITY_NAME'])[['LEVEL1_COUNT', 'LEVEL2_COUNT', 'LEVEL3_COUNT', 'LEVEL4_COUNT']].sum()
    # Change column names to include subject
    resultsDF_grouped.columns = [f'Level 1 {subject}',f'Level 2 {subject}',f'Level 3 {subject}',f'Level 4 {subject}']
    
    # Dataframe for middle schools by years with normalized values
    results_AVG2y[subject] = resultsDF_grouped.div(resultsDF_grouped.sum(axis=1), axis=0)
    results_AVG2y[subject].reset_index(inplace=True)
    
    print(results_AVG2y[subject].head())
    
del resultsDF, resultsDF_grouped

In [None]:
# Make a merged dataframe with both Math and ELA results for 2 years average

DFs = list(results_AVG2y.values())
allResultsDFAVG2y = pd.merge(DFs[0], DFs[1], on = ['ENTITY_CD','ENTITY_NAME'], how = 'inner')
allResultsDFAVG2y.head()

In [None]:
# Adding column to classify the schools on the map

allResultsDFAVG2y['Level 4 Math+Ela'] = allResultsDFAVG2y[f'Level 4 {subjects[0]}']+allResultsDFAVG2y[f'Level 4 {subjects[1]}']
allResultsDFAVG2y.head()

<a id="plotting"></a>
#### Adding plots by schools, tests for map pop-ups

In [None]:
# Make plots for pop-ups in the map and add them as columns to the mappable dataframe

# Set interactive mode off
plt.ioff()

# list of schools names

schoolsNames = allResultsDF['ENTITY_NAME'].to_list()
testResults = allResultsDF
print("Schools' list ready.")

# Create disctionnary to hold the dataframes by schools
schoolDFs = {}

# Make dataframes by schools 
for name in schoolsNames:
    dfName = name
    schoolDFs[dfName] = testResults[testResults['ENTITY_NAME'] == name]
print('Dataframes by schools ready.')    

plotsDFs = {}

print("Making test results plots ...")

for subject in subjects:
    plots = []
    columns_to_plot = [f"Level 1 {subject}", f"Level 2 {subject}", f"Level 3 {subject}", f"Level 4 {subject}"]  
   
    # Plot dataframes by school
    for schoolDF, current_dataframe in tqdm(schoolDFs.items()):
        # schoolDF contains the name of the dataframe
        # current_dataframe contains the dataframe itself

            # Do something with current_dataframe
            # Create a plot
            fig = create_plot(current_dataframe, schoolDF, columns_to_plot)

            # Convert the plot to a PNG image and then encode it
            io_buf = BytesIO()
            fig.savefig(io_buf, format='png', bbox_inches='tight', dpi=85)
            # Close the figure
            plt.close()
            #Reading file to get the base64 string
            io_buf.seek(0)
            base64_string = base64.b64encode(io_buf.read()).decode('utf8')

            pair = (schoolDF, base64_string)

            plots.append(pair)

    # add the plots to the geodataframe of middle schools subject results 
    plotsDFs[subject] = pd.DataFrame(plots, columns=['ENTITY_NAME', f'plot {subject}'])
    
# Concatenate all plots DataFrames along the columns before merging
combined_plots_df = pd.concat(plotsDFs.values(), axis=1)
            
print('Adding plots to the data frame with test results.')    
allResultsDFAVG2y = pd.merge(allResultsDFAVG2y, combined_plots_df, left_on = 'ENTITY_NAME', right_on=combined_plots_df.iloc[:, 0], suffixes=('', '_drop'))
allResultsDFAVG2y = allResultsDFAVG2y.loc[:, ~allResultsDFAVG2y.columns.str.endswith('_drop')]
print('Done.')    
# Set interactive mode on
# plt.ion()

In [None]:
allResultsDFAVG2y.info()

<a id="maps"></a> 
### Matching test and location data for final geoJSON

#### Read schools geolocation file

In [None]:
# Get locations for public schools only (select only public schools 
# (public, charter, charter, SATELLITE SITE FOR CHARTER SCHOOLS) from the geoJSON)

NYSchoolsGeom = NYSchoolsGeom[NYSchoolsGeom['INST_TYPE_DESC'] == 'PUBLIC SCHOOLS']
NYSchoolsGeom

In [None]:
# Make a dataframe from geoJSON with minimum columns

NYSchoolsGeom_short = NYSchoolsGeom[['OBJECTID', 'LEGAL_NAME', 'INSTSUBTYPDESC', 'SDL_DESC', 'geometry']]
NYSchoolsGeom_short

In [None]:
# Matching the school all data file average for 2 years 
# with spatial data (geojson of schools locations)

tqdm.pandas(desc="Matching Names")

matched_tuples = allResultsDFAVG2y['ENTITY_NAME'].progress_apply(
    lambda x: match_name(x, NYSchoolsGeom_short['LEGAL_NAME'], min_score=65))

print('Done.')

In [None]:
print('Appending matches to the dataframe.')
allResultsDFAVG2y['matched_name'] = list(zip(*matched_tuples))[0]
allResultsDFAVG2y['matched_score'] = list(zip(*matched_tuples))[1]
print('Done.')

In [None]:
# Checking how many rows remained unmatched to see if minimum score is optimal

(allResultsDFAVG2y['matched_score'] == -1).sum()

# 41 if minimal score = 70
# 15 if minimal score = 65 - better

In [None]:
# Unmatched or matched incorrectly names identified by 
# visual observations on the map or by analysing the geoJSON in prefered software

unmatched = {
    'JOHNSTOWN JUNIOR-SENIOR HS':'JOHNSTOWN HIGH SCHOOL',
    'YOUNG WOMEN\'S COLLEGE PREP CS':'YOUNG WOMEN\'S COLLEGE PREPARATORY CHARTER SCHOOL OF ROCHESTER',
    'SEED HARLEM':'SCHOOL OF EARTH EXPLORATION AND DISCOVERY HARLEM (SEED HARLEM)',
    'PS/IS 210 21ST CENTURY ACADEMY':'PS/IS 210 TWENTY-FIRST CENTURY ACADEMY FOR COMMUNITY LEADERSHIP',
    'BGLIG-SHIRLEY RODRIGUEZ-REMENESKI CS':'BRONX GLOBAL LEARNING INSTITUTE FOR GIRLS CHARTER SCHOOL THE SHIRLEY RODRGUEZ-REMENESKI SCHOOL',
    'ARCHIMEDES ACAD-MATH, SCI, TECH':'ARCHIMEDES ACADEMY FOR MATH SCIENCE AND TECHNOLOGY APPLICATIONS',
    'QUEENS COLLEGIATE':'QUEENS COLLEGIATE - A COLLEGE BOARD SCHOOL',
    'VALENCE COLLEGE PREP CS':'VALENCE COLLEGE PREPARATORY CHARTER SCHOOL',
    'MEADOW HILL GLOBAL EXPLORATIONS MAGN':'MEADOW HILL SCHOOL',
    'A MACARTHUR BARR MS 5-6 ACADEMY':'A MACARTHUR BARR MIDDLE SCHOOL',
    'LAWRENCE ES-BROADWAY':'LAWRENCE ELEMENTARY SCHOOL AT BROADWAY CAMPUS',
    'BROOKLYN EAST COLLEGIATE CS':'',
    'SOUNDVIEW ACADEMY':'SOUNDVIEW ACADEMY FOR CULTURE AND SCHOLARSHIP',
    'COLLEGIATE ACADEMY-MATH-PERSONAL AWA':'COLLEGIATE ACADEMY FOR MATHEMATICS AND PERSONAL AWARENESS CHARTER SCHOOL',
    'MS 224 MANHATTAN EAST':'MS 224 MANHATTAN EAST SCHOOL FOR ARTS & ACADEMICS',
    'PATHWAYS COLLEGE PREPARATORY':'PATHWAYS COLLEGE PREPARATORY SCHOOL:  A COLLEGE BOARD SCHOOL',
    'GELLER HOUSE SCHOOL':'',
    '30TH AVENUE SCHOOL':'30TH AVENUE SCHOOL (THE) (G & T CITYWIDE)',
    'HUNTS POINT SCHOOL (THE)':'HUNTERS POINT COMMUNITY MIDDLE SCHOOL',
    'ACADEMY OF MEDICAL TECHNOLOGY':'ACADEMY OF MEDICAL TECHNOLOGY - A COLLEGE BOARD SCHOOL',
    'GEORGE WASHINGTON CARVER HS':'GEORGE WASHINGTON CARVER HIGH SCHOOL FOR THE SCIENCES',
    'SCIENCE AND TECHNOLOGY ACADEMY':'SCIENCE AND TECHNOLOGY ACADEMY:  A MOTT HALL SCHOOL',
    'ACHIEVEMENT FIRST NORTH BROOKLYN PRE':'ACHIEVEMENT FIRST NORTH BROOKLYN PREPARATORY CHARTER SCHOOL',
    'SULLIVAN WEST HIGH SCHOOL':'SULLIVAN WEST HIGH SCHOOL AT LAKE HUNTINGTON',
    'BUFFALO COLLEGIATE CHARTER SCHOOL':'',
    'FDA VIII MIDDLE SCHOOL':'',
    'NY MILLS SCHOOL':'NEW YORK MILLS SCHOOL',
    'DALTON-NUNDA INTERMEDIATE SCHOOL':'DALTON-NUNDA MIDDLE SCHOOL',
    'PATHWAYS COLLEGE PREPARATORY SCHOOL':'PATHWAYS COLLEGE PREPARATORY SCHOOL:  A COLLEGE BOARD SCHOOL',
    'WEST GENESEE MIDDLE SCHOOL':'WEST GENESEE INTERMEDIATE SCHOOL',
    'DENZEL WASHINGTON SCHOOL-ARTS':'DENZEL WASHINGTON SCHOOL OF THE ARTS AT NELLIE A THORNTON CAMPUS',
    'WEST HEMPSTEAD MIDDLE SCHOOL':'WEST HEMPSTEAD SECONDARY SCHOOL',
    'KAPPA V':'KAPPA V (KNOWLEDGE AND POWER PREP ACADEMY)',
    'LEADERSHIP ACADEMY FOR YOUNG MEN':'',
    'KIPP NYC WASHINGTON HEIGHTS ACADEMY':'KIPP NYC WASHINGTON HEIGHTS ACADEMY CHARTER SCHOOL',
    'GIRLS PREP CHARTER SCHOOL-BRONX':'GIRLS PREPARATORY CHARTER SCHOOL BRONX MIDDLE SCHOOL',
    'YOUNG WOMEN\'S LEADERSHIP OF SI':'YOUNG WOMEN\'S LEADERSHIP OF STATEN ISLAND',
    'GIRLS PREP CHARTER SCHOOL':'GIRLS PREPARATORY CHARTER SCHOOL OF NY MIDDLE SCHOOL',
    'FORTE PREPARATORY ACADEMY CS':'FORTE PREPARATORY ACADEMY CHARTER SCHOOL',
    'DOLGEVILLE MIDDLE SCHOOL':'',
    'PS/IS 157 BENJAMIN FRANKLIN':'PS/IS 157 BENJAMIN FRANKLIN HEATH AND SCIENCE ACADEMY (THE)',
    'KIPP AMP CHARTER SCHOOL':'KIPP ALWAYS MENTALLY PREPARED CHARTER SCHOOL',
    'MYERS MIDDLE SCHOOL':'',
    'MULLEN ELEMENTARY SCHOOL':'STANLEY G FALK SCHOOL - MULLEN ELEMENTARY',
    'FRONT STREET ELEMENTARY SCHOOL':'',
    'KEY COLLEGIATE CHARTER SCHOOL':'',
}

In [None]:
# Replacing the erroneus matches in the 'allResultsDF_2023' dataframe

def replace_values(row):
    if row['ENTITY_NAME'] in unmatched:
        row['matched_name'] = unmatched[row['ENTITY_NAME']]
    return row

allResultsDFAVG2y = allResultsDFAVG2y.apply(replace_values, axis = 1)

In [None]:
# Merging dataframes based on the matched name

finalGeoDF = pd.merge(NYSchoolsGeom_short, allResultsDFAVG2y, left_on='LEGAL_NAME', right_on='matched_name')
allData_Name = 'PublicCharterNYSschools.geojson'
allData_Path = os.path.join(basePath,outputFolder, allData_Name)
print(f'Saving to {allData_Path} ...')
finalGeoDF.to_file(allData_Path, driver="GeoJSON")
print('Saved.')

del allData_Name, allData_Path