# Analysis of NJ public and charter schools results in ELA and math for grades 6-8.

<span style="color: red;">**If kernel can't connect to server again run command:**
*netsh winsock reset*<span>

<a id="TOC"></a> 
## Table of Contents
1. [Data sources and definitions](#data)
2. [Imports: modules](#modules)
3. [Read and prepare data](#read)
4. [Generating geoJSON for mapping](#maps) 

<a id="data"></a> 
## Data, definitions

#### Data:
1. Data New Jersey Student Learning Assessments (NJSLA) results 2015-2023 for grades 6-8 for public and charter schools:
<br>State of New Jersey, Department of Education:
Statewide Assessment Reports
<br>https://www.nj.gov/education/assessment/results/reports/
2. NJ schools locations: NJGIN Open Data <br>
https://njogis-newjersey.opendata.arcgis.com/datasets/d8223610010a4c3887cfb88b904545ff/explore

####  Performance levels for New Jersey Student Learning Standards for English Language Arts and Math  

**Level 1**: Did Not Yet Meet Expectations <br>
**Level 2**: Partially Met Expectations <br>
**Level 3**: Approached Expectations  <br>
**Level 4**: Met Expectations  <br>
**Level 5**: Exceeded Expectations  <br>

*Source: New Jersey Assessments Resource Center, 2022, https://nj.mypearsonsupport.com/resources/reporting/NJSLA_Score_Interpretation_Guide_Spring2022.pdf*

## Questions
*1. How the test results changed?*
<br>Changes in test scores proportions are charted for MATH and ELA for years 2015-2023 for middle school grades (grades 6-8).
<br><br>
*2. How good the school is?* 
<br> The schools are compared by the sum of average level 5 scores for years 2015-2023 for all middle grades combined (or the years available in NJ DOE data for part of these years)


## Limitations
1. Some elementary schools go up to grade 6. For these schools' share of level 5 results is usually higher than in schools with grades 6-8 or 7-8. Since they teach only the first of the middle grades, they were excluded to make a more grounded view of the middle schools quality.<br><br>
2. Some school names in the original NJSLA data are inconsistently spelled or contain errors in the records across different years. As a result, these discrepancies created separate entries in the allResultsAVG2015_23DF dataframe. Consequently, this has led to certain schools having multiple overlapping points on the map, with pop-ups displaying data for different years.
While this may affect the visual clarity and completeness of the map, the current representation still provides a comprehensive overview of the academic proficiency of middle schools in New Jersey. Further data cleaning to eliminate this issue required spending more time and effort, which was unnecessary for the purpose of the project.

#### About this notebook

- This notebook '*1._Data_processing_by_NJ_middle_schools*' contains the steps for the processing data on state testing of public and charter schools in New Jersey. 
- The notebook '*2._Generating_map_by_NJ_middle_schools*' contains code to generate the map from the processed data.
- The map is available at:

<a id="modules"></a> 
#### Imports: modules

In [1]:
# Appending the path to utils

import sys

parent_dir = 'C:\\GITHUB\\NY_schools_maps\\notebooks'
sys.path.append(parent_dir)

In [2]:
import os
import pandas as pd
import geopandas as gpd
# import folium
import matplotlib.pyplot as plt
import base64
from io import BytesIO
import math
from tqdm import tqdm
from utils import match_name, create_plot, process_schools, create_chart

pd.set_option('display.float_format', '{:.3f}'.format)



<a id="read"></a> 
#### Read data

In [3]:
basePath = r"G:\My Drive\Kids\NJ_schools_mapped"
dataFolder = r"raw_data"
outputFolder = r"processed_data"

The excel files downloaded from NJ DOE were cleaned from 'DFG' columns and case in columns headers was unified.

In [4]:
# Reading data from annual files with results by schools

# Initialize an empty list to store dataframes
math_DFs = []

directory = os.path.join(basePath, dataFolder)

# Loop through each file in the directory
for filename in tqdm(os.listdir(directory), desc = 'Processing files'):
    if filename.endswith('.xlsx') and filename.startswith('MAT') and 'NJSLA DATA'  in filename:
        print(filename)
        
        # Construct the full file path
        file_path = os.path.join(directory, filename)
        
        # Read the Excel file
        df = pd.read_excel(file_path, skiprows=2)
        
        # Filter the dataframe 
        filtered_df = df[(df['Subgroup'].str.lower() == 'total') & (df['School Name'].str.lower() != 'district total') & pd.notna(df['School Name']) & (df['School Name'].str.strip() != '')]
        
        # Add a column with type of assessment and grade (ex: MAT06),
        # it is in the first 5 characters of the filename
        filtered_df['Assessment'] = filename[:5] 
        
        # Add a column with year, it is in the last 4 characters before file extention in the filename
        filtered_df['Year'] = filename[-9:-5] 
        
        # Harmonizing cases in columns between different tables
        column_to_upper = ['County Name', 'District Name', 'School Name', 'Subgroup', 'Subgroup_Type']
        for col in column_to_upper:
            filtered_df[col] = filtered_df[col].str.upper()
        
        # Append the filtered DataFrame to the list 'math_DFs'
        math_DFs.append(filtered_df)

print("Concatinatinating dataframes")        
# Concatenate all dataframes into one
mathResultsDF = pd.concat(math_DFs, ignore_index=True)

print("mathResultsDF is ready.")

Processing files:   0%|                                                                         | 0/46 [00:00<?, ?it/s]

MAT06 NJSLA DATA 2022-2023.xlsx


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
Processing files:  13%|████████▍                                                        | 6/46 [00:04<00:33,  1.21it/s]

MAT07 NJSLA DATA 2022-2023.xlsx


Processing files:  15%|█████████▉                                                       | 7/46 [00:09<00:57,  1.47s/it]

MAT08 NJSLA DATA 2022-2023.xlsx


Processing files:  17%|███████████▎                                                     | 8/46 [00:12<01:14,  1.96s/it]

MAT06 NJSLA DATA 2021-2022.xlsx


Processing files:  26%|████████████████▋                                               | 12/46 [00:17<00:51,  1.51s/it]

MAT07 NJSLA DATA 2021-2022.xlsx


Processing files:  28%|██████████████████                                              | 13/46 [00:21<01:04,  1.95s/it]

MAT08 NJSLA DATA 2021-2022.xlsx


Processing files:  30%|███████████████████▍                                            | 14/46 [00:25<01:14,  2.33s/it]

MAT06 NJSLA DATA 2018-2019.xlsx


Processing files:  39%|█████████████████████████                                       | 18/46 [00:30<00:47,  1.70s/it]

MAT07 NJSLA DATA 2018-2019.xlsx


Processing files:  41%|██████████████████████████▍                                     | 19/46 [00:34<00:56,  2.09s/it]

MAT08 NJSLA DATA 2018-2019.xlsx


Processing files:  43%|███████████████████████████▊                                    | 20/46 [00:37<01:01,  2.37s/it]

MAT06 NJSLA DATA 2017-2018.xlsx


Processing files:  54%|██████████████████████████████████▊                             | 25/46 [00:42<00:32,  1.53s/it]

MAT07 NJSLA DATA 2017-2018.xlsx


Processing files:  57%|████████████████████████████████████▏                           | 26/46 [00:46<00:36,  1.84s/it]

MAT08 NJSLA DATA 2017-2018.xlsx


Processing files:  59%|█████████████████████████████████████▌                          | 27/46 [00:49<00:40,  2.12s/it]

MAT06 NJSLA DATA 2016-2017.xlsx


Processing files:  67%|███████████████████████████████████████████▏                    | 31/46 [00:53<00:24,  1.61s/it]

MAT07 NJSLA DATA 2016-2017.xlsx


Processing files:  70%|████████████████████████████████████████████▌                   | 32/46 [00:57<00:27,  1.94s/it]

MAT08 NJSLA DATA 2016-2017.xlsx


Processing files:  72%|█████████████████████████████████████████████▉                  | 33/46 [01:01<00:29,  2.25s/it]

MAT06 NJSLA DATA 2015-2016.xlsx


Processing files:  80%|███████████████████████████████████████████████████▍            | 37/46 [01:05<00:14,  1.64s/it]

MAT07 NJSLA DATA 2015-2016.xlsx


Processing files:  83%|████████████████████████████████████████████████████▊           | 38/46 [01:09<00:15,  1.93s/it]

MAT08 NJSLA DATA 2015-2016.xlsx


Processing files:  85%|██████████████████████████████████████████████████████▎         | 39/46 [01:12<00:15,  2.17s/it]

MAT06 NJSLA DATA 2014-2015.xlsx


Processing files:  93%|███████████████████████████████████████████████████████████▊    | 43/46 [01:15<00:04,  1.51s/it]

MAT07 NJSLA DATA 2014-2015.xlsx


Processing files:  96%|█████████████████████████████████████████████████████████████▏  | 44/46 [01:18<00:03,  1.74s/it]

MAT08 NJSLA DATA 2014-2015.xlsx


Processing files: 100%|████████████████████████████████████████████████████████████████| 46/46 [01:21<00:00,  1.78s/it]

Concatinatinating dataframes
mathResultsDF is ready.





In [5]:
# Reading data from annual files with results by schools

# Initialize an empty list to store dataframes
ELA_DFs = []

directory = os.path.join(basePath, dataFolder)

# Loop through each file in the directory
for filename in tqdm(os.listdir(directory), desc = 'Processing files'):
    if filename.endswith('.xlsx') and filename.startswith('ELA'):
        print(filename)
        
        # Construct the full file path
        file_path = os.path.join(directory, filename)
        
        # Read the Excel file
        df = pd.read_excel(file_path, skiprows=2)
        
        # Filter the dataframe 
        filtered_df = df[(df['Subgroup'].str.lower() == 'total') & (df['School Name'].str.lower() != 'district total') & pd.notna(df['School Name']) & (df['School Name'].str.strip() != '')]
        
        # Add a column with type of assessment and grade (ex: MAT06),
        # it is in the first 5 characters of the filename
        filtered_df['Assessment'] = filename[:5] 
        
        # Add a column with year, it is in the last 4 characters before file extention in the filename
        filtered_df['Year'] = filename[-9:-5] 
        
        # Harmonizing cases in columns between different tables
        column_to_upper = ['County Name', 'District Name', 'School Name', 'Subgroup', 'Subgroup_Type']
        for col in column_to_upper:
            filtered_df[col] = filtered_df[col].str.upper()
        
        # Append the filtered dataframe to the list 'ELA_DFs'
        ELA_DFs.append(filtered_df)

print("Concatinating dataframes")        
# Concatenate all dataframes into one
ELAResultsDF = pd.concat(ELA_DFs, ignore_index=True)

print("ELAResultsDF is ready.")

Processing files:   0%|                                                                         | 0/46 [00:00<?, ?it/s]

ELA06 NJSLA DATA 2022-2023.xlsx


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
Processing files:   7%|████▏                                                            | 3/46 [00:04<01:06,  1.54s/it]

ELA07 NJSLA DATA 2022-2023.xlsx


Processing files:   9%|█████▋                                                           | 4/46 [00:08<01:41,  2.42s/it]

ELA08 NJSLA DATA 2022-2023.xlsx


Processing files:  11%|███████                                                          | 5/46 [00:13<02:02,  2.98s/it]

ELA06 NJSLA DATA 2021-2022.xlsx


Processing files:  20%|████████████▋                                                    | 9/46 [00:17<01:07,  1.82s/it]

ELA07 NJSLA DATA 2021-2022.xlsx


Processing files:  22%|█████████████▉                                                  | 10/46 [00:21<01:20,  2.25s/it]

ELA08 NJSLA DATA 2021-2022.xlsx


Processing files:  24%|███████████████▎                                                | 11/46 [00:26<01:38,  2.82s/it]

ELA06 NJSLA DATA 2018-2019.xlsx


Processing files:  33%|████████████████████▊                                           | 15/46 [00:31<00:58,  1.88s/it]

ELA07 NJSLA DATA 2018-2019.xlsx


Processing files:  35%|██████████████████████▎                                         | 16/46 [00:35<01:07,  2.23s/it]

ELA08 NJSLA DATA 2018-2019.xlsx


Processing files:  37%|███████████████████████▋                                        | 17/46 [00:39<01:13,  2.54s/it]

ELA06 NJSLA DATA 2017-2018.xlsx


Processing files:  48%|██████████████████████████████▌                                 | 22/46 [00:43<00:37,  1.58s/it]

ELA07 NJSLA DATA 2017-2018.xlsx


Processing files:  50%|████████████████████████████████                                | 23/46 [00:47<00:44,  1.92s/it]

ELA08 NJSLA DATA 2017-2018.xlsx


Processing files:  52%|█████████████████████████████████▍                              | 24/46 [00:51<00:48,  2.22s/it]

ELA06 NJSLA DATA 2016-2017.xlsx


Processing files:  61%|██████████████████████████████████████▉                         | 28/46 [00:55<00:30,  1.69s/it]

ELA07 NJSLA DATA 2016-2017.xlsx


Processing files:  63%|████████████████████████████████████████▎                       | 29/46 [00:59<00:34,  2.01s/it]

ELA08 NJSLA DATA 2016-2017.xlsx


Processing files:  65%|█████████████████████████████████████████▋                      | 30/46 [01:03<00:37,  2.31s/it]

ELA06 NJSLA DATA 2015-2016.xlsx


Processing files:  74%|███████████████████████████████████████████████▎                | 34/46 [01:07<00:20,  1.69s/it]

ELA07 NJSLA DATA 2015-2016.xlsx


Processing files:  76%|████████████████████████████████████████████████▋               | 35/46 [01:11<00:21,  1.99s/it]

ELA08 NJSLA DATA 2015-2016.xlsx


Processing files:  78%|██████████████████████████████████████████████████              | 36/46 [01:14<00:22,  2.28s/it]

ELA06 NJSLA DATA 2014-2015.xlsx


Processing files:  87%|███████████████████████████████████████████████████████▋        | 40/46 [01:18<00:09,  1.56s/it]

ELA07 NJSLA DATA 2014-2015.xlsx


Processing files:  89%|█████████████████████████████████████████████████████████       | 41/46 [01:21<00:08,  1.79s/it]

ELA08 NJSLA DATA 2014-2015.xlsx


Processing files: 100%|████████████████████████████████████████████████████████████████| 46/46 [01:24<00:00,  1.84s/it]

Concatinating dataframes
ELAResultsDF is ready.





In [6]:
# Setting the dictionnaries by subject and results dataframe for speeding up future processing
subjects = ['Math', 'ELA']
resultsDFs = {'Math': mathResultsDF, 'ELA': ELAResultsDF}

In [7]:
for subject in subjects:
    resultsDF = resultsDFs[subject]
    resultsDF.info()
    print(len(resultsDF))
    
del resultsDF    

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 15943 entries, 0 to 15942
Data columns (total 19 columns):
 #   Column                     Non-Null Count  Dtype  
---  ------                     --------------  -----  
 0   County Code                15943 non-null  object 
 1   County Name                15943 non-null  object 
 2   District Code              15943 non-null  object 
 3   District Name              15943 non-null  object 
 4   School Code                15943 non-null  float64
 5   School Name                15943 non-null  object 
 6   Subgroup                   15943 non-null  object 
 7   Subgroup_Type              15943 non-null  object 
 8   Registered To Test         15943 non-null  object 
 9   Not Tested ** (See Below)  15943 non-null  object 
 10  Valid Scores               15943 non-null  object 
 11  Mean Scale Score           15943 non-null  object 
 12  L1 Percent                 15943 non-null  object 
 13  L2 Percent                 15943 non-null  obj

In [8]:
# resultsDF.info() showed that most of the columns are objects instead of numbers and needed to be converted

for subject in subjects:
    resultsDF = resultsDFs[subject]
    resultsDF_colToConvert = ['Valid Scores',
     'Mean Scale Score',
     'L1 Percent',                             
     'L2 Percent',
     'L3 Percent',
     'L4 Percent',
     'L5 Percent']
    resultsDF[resultsDF_colToConvert] = resultsDF[resultsDF_colToConvert].apply(pd.to_numeric, errors = 'coerce')
    resultsDF.info()
    print(len(resultsDF))
    
del resultsDF

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 15943 entries, 0 to 15942
Data columns (total 19 columns):
 #   Column                     Non-Null Count  Dtype  
---  ------                     --------------  -----  
 0   County Code                15943 non-null  object 
 1   County Name                15943 non-null  object 
 2   District Code              15943 non-null  object 
 3   District Name              15943 non-null  object 
 4   School Code                15943 non-null  float64
 5   School Name                15943 non-null  object 
 6   Subgroup                   15943 non-null  object 
 7   Subgroup_Type              15943 non-null  object 
 8   Registered To Test         15943 non-null  object 
 9   Not Tested ** (See Below)  15943 non-null  object 
 10  Valid Scores               15481 non-null  float64
 11  Mean Scale Score           15489 non-null  float64
 12  L1 Percent                 15489 non-null  float64
 13  L2 Percent                 15489 non-null  flo

In [9]:
# Adding a separate 'Grade' column getting the gdare number from the column 'Assessment'
# Adding estimates of numbers of results for each level 
# Adding a column with unique names for schools for further analysis

for subject in subjects:
    resultsDF = resultsDFs[subject]
    
    # Getting grades
    assessment = resultsDF['Assessment']
    resultsDF['Grade'] = assessment.str[-1]
    resultsDF['Grade'] = pd.to_numeric(resultsDF['Grade'])
    
    # Some schools in different school districts are called the same, causing issues in analysis 
    # further. Names of schools districs are incosistantly recorded in the data and cannot be used
    # to distinguish those schools, county names, however, are consistant, so we use them as proxy
    # to make unique key for schools
    resultsDF['School_Key'] = resultsDF['School Name'] + ', '+resultsDF['County Name']
    
    # We'll need number of tests results for each level, so we estimate these nubers backwords
    # from precentage of the results for each level
    levels = ['L1', 'L2', 'L3', 'L4', 'L5']
    for l in levels:        
        resultsDF[f'{l} Number'] = (resultsDF[f'{l} Percent']*0.01)*resultsDF['Valid Scores']
    
    print(resultsDF.head())

del resultsDF

  County Code County Name District Code                    District Name  \
0          01    ATLANTIC        10.000  ABSECON PUBLIC SCHOOLS DISTRICT   
1          01    ATLANTIC       110.000    ATLANTIC CITY SCHOOL DISTRICT   
2          01    ATLANTIC       110.000    ATLANTIC CITY SCHOOL DISTRICT   
3          01    ATLANTIC       110.000    ATLANTIC CITY SCHOOL DISTRICT   
4          01    ATLANTIC       110.000    ATLANTIC CITY SCHOOL DISTRICT   

   School Code              School Name Subgroup Subgroup_Type  \
0       50.000           EMMA C ATTALES    TOTAL  ALL STUDENTS   
1       30.000  SOVEREIGN AVENUE SCHOOL    TOTAL  ALL STUDENTS   
2       50.000   CHELSEA HEIGHTS SCHOOL    TOTAL  ALL STUDENTS   
3       60.000      TEXAS AVENUE SCHOOL    TOTAL  ALL STUDENTS   
4       70.000   NEW YORK AVENUE SCHOOL    TOTAL  ALL STUDENTS   

  Registered To Test Not Tested ** (See Below)  ...  L5 Percent  Assessment  \
0                  *                         *  ...       1.000    

In [10]:
# Deleting rows for elementary K-6 schools

for subject in subjects:
    resultsDF = resultsDFs[subject]
    
    # list of schools names
    schoolsNames = resultsDF['School_Key'].to_list()
    print(f"Schools' list ready for {subject}.")
    
    # Create disctionnary to hold the dataframes by schools
    schoolDFs = {}
    
    # List of schools to delete
    schools_to_delete = []
    
    # Make dataframes by schools 
    for name in schoolsNames:
        dfName = name
        schoolDFs[dfName] = resultsDF[resultsDF['School_Key'] == name]
    print(f'Dataframes by schools ready for {subject}.')

    print(f"Checking schools for grades for {subject}...")

    # Checking dataframes by school
    for schoolDF, current_dataframe in tqdm(schoolDFs.items()):
        # schoolDF contains the name of the dataframe
        # current_dataframe contains the dataframe itself
        # Do something with current_dataframe    
        if not (7 in current_dataframe['Grade'].values or 8 in current_dataframe['Grade'].values):
            schools_to_delete.append(schoolDF)

    print(f"Deleting the K-6 schools from {subject} results...")        
    # Deleting the K-6 schools from schoolDFs
    for schoolDF in tqdm(schools_to_delete):
        del schoolDFs[schoolDF]
    
    print(f'Finalizing the {subject} results dataframe...')
    # Concatenate all schools dataframes along the columns before merging
    resultsDFs[subject] = pd.concat(list(schoolDFs.values()), axis=0)
                                    
    print(f"Dataframe for {subject} results ready.")

del resultsDF

Schools' list ready for Math.
Dataframes by schools ready for Math.
Checking schools for grades for Math...


100%|███████████████████████████████████████████████████████████████████████████| 1231/1231 [00:00<00:00, 10100.45it/s]


Deleting the K-6 schools from Math results...


100%|████████████████████████████████████████████████████████████████████████████| 257/257 [00:00<00:00, 128785.68it/s]

Finalizing the Math results dataframe...





Dataframe for Math results ready.
Schools' list ready for ELA.
Dataframes by schools ready for ELA.
Checking schools for grades for ELA...


100%|████████████████████████████████████████████████████████████████████████████| 1231/1231 [00:00<00:00, 7092.16it/s]


Deleting the K-6 schools from ELA results...


100%|████████████████████████████████████████████████████████████████████████████| 255/255 [00:00<00:00, 127600.52it/s]

Finalizing the ELA results dataframe...





Dataframe for ELA results ready.


## Analysis

#### Prepare schools dataframe with only middle school tests results (grades 6-8)

In [12]:
# Select middle school grades results from the dataframes with Math and ELA tests results

resultsMS_bySchl_Norm ={}

for subject in subjects:
        
    resultsDF = resultsDFs[subject]
       
    # Dataframe with only grades 6-8 results (middle schools and K-8) by years
    resultsMS = resultsDF[(resultsDF['Grade'] >= 6)&(resultsDF['Grade'] <= 8)]
    
    # Dataframe with results grouped by years
    resultsMS_bySchl = resultsMS.groupby(['School_Key', 'School Name', 'Year'])[['L1 Number','L2 Number','L3 Number','L4 Number','L5 Number']].sum()
    
    # Change column names to include subject
    resultsMS_bySchl.columns = [f'Level 1 {subject}',f'Level 2 {subject}',f'Level 3 {subject}',f'Level 4 {subject}', f'Level 5 {subject}']
    
    # Dataframe for middle schools by years with normalized values
    resultsMS_bySchl_Norm[subject] = resultsMS_bySchl.div(resultsMS_bySchl.sum(axis=1), axis=0)
    resultsMS_bySchl_Norm[subject].reset_index(inplace=True)
    resultsMS_bySchl_Norm[subject] = resultsMS_bySchl_Norm[subject].T.drop_duplicates().T
    
    print(resultsMS_bySchl_Norm[subject].head(20))
    
del resultsDF, resultsMS_bySchl

                                           School_Key  \
0                       ABINGTON AVENEU SCHOOL, ESSEX   
1                       ABINGTON AVENUE SCHOOL, ESSEX   
2                       ABINGTON AVENUE SCHOOL, ESSEX   
3                       ABINGTON AVENUE SCHOOL, ESSEX   
4                       ABINGTON AVENUE SCHOOL, ESSEX   
5                       ABINGTON AVENUE SCHOOL, ESSEX   
6                       ABINGTON AVENUE SCHOOL, ESSEX   
7        ABRAHAM LINCOLN MIDDLE SCHOOL NO. 4, PASSAIC   
8                ABRAHAM LINCOLN SCHOOL NO. 14, UNION   
9                ABRAHAM LINCOLN SCHOOL NO. 14, UNION   
10               ABRAHAM LINCOLN SCHOOL NO. 14, UNION   
11               ABRAHAM LINCOLN SCHOOL NO. 14, UNION   
12               ABRAHAM LINCOLN SCHOOL NO. 14, UNION   
13               ABRAHAM LINCOLN SCHOOL NO. 14, UNION   
14               ABRAHAM LINCOLN SCHOOL NO. 14, UNION   
15  ACADEMY FOR URBAN LEADERSHIP CHARTER SCHOOL, C...   
16  ACADEMY FOR URBAN LEADERSHI

In [14]:
# Make a merged dataframe with both Math and ELA results

DFs = list(resultsMS_bySchl_Norm.values())
allResultsDF = pd.merge(DFs[0], DFs[1], on = ['School_Key', 'School Name', 'Year'], how = 'inner')
allResultsDF.head()

Unnamed: 0,School_Key,School Name,Year,Level 1 Math,Level 2 Math,Level 3 Math,Level 4 Math,Level 5 Math,Level 1 ELA,Level 2 ELA,Level 3 ELA,Level 4 ELA,Level 5 ELA
0,"ABINGTON AVENEU SCHOOL, ESSEX",ABINGTON AVENEU SCHOOL,2019,0.157,0.24,0.282,0.268,0.052,0.162,0.13,0.218,0.333,0.158
1,"ABINGTON AVENUE SCHOOL, ESSEX",ABINGTON AVENUE SCHOOL,2015,0.182,0.356,0.279,0.178,0.005,0.157,0.23,0.301,0.273,0.038
2,"ABINGTON AVENUE SCHOOL, ESSEX",ABINGTON AVENUE SCHOOL,2016,0.192,0.272,0.276,0.256,0.004,0.13,0.126,0.236,0.451,0.059
3,"ABINGTON AVENUE SCHOOL, ESSEX",ABINGTON AVENUE SCHOOL,2017,0.192,0.226,0.277,0.284,0.022,0.114,0.156,0.26,0.393,0.076
4,"ABINGTON AVENUE SCHOOL, ESSEX",ABINGTON AVENUE SCHOOL,2018,0.19,0.233,0.234,0.293,0.05,0.15,0.15,0.202,0.387,0.112


In [15]:
# Add colomn with sum of shares of level5 students by Math and level5 students ELA

allResultsDF['Level 5 Math+Ela'] = allResultsDF[f'Level 5 {subjects[0]}']+allResultsDF[f'Level 5 {subjects[1]}']
allResultsDF.head(10)

Unnamed: 0,School_Key,School Name,Year,Level 1 Math,Level 2 Math,Level 3 Math,Level 4 Math,Level 5 Math,Level 1 ELA,Level 2 ELA,Level 3 ELA,Level 4 ELA,Level 5 ELA,Level 5 Math+Ela
0,"ABINGTON AVENEU SCHOOL, ESSEX",ABINGTON AVENEU SCHOOL,2019,0.157,0.24,0.282,0.268,0.052,0.162,0.13,0.218,0.333,0.158,0.21
1,"ABINGTON AVENUE SCHOOL, ESSEX",ABINGTON AVENUE SCHOOL,2015,0.182,0.356,0.279,0.178,0.005,0.157,0.23,0.301,0.273,0.038,0.043
2,"ABINGTON AVENUE SCHOOL, ESSEX",ABINGTON AVENUE SCHOOL,2016,0.192,0.272,0.276,0.256,0.004,0.13,0.126,0.236,0.451,0.059,0.063
3,"ABINGTON AVENUE SCHOOL, ESSEX",ABINGTON AVENUE SCHOOL,2017,0.192,0.226,0.277,0.284,0.022,0.114,0.156,0.26,0.393,0.076,0.098
4,"ABINGTON AVENUE SCHOOL, ESSEX",ABINGTON AVENUE SCHOOL,2018,0.19,0.233,0.234,0.293,0.05,0.15,0.15,0.202,0.387,0.112,0.162
5,"ABINGTON AVENUE SCHOOL, ESSEX",ABINGTON AVENUE SCHOOL,2022,0.264,0.364,0.275,0.096,0.0,0.153,0.234,0.319,0.258,0.037,0.037
6,"ABINGTON AVENUE SCHOOL, ESSEX",ABINGTON AVENUE SCHOOL,2023,0.296,0.354,0.26,0.083,0.007,0.228,0.167,0.264,0.281,0.061,0.068
7,"ABRAHAM LINCOLN MIDDLE SCHOOL NO. 4, PASSAIC",ABRAHAM LINCOLN MIDDLE SCHOOL NO. 4,2018,0.203,0.36,0.305,0.131,0.001,0.222,0.217,0.265,0.242,0.055,0.056
8,"ABRAHAM LINCOLN SCHOOL NO. 14, UNION",ABRAHAM LINCOLN SCHOOL NO. 14,2015,0.051,0.252,0.374,0.316,0.007,0.081,0.14,0.301,0.436,0.043,0.05
9,"ABRAHAM LINCOLN SCHOOL NO. 14, UNION",ABRAHAM LINCOLN SCHOOL NO. 14,2016,0.087,0.167,0.428,0.305,0.014,0.082,0.112,0.228,0.505,0.073,0.087


In [16]:
unique_values = allResultsDF['Year'].unique()
print(unique_values)

['2019' '2015' '2016' '2017' '2018' '2022' '2023']


#### Create dataframe with average 2015-2023 math and ela test results for all middle school grades

In [17]:
# Make a merged dataframe with both Math and ELA average 2015-2023 results 

resultsMS_AVG2015_23 = {}

for subject in subjects:
    
    resultsDF = resultsDFs[subject]
   
  
    # Dataframe with only grades 6-8 results (middle schools and K-8) by schools
    resultsMS_bySchl_sumed = resultsDF.groupby(['School_Key', 'School Name'])[['L1 Number','L2 Number','L3 Number','L4 Number','L5 Number']].sum()
    
    # Rename columns
    resultsMS_bySchl_sumed.columns = [f'# Level 1 {subject}',f'# Level 2 {subject}',f'# Level 3 {subject}',f'# Level 4 {subject}', f'# Level 5 {subject}']

    
    # Dataframe for middle schools by years with normalized values
    resultsMS_bySchl_sumed_Norm = resultsMS_bySchl_sumed.div(resultsMS_bySchl_sumed.sum(axis=1), axis=0)
    resultsMS_bySchl_sumed_Norm.columns = [f'8yrs avg Lvl 1 {subject}',f'8yrs avg Lvl 2 {subject}',f'8yrs avg Lvl 3 {subject}', f'8yrs avg Lvl 4 {subject}', f'8yrs avg Lvl 5 {subject}']
    resultsMS_bySchl_sumed_Norm.reset_index(inplace = True)
    
    # Add the dataframe to the respective dictionnary 
    resultsMS_AVG2015_23[subject] = resultsMS_bySchl_sumed_Norm
    print(subject)
    print(len(resultsMS_AVG2015_23[subject]))
    

# del resultsDF, resultsMS_bySchl_sumed_Norm, resultsMS_bySchl_sumed_sorted, fileName, filePath, resultsMS_bySchl_sumed
del resultsDF, resultsMS_bySchl_sumed_Norm, resultsMS_bySchl_sumed

Math
974
ELA
976


In [44]:
# Make a merged dataframe with both Math and ELA average 2013-2023 results 

AVG2015_23_DFs = list(resultsMS_AVG2015_23.values())
allResultsAVG2015_23DF = pd.merge(AVG2015_23_DFs[0], AVG2015_23_DFs[1], on = ['School_Key','School Name'], how = 'inner')
allResultsAVG2015_23DF['8yrs avg Lvl 5 Math+Ela'] = allResultsAVG2015_23DF[f'8yrs avg Lvl 5 {subjects[0]}']+allResultsAVG2015_23DF[f'8yrs avg Lvl 5 {subjects[1]}']
del AVG2015_23_DFs

In [45]:
# Make plots for popups in the map and add them as columns to the mappable dataframe

# list of schools names

schoolsNames = allResultsDF['School_Key'].to_list()
testResults = allResultsDF

print("Schools' list ready.")
# Create disctionnary to hold the dataframes by schools
schoolDFs = {}

# Make dataframes by schools 
for name in schoolsNames:
    dfName = name
    schoolDFs[dfName] = testResults[testResults['School_Key'] == name]
print('Dataframes by schools ready.')


plotsDFs = {}


print("Making plots of test results ...")

for subject in subjects:
    plots = []
    columns_to_plot = [f'Level 1 {subject}',f'Level 2 {subject}',f'Level 3 {subject}',f'Level 4 {subject}', f'Level 5 {subject}']  

    # Plot dataframes by school

    for schoolDF, current_dataframe in tqdm(schoolDFs.items()):
        # schoolDF contains the name of the dataframe
        # current_dataframe contains the dataframe itself
        # Do something with current_dataframe
        # Create a plot
        fig = create_plot(current_dataframe, schoolDF, columns_to_plot)

        # Convert the plot to a PNG image and then encode it
        io_buf = BytesIO()
        fig.savefig(io_buf, format='png', bbox_inches='tight')
        # Close the figure
        plt.close()
        #Reading file to get the base64 string
        io_buf.seek(0)
        base64_string = base64.b64encode(io_buf.read()).decode('utf8')

        pair = (schoolDF, base64_string)

        plots.append(pair) 
            
    # add the plots to the dataframe of middle schools subject results 
    plotsDFs[subject] = pd.DataFrame(plots, columns=['School Name', f'plot {subject}'])

           
# Concatenate all plots DataFrames along the columns before merging
combined_plots_df = pd.concat(plotsDFs.values(), axis=1)


print('Adding plots to the dataframe with test results.')    
allResultsAVG2015_23DF = pd.merge(allResultsAVG2015_23DF, combined_plots_df, left_on = 'School_Key', right_on=combined_plots_df.iloc[:, 0], suffixes=('', '_drop'))
allResultsAVG2015_23DF = allResultsAVG2015_23DF.loc[:, ~allResultsAVG2015_23DF.columns.str.endswith('_drop')]
print('Done.')  

Schools' list ready.
Dataframes by schools ready.
Making plots of test results ...


100%|████████████████████████████████████████████████████████████████████████████████| 973/973 [03:22<00:00,  4.80it/s]
100%|████████████████████████████████████████████████████████████████████████████████| 973/973 [03:11<00:00,  5.09it/s]

Adding plots to the dataframe with test results.
Done.





In [47]:
allResultsAVG2015_23DF.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 973 entries, 0 to 972
Data columns (total 15 columns):
 #   Column                   Non-Null Count  Dtype  
---  ------                   --------------  -----  
 0   School_Key               973 non-null    object 
 1   School Name              973 non-null    object 
 2   8yrs avg Lvl 1 Math      964 non-null    float64
 3   8yrs avg Lvl 2 Math      964 non-null    float64
 4   8yrs avg Lvl 3 Math      964 non-null    float64
 5   8yrs avg Lvl 4 Math      964 non-null    float64
 6   8yrs avg Lvl 5 Math      964 non-null    float64
 7   8yrs avg Lvl 1 ELA       965 non-null    float64
 8   8yrs avg Lvl 2 ELA       965 non-null    float64
 9   8yrs avg Lvl 3 ELA       965 non-null    float64
 10  8yrs avg Lvl 4 ELA       965 non-null    float64
 11  8yrs avg Lvl 5 ELA       965 non-null    float64
 12  8yrs avg Lvl 5 Math+Ela  964 non-null    float64
 13  plot Math                973 non-null    object 
 14  plot ELA                 9

In [21]:
allResultsAVG2015_23DF.head(10)

Unnamed: 0,School_Key,School Name,8yrs avg Lvl 1 Math,8yrs avg Lvl 2 Math,8yrs avg Lvl 3 Math,8yrs avg Lvl 4 Math,8yrs avg Lvl 5 Math,8yrs avg Lvl 1 ELA,8yrs avg Lvl 2 ELA,8yrs avg Lvl 3 ELA,8yrs avg Lvl 4 ELA,8yrs avg Lvl 5 ELA,8yrs avg Lvl 5 Math+Ela,plot Math,plot ELA
0,"ABINGTON AVENEU SCHOOL, ESSEX",ABINGTON AVENEU SCHOOL,0.157,0.24,0.282,0.268,0.052,0.162,0.13,0.218,0.333,0.158,0.21,iVBORw0KGgoAAAANSUhEUgAAAdQAAAGGCAYAAADCYXCQAA...,iVBORw0KGgoAAAANSUhEUgAAAdQAAAGGCAYAAADCYXCQAA...
1,"ABINGTON AVENUE SCHOOL, ESSEX",ABINGTON AVENUE SCHOOL,0.221,0.298,0.266,0.2,0.015,0.156,0.176,0.263,0.34,0.065,0.08,iVBORw0KGgoAAAANSUhEUgAAAdQAAAGGCAYAAADCYXCQAA...,iVBORw0KGgoAAAANSUhEUgAAAdQAAAGGCAYAAADCYXCQAA...
2,"ABRAHAM LINCOLN MIDDLE SCHOOL NO. 4, PASSAIC",ABRAHAM LINCOLN MIDDLE SCHOOL NO. 4,0.203,0.36,0.305,0.131,0.001,0.222,0.217,0.265,0.242,0.055,0.056,iVBORw0KGgoAAAANSUhEUgAAAl4AAAGGCAYAAACqpI9ZAA...,iVBORw0KGgoAAAANSUhEUgAAAl4AAAGGCAYAAACqpI9ZAA...
3,"ABRAHAM LINCOLN SCHOOL NO. 14, UNION",ABRAHAM LINCOLN SCHOOL NO. 14,0.128,0.279,0.36,0.227,0.007,0.106,0.13,0.262,0.389,0.114,0.12,iVBORw0KGgoAAAANSUhEUgAAAhQAAAGGCAYAAAAjPBgwAA...,iVBORw0KGgoAAAANSUhEUgAAAhQAAAGGCAYAAAAjPBgwAA...
4,"ACADEMY FOR URBAN LEADERSHIP CHARTER SCHOOL, C...",ACADEMY FOR URBAN LEADERSHIP CHARTER SCHOOL,0.372,0.353,0.202,0.071,0.002,0.208,0.201,0.289,0.276,0.027,0.028,iVBORw0KGgoAAAANSUhEUgAAAtEAAAGGCAYAAAC9qt3VAA...,iVBORw0KGgoAAAANSUhEUgAAAtEAAAGGCAYAAAC9qt3VAA...
5,"ACADEMY I, HUDSON",ACADEMY I,0.042,0.056,0.106,0.552,0.244,0.024,0.031,0.057,0.409,0.479,0.723,iVBORw0KGgoAAAANSUhEUgAAAZEAAAGGCAYAAAC68rx0AA...,iVBORw0KGgoAAAANSUhEUgAAAZEAAAGGCAYAAAC68rx0AA...
6,"ACHIEVE COMMUNITY CHARTER SCHOOL, CHARTERS",ACHIEVE COMMUNITY CHARTER SCHOOL,0.291,0.355,0.212,0.133,0.01,0.132,0.142,0.275,0.358,0.093,0.103,iVBORw0KGgoAAAANSUhEUgAAAl8AAAGGCAYAAABFZuRnAA...,iVBORw0KGgoAAAANSUhEUgAAAl8AAAGGCAYAAABFZuRnAA...
7,"ACHIEVERS EARLY COLLEGE PREP CHARTER SCHOOL, C...",ACHIEVERS EARLY COLLEGE PREP CHARTER SCHOOL,0.326,0.414,0.192,0.068,0.0,0.197,0.223,0.284,0.254,0.042,0.042,iVBORw0KGgoAAAANSUhEUgAAAsIAAAGGCAYAAABxM+c+AA...,iVBORw0KGgoAAAANSUhEUgAAAsIAAAGGCAYAAABxM+c+AA...
8,"ALBERT E GRICE MIDDLE SCHOOL, MERCER",ALBERT E GRICE MIDDLE SCHOOL,0.191,0.303,0.303,0.185,0.018,0.139,0.18,0.278,0.303,0.1,0.118,iVBORw0KGgoAAAANSUhEUgAAAhAAAAGGCAYAAAAq17hKAA...,iVBORw0KGgoAAAANSUhEUgAAAhAAAAGGCAYAAAAq17hKAA...
9,"ALDER AVENUE MIDDLE SCHOOL, ATLANTIC",ALDER AVENUE MIDDLE SCHOOL,0.136,0.247,0.316,0.273,0.027,0.111,0.151,0.271,0.366,0.102,0.129,iVBORw0KGgoAAAANSUhEUgAAAhQAAAGGCAYAAAAjPBgwAA...,iVBORw0KGgoAAAANSUhEUgAAAhQAAAGGCAYAAAAjPBgwAA...


<a id="maps"></a> 
### Preparing geoJSON for mapping

#### Read schools geolocation file

In [22]:
# Read GeoJSON into dataframe

SchoolsFile = 'School_Point_Locations_of_NJ_(Public%2C_Private_and_Charter).geojson'
NJSchoolsPath = os.path.join(basePath, dataFolder, SchoolsFile)
NJSchoolsData = gpd.read_file(NJSchoolsPath)

In [23]:
# Add column with school-county key for each school

NJSchoolsData['School_Key'] = NJSchoolsData['SCHOOL']  + ', '+ NJSchoolsData['COUNTY']

In [24]:
NJSchoolsData.info()

<class 'geopandas.geodataframe.GeoDataFrame'>
RangeIndex: 3803 entries, 0 to 3802
Data columns (total 28 columns):
 #   Column       Non-Null Count  Dtype   
---  ------       --------------  -----   
 0   OBJECTID     3803 non-null   int64   
 1   SCH_GUID     3797 non-null   object  
 2   COUNTYCODE   3803 non-null   object  
 3   COUNTY       3803 non-null   object  
 4   DIST_CODE    3803 non-null   object  
 5   DIST_NAME    3802 non-null   object  
 6   SCHOOLCODE   3803 non-null   object  
 7   SCHOOLTYPE   3232 non-null   object  
 8   SCHOOL       3803 non-null   object  
 9   SCHOOLNAME   3801 non-null   object  
 10  ADDRESS1     3803 non-null   object  
 11  ADDRESS2     233 non-null    object  
 12  CITY         3803 non-null   object  
 13  STATE        3802 non-null   object  
 14  ZIP          3803 non-null   object  
 15  PHONE        3723 non-null   object  
 16  X            3803 non-null   float64 
 17  Y            3803 non-null   float64 
 18  SOURCE       3513 no

#### Merge the GeoJSON and the results dataframe

In [25]:
#NYCSchoolsData.info() #Too many columns --> make a smaller copy

NJSchoolsDataShort = NJSchoolsData[['OBJECTID', 'DIST_NAME', 'SCHOOLTYPE', 'SCHOOL', 'SCHOOLNAME', 'CITY', 'School_Key','geometry']]
NJSchoolsDataShort.head()

Unnamed: 0,OBJECTID,DIST_NAME,SCHOOLTYPE,SCHOOL,SCHOOLNAME,CITY,School_Key,geometry
0,1,Glen Rock,"DAY CARE, TRANSITIONAL K",Glen Rock Cooperative Nursery School,Glen Rock Cooperative Nusery School,Glen Rock,"Glen Rock Cooperative Nursery School, BERGEN",POINT (-74.12435 40.96150)
1,2,North Brunswick Twp,CHILD CARE/PRE-SCHOOL,Creative Nursery School Childcare & Learning C...,Creative Nursery School,North Brunswick,Creative Nursery School Childcare & Learning C...,POINT (-74.47660 40.44052)
2,3,Old Bridge Twp.,PRE-SCHOOL/PRE-K,Good Shepherd Children's Center,Good Shepherd Children's Center,Old Bridge,"Good Shepherd Children's Center, MIDDLESEX",POINT (-74.30568 40.40041)
3,4,Lakewood Twp,SPECIAL EDUCATION,TREE OF KNOWLEDGE LEARNING ACADEMY,Tree Of Knowledge,Lakewood,"TREE OF KNOWLEDGE LEARNING ACADEMY, OCEAN",POINT (-74.21612 40.09282)
4,5,Hackensack City,DAY CARE,Sarkis & Siran Gabrellian Child Care Center,Sarkis & Siran Gabrellian Child Care and Learn...,Hackensack,"Sarkis & Siran Gabrellian Child Care Center, B...",POINT (-74.05591 40.88239)


In [26]:
# Matching the school all data file with spatial data (geojson of schools' locations) 
# by the 'School_Key' columns from 'allResultsAVG2015_23DF' to 'NYCSchoolsDataShort' dataframes
# Matched scores later are used to find out mismatched rows  

tqdm.pandas(desc="Matching Names")

matched_tuples = allResultsAVG2015_23DF['School_Key'].progress_apply(
    lambda x: match_name(x, NJSchoolsDataShort['School_Key'], min_score=70))

print('Done.')

Matching Names: 100%|████████████████████████████████████████████████████████████████| 973/973 [09:29<00:00,  1.71it/s]

Done.





In [48]:
# Appending matches to the dataframe 'allResultsAVG2015_23DF'

print('Appending mathes to the dataframe.')
allResultsAVG2015_23DF['matched_name'] = list(zip(*matched_tuples))[0]
allResultsAVG2015_23DF['matched_score'] = list(zip(*matched_tuples))[1]
print('Done.')

Appending mathes to the dataframe.
Done.


In [49]:
allResultsAVG2015_23DF.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 973 entries, 0 to 972
Data columns (total 17 columns):
 #   Column                   Non-Null Count  Dtype  
---  ------                   --------------  -----  
 0   School_Key               973 non-null    object 
 1   School Name              973 non-null    object 
 2   8yrs avg Lvl 1 Math      964 non-null    float64
 3   8yrs avg Lvl 2 Math      964 non-null    float64
 4   8yrs avg Lvl 3 Math      964 non-null    float64
 5   8yrs avg Lvl 4 Math      964 non-null    float64
 6   8yrs avg Lvl 5 Math      964 non-null    float64
 7   8yrs avg Lvl 1 ELA       965 non-null    float64
 8   8yrs avg Lvl 2 ELA       965 non-null    float64
 9   8yrs avg Lvl 3 ELA       965 non-null    float64
 10  8yrs avg Lvl 4 ELA       965 non-null    float64
 11  8yrs avg Lvl 5 ELA       965 non-null    float64
 12  8yrs avg Lvl 5 Math+Ela  964 non-null    float64
 13  plot Math                973 non-null    object 
 14  plot ELA                 9

In [50]:
# Checking how many rows remained unmatched to see if minimum score is optimal

(allResultsAVG2015_23DF['matched_score'] == -1).sum()

# 19 if minimal score = 70, which is good for this case

19

In [35]:
# Saving 'allResultsAVG2015_23DF' dataframe to csv file to manually check mismatches

name = 'NJTestResults2023_tempMatched.csv'
path = os.path.join(basePath, outputFolder, name)
print(f'Saving to {path} ...')
allResultsAVG2015_23DF.to_csv(path)
print('Saved.')

del name, path

Saving to G:\My Drive\Kids\NJ_schools_mapped\processed_data\NJTestResults2023_tempMatched9.csv ...
Saved.


In [55]:
# Unmatched or matched incorrectly names identified by 
# visual observations on the map or by analysing the geoJSON in prefered software
# allResultsAVG2015_23DF['School_Key']:NJSchoolsDataShort['School_Key']
# in case the school turned up to be closed or the row not being a school the match changed to
# empty '' to make sure, the row would not be merged to a school point

unmatched = {
    'HORACE MANN #6, HUDSON':'Horace Mann Community School, Hudson',
    'JOHN M. BAILEY #12, HUDSON':'John M. Bailey Community School, HUDSON',
    'RONALD REAGAN ACADEMY SCHOOL NO. 30, UNION':'Chessie Dentley Roberts Academy School No. 30, UNION',
    'MERIT PREPARATORY CHARTER SCHOOL OF NEWARK, CHARTERS':'',
    'LADY LIBERTY ACADEMY CHARTER SCHOOL, CHARTERS':'',
    'CLASSICAL ACADEMY CHARTER SCHOOL , CHARTERS':'Classical Academy Charter School of Clifton, PASSAIC',
    'MILLER STREET SCHOOL AT SPENCER, ESSEX':'',
    'WINFIELD TOWNSHIP, UNION':'',
    'CHARLES J. HUDSON SCHOOL NO. 25, UNION':'Sonia Sotomayor School No 25, UNION',
    'JOHN WITHERSPOON MIDDLE SCHOOL, MERCER':'Princeton Middle School, MERCER',
    'WOODROW WILSON #10, HUDSON':'Woodrow Wilson Community School, HUDSON',
    'CALIFON ELEMENTARY, HUNTERDON':'Califon Public School, HUNTERDON',
    'RAFAEL CORDERO MOLINA ELEMENTARY SCHOOL, CAMDEN':'Mastery Schools of Camden, Inc., HUDSON',
    'DON BOSCO ACADEMY, PASSAIC':'',
    'VETERANS MEMORIAL FAMILY SCHOOL, CAMDEN':'Veteran\'S Memorial Middle School, OCEAN',
    'CAMDENS PROMISE CHARTER SCHOOL, CHARTERS':'Camden\'s Promise Charter School, CAMDEN',
    'DR. MARTIN LUTHER KING MIDDLE SCHOOL, MERCER':'Dr. Martin Luther King, Jr., MERCER',
    'GALLOWAY COMMUNITY CHARTER SCHOOL, CHARTERS':'',
    'LINCOLN AVENUE MIDDLE SCHOOL, CUMBERLAND':'Sgt. Dominick Pilla Middle School, CUMBERLAND',
    'QUITMAN COMMUNITY SCHOOL, ESSEX':'Quitman Street School, ESSEX',
    'GRETTA R. OSTROVSKY MIDDLE SCHOOL, BERGEN':'',
    'ALTERNATIVE MIDDLE & HIGH SCHOOL, SALEM':'',
    'EAST CAMDEN MIDDLE SCHOOL, CAMDEN':'Mastery Schools Of Camden, Inc., CAMDIEN',
    'HENRY L. BONSALL FAMILY SCHOOL, CAMDEN':'',
    'PYNE POYNT MIDDLE SCHOOL, CAMDEN':'',
    'STRIVE ALTERNATIVE MIDDLE SCHOOL, PASSAIC':'',
    'PORT NORRIS MIDDLE SCHOOL, CUMBERLAND':'',
    'EAST NEWARK PUBLIC SCHOOL, HUDSON':'East Newark Middle School, HUDSON',
    'MONONGAHELA MIDDLE SCHOOL, GLOUCESTER':'Deptford Township Middle School, GLOUCESTER',
    'MT HEBRON MIDDLE SCHOOL, ESSEX':'',
    'MT. HEBRON MIDDLE SCHOOL, ESSEX':'',
    'OXFORD STREET ELEMENTARY SCHOOL, WARREN':'Belvidere Elementary School, WARREN',
    'CLEVELAND AVENUE SCHOOL, ESSEX':'',
    'HAMMARSKJOLD MIDDLE SCHOOL, MIDDLESEX':'Hammarskjold Upper Elementary School, MIDDLESEX',
    'ORANGE PREPARATORY ACADEMY, ESSEX':'Orange Preparatory Academy School of Inquiry and Innovation, ESSEX',
    'CHARLES SUMNER ELEMENTARY SCHOOL, CAMDEN':'',
    'DEERFIELD TOWNSHIP SCHOOL DISTRICT, CUMBERLAND':'',
    'MIDTOWN COMMUNITY SCHOOL #8, HUDSON':'William Shemin Midtown Community School #8, HUDSON',
    'WESTWOOD JUNIONR/SENIOR HIGH SCHOOL, BERGEN':'Westwood Regional High School, BERGEN',
    'FRANKLIN ELEMENTARY SCHOOL, SUSSEX':'Franklin Borough School, SUSSEX',
    'WESTWOOD JUNIOR/SENIOR HIGH SCHOOL, BERGEN':'Westwood Regional High School, BERGEN',
    'WOODROW WILSON ELEMENTARY SCHOOL, HUDSON':'Woodrow Wilson Community School, HUDSON',
    'FRANKLIN MIDDLE SCHOOL, SOMERSET':'Franklin Middle School at Hamilton Street Campus, SOMERSET',
    'LANDIS MIDDLE SCHOOL, CUMBERLAND':'',
    'SCHOOL 11 (NEWCOMERS), PASSAIC':'',
    'FOREST STREET ELEMENTARY SCHOOL, ESSEX':'Forest Street Community Elementary School, ESSEX',
    'DEERFIELD TOWNSHIP SCHOOL, CUMBERLAND':'Deerfield Township Elementary School, CUMBERLAND',
    'OAKWOOD AVENUE ELEMENTARY SCHOOL, ESSEX':'Oakwood Avenue Community School, ESSEX',
    'SCHOOL NO. 5, PASSAIC':'School #5, PASSAIC',
    'SCHOOL #6, BERGEN':'School #6/Middle School, BERGEN',
    'SCHOOL 6, PASSAIC':'Martin Luther King, Jr. School No. 6, PASSAIC',
    'BEVERLY CITY SCHOOL DISTRICT, BURLINGTON':'',
    'CAMDENS PROMISE CHARTER SCHOOL, CHARTERS':'',
    'DEERFIELD TOWNSHIP SCHOOL DISTRICT, CUMBERLAND':'',
    'DEERFIELD TOWNSHIP SCHOOL, CUMBERLAND':'',
    'EASTAMPTON TOWNSHIP SCHOOL DISTRICT, BURLINGTON':'',    
    'EISENHOWER MIDDLE SCHOOL DISTRICT, MORRIS':'',
    'HAMPTON BOROUGH SCHOOL DISTRICT, HUNTERDON':'',
    'HARMONY TOWNSHIP SCHOOL DISTRICT, WARREN':'',
    'HARRINGTON PARK SCHOOL DISTRICT, BERGEN':'',
    'KITTATINNY HIGH SCHOOL DISTRICT, SUSSEX':'',
    'LAWNSIDE SCHOOL DISTRICT, CAMDEN':'',
    'MAURICE RIVER TOWNSHIP SCHOOL DISTRICT, CUMBERLAND':'',
    'MONMOUTH BEACH ELEMENTARY SCHOOL DISTRICT, MONMOUTH':'',
    'PORT REPUBLIC SCHOOL DISTRICT, ATLANTIC':'', 
    'QUINTON TOWNSHIP SCHOOL DISTRICT, SALEM':'', 
    'RIVERTON SCHOOL DISTRICT, BURLINGTON':'', 
    'SHREWSBURY BOROUGH SCHOOL DISTRICT, MONMOUTH':'', 
    'SOMERDALE SCHOOL DISTRICT, CAMDEN':'',
}

In [56]:
# Replacing the erroneus matches in the 'allResultsDF_2023' dataframe

def replace_values(row):
    if row['School_Key'] in unmatched:
        row['matched_name'] = unmatched[row['School_Key']]
    return row

allResultsAVG2015_23DF = allResultsAVG2015_23DF.apply(replace_values, axis = 1)

In [33]:
allResultsAVG2015_23DF.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 973 entries, 0 to 972
Data columns (total 17 columns):
 #   Column                   Non-Null Count  Dtype  
---  ------                   --------------  -----  
 0   School_Key               973 non-null    object 
 1   School Name              973 non-null    object 
 2   8yrs avg Lvl 1 Math      964 non-null    float64
 3   8yrs avg Lvl 2 Math      964 non-null    float64
 4   8yrs avg Lvl 3 Math      964 non-null    float64
 5   8yrs avg Lvl 4 Math      964 non-null    float64
 6   8yrs avg Lvl 5 Math      964 non-null    float64
 7   8yrs avg Lvl 1 ELA       965 non-null    float64
 8   8yrs avg Lvl 2 ELA       965 non-null    float64
 9   8yrs avg Lvl 3 ELA       965 non-null    float64
 10  8yrs avg Lvl 4 ELA       965 non-null    float64
 11  8yrs avg Lvl 5 ELA       965 non-null    float64
 12  8yrs avg Lvl 5 Math+Ela  964 non-null    float64
 13  plot Math                973 non-null    object 
 14  plot ELA                 9

In [53]:
# Checking if there are rows matched to the same school and what are those rows

df_duplicates = allResultsAVG2015_23DF.groupby('matched_name').filter(lambda x: len(x) > 1)

Unnamed: 0,School_Key,School Name,8yrs avg Lvl 1 Math,8yrs avg Lvl 2 Math,8yrs avg Lvl 3 Math,8yrs avg Lvl 4 Math,8yrs avg Lvl 5 Math,8yrs avg Lvl 1 ELA,8yrs avg Lvl 2 ELA,8yrs avg Lvl 3 ELA,8yrs avg Lvl 4 ELA,8yrs avg Lvl 5 ELA,8yrs avg Lvl 5 Math+Ela,plot Math,plot ELA,matched_name,matched_score
0,"ABINGTON AVENEU SCHOOL, ESSEX",ABINGTON AVENEU SCHOOL,0.157,0.240,0.282,0.268,0.052,0.162,0.130,0.218,0.333,0.158,0.210,iVBORw0KGgoAAAANSUhEUgAAAdQAAAGGCAYAAADCYXCQAA...,iVBORw0KGgoAAAANSUhEUgAAAdQAAAGGCAYAAADCYXCQAA...,"Abington Avenue School, ESSEX",96
1,"ABINGTON AVENUE SCHOOL, ESSEX",ABINGTON AVENUE SCHOOL,0.221,0.298,0.266,0.200,0.015,0.156,0.176,0.263,0.340,0.065,0.080,iVBORw0KGgoAAAANSUhEUgAAAdQAAAGGCAYAAADCYXCQAA...,iVBORw0KGgoAAAANSUhEUgAAAdQAAAGGCAYAAADCYXCQAA...,"Abington Avenue School, ESSEX",100
14,"ALFRED S. FAUST MIDDLE SCHOOL, BERGEN",ALFRED S. FAUST MIDDLE SCHOOL,0.152,0.265,0.256,0.261,0.066,0.032,0.042,0.167,0.414,0.344,0.411,iVBORw0KGgoAAAANSUhEUgAAAhkAAAGGCAYAAADWwpOAAA...,iVBORw0KGgoAAAANSUhEUgAAAhkAAAGGCAYAAADWwpOAAA...,"Alfred S. Faust Middle School, BERGEN",100
15,"ALFRED S. FAUST, BERGEN",ALFRED S. FAUST,0.116,0.211,0.311,0.309,0.053,0.066,0.104,0.220,0.385,0.225,0.278,iVBORw0KGgoAAAANSUhEUgAAAakAAAGGCAYAAADB1n64AA...,iVBORw0KGgoAAAANSUhEUgAAAakAAAGGCAYAAADB1n64AA...,"Alfred S. Faust Middle School, BERGEN",75
22,"ALTERNATIVE MIDDLE & HIGH SCHOOL, SALEM",ALTERNATIVE MIDDLE & HIGH SCHOOL,,,,,,,,,,,,iVBORw0KGgoAAAANSUhEUgAAAiwAAAGGCAYAAABYGNr8AA...,iVBORw0KGgoAAAANSUhEUgAAAiwAAAGGCAYAAABYGNr8AA...,,79
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
960,"WOODROW WILSON COMMUNITY SCHOOL, HUDSON",WOODROW WILSON COMMUNITY SCHOOL,0.139,0.237,0.364,0.237,0.023,0.066,0.098,0.215,0.445,0.176,0.199,iVBORw0KGgoAAAANSUhEUgAAAlwAAAGGCAYAAACuUV9kAA...,iVBORw0KGgoAAAANSUhEUgAAAlwAAAGGCAYAAACuUV9kAA...,"Woodrow Wilson Community School, HUDSON",100
961,"WOODROW WILSON ELEMENTARY SCHOOL, HUDSON",WOODROW WILSON ELEMENTARY SCHOOL,0.002,0.043,0.268,0.581,0.106,0.004,0.000,0.050,0.360,0.586,0.691,iVBORw0KGgoAAAANSUhEUgAAAmIAAAGGCAYAAADYa+3vAA...,iVBORw0KGgoAAAANSUhEUgAAAmIAAAGGCAYAAADYa+3vAA...,"Woodrow Wilson Community School, HUDSON",86
970,"YOUNG MEN'S ACADEMY, PASSAIC",YOUNG MEN'S ACADEMY,0.375,0.375,0.125,0.125,0.000,0.208,0.292,0.208,0.271,0.021,0.021,iVBORw0KGgoAAAANSUhEUgAAAcgAAAGGCAYAAAD/8xH2AA...,iVBORw0KGgoAAAANSUhEUgAAAcgAAAGGCAYAAAD/8xH2AA...,"Young Men's Academy, PASSAIC",100
971,"YOUNG MENS ACADEMY, PASSAIC",YOUNG MENS ACADEMY,0.500,0.294,0.147,0.059,0.000,0.522,0.391,0.043,0.043,0.000,0.000,iVBORw0KGgoAAAANSUhEUgAAAcUAAAGGCAYAAAAKDZpGAA...,iVBORw0KGgoAAAANSUhEUgAAAcUAAAGGCAYAAAAKDZpGAA...,"Young Men's Academy, PASSAIC",94


In [59]:
# Saving the duplicates for visual checking

name = 'NJduplicates check.csv'
path = os.path.join(basePath, outputFolder, name)
print(f'Saving to {path} ...')
df_duplicates.to_csv(path)
print('Saved.')
del name, path

# A visual inspection revealed that some school names are inconsistently spelled or 
# contain errors in the records across different years. As a result, these discrepancies created
# separate entries in the allResultsAVG2015_23DF dataframe. Consequently, this has led to certain
# schools having multiple overlapping points on the map, with pop-ups displaying data for 
# different years.
# While this may affect the visual clarity and completeness of the map, the current 
# representation still provides a comprehensive overview of the academic proficiency of middle 
# schools in New Jersey. Further data cleaning to eliminate this issue required spending more 
# time and effort which was unesessery for the purpose of the project.

Saving to G:\My Drive\Kids\NJ_schools_mapped\processed_data\NJduplicates check.csv ...
Saved.


In [57]:
# Merging dataframes based on the matched name - county key

print('Merging dataframes.')
schoolsData_mappable = pd.merge(NJSchoolsDataShort,allResultsAVG2015_23DF, left_on= ['School_Key'], right_on=['matched_name'], suffixes=('', '_drop'))
schoolsData_mappable = schoolsData_mappable.loc[:, ~schoolsData_mappable.columns.str.endswith('_drop')]
data_Name = 'NJpublicSchoolsData.geojson'
data_Path = os.path.join(basePath,outputFolder, data_Name)

print(f"Saving data to GeoJSON file {data_Path}...")
schoolsData_mappable.to_file(data_Path, driver="GeoJSON")

print('Saved.')
del data_Name, data_Path

Merging dataframes.
Saving data to GeoJSON file G:\My Drive\Kids\NJ_schools_mapped\processed_data\NJpublicSchoolsData.geojson...
Saved.


In [58]:
schoolsData_mappable.info()

<class 'geopandas.geodataframe.GeoDataFrame'>
Int64Index: 1025 entries, 0 to 1024
Data columns (total 24 columns):
 #   Column                   Non-Null Count  Dtype   
---  ------                   --------------  -----   
 0   OBJECTID                 1025 non-null   int64   
 1   DIST_NAME                1025 non-null   object  
 2   SCHOOLTYPE               1008 non-null   object  
 3   SCHOOL                   1025 non-null   object  
 4   SCHOOLNAME               1025 non-null   object  
 5   CITY                     1025 non-null   object  
 6   School_Key               1025 non-null   object  
 7   geometry                 1025 non-null   geometry
 8   School Name              1025 non-null   object  
 9   8yrs avg Lvl 1 Math      1021 non-null   float64 
 10  8yrs avg Lvl 2 Math      1021 non-null   float64 
 11  8yrs avg Lvl 3 Math      1021 non-null   float64 
 12  8yrs avg Lvl 4 Math      1021 non-null   float64 
 13  8yrs avg Lvl 5 Math      1021 non-null   float64 
 14  