<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Preparation" data-toc-modified-id="Preparation-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Preparation</a></span></li><li><span><a href="#Process-routes-for-all-school-years" data-toc-modified-id="Process-routes-for-all-school-years-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Process routes for all school years</a></span></li><li><span><a href="#Handle-two-special-cases" data-toc-modified-id="Handle-two-special-cases-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>Handle two special cases</a></span><ul class="toc-item"><li><span><a href="#Dyett-HS" data-toc-modified-id="Dyett-HS-3.1"><span class="toc-item-num">3.1&nbsp;&nbsp;</span>Dyett HS</a></span></li><li><span><a href="#Crane-HS" data-toc-modified-id="Crane-HS-3.2"><span class="toc-item-num">3.2&nbsp;&nbsp;</span>Crane HS</a></span></li></ul></li><li><span><a href="#Fix-geometries-for-school-year-14/15" data-toc-modified-id="Fix-geometries-for-school-year-14/15-4"><span class="toc-item-num">4&nbsp;&nbsp;</span>Fix geometries for school year 14/15</a></span></li><li><span><a href="#Fix-route-information-of-treated-schools-for-SY1314/SY1415" data-toc-modified-id="Fix-route-information-of-treated-schools-for-SY1314/SY1415-5"><span class="toc-item-num">5&nbsp;&nbsp;</span>Fix route information of treated schools for SY1314/SY1415</a></span></li><li><span><a href="#Save" data-toc-modified-id="Save-6"><span class="toc-item-num">6&nbsp;&nbsp;</span>Save</a></span></li></ul></div>

**Description**: Reads in the shapefiles of the routes for each school year, preprocesses and concatenates the observations, harmonizes the school names as well as deals with special cases, and then saves the data.

---

In [1]:
import pickle
import sys
from pathlib import Path

import geopandas as gpd
import pandas as pd
from pandas.api.types import is_numeric_dtype
from pandas.api.types import is_string_dtype

sys.path.append('../..')
from src.prepare_data.routes import (read_routes_file, harmonize_dataframe,
                                     check_school_name_id_unique,
                                     harmonize_all_names)

In [2]:
data_path = Path('../../data')
data_routes_path = data_path / 'raw/routes'
school_name_path = data_path / 'raw/school_names.xlsx'

# Preparation

The folder for SY1314 contains two shapefiles, one of them serves as a buffer around the exact route. We only need the exact route for the analysis.

In [3]:
routes_files = [('Chicago Public Schools - Safe Passage Routes SY1516/' +
                 'geo_export_8764a65c-2db7-4490-b11c-cc6953061bef',
                 'SY1516'),
    ('Chicago Public Schools - Safe Passage Routes SY1415/' +
     'geo_export_067b2aa2-e0a0-40e6-9b1c-9ca4c4398d2b',
     'SY1415'), ('Chicago Public Schools - Safe Passage Routes SY1314/' +
                 'geo_export_179143f8-1d6e-4c28-aa3c-a072cf19f401',
                 'SY1314')
]

# Process routes for all school years

In [4]:
all_routes = []
for folder, SY in routes_files:
    path = str(data_routes_path / folder) + '.shp'
    routes_temp = read_routes_file(path, SY)
    # Adjust column names for function
    if SY == 'SY1415':
        routes_temp = routes_temp.rename(
            {
                'route_num': 'rt_num',
                'schoolname': 'school_nam',
                'school_id': 'schoolid'
            },
            axis='columns')
    elif SY == 'SY1314':
        routes_temp = routes_temp.rename(
            {
                'route_numb': 'rt_num'
            }, axis='columns')
    routes_temp = harmonize_dataframe(routes_temp)
    all_routes.append(routes_temp)

Append all routes together

In [5]:
all_routes = pd.concat(all_routes, axis='rows', ignore_index=True)

Check some dtypes

In [6]:
assert all(
    map(is_numeric_dtype,
        [all_routes['school_id'], all_routes['route_number']]))
assert all(
    map(is_string_dtype,
        [all_routes['school_name'], all_routes['school_year']]))

Make sure that all schools only appear once for each school year

In [7]:
assert all(
    all_routes.groupby('school_year')['school_name'].size() ==
    all_routes.groupby('school_year')['school_name'].nunique())

all_routes = harmonize_all_names(all_routes, 'school_name',
                                 school_name_path)

# Handle two special cases

## Dyett HS

Delete DYETT HS (609736), which has only an entry for SY1516. The Dyett HS school was closed after SY1415 and reopened as Dyett Arts HS (610587) school in SY1617, therefore in SY1516 the school was not active and this wrong entry can be deleted. Source: https://en.wikipedia.org/wiki/Dyett_High_School

In [8]:
all_routes = all_routes.query('school_id != 609736')

## Crane HS

Change entries for Crane HS (school_id: 609702) to Crane Medical HS (school_id: 610561). Crane HS does not appear anymore in schools dataframe and on official CPS website starting school year 15/16. The safe passage routes which can be found in the routes dataframe starting from SY1516 therefore should be attributed to Crane Medical HS which resides at the same address.

In [9]:
all_routes.loc[all_routes['school_id'] == 609702, 'school_id'] = 610561
all_routes.loc[all_routes['school_name'] == 'Crane HS', 'school_name'] = (
    'Crane Medical HS')

# Fix geometries for school year 14/15

Take geometries from school year 15/16 for school year 14/15 as in 14/15 the geometries of the routes are polygons, which means that they are not accurate enough for merging with the blocks later on. Too many blocks would be marked as treated otherwise.

Make sure that each route in SY1415 also exists in SY1516

In [10]:
assert all_routes.loc[all_routes[
    'school_year'] == 'SY1415', 'route_number'].isin(
        all_routes.loc[all_routes['school_year'] == 'SY1516',
                       'route_number']).all()

Create new entries for SY1415 using geometries from SY1516

In [11]:
routes_1415 = all_routes.loc[all_routes['school_year'] == 'SY1415',
                             'route_number'].unique()
routes_1415 = all_routes.loc[(all_routes['school_year'] == 'SY1516') & (
    all_routes['route_number'].isin(routes_1415))].copy()
routes_1415['school_year'] = 'SY1415'

Replace SY1415

In [12]:
routes_new = all_routes.query('school_year != "SY1415"').copy()
routes_new = pd.concat([routes_new, routes_1415], ignore_index=True)
del routes_1415
del all_routes

# Fix route information of treated schools for SY1314/SY1415

For all treated schools named in the data provided through the FOIA request, but which can not be matched to the corresponding school years, SY1314/SY1415, take route information from SY1516.

Load data on schools with a Safe Passage program provided by FOIA request

In [13]:
with (data_path / 'processed/foia_sp.pkl').open('rb') as f:
    foia_sp = pickle.load(f)
    
sy_dict = {'SY14': 'SY1314', 'SY15': 'SY1415'}

for sy in ['SY14', 'SY15']:
    # Get routes from SY1516 which belong to a school
    # which is treated in year "sy" but does not have an entry
    # yet for that school year (i.e. school year "sy")
    add_routes_temp = routes_new.loc[
        (routes_new['school_year'] == 'SY1516')
        & (routes_new['school_name'].isin(foia_sp[sy][sy])) &
        (~routes_new['school_name'].isin(
            routes_new.loc[routes_new['school_year'] == sy_dict[sy],
                           'school_name']))].copy()
    assert add_routes_temp.shape[0] > 0
    # Add these routes to existing dataset
    add_routes_temp['school_year'] = sy_dict[sy]
    routes_new = pd.concat(
        [routes_new, add_routes_temp], ignore_index=True)

# Save

First, make sure that only one school id corresponds to each name and other way around.

In [14]:
check_school_name_id_unique(routes_new)

Now save all routes

In [15]:
with (data_path / 'processed/routes.pkl').open('wb') as f:
    pickle.dump(routes_new, f)