<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Load-data" data-toc-modified-id="Load-data-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Load data</a></span></li><li><span><a href="#Match-schools-and-FOIA-data-from-SY1314-onwards" data-toc-modified-id="Match-schools-and-FOIA-data-from-SY1314-onwards-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Match schools and FOIA data from SY1314 onwards</a></span></li><li><span><a href="#Extend-school-years-back-to-SY0910" data-toc-modified-id="Extend-school-years-back-to-SY0910-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>Extend school years back to SY0910</a></span></li><li><span><a href="#Save" data-toc-modified-id="Save-4"><span class="toc-item-num">4&nbsp;&nbsp;</span>Save</a></span></li></ul></div>

**Description**: Takes preprocessed school data, matches it to
FOIA data on SP implementation from SY1314 onwards, extends school years
back to SY0910 and saves resulting dataset.

---

In [1]:
import pickle
from pathlib import Path

import pandas as pd
import numpy as np

# Load data

In [2]:
data_path = Path('../../data')

In [3]:
with (data_path / 'interim/schools.pkl').open('rb') as f:
    schools = pickle.load(f)

with (data_path / 'processed/foia_sp.pkl').open('rb') as f:
    foia_sp = pickle.load(f)

# Match schools and FOIA data from SY1314 onwards

Dictionary containing foia_sp school year names as keys and
corresponding extended school year names, as used in this thesis and in the school
data set, as values

In [4]:
sy_dict = {
    'SY16': 'SY1516',
    'SY15': 'SY1415',
    'SY14': 'SY1314',
    'SY13': 'SY1213',
    'SY12': 'SY1112'
}

Match from SY1314 onwards

In [5]:
for sy in ['SY14', 'SY15', 'SY16']:
    schools.loc[(schools['school_year'] == sy_dict[sy]) & (
        schools['school_name'].isin(foia_sp[sy][sy])), 'treated_foia'] = 1

Fill missings with 0, i.e. not treated

In [6]:
schools['treated_foia'] = schools['treated_foia'].fillna(0)

# Extend school years back to SY0910
(only for treated schools, as we don't know if others existed back then)


It is assumed that there is no change between SY0910 and SY1112
This means that the next change after SY0910 in treatment status of a school
can be earliest in SY1213
Basis for this assumption: McMillen et al. (2017)

Get list of treated schools in SY1112

In [7]:
early_treated_schools = foia_sp['SY12']['SY12'].values.tolist()

Add treated schools from SY1213 to list

In [8]:
early_treated_schools.extend(foia_sp['SY13']['SY13'].values.tolist())

Make school names unique

In [9]:
early_treated_schools = list(set(early_treated_schools))

Make sure  that all schools which are treated in school year SY1112
and/or SY1213 also have an observation in SY1314 (do not have to be treated).

In [10]:
assert all([
    True if school in schools.loc[schools['school_year'] == 'SY1314',
                                  'school_name'].values else False
    for school in early_treated_schools
])

For each early treated school take the observation from SY1314

In [11]:
early_years = schools.loc[(schools['school_year'] == 'SY1314') & (
    schools['school_name'].isin(early_treated_schools))].drop(
        'treated_foia', axis='columns')
assert not early_years['school_name'].duplicated().any()
assert early_years.shape[0] == len(early_treated_schools)

Dupliate these observations once for each school year prior to SY1314
(i.e. 4 observations per school).

In [12]:
sy_to_add = ['SY0910', 'SY1011', 'SY1112', 'SY1213']

Iteratively add observations for each school year from list above

In [13]:
all_early_years = []
for sy in sy_to_add:
    early_years_temp = early_years.copy()
    early_years_temp['school_year'] = sy
    all_early_years.append(early_years_temp)
all_early_years = pd.concat(all_early_years, ignore_index=True)
assert all_early_years.shape[0] / len(sy_to_add) == early_years.shape[0]

Add treatment indicator according to FOIA information

In [14]:
for sy_short, sy_long in [('SY12', 'SY0910'), ('SY12', 'SY1011'),
                          ('SY12', 'SY1112'), ('SY13', 'SY1213')]:
    all_early_years.loc[(all_early_years['school_year'] == sy_long) & (
        all_early_years['school_name'].isin(foia_sp[sy_short][sy_short])),
                        'treated_foia'] = 1

Fill missing values for treatment indicator with 0

In [15]:
all_early_years['treated_foia'] = all_early_years['treated_foia'].fillna(0)

Set normal treatment indicator to missing

In [16]:
all_early_years['treated'] = np.nan

Add early years to school dataset

In [17]:
schools = pd.concat([schools, all_early_years], ignore_index=True)
schools = schools.sort_values(['school_name',
                               'school_year']).reset_index(drop=True)

# Save
School data now contains observations from SY0910 to SY1516 ass well as the FOIA treatment status.

In [18]:
with (data_path / 'interim/schools_foia.pkl').open('wb') as f:
    pickle.dump(schools, f)