# Data Integration for Austin Granular Model for 2018-2019 School Year

In [15]:
import pandas as pd
import geopandas as gpd
import xarray as xr

## Data categories and sources

### Zip code tabulation area population data by age

See `epimodels/notebooks/AustinGranularModel/ZCTA/TX_ZCTA_age_populations.r` for details on data download.

In [18]:
zcta_age_pop_2018 = pd.read_csv('/Users/kpierce/epimodels/notebooks/AustinGranularModel/ZCTA/2018_TX_ZCTA_age_populations.csv')
zcta_age_pop_2019 = pd.read_csv('/Users/kpierce/epimodels/notebooks/AustinGranularModel/ZCTA/2019_TX_ZCTA_age_populations.csv')

In [19]:
zcta_age_pop_2019['age'].unique()

array(['10 to 14 years', '15 to 17 years', '18 and 19 years', '20 years',
       '21 years', '22 to 24 years', '25 to 29 years', '30 to 34 years',
       '35 to 39 years', '40 to 44 years', '45 to 49 years',
       '5 to 9 years', '50 to 54 years', '55 to 59 years',
       '60 and 61 years', '62 to 64 years', '65 and 66 years',
       '67 to 69 years', '70 to 74 years', '75 to 79 years',
       '80 to 84 years', '85 years and over', 'Under 5 years'],
      dtype=object)

### Zip code tabulation area geometries by year

See `epimodels/notebooks/AustinGranularModel/CBG/TRAVISCO_TX_CBG_age_populations.r` for details on data download.

In [20]:
zcta_shp_2018 = gpd.read_file('/Users/kpierce/epimodels/notebooks/AustinGranularModel/ZCTA/2018_TX_zcta.shp')
zcta_shp_2019 = gpd.read_file('/Users/kpierce/epimodels/notebooks/AustinGranularModel/ZCTA/2019_TX_zcta.shp')

### Zip code tabulation are level private school enrollment

See `epimodels/notebooks/AustinGranularModel/Schools/AISD_enrollment_by_zcta.r` for details on data download.

In [21]:
zcta_private_enroll_2018 = pd.read_csv(
    '/Users/kpierce/epimodels/notebooks/AustinGranularModel/Schools/data/2018_AISD_Private_School_Enrollment_Estimates_by_ZCTA.csv'
)
zcta_private_enroll_2019 = pd.read_csv(
    '/Users/kpierce/epimodels/notebooks/AustinGranularModel/Schools/data/2019_AISD_Private_School_Enrollment_Estimates_by_ZCTA.csv'
)


### AISD school attendance boundaries

- 2020-21 downloaded from AISD website
- Open records request data provides the following information on attendance areas: 
    - No change from 2015 till 2019
    - 2019-2020 Elementary changed 

In [7]:
aisd_201819_elem = gpd.read_file('/Users/kpierce/epimodels/notebooks/AustinGranularModel/Schools/data/ORR - Kelly Pierce/2015-16 Boundaries/1516_Boundaries_Schools/1516_Elementary_Boundaries.shp')
aisd_201819_midd = gpd.read_file('/Users/kpierce/epimodels/notebooks/AustinGranularModel/Schools/data/ORR - Kelly Pierce/2015-16 Boundaries/1516_Boundaries_Schools/Middle_AA.shp')
aisd_201819_high = gpd.read_file('/Users/kpierce/epimodels/notebooks/AustinGranularModel/Schools/data/ORR - Kelly Pierce/2015-16 Boundaries/1516_Boundaries_Schools/1516_High_Boundaries.shp')
                                   

### AISD school calendars

See [Current and Previous AISD Calendars](https://www.austinisd.org/advisory-bodies/calendar-planning) for calendar PDFs. Weekdays with holidays were manually entered into the spreadsheet at `epimodels/notebooks/AustinGranularModel/Schools/data/AISDCalendars/AISD_2018_2021_Calendar.csv`

In [9]:
aisd_2018_2021_cal = pd.read_csv('/Users/kpierce/epimodels/notebooks/AustinGranularModel/Schools/data/AISDCalendars/AISD_2018_2021_Calendar.csv')

### Baseline contact rates

Citations:

Mossong J, Hens N, Jit M, Beutels P, Auranen K, Mikolajczyk R, et al. (2008) Social Contacts and Mixing Patterns Relevant to the Spread of Infectious Diseases. PLoS Med 5(3): e74. https://doi.org/10.1371/journal.pmed.0050074

Prem K, Cook AR, Jit M (2017) Projecting social contact matrices in 152 countries using contact surveys and demographic data. PLoS Comput Biol 13(9): e1005697. https://doi.org/10.1371/journal.pcbi.1005697

Workflow in `/epimodels/notebooks/AustinGranularModel/BaselineContacts/BaselineContactRates.ipynb`

In [16]:
contacts = xr.open_zarr('/Users/kpierce/epimodels/notebooks/AustinGranularModel/BaselineContacts/usa_baseline_contacts.zarr/')

In [17]:
contacts

Unnamed: 0,Array,Chunk
Bytes,392 B,392 B
Shape,"(7, 7)","(7, 7)"
Count,2 Tasks,1 Chunks
Type,float64,numpy.ndarray
"Array Chunk Bytes 392 B 392 B Shape (7, 7) (7, 7) Count 2 Tasks 1 Chunks Type float64 numpy.ndarray",7  7,

Unnamed: 0,Array,Chunk
Bytes,392 B,392 B
Shape,"(7, 7)","(7, 7)"
Count,2 Tasks,1 Chunks
Type,float64,numpy.ndarray


### Mobility patterns

From Safegraph/Kelly Gaither

## Core assumptions

1. Population is uniform over census block groups and school attendance boundaries (no accounting is made for commercial/non-residential areas).
2. The same percentage of children in each ZCTA are enrolled in private school across all age groups (the percent of elementary school aged students enrolled in private elementary schools is the same as the percent of middle school aged students enrolled in middle schools, etc.).
3. Safegraph travel for people 13 and older is reflective of travel patterns across all ages.
4. School attendance transfers have a negligible impact on travel and contact and can be disregarded.

## Integration workflow

### 1. Subtract private school students from each census block group's child population

1. Multiply the child population in each census block group by the percent of children enrolled in public school in the corresponding census block group (enrollment in private vs public school has a coarse age breakdown, so assume the percentage is constant across all age groups).

### 2. Trim census block groups to AISD boundary

1. Intersect the ZCTA shapefile with the AISD boundary shapefile
2. Calculate the overlap between each ZCTA and the total AISD area
3. If overlap is less than 100% (for ZCTAs on the edge of the district), multiply the ZCTA population by the percentage overlap.
4. Save a trimmed ZCTA shapefile and an adjusted population dataset.

### 3. Students attending school by home census block group (trimmed)

1. Calculate the area of intersection between each ZCTA and all school boundary areas that intersect (typically one elementary school, one middle school, and one high school).
2. Calculate the percentage of each census block group that is assigned to each school.
3. Stratify census block group popluation data by age.
    - ages 5-9: elementary school
    - ages 10-14: middle school
    - ages 15-17: high school
4. Calculate the students from each ZCTA$_{i}$ attending school$_{j}$ for each school level *k* as 
    
    attendance$_{jk}$ = areal overlap$_{ij}$ / area ZCTA$_{i}$ * population ZCTA$_{ik}$
    
### 4. Integrate mobility data

1. Un-pivot the ZCTA visits matrix to a long-form table with source and destination columns.
2. Group mobility data by source and calculate the percentage of each source ZCTA traveling to each destination ZCTA.
3. For weekends/non-school-days, multiply this percentage of travel across ZCTA populations for all age groups.
4. For weekdays/school days, multiply this percentage of travel across ZCTA populations for ages 18 and up only.
5. For weekdays/school days, append the student attendance data calculated in step 3 to account for school age groups.