Daily smoke PM2.5 predictions
--------------------------------------------------------------------------------

- `tract/tracts/`: This is a folder that contains the shapefiles for CONUS census tracts by state/territory in 2019. Files were downloaded from the US Census Bureau TIGER/Line Shapefiles website (https://www.census.gov/cgi-bin/geo/shapefiles/index.php). 

- `tract/smokePM2pt5_predictions_daily_tract_20060101-20201231`: This is a file that contains a data frame with the final set of **daily smoke PM2.5 predictions on smoke days** at the tract level from **January 1, 2006 to December 31, 2020** for the contiguous US. Tract-level smoke PM2.5 predictions are aggregated from smoke PM2.5 predictions at the 10 km resolution using population and area of intersection-weighted averaging. 

- The `GEOID` column in this file corresponds to the 'GEOID' column in the tract shapefiles.

- All rows in this file are predictions on smoke days. Predictions on non-smoke days are by construction 0 ug/m^3 and not included in this file. A smoke PM2.5 prediction of 0 in this file means that the tract-day did have a smoke day but did not have elevated PM2.5. The full set of smoke PM2.5 predictions on both smoke days and non-smoke days can be obtained by setting the smoke PM2.5 prediction to 0 on tract-days in the tracts and in the January 1, 2006-December 31, 2020 date range that are not in this file.

In [1]:
import numpy as np
import pandas as pd

In [75]:
df = pd.read_csv("data/tract/smokePM2pt5_predictions_daily_tract_20060101-20201231.csv")

## Get only California 

State = 06

**Note**: STATE+COUNTY+TRACT - 2+3+6=11 digits

In [76]:
df.head()

Unnamed: 0,GEOID,date,smokePM_pred
0,20019964600,20060101,3.278776
1,20035493100,20060101,0.556782
2,20049965100,20060101,0.796605
3,20125950700,20060101,1.467832
4,20125950800,20060101,5.9037


In [77]:
df = df[df["GEOID"].astype(str).str[:1]=='6']

In [78]:
df["GEOID"].min()

6001400100

In [79]:
df["GEOID"].max()

6115041100

## Get ESRI crosswalk

In [84]:
crosswalk = pd.read_csv("data/fips_crosswalk_merged_county.csv", usecols = ["long_FIPS", "FIPS"])

In [85]:
crosswalk["long_FIPS"].min()

6001400100

In [86]:
crosswalk["long_FIPS"].max()

6115041102

In [87]:
len(crosswalk)

9096

In [88]:
len(df["GEOID"].unique())

8057

## Join smoke PM & FIPS codes

In [89]:
df = df.merge(crosswalk, left_on=["GEOID"], right_on=["long_FIPS"])

In [90]:
len(df)

2599492

In [91]:
len(df["GEOID"].unique())

6854

In [92]:
df.head()

Unnamed: 0,GEOID,date,smokePM_pred,long_FIPS,FIPS
0,6025010101,20060103,2.294267,6025010101,2677.0
1,6025010101,20060105,1.291195,6025010101,2677.0
2,6025010101,20060106,3.173662,6025010101,2677.0
3,6025010101,20060109,1.147731,6025010101,2677.0
4,6025010101,20060120,1.624987,6025010101,2677.0


In [93]:
df['date2'] =  pd.to_datetime(df['date'], format='%Y%m%d')

In [94]:
df.head()

Unnamed: 0,GEOID,date,smokePM_pred,long_FIPS,FIPS,date2
0,6025010101,20060103,2.294267,6025010101,2677.0,2006-01-03
1,6025010101,20060105,1.291195,6025010101,2677.0,2006-01-05
2,6025010101,20060106,3.173662,6025010101,2677.0,2006-01-06
3,6025010101,20060109,1.147731,6025010101,2677.0,2006-01-09
4,6025010101,20060120,1.624987,6025010101,2677.0,2006-01-20


In [96]:
df = df.drop(columns=["date", "long_FIPS"])

In [101]:
df = df.rename(columns={"date2":"time"})

In [102]:
df.to_parquet("outputs/smoke_pm25_predicted_with_fips.parquet")