# Create CMAQ-Ready Files from a Shapefiles

This Notebook uses geopandas and cmaqsatproc to create IOAPI-like files for CMAQ. Geopandas supports optimized searches and projection conversions. This section focuses on creating I/O API NetCDF masks from **categorical** variables (e.g., **states**, **countries**) or from **quantitative** variables (i.e. **population**, **income**).
* Categorical variables have grid cell values set to the fraction of the grid cell covered by a feature (e.g., state).
* Quantitative variables have grid cell values set to the fraction of the feature (e.g., county) in the grid cell multiplied by the quantitative variable (e.g., population.)

## Install Libraries

- You can try uncommenting and then running this cell for these libraries to populate in your current kernel
- If this doesn't work, you can try two different optons:
  - try replacing !python -m pip` with `%pip`
  - Or run pip install outside of the notebook for the python environment you will be using.


In [None]:
#!python -m pip install -qq cmaqsatproc geopandas xarray netcdf4 pycno

## Import Libs Required
- If this gives you a warning, your installation was likely unsuccessful
- Or you need to restart your notebook to access the newly installed libraries.

In [None]:
import matplotlib.pyplot as plt
import shp2cmaq
import warnings
import cmaqsatproc as csp
import geopandas as gpd
import xarray
import netCDF4
import pycno
# ignore warnings
warnings.simplefilter('ignore')

## Create a CMAQ-Ready File from a Shapefile with Categorical Data (Part 1)

This section uses shapefiles to create a CMAQ-ready file with variables for each feature (e.g., state or country). Each grid cell's value in a variable expresses the fraction of the grid cell area that is within the feature polygon (e.g., state or country).

### Download Shapefiles for Tutorial

- You can try uncommenting the following !wget commands to pull in shapefiles
- If the wget downloads don't work for you, try downloading the files from their respective websites:
  - Natural Earth https://www.naturalearthdata.com.
  - US Census https://census.gov/

In [None]:
# Download an example shapefile if you don't already have one.
# default examples
#!wget -N https://www2.census.gov/geo/tiger/GENZ2022/shp/cb_2022_us_state_500k.zip # use with attrkey STUSPS
# alternate examples
#!wget -N https://naciscdn.org/naturalearth/10m/cultural/ne_10m_admin_0_countries.zip # use with attrkey ADM0_A3
#!wget -N https://naciscdn.org/naturalearth/10m/cultural/ne_10m_admin_1_states_provinces.zip # us with attrkey iso_3166_2

### Set Configuration
- Change the shppath to point to your shapefile
- Set your attrkey (options for this should be in your shapefile metadata as a column)

In [None]:
# shppath : str
#     Path to a shapefile or zip file containing a shapefile
shppath = 'cb_2022_us_state_500k.zip'

# attrkey : str
#     Column to group shapes by e.g., STUSPS of census (AL, NC, etc)
attrkey = 'STUSPS'

# gdnam : str
#     Name of grid definition within gdpath (e.g., 12US1, 108NHEMI2)
gdnam = '12US1'
gdpath = None # None uses built-in; or specify your own GRIDDESC path

# For more options, run help(shp2cmaq.shp2cmaq)
#?shp2cmaq.shp2cmaq

### Run Tool

1. This processor:
    * Reads in native projection.
    * Filters for in CMAQ domain.
    * Optionally, custom extra processing. run help(shp2cmaq.shp2cmaq)
2. Calculates area overlap:
    * Performs grid cell intersections with shapefile polygons.
    * Aggregates results to grid cell level.
    * Finds largest area contributor.
    * Calculates total cell overlap.
3. Outputs:
    * Store results as variables.
    * Save as IOAPI-like file

In [None]:
outpath = shp2cmaq.shp2cmaq(shppath, attrkey, gdnam, gdpath=None, verbose=1)

### Plot Result

In [None]:
igf = csp.open_ioapi(outpath)
domkey = [k for k in list(igf.data_vars) if k.endswith('DOM')][0]
totkey = [k for k in list(igf.data_vars) if k.endswith('TOT')][0]

fig, axx = plt.subplots(1, 2, figsize=(12, 4))
igf[domkey].where(lambda x: x > -999).plot(ax=axx[0], cmap='nipy_spectral')
igf[totkey].plot(ax=axx[1], cmap='YlOrRd')
_ = igf.csp.cno.drawcountries(ax=axx)
name2idx = eval(igf[domkey].description)
print('Dominant Index')
print(str({v: k for k, v in name2idx.items()}))

## Create CMAQ-Ready File from a Shapefile with Quantitative Data (Part 2)

This section uses shapefiles to create a CMAQ-ready file with variables for each feature (e.g., state or county). Each grid cell in the variable expresses the fraction of a quantitative variable that is within that grid cell. The fraction within the cell is assumed proportional to the fraction of the feature polygon (e.g., state or county) in that grid cell.

### Download Shapefiles for Tutorial

- You can try uncommenting the following !wget commands to pull in shapefiles and unzip them.
- If the wget downloads don't work for you, try:
    1. Go to https://censusreporter.org/,
    2. Enter B01003 in the Explor dialog,
    3. Enter states in the "Show data by",
    4. Re-enter United States in the "In" dialog,
    5. Hoover over "Download data",
    6. Choose "Shapefile"
    7. You must unzip the downloaded zip file.

In [None]:
#!wget -N -O acs2022_5yr_B01003_04000US21.zip 'https://api.censusreporter.org/1.0/data/download/acs2022_5yr?table_ids=B01003&geo_ids=040|01000US&format=shp'
#!unzip acs2022_5yr_B01003_04000US21.zip

### Set Configuration

- Change the shppath to point to your shapefile **(for this you will first need to unzip your zip file and then point to your .shp file)**
- Set your attrkey (options for this should be in your shapefile metadata as a column)

In [None]:
# shppath : str
#     Path to a shapefile or zip file containing a shapefile
shppath = 'acs2022_5yr_B01003_04000US21/acs2022_5yr_B01003_04000US21.shp'

# attrkey : str
#     Column to group shapes by e.g., STUSPS of census (AL, NC, etc)
attrkey = 'geoid'

# gdnam : str
#     Name of grid definition within gdpath (e.g., 12US1, 108NHEMI2)
gdnam = '36US3'
gdpath = None # None uses built-in; or specify your own GRIDDESC path

# srckey : str
#     Name of quantitative variable (in this demo 'B01003001' is for population)
srckey = 'B01003001' 

# For more options, run help(shp2cmaq.shp2cmaq)

### Run Tool

1. This processor:
    * Reads in native projection.
    * Filters for in CMAQ domain.
    * Optionally, custom extra processing. run help(shp2cmaq.shp2cmaq)
2. Calculates area overlap:
    * Performs grid cell intersections with shapefile polygons.
    * Aggregates results to grid cell level.
    * Finds largest area contributor.
    * Calculates total cell overlap.
3. Outputs:
    * Store results as variables.
    * Save as IOAPI-like file

In [None]:
census_outpath = shp2cmaq.shp2cmaq(shppath, attrkey, gdnam, srckey=srckey)

### Plot Result

In [None]:
igf = csp.open_ioapi(census_outpath)
domkey = [k for k in list(igf.data_vars) if k.endswith('DOM')][0]
totkey = [k for k in list(igf.data_vars) if k.endswith('TOT')][0]

fig, axx = plt.subplots(1, 2, figsize=(12, 4))
igf[domkey].where(lambda x: x > -999).plot(ax=axx[0], cmap='nipy_spectral')
igf[totkey].plot(ax=axx[1], cmap='YlOrRd')
_ = igf.csp.cno.drawcountries(ax=axx)
name2idx = eval(igf[domkey].description)
print('Dominant Index')
print(str({v: k for k, v in name2idx.items()}))