<a href="https://colab.research.google.com/github/catherinebirney/tiem-training/blob/main/Notebooks/stewi.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Standardized Emission and Waste Inventories (StEWI)

StEWI is a collection of Python modules that provide processed USEPA facility-based emission and waste generation inventory data in standard tabular formats. The standard outputs may be further aggregated or filtered based on given criteria, and can be combined based on common facility and flows across the inventories.

In [None]:
# Clone the repository from GitHub into the local colab environment
!git clone https://github.com/USEPA/standardizedinventories.git

In [None]:
# Install the locally cloned repository to have access to the root files
%cd standardizedinventories
!pip install -e .

In [14]:
import stewi
import pandas as pd
import numpy as np

## Accessing Data
Attempting to retreive data for the first time, e.g., via `stewi.getInventory()` or `stewi.getInventoryFacilities()` will cause StEWI to run the appropriate modules to access the public data directly from their source. Alternatively, use `download_if_missing=True` to obtain the StEWI processed data that has been validated and published to the [EPA Data Commons](https://dmap-data-commons-ord.s3.amazonaws.com/index.html?prefix=#stewi/).

In [None]:
# Review the available data in StEWI
stewi.getAllInventoriesandYears()

In [None]:
# Download and review the 2017 NEI
NEI = stewi.getInventory('NEI', 2017, download_if_missing=True)
print(NEI)
print(f'Number of unique flows: {len(NEI.FlowName.unique())}')
print(f'Number of unique facilities: {len(NEI.FacilityID.unique())}')

NEI and GHGRP are also available as `FlowByProcess` format, which provides the data at sub-facility level. For the NEI, this means emissions are reported by facility by [Source Classification Code (SCC)](https://sor-scc-api.epa.gov/sccwebservices/sccsearch/) while the GHGRP is aggregated by [GHGRP Subpart](https://www.epa.gov/ghgreporting/resources-subpart-ghg-reporting).

In [None]:
# NEI and GHGRP are also available as FlowByProcess
GHGRP = stewi.getInventory('GHGRP', year=2021, stewiformat='flowbyprocess', download_if_missing=True)
print(GHGRP)
print(f'Available GHGRP Subparts: {GHGRP.Process.unique()}')

In [None]:
# Review the totals by Flow
GHGRP.pivot_table(index=['FlowName', 'Unit'], values='FlowAmount', columns='Process',
                  aggfunc='sum', margins=True, margins_name='Total', fill_value=0)

## Challenge Questions
1. What state reports the most facilities for a dataset? Does it change over time?
2. What are the top 3 GHGRP Subparts with the most Methane emissions?


## Extreme TIEM Challenge
What is the total characterized impacts (i.e., Global Warming Potential, Smog Formation Potential, etc.) of all reported facilities from a dataset?
  - Hint 1: Use [apply_flow_mapping](https://github.com/USEPA/esupy/blob/main/esupy/mapping.py#L11-L12) from `esupy.mapping` to convert the flows from a data source to the FEDEFL.
  - Hint 2: Use [apply_lcia_method](https://github.com/USEPA/LCIAformatter/blob/master/lciafmt/__init__.py#L268) from `lciafmt` to apply your desired LCIA method.