<a href="https://colab.research.google.com/github/catherinebirney/tiem-training/blob/main/Notebooks/flowsa.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

FLOWSA Overview

FLOWSA is a Python package designed to streamline attributing environmental, economic, emission, waste, material, and other data to industry and end-use sectors.
FLOWSA pulls data from primary environmental and economic sources (generally government or other publicly available sources), that use a variety of terminology and units, and attributes those data to standardized classification and units.
Data can be attributed to sectors that produce the data and/or sectors that consume the data.
This way, a single row in a dataset captures the generation of, the consumption of, or the direct flow of environmental/economic data between two sectors.
For example, we can capture water withdrawals consumed by wheat farming (Sector-Consumed-By) or the movement of water from public supply withdrawals (Sector-Produced-By) to domestic use (Sector-Consumed-By).

Clone the FLOWSA GitHub repository so we can edit method files.

In [None]:
!git clone https://github.com/USEPA/flowsa.git

Install the FLOWSA package.

In [None]:
!pip install git+https://github.com/USEPA/flowsa.git@pandas

Flow-By-Activity Data

Flow-By-Activity (FBA) datasets are environmental and other data imported from government, peer-reviewed, or proprietary sources and formatted into [standardized tables](https://github.com/USEPA/flowsa/blob/master/format%20specs/FlowByActivity.md).
These data are largely unchanged from the original data source, except for formatting.
FBA datasets retain original source terminology and units.
The defining columns for an FBA dataset are the "ActivityProducedBy" and "ActivityConsumedBy" columns.
These columns contain the "activity" that produces or consumes the environmental/economic data.
The FBA tables can include optional columns, but all FBAs must contain a number of the same columns.
One such optional column is "Suppressed" which can be used to indicate which rows contain suppressed data and then used in source-specific functions to estimate suppressed data.

Import FLOWSA and print out the available FBA models.

In [None]:
import flowsa
# see all datasources and years available in flowsa
flowsa.seeAvailableFlowByModels('FBA')
# FLOW-BY-ACTIVITY
from flowsa import getFlowByActivity, seeAvailableFlowByModels
from flowsa.settings import fbaoutputpath

# see all datasources and years available in flowsa
seeAvailableFlowByModels('FBA')

Generate an FBA model.

In [None]:
# Load all information for one GHGI Table
fba = getFlowByActivity(datasource="EPA_GHGI_T_5_29", year=2021)

Flow-By-Sector Datasets

Flow-By-Sector (FBS) datasets capture the direct resource generation or consumption by sectors, or the movement of data between sectors.
FBS datasets are standardized tables generated by attributing FBA and/or other FBS data to sectors.
The FBS tables contain standard columns as defined in [format specs/FlowBySector.md](https://github.com/USEPA/flowsa/blob/master/format%20specs/FlowBySector.md).
FBS datasets can be created from a single FBA, multiple FBAs, or a combination of FBA and FBS datasets.
The defining columns for an FBS are the "SectorProducedBy" and "SectorConsumedBy" columns.
These columns contain the _sector_ that produces or consumes the environmental/economic data.