<a href="https://colab.research.google.com/github/catherinebirney/tiem-training/blob/main/Notebooks/flowsa.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

FLOWSA Overview

FLOWSA is a Python package designed to streamline attributing environmental, economic, emission, waste, material, and other data to industry and end-use sectors.
FLOWSA pulls data from primary environmental and economic sources (generally government or other publicly available sources), that use a variety of terminology and units, and attributes those data to standardized classification and units.
Data can be attributed to sectors that produce the data and/or sectors that consume the data.
This way, a single row in a dataset captures the generation of, the consumption of, or the direct flow of environmental/economic data between two sectors.
For example, we can capture water withdrawals consumed by wheat farming (Sector-Consumed-By) or the movement of water from public supply withdrawals (Sector-Produced-By) to domestic use (Sector-Consumed-By).

Clone the FLOWSA GitHub repository so we can edit method files.

In [1]:
!git clone https://github.com/USEPA/flowsa.git

Cloning into 'flowsa'...
remote: Enumerating objects: 35416, done.[K
remote: Counting objects: 100% (1213/1213), done.[K
remote: Compressing objects: 100% (371/371), done.[K
remote: Total 35416 (delta 924), reused 858 (delta 842), pack-reused 34203 (from 3)[K
Receiving objects: 100% (35416/35416), 34.96 MiB | 5.32 MiB/s, done.
Resolving deltas: 100% (27351/27351), done.


Install the FLOWSA package.

In [2]:
!pip install git+https://github.com/USEPA/flowsa.git@pandas

Collecting git+https://github.com/USEPA/flowsa.git@pandas
  Cloning https://github.com/USEPA/flowsa.git (to revision pandas) to /tmp/pip-req-build-o64zsjou
  Running command git clone --filter=blob:none --quiet https://github.com/USEPA/flowsa.git /tmp/pip-req-build-o64zsjou
  Running command git checkout -b pandas --track origin/pandas
  Switched to a new branch 'pandas'
  Branch 'pandas' set up to track remote branch 'pandas' from 'origin'.
  Resolved https://github.com/USEPA/flowsa.git to commit d3684454cfc24aea495771b98da41c02a93c0475
  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting fedelemflowlist@ git+https://github.com/USEPA/fedelemflowlist.git@develop#egg=fedelemflowlist (from flowsa==2.0.6)
  Cloning https://github.com/USEPA/fedelemflowlist.git (to revision develop) to /tmp/pip-install-szbrekkl/fedelemflowlist_fa11082fc2474c69ae00c0c02c9fc2f1
  Running command git clone --filter=blob:none --quiet https://github.com/USEPA/fedelemflowlist.git /tmp/pip-install-szbre

Flow-By-Activity Data

Flow-By-Activity (FBA) datasets are environmental and other data imported from government, peer-reviewed, or proprietary sources and formatted into [standardized tables](https://github.com/USEPA/flowsa/blob/master/format%20specs/FlowByActivity.md).
These data are largely unchanged from the original data source, except for formatting.
FBA datasets retain original source terminology and units.
The defining columns for an FBA dataset are the "ActivityProducedBy" and "ActivityConsumedBy" columns.
These columns contain the "activity" that produces or consumes the environmental/economic data.
The FBA tables can include optional columns, but all FBAs must contain a number of the same columns.
One such optional column is "Suppressed" which can be used to indicate which rows contain suppressed data and then used in source-specific functions to estimate suppressed data.

Import FLOWSA and print out the available FBA models.

In [3]:
import flowsa
# see all datasources and years available in flowsa
flowsa.seeAvailableFlowByModels('FBA')
# FLOW-BY-ACTIVITY
from flowsa import getFlowByActivity, seeAvailableFlowByModels
from flowsa.settings import fbaoutputpath

# see all datasources and years available in flowsa
seeAvailableFlowByModels('FBA')

Install colorama for colored log output


  DROP_COLS = ["Unnamed: 0"] + list(pd.date_range(
  YEARS = list(pd.date_range(start="2010", end="2023", freq='Y').year.astype(str))


{'BEA_Detail_GrossOutput_IO': [2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009,
                               2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017,
                               2018, 2019, 2020, 2021, 2022],
 'BEA_Detail_Make_AfterRedef': [2002],
 'BEA_Detail_Make_BeforeRedef': [2012],
 'BEA_Detail_Supply': [2012, 2017],
 'BEA_Detail_Use_PRO_BeforeRedef': [2012],
 'BEA_Detail_Use_SUT': [2012, 2017],
 'BEA_PCE': [2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020, 2021, 2022,
             2023],
 'BEA_Summary_Make_BeforeRedef': [2012, 2013, 2014, 2015, 2016, 2017, 2018,
                                  2019, 2020, 2021],
 'BEA_Summary_Supply': [2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020,
                        2021, 2022],
 'BEA_Summary_Use_PRO_BeforeRedef': [2012, 2013, 2014, 2015, 2016, 2017, 2018,
                                     2019, 2020, 2021],
 'BEA_Summary_Use_SUT': [2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020,
                         202

DTypePromotionError: The DType <class 'numpy.dtypes.StrDType'> could not be promoted by <class 'numpy.dtypes._PyFloatDType'>. This means that no common DType exists for the given inputs. For example they cannot be stored in a single array unless the dtype is `object`. The full list of DTypes is: (<class 'numpy.dtypes.StrDType'>, <class 'numpy.dtypes._PyFloatDType'>)

Generate an FBA model.

In [None]:
# Load all information for one GHGI Table
fba = getFlowByActivity(datasource="EPA_GHGI_T_5_29", year=2021)

Flow-By-Sector Datasets

Flow-By-Sector (FBS) datasets capture the direct resource generation or consumption by sectors, or the movement of data between sectors.
FBS datasets are standardized tables generated by attributing FBA and/or other FBS data to sectors.
The FBS tables contain standard columns as defined in [format specs/FlowBySector.md](https://github.com/USEPA/flowsa/blob/master/format%20specs/FlowBySector.md).
FBS datasets can be created from a single FBA, multiple FBAs, or a combination of FBA and FBS datasets.
The defining columns for an FBS are the "SectorProducedBy" and "SectorConsumedBy" columns.
These columns contain the _sector_ that produces or consumes the environmental/economic data.