# Exploring MRIOs with Pymrio

The first step when working with a new MRIO data set is to familiarize yourself with the data.
This notebook shows how to use the `pymrio` package to explore the data.
We use the test data set that is included in the `pymrio` package.
This is a completely artificial, very small MRIO.
It is not meant to be realistic, but it is useful for developing, testing and learning.

First we import the required packages:

In [1]:
import pymrio

We can now load the test data set with the `load_test` function. We can call
the MRIO whatever we want, here we use mrio.

In [2]:
mrio = pymrio.load_test()

We can get some first information about the MRIO by printing it.

In [3]:
print(mrio)

IO System with parameters: Z, Y, unit, population, meta, factor_inputs, emissions


This tells us what the MRIO data we just loaded contains.
We find a Z and Y matrix, some unit information and two satellite accounts, factor_inputs and emissions.

To get more specific data we can ask pymrio for regions, sectors, products, etc.

In [4]:
mrio.name

'testmrio'

In [5]:
mrio.get_regions()

Index(['reg1', 'reg2', 'reg3', 'reg4', 'reg5', 'reg6'], dtype='object', name='region')

In [6]:
mrio.get_sectors()

Index(['food', 'mining', 'manufactoring', 'electricity', 'construction',
       'trade', 'transport', 'other'],
      dtype='object', name='sector')

In [7]:
mrio.get_Y_categories()

Index(['Final consumption expenditure by households',
       'Final consumption expenditure by non-profit organisations serving households (NPISH)',
       'Final consumption expenditure by government',
       'Gross fixed capital formation', 'Changes in inventories',
       'Changes in valuables', 'Export'],
      dtype='object', name='category')

The same methods can be used to explore one of the satellite accounts.

In [8]:
print(mrio.emissions)

Extension Emissions with parameters: name, F, F_Y, unit


In [9]:
mrio.emissions.name

'Emissions'

In [10]:
mrio.emissions.get_regions()

Index(['reg1', 'reg2', 'reg3', 'reg4', 'reg5', 'reg6'], dtype='object', name='region')

The satellite accounts also have a special method to get index (rows) of the acccounts.

In [11]:
mrio.emissions.get_rows()

MultiIndex([('emission_type1',   'air'),
            ('emission_type2', 'water')],
           names=['stressor', 'compartment'])

# Searching through the MRIO

Several methods are available to search through the whole MRIO.
These generally accept [regular expressions](https://docs.python.org/3/howto/regex.html) as search terms.

The most general method is 'find'. This can be used for a quick overview where a specific term appears in the MRIO.

In [12]:
mrio.find("air")

{'emissions_index': MultiIndex([('emission_type1', 'air')],
            names=['stressor', 'compartment'])}

In [13]:
mrio.find("trade")

{'index': MultiIndex([('reg1', 'trade'),
             ('reg2', 'trade'),
             ('reg3', 'trade'),
             ('reg4', 'trade'),
             ('reg5', 'trade'),
             ('reg6', 'trade')],
            names=['region', 'sector']),
 'sectors': Index(['trade'], dtype='object', name='sector')}

Not that 'find' (and all other search methods) a case sensitive.
Do make a case insensitive search, add the regular expression flag `(?i)` to the search term.

In [14]:
mrio.find("value")

{}

In [15]:
mrio.find("(?i)value")

{'factor_inputs_index': Index(['Value Added'], dtype='object', name='inputtype')}

## Specific search methods: contains, match, fullmatch,

The MRIO class also contains a set of specific regular expresion search methods, mirroring the 'contains', 'match' and 'fullmatch'
methods of the pandas DataFrame str column type. See the pandas documentation for details, in short:

  - 'contains' looks for a match anywhere in the string
  - 'match' looks for a match at the beginning of the string
  - 'fullmatch' looks for a match of the whole string

These methods are available for all index columns of the MRIO and have a similar signature:

  1. As for 'find_all', the search term is case sensitive. To make it case insensitive, add the regular expression flag `(?i)` to the search term.
  2. The search term can be passed to the keyword argument 'find_all' or as the first positional argument to search in all index levels.
  3. Alternativels, the search term can be passed to the keyword argument with the level name to search only in that index level.

The following examples show how to use these methods.

In [16]:
mrio.contains(find_all="ad")
mrio.contains("ad")  # find_all is the default argument

MultiIndex([('reg1', 'trade'),
            ('reg2', 'trade'),
            ('reg3', 'trade'),
            ('reg4', 'trade'),
            ('reg5', 'trade'),
            ('reg6', 'trade')],
           names=['region', 'sector'])

In [17]:
mrio.match("ad")

MultiIndex([], names=['region', 'sector'])

In [18]:
mrio.match("trad")

MultiIndex([('reg1', 'trade'),
            ('reg2', 'trade'),
            ('reg3', 'trade'),
            ('reg4', 'trade'),
            ('reg5', 'trade'),
            ('reg6', 'trade')],
           names=['region', 'sector'])

In [19]:
mrio.fullmatch("trad")

MultiIndex([], names=['region', 'sector'])

In [20]:
mrio.fullmatch("trade")

MultiIndex([('reg1', 'trade'),
            ('reg2', 'trade'),
            ('reg3', 'trade'),
            ('reg4', 'trade'),
            ('reg5', 'trade'),
            ('reg6', 'trade')],
           names=['region', 'sector'])

In [21]:
mrio.fullmatch("(?i).*AD.*")

MultiIndex([('reg1', 'trade'),
            ('reg2', 'trade'),
            ('reg3', 'trade'),
            ('reg4', 'trade'),
            ('reg5', 'trade'),
            ('reg6', 'trade')],
           names=['region', 'sector'])

For the rest of the notebook, we will do the examples with the 'contains' method, but the same applies to the other methods.

To search only at one specific level, pass the search term to the keyword argument with the level name.

In [22]:
mrio.contains(region="trade")

MultiIndex([], names=['region', 'sector'])

In [23]:
mrio.contains(sector="trade")

MultiIndex([('reg1', 'trade'),
            ('reg2', 'trade'),
            ('reg3', 'trade'),
            ('reg4', 'trade'),
            ('reg5', 'trade'),
            ('reg6', 'trade')],
           names=['region', 'sector'])

And of course, the method are also available for the satellite accounts.

In [24]:
mrio.emissions.contains(compartment="air")

MultiIndex([('emission_type1', 'air')],
           names=['stressor', 'compartment'])

Passing a non-existing level to the keyword argument is silently ignored.

In [25]:
mrio.factor_inputs.contains(compartment="trade")

Index([], dtype='object')

This allows to search for terms that are only in some index levels.
Logically, this is an 'or' search.

In [26]:
mrio.factor_inputs.contains(compartment="air", inputtype="Value")

Index(['Value Added'], dtype='object', name='inputtype')

But note, that if both levels exist, both must match (so it becomes a logical 'and').

In [27]:
mrio.emissions.contains(stressor="emission", compartment="air")

MultiIndex([('emission_type1', 'air')],
           names=['stressor', 'compartment'])

## Search through all extensions

All three search methods are also available to loop through all extensions of the MRIO.

In [28]:
mrio.extension_contains(stressor="emission", compartment="air")

{'Factor Inputs': Index([], dtype='object'),
 'Emissions': MultiIndex([('emission_type1', 'air')],
            names=['stressor', 'compartment'])}

If only a subset of extensions should be searched, pass the extension names to the keyword argument 'extensions'.

## Generic search method for any dataframe index

Internally, the class methods 'contains', 'match' and 'fullmatch' all the
'index_contains', 'index_match' and 'index_fullmatch' methods of ioutil module.
This function can be used to search through index of any pandas DataFrame.

In [29]:
df = mrio.Y

Depending if a dataframe or an index is passed, the return is either the dataframe or the index.

In [30]:
pymrio.index_contains(df, "trade")

Unnamed: 0_level_0,region,reg1,reg1,reg1,reg1,reg1,reg1,reg1,reg2,reg2,reg2,...,reg5,reg5,reg5,reg6,reg6,reg6,reg6,reg6,reg6,reg6
Unnamed: 0_level_1,category,Final consumption expenditure by households,Final consumption expenditure by non-profit organisations serving households (NPISH),Final consumption expenditure by government,Gross fixed capital formation,Changes in inventories,Changes in valuables,Export,Final consumption expenditure by households,Final consumption expenditure by non-profit organisations serving households (NPISH),Final consumption expenditure by government,...,Changes in inventories,Changes in valuables,Export,Final consumption expenditure by households,Final consumption expenditure by non-profit organisations serving households (NPISH),Final consumption expenditure by government,Gross fixed capital formation,Changes in inventories,Changes in valuables,Export
region,sector,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2,Unnamed: 17_level_2,Unnamed: 18_level_2,Unnamed: 19_level_2,Unnamed: 20_level_2,Unnamed: 21_level_2,Unnamed: 22_level_2
reg1,trade,769535.93,16.63892,24088070.0,67273450.0,1230.2182,216.21108,0,8063.5238,12.738233,163.20538,...,204.38317,1.684295,0,49782414.0,0.224933,14.44566,16739029.0,12.145465,0.013888,0
reg2,trade,5678.2674,0.075424,231.2962,633.9521,35.607157,3.192694,0,385664.01,178.50114,8160.8812,...,358.24932,22.71962,0,26592464.0,0.139745,10.623962,11775351.0,20.572534,0.005433,0
reg3,trade,2753.8608,0.11154,1.956911,359.8675,23.39112,0.000455,0,2072.2489,0.044811,1.242613,...,309.98427,6.283278,0,114505.89,0.630098,37.095549,31317361.0,212.70751,0.014929,0
reg4,trade,373.28393,0.009382,0.3585011,0.02514957,0.002016,0.000144,0,192.21539,0.019666,0.537107,...,73.859706,0.07199126,0,40152651.0,0.255523,17.253634,14011134.0,4.052444,0.001935,0
reg5,trade,4287.4067,0.038941,7.014679,195.5479,6.675656,0.524015,0,3633.6875,2.536312,50.624916,...,9177.0818,1330591.0,0,60992225.0,0.823823,34.208026,27870911.0,85.191511,0.008929,0
reg6,trade,4772.7575,0.113112,23.21101,241.7571,16.267049,1.488818,0,2031.4964,1.864492,18.787893,...,91.040319,2.122217,0,851864.04,23.371306,1966.0309,131182.13,1549.4104,0.266033,0


In [31]:
pymrio.index_contains(df.index, "trade")

MultiIndex([('reg1', 'trade'),
            ('reg2', 'trade'),
            ('reg3', 'trade'),
            ('reg4', 'trade'),
            ('reg5', 'trade'),
            ('reg6', 'trade')],
           names=['region', 'sector'])

In [32]:
pymrio.index_fullmatch(df, region="reg[2,4]", sector="m.*")

Unnamed: 0_level_0,region,reg1,reg1,reg1,reg1,reg1,reg1,reg1,reg2,reg2,reg2,...,reg5,reg5,reg5,reg6,reg6,reg6,reg6,reg6,reg6,reg6
Unnamed: 0_level_1,category,Final consumption expenditure by households,Final consumption expenditure by non-profit organisations serving households (NPISH),Final consumption expenditure by government,Gross fixed capital formation,Changes in inventories,Changes in valuables,Export,Final consumption expenditure by households,Final consumption expenditure by non-profit organisations serving households (NPISH),Final consumption expenditure by government,...,Changes in inventories,Changes in valuables,Export,Final consumption expenditure by households,Final consumption expenditure by non-profit organisations serving households (NPISH),Final consumption expenditure by government,Gross fixed capital formation,Changes in inventories,Changes in valuables,Export
region,sector,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2,Unnamed: 17_level_2,Unnamed: 18_level_2,Unnamed: 19_level_2,Unnamed: 20_level_2,Unnamed: 21_level_2,Unnamed: 22_level_2
reg2,mining,165.3997,1.817989e-05,0.334824,32.83238,29.10648,3.970468e-05,0,1091.126,2.751312,15.44777,...,0.040217,0.00042,0,0.6127299,0.008212,0.013061,11.6717,166.53958,2e-06,0
reg2,manufactoring,99284590.0,4.187143,1373.3702,42378780.0,4415.752,163.7658,0,321031.6,125.72911,16038330.0,...,951.80921,21.64128,0,10741920.0,62.832488,6363.1928,11704970.0,1060.3215,0.145903,0
reg4,mining,107.2728,9.421644e-09,0.209851,1.055704,26.97312,3.112643e-09,0,294.0734,1e-06,0.1061796,...,45.388457,1.5e-05,0,77058.0,0.348367,2.326858,0.09917552,68.266815,0.000789,0
reg4,manufactoring,40863520.0,1.170611,377.32281,30326550.0,2263532.0,46.96135,0,15104450.0,7.59844,128.5316,...,1004.0742,1.4762,0,21906040.0,94.28291,6712.9728,31674960.0,825.25694,0.03651,0


All search methods can easily be combined with the extract methods to extract the data that was found.
For more information on this, see the [extract_data](./extract_data.ipynb) notebook.