# Using the aggregation functionality of pymrio

Pymrio offers various possibilities to achieve an aggreation of a existing MRIO system. 
The following section will present all of them in turn, using the test MRIO system included in pymrio.
The same concept can be applied to real life MRIOs.

Some of the examples rely in the [country converter coco](https://github.com/konstantinstadler/country_converter). The minimum version required is coco >= 0.6.3 - install the latest version with 
```
pip install country_converter --upgrade
```
Coco can also be installed from the Anaconda Cloud - see the coco readme for further infos.

## Loading the test mrio

First, we load and explore the test MRIO included in pymrio:

In [1]:
import numpy as np
import pymrio

In [2]:
io = pymrio.load_test()
io.calc_all()

<pymrio.core.mriosystem.IOSystem at 0x7f45f043b4d0>

In [3]:
print("Sectors: {sec},\nRegions: {reg}".format(sec=io.get_sectors().tolist(), reg=io.get_regions().tolist()))

Sectors: ['food', 'mining', 'manufactoring', 'electricity', 'construction', 'trade', 'transport', 'other'],
Regions: ['reg1', 'reg2', 'reg3', 'reg4', 'reg5', 'reg6']


## Aggregation using a numerical concordance matrix

This is the standard way to aggregate MRIOs when you work in Matlab.
To do so, we need to set up a concordance matrix in which the columns correspond to the orignal classification and the rows to the aggregated one.

In [4]:
sec_agg_matrix = np.array([
    [1, 0, 0, 0, 0, 0, 0, 0],
    [0, 1, 1, 1, 1, 0, 0, 0],
    [0, 0, 0, 0, 0, 1, 1, 1]
    ])

reg_agg_matrix = np.array([
    [1, 1, 1, 0, 0, 0],
    [0, 0, 0, 1, 1, 1]
    ])

In [5]:
io.aggregate(region_agg=reg_agg_matrix, sector_agg=sec_agg_matrix)

<pymrio.core.mriosystem.IOSystem at 0x7f45f043b4d0>

In [6]:
print("Sectors: {sec},\nRegions: {reg}".format(sec=io.get_sectors().tolist(), reg=io.get_regions().tolist()))

Sectors: ['sec0', 'sec1', 'sec2'],
Regions: ['reg0', 'reg1']


In [7]:
io.calc_all()

<pymrio.core.mriosystem.IOSystem at 0x7f45f043b4d0>

In [8]:
io.emissions.D_cba

Unnamed: 0_level_0,region,reg0,reg0,reg0,reg1,reg1,reg1
Unnamed: 0_level_1,sector,sec0,sec1,sec2,sec0,sec1,sec2
stressor,compartment,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2
emission_type1,air,9041149.0,301879100.0,152323600.0,24694650.0,346874200.0,245411700.0
emission_type2,water,2123543.0,48845090.0,98897570.0,6000239.0,45945300.0,189273100.0


To use custom names for the aggregated sectors or regions, pass a list of names in order of rows in the concordance matrix:

In [9]:
io = pymrio.load_test().calc_all().aggregate(region_agg=reg_agg_matrix, 
                                             region_names=['World Region A', 'World Region B'], 
                                             inplace=False)

In [10]:
io.get_regions()

Index(['World Region A', 'World Region B'], dtype='object', name='region')

## Aggregation using a numerical vector

Pymrio also accepts the aggregatio information as numerical or string vector. 
For these, each entry in the vector assignes the sector/region to a aggregation group.
Thus the two aggregation matrices from above (*sec_agg_matrix* and *reg_agg_matrix*) can also be represented as numerical or string vectors/lists:

In [11]:
sec_agg_vec = np.array([0,1,1,1,1,2,2,2])
reg_agg_vec = ['R1', 'R1', 'R1', 'R2', 'R2', 'R2']

can also be represented as aggregation vector:

In [12]:
io_vec_agg = pymrio.load_test().calc_all().aggregate(region_agg=reg_agg_vec, 
                                                     sector_agg=sec_agg_vec, 
                                                     inplace=False)

In [13]:
print("Sectors: {sec},\nRegions: {reg}".format(sec=io_vec_agg.get_sectors().tolist(), 
                                               reg=io_vec_agg.get_regions().tolist()))

Sectors: ['sec0', 'sec1', 'sec2'],
Regions: ['R1', 'R2']


In [14]:
io_vec_agg.emissions.D_cba_reg

Unnamed: 0_level_0,region,R1,R2
stressor,compartment,Unnamed: 2_level_1,Unnamed: 3_level_1
emission_type1,air,669019200.0,1686954000.0
emission_type2,water,533768200.0,590208100.0


## Regional aggregation using the country converter coco

The previous examples are best suited if you want to reuse existing aggregation information.
For new/ad hoc aggregation, the most user-friendly solution is to build the concordance with the [country converter coco](https://github.com/konstantinstadler/country_converter). The minimum version of coco required is 0.6.2. You can either use coco to build independent aggregations (first case below) or use the predefined classifications included in coco (second case - Example WIOD below).

In [15]:
import country_converter as coco

### Independent aggregation 

In [16]:
io = pymrio.load_test().calc_all()

In [17]:
reg_agg_coco = coco.agg_conc(original_countries=io.get_regions(), 
                             aggregates={'reg1': 'World Region A',
                                         'reg2': 'World Region A',
                                         'reg3': 'World Region A',},
                             missing_countries='World Region B')                               

In [18]:
io.aggregate(region_agg=reg_agg_coco)

<pymrio.core.mriosystem.IOSystem at 0x7f45f046ba50>

In [19]:
print("Sectors: {sec},\nRegions: {reg}".format(sec=io.get_sectors().tolist(), 
                                               reg=io.get_regions().tolist()))

Sectors: ['food', 'mining', 'manufactoring', 'electricity', 'construction', 'trade', 'transport', 'other'],
Regions: ['World Region A', 'World Region B']


This can be passed directly to pymrio:

In [20]:
io.emissions.D_cba_reg

Unnamed: 0_level_0,region,World Region A,World Region B
stressor,compartment,Unnamed: 2_level_1,Unnamed: 3_level_1
emission_type1,air,669019200.0,1686954000.0
emission_type2,water,533768200.0,590208100.0


A pandas DataFrame corresponding to the output from *coco* can also be passed to *sector_agg* for aggregation.
A sector aggregation package similar to the country converter is planned.

### Using the build-in classifications - WIOD example

The country converter is most useful when you work with a MRIO which is included in coco. In that case you can just pass the desired country aggregation to coco and it returns the required aggregation matrix:

For the example here, we assume that a raw WIOD download is available at:

In [21]:
wiod_raw = '/tmp/mrios/WIOD2013'

We will parse the year 2000 and calculate the results:

In [22]:
wiod_orig = pymrio.parse_wiod(path=wiod_raw, year=2000).calc_all()

and then aggregate the database to first the EU countries and group the remaining countries based on OECD membership. In the example below, we single out Germany (DEU) to be not included in the aggregation:

In [23]:
wiod_agg_DEU_EU_OECD = wiod_orig.aggregate(
    region_agg = coco.agg_conc(original_countries='WIOD',
                               aggregates=[{'DEU': 'DEU'},'EU', 'OECD'],
                               missing_countries='Other',
                               merge_multiple_string=None),
    inplace=False)

We can then rename the regions to make the membership clearer:

In [24]:
wiod_agg_DEU_EU_OECD.rename_regions({'OECD': 'OECDwoEU',
                                     'EU': 'EUwoGermany'})

<pymrio.core.mriosystem.IOSystem at 0x7f45d8d10f10>

To see the result for the air emission footprints:

In [25]:
wiod_agg_DEU_EU_OECD.AIR.D_cba_reg

region,OECDwoEU,EUwoGermany,Other,DEU
stressor,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
CO2,9576199.0,3840406.0,9232742.0,1123772.0
CH4,60664540.0,31347220.0,148761500.0,7953304.0
N2O,2103103.0,1264400.0,6166586.0,294148.6
NOX,37305270.0,13858590.0,51031330.0,3164278.0
SOX,33620540.0,12395620.0,51378820.0,2045926.0
CO,191601600.0,59512960.0,442499200.0,12968160.0
NMVOC,34237130.0,15367530.0,81869180.0,2870176.0
NH3,5453330.0,3867825.0,16748070.0,865681.8


For further examples on the capabilities of the country converter see the [coco tutorial notebook](http://nbviewer.jupyter.org/github/konstantinstadler/country_converter/blob/master/doc/country_converter_aggregation_helper.ipynb)

## Aggregation to one total sector / region

Both, *region_agg* and *sector_agg*, also accept a string as argument. This leads to the aggregation to one total region or sector for the full IO system.

In [26]:
pymrio.load_test().calc_all().aggregate(region_agg='global', sector_agg='total').emissions.D_cba

Unnamed: 0_level_0,region,global
Unnamed: 0_level_1,sector,total
stressor,compartment,Unnamed: 2_level_2
emission_type1,air,1080224000.0
emission_type2,water,391084800.0


## Pre- vs post-aggregation account calculations

It is generally recommended to calculate MRIO accounts with the highest detail possible and aggregated the results afterwards (post-aggregation - see for example [Steen-Olsen et al 2014](http://dx.doi.org/10.1080/09535314.2014.934325), [Stadler et al 2014](https://zenodo.org/record/1137670#.WlOSOhZG1O8) or [Koning et al 2015](https://doi.org/10.1016/j.ecolecon.2015.05.008). 

Pre-aggregation, that means the aggregation of MRIO sectors and regions before calculation of footprint accounts, might be necessary when dealing with MRIOs on computers with limited RAM resources. However, one should be aware that the results might change.

Pymrio can handle both cases and can be used to highlight the differences. To do so, we use the two  concordance matrices defined at the beginning (*sec_agg_matrix* and *reg_agg_matrix*) and aggregate the test system before and after the calculation of the accounts:

In [27]:
io_pre = pymrio.load_test().aggregate(region_agg=reg_agg_matrix, sector_agg=sec_agg_matrix).calc_all()
io_post = pymrio.load_test().calc_all().aggregate(region_agg=reg_agg_matrix, sector_agg=sec_agg_matrix)

In [28]:
io_pre.emissions.D_cba

Unnamed: 0_level_0,region,reg0,reg0,reg0,reg1,reg1,reg1
Unnamed: 0_level_1,sector,sec0,sec1,sec2,sec0,sec1,sec2
stressor,compartment,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2
emission_type1,air,7722782.0,349441300.0,138876400.0,26953960.0,335459800.0,221770300.0
emission_type2,water,1862161.0,52409500.0,158346500.0,6399685.0,40805090.0,131261900.0


In [29]:
io_post.emissions.D_cba

Unnamed: 0_level_0,region,reg0,reg0,reg0,reg1,reg1,reg1
Unnamed: 0_level_1,sector,sec0,sec1,sec2,sec0,sec1,sec2
stressor,compartment,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2
emission_type1,air,9041149.0,301879100.0,152323600.0,24694650.0,346874200.0,245411700.0
emission_type2,water,2123543.0,48845090.0,98897570.0,6000239.0,45945300.0,189273100.0


The same results as in io_pre are obtained for io_post, if we recalculate the footprint accounts based on the aggregated system:

In [30]:
io_post.reset_all_full().calc_all().emissions.D_cba

Unnamed: 0_level_0,region,reg0,reg0,reg0,reg1,reg1,reg1
Unnamed: 0_level_1,sector,sec0,sec1,sec2,sec0,sec1,sec2
stressor,compartment,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2
emission_type1,air,7722782.0,349441300.0,138876400.0,26953960.0,335459800.0,221770300.0
emission_type2,water,1862161.0,52409500.0,158346500.0,6399685.0,40805090.0,131261900.0
