# Extract data from Pymrio

This notebook shows how to extract specific data from the pymrio object for further processing in Python. For exporting/saving the data to another file format see [the notebook on saving/loading/exporting data.](./load_save_export.ipynb)

In [39]:
import pymrio

In [40]:
mrio = pymrio.load_test().calc_all()

### Basic pandas indexing of pymrio tables

Since pymrio is built on top of pandas, we can use the pandas functions to extract data from the pymrio object. For example, to access the part of the A matrix from the region 2 we can use:

In [41]:
A_reg2 = mrio.A.loc["reg2", "reg2"]
A_reg2

sector,food,mining,manufactoring,electricity,construction,trade,transport,other
sector,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
food,0.000486,0.000638,0.000194,5e-06,1.9e-05,9.2e-05,2.7e-05,2.6e-05
mining,6e-06,0.050904,4.7e-05,0.000218,0.000203,1.1e-05,1e-05,1.3e-05
manufactoring,0.000488,0.069862,0.001529,0.000196,0.005915,0.001191,0.002294,0.000844
electricity,8.9e-05,0.050427,0.000137,0.000604,0.000146,0.000177,0.00028,0.000248
construction,2.5e-05,0.007375,3.2e-05,0.000109,0.004615,8.8e-05,0.000515,0.000422
trade,0.000251,0.02877,0.000531,9.5e-05,0.00164,0.000772,0.001372,0.000487
transport,7.1e-05,0.031839,0.000212,6.9e-05,0.000714,0.000579,0.004747,0.000494
other,0.000171,0.064935,0.000595,0.000291,0.002844,0.001897,0.0038,0.003936


Most tables are indexed via a multiindex, in case of the A matrix the index is a tuple of the region and the sector.
To access all technical coefficients (column) data for mining from all regions we can use:

In [42]:
A_mining = mrio.A.loc[:, (slice(None), "mining")]
A_mining

Unnamed: 0_level_0,region,reg1,reg2,reg3,reg4,reg5,reg6
Unnamed: 0_level_1,sector,mining,mining,mining,mining,mining,mining
region,sector,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2
reg1,food,0.001179,1e-05,3.652734e-09,1.626677e-06,3.767567e-07,1.481621e-05
reg1,mining,0.048022,0.000268,7.486558e-07,6.899387e-05,3.628651e-05,8.801658e-05
reg1,manufactoring,0.124366,0.017417,2.799765e-05,0.009161688,0.004792741,0.002087445
reg1,electricity,0.037991,0.001099,2.170169e-07,1.150382e-08,1.44466e-05,8.892062e-06
reg1,construction,0.017324,2.2e-05,8.88421e-08,1.33199e-07,1.129186e-05,7.64183e-07
reg1,trade,0.035429,0.001836,6.170323e-07,0.000383534,0.001071471,0.0005575273
reg1,transport,0.060324,0.001544,4.060594e-06,0.0005820972,0.001278089,0.0065552
reg1,other,0.092059,0.005024,1.788858e-05,0.0003664017,0.0007664473,0.0003287131
reg2,food,8.4e-05,0.000638,3.772203e-09,2.165695e-07,9.23743e-08,3.105702e-05
reg2,mining,0.000523,0.050904,5.883755e-06,5.472492e-05,2.212937e-05,3.108304e-05


For further information on the pandas multiindex see the [pandas documentation on advanced indexing.](https://pandas.pydata.org/docs/user_guide/advanced.html)

## Extracting data across extension tables

Pymrio includes methods for bulk extraction of data across extension tables. These can either work on a specific extension or across all extensions of the system.

### Extracting from a specific extension

Here we use use the `extract` method available in the extension object.
This expect a list of rows (index) to extract.
Here we extract some rows from the emission extension table.
To do so, we first define the rows (index) to extract:

In [43]:
rows_to_extract = [("emission_type1", "air"), ("emission_type2", "water")]

We can now use the `extract` method to extract the data, either as a pandas DataFrame

In [44]:
df_extract = mrio.emissions.extract(rows_to_extract, return_type="dataframe")
df_extract.keys()

dict_keys(['F', 'F_Y', 'S', 'S_Y', 'M', 'M_down', 'D_cba', 'D_pba', 'D_imp', 'D_exp', 'unit', 'D_cba_reg', 'D_pba_reg', 'D_imp_reg', 'D_exp_reg', 'D_cba_cap', 'D_pba_cap', 'D_imp_cap', 'D_exp_cap'])

Or we extract into a new extension object:

In [45]:
ext_extract = mrio.emissions.extract(rows_to_extract, return_type="extension")
str(ext_extract)

'Extension Emissions_extracted with parameters: name, F, F_Y, S, S_Y, M, M_down, D_cba, D_pba, D_imp, D_exp, unit, D_cba_reg, D_pba_reg, D_imp_reg, D_exp_reg, D_cba_cap, D_pba_cap, D_imp_cap, D_exp_cap'

Note that the name of the extension object is now `Emissions_extracted`, based on the name of the original extension object.
To use another name, just pass the name as the `return_type` method.

In [46]:
new_extension = mrio.emissions.extract(rows_to_extract, return_type="new_extension")
str(new_extension)

'Extension new_extension with parameters: name, F, F_Y, S, S_Y, M, M_down, D_cba, D_pba, D_imp, D_exp, unit, D_cba_reg, D_pba_reg, D_imp_reg, D_exp_reg, D_cba_cap, D_pba_cap, D_imp_cap, D_exp_cap'

Extracting to dataframes is also a convienient
way to convert an extension object to a dictionary:

In [47]:
df_all = mrio.emissions.extract(mrio.emissions.get_rows(), return_type="dfs")
df_all.keys()


# The method also allows to only extract some of the accounts:
df_some = mrio.emissions.extract(
    mrio.emissions.get_rows(), dataframes=["D_cba", "D_pba"], return_type="dfs"
)
df_some.keys()

dict_keys(['F', 'F_Y', 'S', 'S_Y', 'M', 'M_down', 'D_cba', 'D_pba', 'D_imp', 'D_exp', 'unit', 'D_cba_reg', 'D_pba_reg', 'D_imp_reg', 'D_exp_reg', 'D_cba_cap', 'D_pba_cap', 'D_imp_cap', 'D_exp_cap'])

### Extracting from all extensions

We can also extract data from all extensions at once.
This is done using the `extension_extract` method from the pymrio object.
This expect a dict with keys based on the extension names and values as a list of rows (index) to extract.

Lets assume we want to extract value added and all emissions.
We first define the rows (index) to extract:

In [None]:
to_extract = {
    "Factor Inputs": "Value Added",
    "Emissions": [("emission_type1", "air"), ("emission_type2", "water")],
}

And can then use the `extension_extract` method to extract the data, either as a pandas DataFrame,
which returns a dictionary with the extension names as keys

In [None]:
df_extract_all = mrio.extension_extract(to_extract, return_type="dataframe")
df_extract_all.keys()

In [None]:
df_extract_all["Factor Inputs"].keys()

We can also extract into a dictionary of extension objects:

In [None]:
ext_extract_all = mrio.extension_extract(to_extract, return_type="extensions")
ext_extract_all.keys()

In [None]:
str(ext_extract_all["Factor Inputs"])

Or merge the extracted data into a new pymrio Extension object (when passing a new name as return_type):

In [None]:
ext_new = mrio.extension_extract(to_extract, return_type="new_merged_extension")
str(ext_new)

CONT: Continue with explaining, mention the work with find_all etc

### Search and extract

The extract methods can also be used in combination with the [search/explore](./explore.ipynb) methods of pymrio.
This allows to search for specific rows and then extract the data.

For example, to extract all emissions from the air compartment we can use:

In [None]:
match_air = mrio.extension_match(find_all="air")

And then make a new extension object with the extracted data:

In [None]:
air_emissions = mrio.emissions.extract(match_air, return_type="extracted_air_emissions")
print(air_emissions)

For more information on the search methods see the [explore notebook](./explore.ipynb).