# Practical 5: Introduction to exiobase

In this practical, you will learn how to load and work with exiobase

You can download exiobase through this link https://zenodo.org/record/5589597
 
We will work with: IOT_2019_pxp.zip

The objectives of the practical are:
- See the data available within exiobase
- Calculate footprints using exiobase
- Make a visualization

## Exercise 1: Load the data

Import the IO exiobase data for the year 2019 in product-by-product format

In [None]:
# Import modules
import pandas as pd
import numpy as np

#### 1.1 Import exiobase

Beware: exiobase is composed by large datasets so it may take some time to load and process

In [None]:
A = None  # A matrix
Y = None  # Y matrix
F = None  # impacts matrix
F_hh = None  # impacts for FD matrix

#### 1.2 Look at the available labels in exiobase
You may do this by printing the labels of your imported matrices or by opening the following files in your data folder:
- finaldemands.txt
- products.txt
- impacts/unit.txt

In [None]:
units = None
units

Since we don't have a file showing all individual regions. 
Here is a code example of how you can get a list of all the regions within exiobase

In [None]:
# First we collect all labels from A
A_labels = None
A_labels

In [None]:
# .to_frame to turn the collected labels into a dataframe
A_labels = None
A_labels

A_labels is composed by two columns "region" and "sector"
by doing 

> A_labels.region 

or 

> A_labels.sector 

you can access the specific columns 

N.b.
it is the equivalent of doing 

> A_labels.loc[:, "region"]

How do we know how many regions, sectors or categories do we have available?

In [None]:
# Then we extract region column and eliminate any duplicate labels
# We do this because the labels are replicated for each sectoral category in each region
regions_labels = None

# We print the regional labels so that we can see the regions we have to work with
regions_labels

In [None]:
#Collect the sectors labels and get the u
sectors_labels = None

# Print your labels to analyse them (remember .to_frame)
sectors_labels

## Exercise 2: Calculate the rest of the IO variables (I, L, x)

#### 2.1 First we calculate the Leontief inverse

In [None]:
I = None # A.shape[0] is the total number of columns in the A matrix
L = None

#### 2.2 We calculate our product output x

In [None]:
x = None

# A check to see whether the resulting x is right
print(x.shape)
print(x)

## Exercise 3: Create a matrix of extension intensities 

When working with real data, you will find cases in which the produt output vector x contains 0's 

If you try to invert the diagonalized product output, it will tell you that you cannot invert a Singular matrix.

There are various reasons why a matrix may be singular but in our case it is due to the fact that not all values are non-zeros

You may then be tempted to perform 1/x, however, this may results in several divisions by 0's and the resulting matrix will be filled with NaN values or inf values.

One way to get around this: Divide 1 by the values that are non-0 as shown in the following example

In [None]:
# we make a copy of our product output vector which we will call x_inv
x_inv = None

# we divide 1 by the values that are non-0
x_inv[x_inv!=0] = 1/x_inv[x_inv!=0]

We calculate our intensities (i.e., extension coefficients)

In [None]:
# We are essentially dividing the total extension by the product output
# This gives us coefficients of extension by unit of output (e.g., kg/euro)
f = None

f

N.b. inverting a matrix is a more complex operation than just dividing 1 by the values in your matrix. 

However, in the case of a diagonalized vector with non-zero values along the diagonal 1/diag(x) and inv(diag(x)) output the same results. 

If you have zero's in x vector to be diagonalized then you will not be able to perform the inversion.    

## Exercise 4: Total footprint of the Netherlands


- *What is the total carbon footprint of the Netherlands?*


$\text{F} = \text{f} \mathbf{L}\text{Y} + \text{F}_{hh}$

#### 4.1 We first create a (modified) final demand matrix

4.1.1 Lets identify the range of the Y columns concerning the Netherlands

In [None]:
# we know NL is the 20th country (python counting starting from 0) in the list of countries 
# and that we have 7 final demand categories, therefore

start_NL = None
end_NL = None

4.1.2 We calculate the modified Y

You can slice your Y by using pandas iloc method 

> Y.iloc[:,start_NL:end_NL]

In [None]:
Y_mod = None
Y_mod

Or by using the labels through pandas loc method

> Y.loc[:, "NL"]

In [None]:
Y_mod = None
Y_mod

#### 5.1 We isolate the extension in which we are interested

For this exercise we only focus on the carbon fooprint

*"GHG emissions (GWP100) | Problem oriented approach: baseline (CML, 2001) | GWP100 (IPCC, 2007)"* in kg

In [None]:
indicator = ""

In [None]:
# the intensity vector in which we are interested
f_ =  None

f_

In [None]:
# the final demand CO2 emissions

e_hh_ = None

#### 5.2 We calculate the total footprint of the region

In [None]:
# Calculate the total global footprint
e_total_reg = None
e_total_reg

## Exercise 5: Which regions emit the most CO2 as a result of final consumption in the Netherlands?

#### 5.3 Let's analyse in which regions CO2 is emitted the most as a result of NL consumption

In [None]:
# In this case we diagonalize the emission intensity vector 
e_breakdown = None

In [None]:
# we apply the sectoral labels
e_breakdown.index = None

e_breakdown

In [None]:
# Right now, we are only interested in the impacts of regions, so so we sum across axis 1
e_prod_reg = None

In [None]:
# and then we sum the results by regions by using the groupby methods
e_regional_breakdown = None

In [None]:
# We sort the results from largest to smallest to see the most impacting regions
e_rb_sorted = None

e_rb_sorted

## Exercise 6: Let's plot the results for the top 15 emitters 

Using pandas you can make simple visualizations directly from dataframes and series

see more here: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.plot.html

#### 6.1 Totals of the top 15 emitters

In [None]:
# plot your results with plot.bar()
top_15_results = None

top_15_plot = None

# applying bar labels
top_15_plot = None

#### 6.2 Let's normalize results by the total footprint of NL consumption

In [None]:
# Normalize your results
e_rb_sorted_norm = None

In [None]:
# Plot top 15 regions
top_15_norm_plot = None

# applying bar labels
top_15_norm_plot = None