# Practical 4: National Environmental Footprints

In this practical, you will calculate the total environmental footprints of countries. 

The objectives of the practical are:
- To understand the structure of the MR IO system
- Calculate national footprints
- Gain transferable skills:
    - Master slicing

## Exercise 1: Load the data and labels

Import the IO data. The dataset is constructed from real-world data from the year of 2015.

In [1]:
# Import modules
import pandas as pd
import numpy as np

#### 1.1 Import the IO data

In [2]:
# Import Z, Y, V, F
Z = pd.read_csv("data/Z.txt", delimiter="\t", header=None)
Y = pd.read_csv('data/Y.txt', delimiter='\t', header=None)
V = pd.read_csv('data/V.txt', delimiter='\t', header=None)
F = pd.read_csv('data/F.txt', delimiter='\t', header=None)
F_y = pd.read_csv('data/F_y.txt', delimiter='\t', header=None)

#### 1.2 Import and organize your labels

In [3]:
# Import the labels.csv file from the data folder 
labels = pd.read_csv("data/labels/labels.csv")
labels

Unnamed: 0,region_code,region_name,sector_code,sector_category,final_demand_code,final_demand_category,value_added_code,value_added_category,extension_code,extension_name
0,R1,OECD,S1,Food,F1,Final consumption expenditure by household,V1,value_added,E1,CO2 emissions (unit: tonnes/year)
1,R2,BRICS,S2,Clothing,F2,Final consumption expenditure by NPISHs,,,E2,Blue water consumption (unit: million m3/year)
2,R3,ROW,S3,Shelter,F3,Final consumption expenditure by government,,,E3,Employment (unit: 1000 people/year)
3,,,S4,Construction,F4,Gross capital formation,,,,
4,,,S5,Manufactured products,,,,,,
5,,,S6,Mobility,,,,,,
6,,,S7,Trade,,,,,,
7,,,S8,Services,,,,,,


In [4]:
# Clean labels for value added and for the extensions
va_lb = labels.value_added_category.dropna()
f_lb = labels.extension_name.dropna()



In [10]:
# Import the multi_reg_sectors.csv file from the data folder 
mr_sec_labels = pd.read_csv("data/labels/multi_reg_sectors.csv", delimiter = ",")
mr_sec_labels

Unnamed: 0,region,sector
0,OECD,Food
1,OECD,Clothing
2,OECD,Shelter
3,OECD,Construction
4,OECD,Manufactured products
5,OECD,Mobility
6,OECD,Trade
7,OECD,Services
8,BRICS,Food
9,BRICS,Clothing


In [11]:
# Import the multi_reg_final_demand.csv file from the data folder 
mr_fd_labels = pd.read_csv("data/labels/multi_reg_final_demand.csv", delimiter = ',')
mr_fd_labels

Unnamed: 0,region,final_demand_category
0,OECD,Final consumption expenditure by household
1,OECD,Final consumption expenditure by NPISHs
2,OECD,Final consumption expenditure by government
3,OECD,Gross capital formation
4,BRICS,Final consumption expenditure by household
5,BRICS,Final consumption expenditure by NPISHs
6,BRICS,Final consumption expenditure by government
7,BRICS,Gross capital formation
8,ROW,Final consumption expenditure by household
9,ROW,Final consumption expenditure by NPISHs


Assign the labels to Z to see how the data is structured

tip:  

<code>Z.columns = </code>

<code>Z.index = </code>

In [13]:
Z.columns = mr_sec_labels
Z.index = mr_sec_labels
Z

Unnamed: 0,"(OECD, Food)","(OECD, Clothing)","(OECD, Shelter)","(OECD, Construction)","(OECD, Manufactured products)","(OECD, Mobility)","(OECD, Trade)","(OECD, Services)","(BRICS, Food)","(BRICS, Clothing)",...,"(BRICS, Trade)","(BRICS, Services)","(ROW, Food)","(ROW, Clothing)","(ROW, Shelter)","(ROW, Construction)","(ROW, Manufactured products)","(ROW, Mobility)","(ROW, Trade)","(ROW, Services)"
"(OECD, Food)",900389.97,7116.2579,2854.3939,13257.049,46574.18,2892.3325,1886.7033,265499.2,22643.474,3135.7552,...,34.491289,11503.72,54038.675,2686.7443,2841.9367,1994.5645,24452.97,1599.7089,1343.6648,25968.44
"(OECD, Clothing)",1650.2952,93490.1,684.95479,4392.1856,22066.78,255.44197,258.2817,17859.68,30.788548,8461.3499,...,3.345165,849.9656,1028.166,15286.018,623.6473,1056.9763,11540.94,458.48292,1121.1862,4209.014
"(OECD, Shelter)",60010.394,24290.853,389328.34,161156.12,336457.9,195116.86,4368.3769,414675.3,1025.2358,289.11479,...,115.37439,1432.888,1595.5398,1029.6405,8532.8224,5875.0751,10101.61,5412.202,464.24195,5977.449
"(OECD, Construction)",25485.023,3492.0288,58962.171,737722.47,228561.7,24101.183,2808.6036,447409.5,199.00665,24.30559,...,22.811337,1056.922,1216.7844,529.29682,1188.5423,17503.426,13981.68,785.8398,366.26558,5912.369
"(OECD, Manufactured products)",158978.6,44027.016,89108.882,465569.4,3493587.0,70836.613,25772.136,1038022.0,10905.157,7731.6461,...,1117.9153,32919.42,27862.458,4878.9846,25074.623,106727.54,513724.4,29853.137,23686.922,126412.7
"(OECD, Mobility)",162815.51,25646.552,107856.19,156855.74,388692.0,849444.61,44312.307,770904.2,3452.0377,860.47592,...,1335.8464,7508.478,9980.1168,1402.179,10624.89,14149.367,12535.43,50934.944,6208.9413,27325.24
"(OECD, Trade)",254235.17,45172.793,115221.04,152184.42,643237.1,190640.73,107747.42,517820.4,2761.8323,262.28486,...,130.51804,3066.925,3077.7672,1813.1738,2034.8295,3576.2306,10293.15,4559.7601,3690.4482,14926.42
"(OECD, Services)",510136.01,63522.804,283365.99,652310.73,1457110.0,443516.33,87546.196,8445863.0,11293.723,4486.171,...,3796.3813,96717.59,5484.3791,2648.0756,12847.646,14117.682,26177.7,9090.7123,12186.951,129385.8
"(BRICS, Food)",18630.91,392.92859,177.17052,653.47019,2169.41,122.76635,64.429012,6780.606,949321.55,100725.3,...,1472.159,298004.6,31395.95,1436.9998,2506.0942,1799.6444,21794.76,1153.819,819.18283,15590.4
"(BRICS, Clothing)",334.05133,11847.968,211.57147,987.71458,6827.371,139.64478,29.258034,4431.914,3344.6053,516363.72,...,405.11739,76621.17,2313.723,24374.939,933.5234,2666.9804,18990.64,930.11239,3627.5137,9399.195


#### 1.3 Brief intro to multi-indexing
Multi-indexes are hierarchical indexes useful in the grouping and slicing of columns and rows according to given common labels/features

In [8]:
# First we assign the multiindexes to Z using the MultiIndex.from_frame method
Z.columns = None
Z.index = None
Z

#Index idenfitication 
Z.iloc[:,7]

TypeError: Index(...) must be called with a collection of some kind, None was passed

You can slice dataframes with multi-indexes by using 

<code>Z.loc[pd.IndexSlice[first_level, second_level],:]</code>


In [14]:
# Slice a dataframe with a multiindex
Z.columns = pd.MultiIndex.from_frame(mr_sec_labels)
Z.index = pd.MultiIndex.from_frame(mr_sec_labels)
Z.iloc[:, 7]
Z.loc[:,"OECD"]
Z.loc[pd.IndexSlice["OECD", "Food"],:]

region  sector               
OECD    Food                     900389.970000
        Clothing                   7116.257900
        Shelter                    2854.393900
        Construction              13257.049000
        Manufactured products     46574.185000
        Mobility                   2892.332500
        Trade                      1886.703300
        Services                 265499.210000
BRICS   Food                      22643.474000
        Clothing                   3135.755200
        Shelter                    3962.018900
        Construction                523.815020
        Manufactured products      5174.119200
        Mobility                   3632.647000
        Trade                        34.491289
        Services                  11503.715000
ROW     Food                      54038.675000
        Clothing                   2686.744300
        Shelter                    2841.936700
        Construction               1994.564500
        Manufactured products 

## Exercise 2: Calculate your IO system 

#### 2.1 First convert everything to numpy objects
It will facilitate calculations

In [15]:
Z = Z.values
Y = Y.values
V = V.values
F = F.values
F_y = F_y.values
#when working with linear operations use Numpy, use Pandas for quick statistical analysis 

#### 2.2 Calculate your total product inputs and outputs and check that they match

In [16]:
X_out = Z.sum(axis=1) + Y.sum(axis=1)
X_in = Z.sum(axis=0) + V


In [17]:
# Check if X_out and X_in are the same 
print(f"The inputs equal the outputs: {np.allclose(X_in, X_out)}")

The inputs equal the outputs: True


#### 2.3 Calculate the Leontief inverse of quantity model
Calculate the Leontief Inverse matrix: 

$L=(I-A)^{-1}$

(Note: calculate the Technical Coefficient Matrix A first, the Z matrix normalised by output.)

In [18]:
# Calculate the technical coefficient matrix A
inv_diag_X = np.linalg.inv(np.diag(X_out))
A = Z @ inv_diag_X

# Make an identity matrix of the same order of A
I = np.identity(A.shape[0])
#A.shape knows the shape of A and makes a similar identity matrix 

# Calculate the Leontief inverse
L = np.linalg.inv(I-A)

#### 2.4 Verify the correctness of the product output

In [19]:
# Calculate the total product output
X = L @ Y.sum(axis=1)

# Check if X is the same as X_out
print(f"The calculated X is right : {np.allclose(X, X_out)}")

The calculated X is right : True


## Exercise 3: Carbon footprints

a. Calculate the carbon footprint of the the 3 regions.
(Note: EF = fLY + F_y)

b. Trace the OECD’ carbon footprints to producing sectors and regions.



<!---(b. Compare them with their territorial CO2 emissions.
(Note: F and Fhh are constructed from territorial, production-based perspective))--->

#### 3.1 Calculate the extensions intensities 
Calculate the "extensions intensities" vectors which contain information on environmental and non-environmental variables:
- blue water consumption (million m3, or Mm3)
- CO2 emissions (metric tons, or tonnes)
- employment (1000 people) per €1 million output.

$f = F \hat{X}^{-1}$

In [20]:
# Extensions intensity vector
f = F@inv_diag_X
f

array([[1.19591456e+02, 1.05704042e+02, 1.68034108e+03, 1.37560063e+02,
        8.26587942e+01, 3.68559600e+02, 2.21263515e+01, 1.90910156e+01,
        1.27881585e+02, 5.21609038e+01, 3.81620195e+03, 5.83373486e+02,
        2.83042732e+02, 3.95871223e+02, 3.09589244e+01, 5.72265700e+01,
        1.57518807e+02, 6.24633463e+02, 2.28823648e+03, 3.67142312e+02,
        1.53039059e+02, 4.21919961e+02, 5.52783972e+01, 5.01649798e+01],
       [4.70336694e-02, 9.25707436e-04, 2.61659187e-03, 1.45991910e-04,
        5.36745634e-04, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
        1.67945738e-01, 1.46266304e-03, 4.21965848e-03, 3.05770865e-04,
        5.87560904e-03, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
        1.44962429e-01, 8.25264614e-03, 1.65220315e-03, 4.53919319e-04,
        1.51040389e-03, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00],
       [6.72098856e-03, 7.15466787e-03, 5.39146304e-03, 7.30658590e-03,
        3.61199748e-03, 3.95822614e-03, 2.04553341e-02, 7.8145

#### 3.2 Prepare your regional final demand vectors

Two programming approaches

- Mathematical approach: Isolate the consumption of a sigle country using matrix algebra

- Slicing approach: Isolate the consumption of a single country using slicing methods from numpy and pandas

Let's first assign labels to your Y matrix so that it's easiery to analyse results

In [21]:
# Turn your Y into a dataframe with labels to facilitate analysis
Y = pd.DataFrame(Y, index = pd.MultiIndex.from_frame(mr_sec_labels), columns = pd.MultiIndex.from_frame(mr_fd_labels)) 
Y

Unnamed: 0_level_0,region,OECD,OECD,OECD,OECD,BRICS,BRICS,BRICS,BRICS,ROW,ROW,ROW,ROW
Unnamed: 0_level_1,final_demand_category,Final consumption expenditure by household,Final consumption expenditure by NPISHs,Final consumption expenditure by government,Gross capital formation,Final consumption expenditure by household,Final consumption expenditure by NPISHs,Final consumption expenditure by government,Gross capital formation,Final consumption expenditure by household,Final consumption expenditure by NPISHs,Final consumption expenditure by government,Gross capital formation
region,sector,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2
OECD,Food,1753360.0,1610.136,5428.022,45566.9,17297.06,1872.0771,491.3126,2303.218,97947.92,88.19524,223.4574,1864.093
OECD,Clothing,288486.9,12.90528,2499.307,12133.65,5679.645,37.996041,29.27748,160.5304,32045.4,209.2432,83.63353,1310.921
OECD,Shelter,662214.6,1939.06,22445.46,117857.0,7094.673,11.587942,59.81397,247.8947,2579.657,25.98399,315.1498,1421.052
OECD,Construction,68488.3,134.9671,2973.342,3064120.0,551.3223,7.777603,37.17852,5940.461,2675.906,0.08931081,7.002039,12952.62
OECD,Manufactured products,1786986.0,4629.975,127687.8,2092678.0,44718.05,355.11304,440.1693,202829.5,208064.0,600.9225,3344.385,304226.8
OECD,Mobility,1157830.0,9812.754,58825.04,27611.03,4606.01,147.93916,1060.299,384.0281,39965.49,37.58608,558.8208,2045.063
OECD,Trade,400433.7,2697.264,5440.243,59792.34,16959.9,1720.6943,1353.788,1899.609,8266.261,200.2523,435.0525,3657.033
OECD,Services,10700680.0,2968987.0,5909440.0,1191428.0,38804.9,7956.0305,3881.043,3957.353,84414.85,3059.385,11674.93,31172.16
BRICS,Food,22234.31,14.16137,149.9334,642.1399,1074379.0,26847.266,22238.21,77952.18,37696.48,13.06551,64.41736,466.3712
BRICS,Clothing,107713.2,0.000248236,37.6694,1020.318,278363.0,3167.5208,3134.867,18381.93,64586.76,136.0595,96.60266,2013.962


In the mathematical approach you first create a vector of ones which you modify and then multiply by the final demand matrix

$ \mathbf{Y^{reg}} = \mathbf{Y} \mathrm{\hat{i_j}^{reg}}$

In [22]:
# 1 Make vector of zeros of the same length of the columns of Y using np.zeros
i_j = np.zeros(Y.shape[1])

# 2 Turn into 1's the zeros of the columns you want to analyse
i_j[:4] = 1 

# 3 multiply the diagonlized i_j vector by Y 
Y_r1 = Y@np.diag(i_j)

Y_r1

Unnamed: 0_level_0,Unnamed: 1_level_0,0,1,2,3,4,5,6,7,8,9,10,11
region,sector,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
OECD,Food,1753360.0,1610.136,5428.022,45566.9,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
OECD,Clothing,288486.9,12.90528,2499.307,12133.65,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
OECD,Shelter,662214.6,1939.06,22445.46,117857.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
OECD,Construction,68488.3,134.9671,2973.342,3064120.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
OECD,Manufactured products,1786986.0,4629.975,127687.8,2092678.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
OECD,Mobility,1157830.0,9812.754,58825.04,27611.03,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
OECD,Trade,400433.7,2697.264,5440.243,59792.34,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
OECD,Services,10700680.0,2968987.0,5909440.0,1191428.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
BRICS,Food,22234.31,14.16137,149.9334,642.1399,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
BRICS,Clothing,107713.2,0.000248236,37.6694,1020.318,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In the slicing approach you take advantage of the slicing methods built within pandas and numpy

In [23]:
Y_oecd = Y.loc[:,"OECD"]
#changes the size of the matrix, could create problems with calculating 
Y_oecd

Unnamed: 0_level_0,final_demand_category,Final consumption expenditure by household,Final consumption expenditure by NPISHs,Final consumption expenditure by government,Gross capital formation
region,sector,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
OECD,Food,1753360.0,1610.136,5428.022,45566.9
OECD,Clothing,288486.9,12.90528,2499.307,12133.65
OECD,Shelter,662214.6,1939.06,22445.46,117857.0
OECD,Construction,68488.3,134.9671,2973.342,3064120.0
OECD,Manufactured products,1786986.0,4629.975,127687.8,2092678.0
OECD,Mobility,1157830.0,9812.754,58825.04,27611.03
OECD,Trade,400433.7,2697.264,5440.243,59792.34
OECD,Services,10700680.0,2968987.0,5909440.0,1191428.0
BRICS,Food,22234.31,14.16137,149.9334,642.1399
BRICS,Clothing,107713.2,0.000248236,37.6694,1020.318


N.b in this case we don't need to use pd.IndexSlice because OECD is at top level of the hierarchy of the multi-index

However if we needed to select the "Shelter" sector in the region we would need to use 

<code>Y.loc[pd.IndexSlice["OECD", "Shelter"]]</code>

because the sector is in this case is at a lower hierarchy level

#### Let's check the row sum to compare with the second approach

In [24]:
pd.concat([Y_r1.sum(axis=1), Y_oecd.sum(axis=1)],axis=1)

Unnamed: 0_level_0,Unnamed: 1_level_0,0,1
region,sector,Unnamed: 2_level_1,Unnamed: 3_level_1
OECD,Food,1805965.0,1805965.0
OECD,Clothing,303132.8,303132.8
OECD,Shelter,804456.1,804456.1
OECD,Construction,3135716.0,3135716.0
OECD,Manufactured products,4011982.0,4011982.0
OECD,Mobility,1254078.0,1254078.0
OECD,Trade,468363.5,468363.5
OECD,Services,20770530.0,20770530.0
BRICS,Food,23040.55,23040.55
BRICS,Clothing,108771.2,108771.2


The mathematical approach is preferable for multiple reasons:

- it maintains the order (i.e., shape) of Y unalterd

- it can be easily combined with the isolation of a specific sector footprint such that: $\mathbf{Y^{reg}_{sec}}  = \mathrm{\hat{i_{i}^{sec}}} \mathbf{Y^{reg}}$

- it relies on the same principle for final demand changes

- it's not affected by typos and inconsistencies in labels


In [25]:
# 1 vector of zeros of the same length of the rows
i_i = np.zeros(Y.shape[0])

# 2 turn into 1's all the 0's that concerns rows (sectors) your want to analyse
i_i[3] = 1
# 3 multiply the diagonlized i_i vector by the matrix Y_r1 
Y_r1_sec2 = np.diag(i_i) @ Y_r1

Y_r1_sec2

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11
0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,68488.298,134.96705,2973.3418,3064119.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


#### 3.3 Let's combine the two approaches and calculate the footprints of the OECD region

In [26]:
# 1 We remake our vector of zeros of the same length of the columns of Y 
i_j_ = np.zeros(Y.shape[1])

In [27]:
# 2 we make i_j_oecd as a series with the multi-indexes we need
i_j_oecd = pd.DataFrame(i_j_, index = pd.MultiIndex.from_frame(mr_fd_labels))

In [28]:
# 3 We modify i_j_oecd to display 1's where the OECD index is present
i_j_oecd.loc["OECD"] = 1

In [35]:
# 4 multiply the diagonlized i_j vector by Y 
OECDfp = i_j_oecd(np.diag(i_j_oecd)) @ Y

TypeError: 'DataFrame' object is not callable

In [None]:
# 5 We sum the final demand over the rows either with .sum(axis=1) 
# or by multiplying Y_oecd by a vector of ones of the length of rows
Y_oecd = None
Y_oecd

Let's calculate the total product output of the OECD region

$\text{X} = \mathbf{L}\text{Y}^{oecd}$

In [None]:
X_oecd = None

Multiply the product output vector by extention intensities and then sum the extensions of final demand

$F = \text{f} \mathbf{L}\text{Y} + F_{y}$

In [None]:
# In this case we diagonalize X_oecd because f in our case is a matrix with multiple extension intensities
F_oecd_s = None
# Add labels
F_oecd_s = None
F_oecd_s

In [None]:
# F_y doesn't have any labels
F_y

In [None]:
# add labels to the final demand extensions so that we can easily select the regions
F_y =  None
F_y

In [None]:
# We sum the sectoral extensions to the final demand extensions
F_oecd_tot = None
F_oecd_tot

#### 3.4 Calculate the results for all the regions

Copy and paste the steps we followed and modify the names of the variables to be generic

In [None]:
# we first make a dictionary to store our results
regional_footprints = {}

# We create the zero vector
i_j_ = None

# We create a for loop iterating through the available regions
# and applying the discussed procedures to calculate the regional footprints