# DataMatrix for the Calculator

### The basic idea behind moving from KNIME to Python is the possibility of using n-D data stuctures that we call 'DataMatrix', instead of 2D tables. 
Usually in the standard 2D tables you had variables as columns and 'country/years' combination as rows. Now by default we will use a 3D matrix with country in one dimension, years in another and variables in the third. Python already has an object that allows users to work in multi-dimensions called the 'numpy array' or 'ndarray'.

## Table of Contents

- [Basic Python](###basic-python)
- [Numpy array](###numpy-array)
- [DataMatrix](###datamatrix)

### Basic Python

#### Function definition
Functions in Python are blocks of reusable code that perform a specific task. In this example, the greet function takes one parameter name and prints a greeting message.

In [13]:
# Function example
def greet(name):
    """A function to greet the user."""
    print("Hello, " + name + "!")

#### Lists
Lists in Python are ordered collections of items. In this example, fruits is a list containing strings representing different fruits. You can access elements in a list using their index (position), starting from 0.

In [11]:
# List example
fruits = ["apple", "banana", "orange", "grape", "kiwi"]
# Accessing elements in a list
print("First fruit:", fruits[0])
print("Second fruit:", fruits[1])

First fruit: apple
Second fruit: banana


#### Dictionaries
Dictionaries in Python are collections of key-value pairs. In this example, student is a dictionary containing information about a student such as their name, age, and major. You can access values in a dictionary using their keys.

In [14]:
# Dictionary example
student = {"name": "Alice", "age": 20, "major": "Computer Science"}
# Accessing values in a dictionary
print("Name:", student["name"])
print("Age:", student["age"])
print("Major:", student["major"])

Name: Alice
Age: 20
Major: Computer Science


#### Function call
to execute a function, you call it by its name followed by parentheses containing the arguments (if any) required by the function. In this example, we call the greet function with the argument "John", which will print "Hello, John!".

In [None]:
# Function call
greet("John")

#### Numpy arrays (ndarray)

In [42]:
import numpy as np

# Creating a NumPy array from a list
my_list = [1, 2, 3, 4, 5]
my_array = np.array(my_list)
print("NumPy Array:")
print(my_array)
print()

# Accessing elements of a NumPy array
print("Accessing elements of the NumPy array:")
print("First element:", my_array[0])
print("Last element:", my_array[-1])
print()

# Arithmetic operations with NumPy arrays
print("Arithmetic operations with NumPy arrays:")
print("Original array:", my_array)
print("Adding 1 to each element:", my_array + 1)
print("Multiplying each element by 2:", my_array * 2)
print()

# Creating multidimensional arrays
print("Multidimensional arrays:")
my_2d_array = np.array([[1, 2, 3], [4, 5, 6]])
print("2D NumPy array:")
print(my_2d_array)
print()

# Accessing elements of a multidimensional array
print("Accessing elements of a 2D NumPy array:")
print("Element at row 0, column 1:", my_2d_array[0, 1])
print("Entire first row:", my_2d_array[0, :])
print("Entire second column:", my_2d_array[:, 1])

NumPy Array:
[1 2 3 4 5]

Accessing elements of the NumPy array:
First element: 1
Last element: 5

Arithmetic operations with NumPy arrays:
Original array: [1 2 3 4 5]
Adding 1 to each element: [2 3 4 5 6]
Multiplying each element by 2: [ 2  4  6  8 10]

Multidimensional arrays:
2D NumPy array:
[[1 2 3]
 [4 5 6]]

Accessing elements of a 2D NumPy array:
Element at row 0, column 1: 2
Entire first row: [1 2 3]
Entire second column: [2 5]


### DataMatrix

Let's take for example the total tonnes kilometers of freight transported and the modal-share to distribute the tkm by vehicle type.

In [30]:
from model.common.io_database import read_database  # import function that reads database and return a table (dataframe)
file = 'transport_freight-volume'  # csv file in _database/data/csv
lever = 'freight_tkm'

df_ots, df_fts = read_database(file, lever, level=1)


In [41]:
# Display the table (dataframe)
df_ots

eucalc-name,Country,Years,ots_tra_freight_tkm-total-demand[bn-tkm]
0,Austria,1990,50.836512
1,Austria,1991,50.836512
2,Austria,1992,50.836512
3,Austria,1993,50.836512
4,Austria,1994,50.836512
...,...,...,...
827,Vaud,2011,2.622088
828,Vaud,2012,2.576287
829,Vaud,2013,2.639190
830,Vaud,2014,2.725631


In [32]:
# Convert the table to datamatrix
from model.common.data_matrix_class import DataMatrix  # import the datamatrix class

df_ots.drop(columns=[lever], inplace=True)  # Remove the lever column, we don't need it

datamatrix_ots = DataMatrix.create_from_df(df_ots, num_cat=0)  # this is a fuction defined in the DataMatrix class

You can find the DataMatrix class and its function in PathwayCalc/model/common/data_matrix_class.py. 
**TIP**: If you want to search for a function definition, in PyCharm do CNTRL + shift + F (or cmd + shift + F) and type 'def _function-name_', e.g. 'def create_from_df' and then you click on the line from the list and it will open the file at the right location.

DataMatrix is the by-default class used by the calculator.
DataMatrix object contains the following attributes:
- `array`: numpy array (can be 3D or more)
- `dim_labels`: list \['Country', 'Years', 'Variables', 'Categories1', ..\]
- `col_labels`: dict that associates each dimension with the list of column labels
       dm.col_labels = {
            'Country': ['Austria', 'Belgium', ...];,
            'Years': [1990, 1991, ..., 2015, 2020, ..., 2050],
            'Variables': ['tra_passenger_modal-share', 'tra_passenger_occupancy', ...],
            'Categories1': ['LDV', '2W', 'rail', 'aviation', ...]
            }
- `units`: dict that contains the unit corresponding to each Variable
             e.g. units['tra_passenger_modal-share'] = '%'
- `idx`: dictionary that links every label with the array index position
             e.g. idx['Austria'] = 0
                  idx['Belgium'] = 1
                  idx[1990] = 0
              this is used to access the numpy array e.g.
              dm.array[idx['Austria'], :, idx['tra_passenger_modal-share'], idx['LDV']]
              gives the share of light duty vehicles (cars) in Austria for all years.

You can see the content of datamatrix_ots by executing the code below:

In [40]:
import pprint  # a python package to print nicely
print('dim_labels:')
pprint.pprint(datamatrix_ots.dim_labels)
print('col_labels:')
pprint.pprint(datamatrix_ots.col_labels)
print('units:')
pprint.pprint(datamatrix_ots.units)

The data are stored in the `array`, which is a numpy array of dimensions 32 x 36 x 1. These dimensions stay for 32 countries, 26 years and 1

In [58]:
print('The datamatrix array has dimensions: ', datamatrix_ots.array.shape)
pprint.pprint(datamatrix_ots.array)

The datamatrix array has dimensions:  (32, 26, 1)
array([[[5.08365124e+01],
        [5.08365124e+01],
        [5.08365124e+01],
        [5.08365124e+01],
        [5.08365124e+01],
        [5.08365124e+01],
        [5.08365124e+01],
        [5.08365124e+01],
        [5.08365124e+01],
        [5.08365124e+01],
        [5.08365124e+01],
        [5.16395178e+01],
        [5.24425232e+01],
        [5.32455287e+01],
        [5.40485341e+01],
        [5.48515395e+01],
        [5.63789176e+01],
        [5.78048924e+01],
        [5.92119043e+01],
        [6.05989890e+01],
        [6.21693138e+01],
        [6.29922787e+01],
        [6.38152437e+01],
        [6.46382087e+01],
        [6.54611736e+01],
        [6.62841386e+01]],

       [[1.03792890e+02],
        [1.03792890e+02],
        [1.03792890e+02],
        [1.03792890e+02],
        [1.03792890e+02],
        [1.03792890e+02],
        [1.03792890e+02],
        [1.03792890e+02],
        [1.03792890e+02],
        [1.03792890e+02],
        [1.0

If I want to access the 'ots_tra_freight_tkm-total-demand' of Switzerland in 2015 use the `idx`:


In [56]:
i = datamatrix_ots.idx   # extracts datamatrix_ots.idx dictionary and assigns it to the variable i

swiss_tkm_2015 = datamatrix_ots.array[i['Switzerland'], i[2015], :]  # extract the transport tkm of CH in 2015

unit = datamatrix_ots.units['ots_tra_freight_tkm-total-demand']
print(f'Switzerland freight transport in 2015 was: {swiss_tkm_2015[0]} {unit}')

Switzerland freight transport in 2015 was: 29.317963887158736 bn-tkm


Let's recap the essential code so far:

In [68]:
# Initialise filename and lever
file = 'transport_freight-volume' 
lever = 'freight_tkm'
years_ots = [1990, 1991, 1992, 1993, 1994, 1995, 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015]
years_fts = [2020, 2025, 2030, 2035, 2040, 2045, 2050]

# Read the database as tables
df_ots, df_fts = read_database(file, lever, level=1)

# Remove lever column
df_ots.drop(columns=[lever], inplace=True) 
df_fts.drop(columns=[lever], inplace=True)

# Keep only calculator years
df_ots = df_ots[df_ots['Years'].isin(years_ots)].copy()
df_fts = df_fts[df_fts['Years'].isin(years_fts)].copy()

# Create datamatrix from table
datamatrix_ots = DataMatrix.create_from_df(df_ots, num_cat=0)
datamatrix_fts = DataMatrix.create_from_df(df_fts, num_cat=0)

# Remove 'ots_' and 'fts_' prefix
datamatrix_ots.rename_col_regex(str1='ots_', str2='', dim='Variables')
datamatrix_fts.rename_col_regex(str1='fts_', str2='', dim='Variables')

# Append the fts datamatrix to the ots datamatrix to have only one table
datamatrix_ots.append(datamatrix_fts, dim = 'Years')

# Rename datamatrix_ots as dm since now it contains ots and fts
dm = datamatrix_ots.copy()

In [72]:
i = dm.idx
print('CH, 2015 data:', dm.array[i['Switzerland'], i[2015], i['tra_freight_tkm-total-demand']])
print('CH, 2050 data:', dm.array[i['Switzerland'], i[2015], i['tra_freight_tkm-total-demand']])

CH, 2015 data: 29.317963887158736
CH, 2050 data: 29.317963887158736


In [77]:
# Plot the datamatrix using datamatrix_plot (defined in DataMatrix class)
dm.datamatrix_plot({'Country': 'all', 'Years': 'all', 'Variables': 'all'}, title = 'Freight tkm')