# MoTMo Sensitibity Analysis

The main goal of sensitibity analysis is try to find how output variance changes with respect to inputs. In our model, the outputs are the emissions and the mobility choices, and the inputs are each of the scenarios. We transformed the input space as described in the report, and we ended up with thre variable inputs that represent the three categories: investment - $X_I$, policies - $X_P$, and events - $X_E$. These variables are represented by binary strings, and we made a further transform from binary to the unit interval.

We can see this model as a function $f(X)=Y$, where the inputs are the scenarios, so $X=(X_I,X_P,X_E)\in\mathbb{R}^3$ and the output is a real value, that in our case can be the emissions or the mobility choices. First, we are going to do an analysis emissions, followed by the mobility choices.

The idea consists in estimating certain variances that we are going to define later in order to compute some sesitivity indices (there are first and second order sensitivity indices). We write
$$Var(Y)=\sum_{i=1}^dV_i+\sum_{i<j}^dV_{ij}+\ldots+V_{1,2, \ldots,d}$$where
$$V_i=Var_{x_i}(E_{X_{\sim i}}(Y|X))$$where the $X_{\sim i}$ notation means the set of all variables except $x_i$. One condition to compute these indices is that the input variables $X_i$ are uniformly distributed and mutually independent.

In [1]:
import numpy as np
import random
import itertools
import pandas as pd

import MoTmo as mo


### Input Space
For the moment, we decided to take as input variables each of the categories (investment: $X_I$, policies: $X_P$ and events: $X_E$). In each of these categories there is a set of options that can be either turned ON or OFF, with the condition that on each of the categories a maximum of two options are turned ON. Thus, we can see each of the input variables as all the set of boolean strings of the given length such that they have no more than two ones. For example, for inverstment $(1,0,0)$ simbolizes any sceario that has the option of *Charging infraestructure* turned ON, while the others, *Public transport subsidy* and *Electric vehicle subsidy* are OFF. 

### Outputs
The natural outputs are: total emissions, mobility choices between cars: public transport (`stock_P`), electric cars (`stock_E`), combustion cars (`stock_C`), non-motorized (`stock_N`) and shared vehicles (`stock_S`).

For the scope of this report, we did not take each of the options inside the categories as an input variable given that they are not mutually independent (since we cannot have more than two options ON). This is a further research that can be done in order to describe in more detail the model. Therefore, the sensitibity analysis performed here is going to suggest how sensitive the output is given the input variables ($X_I, X_P, X_E$). For example, and as we are going to see later, the analysis suggests that the total emissions output is more susceptible to changes in policies ($X_P$), and that the factors of policies and events ($X_P$ and $X_E$, resp.) when combined are the most influential for the output.

A drawback, and as mentioned earlier, is that this analysis does not tell us nor suggests which options inside the categories are the most influential. For example, the options inside the Policies category are *Car weight regulation*, *Bike friendliness* and *Urban combustion restriction*, but with the result given we cannot say much about which of these options are the most "important" for the output of the model. Managing to make a sensitibity analysis in the same fashion as we did here for each of these options, not only requires different theoretical framework, but it is more computationally expensive; there are some estimators for when the model is too complex, but given the dependency of these options, we should modify them accordingly. Thus, for the moment, we will focus on the main category inputs, rather than the individual options.

In [6]:
# some global variables
num_options_per_category = {
    'investment' : 3,
    'policy' : 3,
    'event' : 4
}

output_vars = ['stock_C','stock_E','stock_N','stock_P','stock_S','total_emissions']

In [7]:
df2 = pd.read_csv('total_sums_all_variables.csv',index_col=0)
df2

## First-order indices
This index is given by $$S_i= \frac{V_i}{Var(Y)}$$where $V_i=Var_{x_i}(E_{X_{\sim i}}(Y|X))$. Notice that this form gives us a direct (possible) interpretation: "it is the fractional reduction in the variance of $Y$ which would be obtained on average if $X$ could be fixed".

In other words, the expected value $E_{X_{\sim i}}(Y|X)$ computes the mean of all inputs while fixing $X_i$, so the following function `get_input_fix_one` gives us all the input space resulting of fixing an input (`boolean_tuple`) of certain category (can take values 0, 1 or 2, representing investment, policies or events).

In [26]:
def get_input_fix_one(boolean_tuple, category):
    if category == 2 and len(boolean_tuple) !=4:
        print("ERROR: the boolean input does not match Event length, which is 4 (for example, (0,1,0,0)!")
    elif (category ==0 or category ==1) and len(boolean_tuple)!=3:
        print("ERROR: the boolean input length does not match the category. It must be of length 3 (for example, (1,0,0))")
    else:
        input_space = mo.generate_input_space_bool(num_options_per_category)
        reduced_input = [x for x in input_space if x[category]== boolean_tuple]
        return reduced_input

In [14]:
# uncomment the following line to see an example
# get_input_fix_one(boolean_tuple=(1,0,1), category=0)

[((1, 0, 1), (0, 0, 0), (0, 0, 0, 0)),
 ((1, 0, 1), (0, 0, 0), (0, 0, 0, 1)),
 ((1, 0, 1), (0, 0, 0), (0, 0, 1, 0)),
 ((1, 0, 1), (0, 0, 0), (0, 0, 1, 1)),
 ((1, 0, 1), (0, 0, 0), (0, 1, 0, 0)),
 ((1, 0, 1), (0, 0, 0), (0, 1, 0, 1)),
 ((1, 0, 1), (0, 0, 0), (0, 1, 1, 0)),
 ((1, 0, 1), (0, 0, 0), (1, 0, 0, 0)),
 ((1, 0, 1), (0, 0, 0), (1, 0, 0, 1)),
 ((1, 0, 1), (0, 0, 0), (1, 0, 1, 0)),
 ((1, 0, 1), (0, 0, 0), (1, 1, 0, 0)),
 ((1, 0, 1), (0, 0, 1), (0, 0, 0, 0)),
 ((1, 0, 1), (0, 0, 1), (0, 0, 0, 1)),
 ((1, 0, 1), (0, 0, 1), (0, 0, 1, 0)),
 ((1, 0, 1), (0, 0, 1), (0, 0, 1, 1)),
 ((1, 0, 1), (0, 0, 1), (0, 1, 0, 0)),
 ((1, 0, 1), (0, 0, 1), (0, 1, 0, 1)),
 ((1, 0, 1), (0, 0, 1), (0, 1, 1, 0)),
 ((1, 0, 1), (0, 0, 1), (1, 0, 0, 0)),
 ((1, 0, 1), (0, 0, 1), (1, 0, 0, 1)),
 ((1, 0, 1), (0, 0, 1), (1, 0, 1, 0)),
 ((1, 0, 1), (0, 0, 1), (1, 1, 0, 0)),
 ((1, 0, 1), (0, 1, 0), (0, 0, 0, 0)),
 ((1, 0, 1), (0, 1, 0), (0, 0, 0, 1)),
 ((1, 0, 1), (0, 1, 0), (0, 0, 1, 0)),
 ((1, 0, 1), (0, 1, 0), (

The following function computes the variances of the means of the given category and output variable. `sum_df` is the dataframe that contais the sum of all variables.

In [92]:
def get_variance_fixed_xi(category,output_variable, sum_df):
    # computes V_xi of the desired output variable
    expectations = []
    if category == 0 or category == 1:
        num_options = 3
    elif category == 2:
        num_options = 4
    input_bool_category = mo.valid_scenarios_for_category(num_options)
    for input_cat in input_bool_category:
        reduced_boolean_input = get_input_fix_one(input_cat,category)
        mask = [mo.get_scenario_string(x) for x in reduced_boolean_input]
        temp_df = sum_df[[output_variable]].loc[mask]
        expected_xi = temp_df[output_variable].mean()
        expectations.append(expected_xi)
    var_xi = np.var(expectations)
    return var_xi

In [93]:
# example for variable of mobility choice of combustion cars, and policies category (1)
get_variance_fixed_xi(category=1,output_variable="stock_C", sum_df=df2)

5554685331202.368

With the above functions, we can comput the $V_i$s. Remember that the 1st order indices are given by $$S_i= \frac{V_i}{Var(Y)}$$Thus, the following function computes the indices of all input factors (variables) and stores it in a dataframe.

In [20]:
def first_order_indices(sum_df, output_vars=output_vars):
    index_dict = {"S_I":[],"S_P":[],"S_E":[]}
    sobol_names_dict = {"S_I":0,"S_P":1,"S_E":2}
    for out_var in output_vars:
        var_Y = sum_df[out_var].var()
        ind_list = []
        for S,i in sobol_names_dict.items():
            v_xi = get_variance_fixed_xi(i,out_var,sum_df)
            S_i = v_xi / var_Y
            ind_list.append(S_i)
            index_dict[S].append(S_i)
    df = pd.DataFrame.from_dict(index_dict, orient='index',columns=output_vars)
    return df

In [94]:
first_Si = first_order_indices(sum_df=df2)
first_Si

Unnamed: 0,stock_C,stock_E,stock_N,stock_P,stock_S,total_emissions
S_I,0.144434,0.487024,0.514822,0.534556,0.058319,0.005789
S_P,0.426486,0.150791,0.380571,0.160698,0.229341,0.9399
S_E,0.411576,0.280636,0.089221,0.288213,0.538039,0.042712


#### Interpretation
Refer to the report.

## Second-order indices
The second-order indices measure the interaction of pairs of inputs:
$$S_{ij}=\frac{V_{ij}}{Var(Y)}$$where 
$$V_{ij}=Var_{X_{ij}}(E_{X_{\sim ij}}(Y|X_i,X_j))-V_i-V_j$$


In [31]:
def get_input_fix_two(boolean_tuples, categories):
    # 'boolean_input' is a pair of tuples (one for each category)
    # An example, ((1,0,0),(0,1,0)) for the investment and policies categ.
    # It gives us the input space of fixing inputs of 2 categories.
    reduced_space1 = get_input_fix_one(boolean_tuples[0],categories[0])
    reduced_space2 = [x for x in reduced_space1 if x[categories[1]]== boolean_tuples[1]]
    return reduced_space2

def get_all_pairs_two_categ(cat1,cat2):
    # gives us all the combinations of boolean strigns of two categories.
    if cat2<2:
        num_ops_dict={0:3,1:3}
    else:
        num_ops_dict={0:3,1:4}
    all_pairs_categ = mo.generate_input_space_bool(num_ops_dict)
    return all_pairs_categ

In [95]:
# example of an input on the categories of Investment (0) and Policies (1)
get_input_fix_two(boolean_tuples=((1,0,0),(0,0,0)), categories=(0,1))

[((1, 0, 0), (0, 0, 0), (0, 0, 0, 0)),
 ((1, 0, 0), (0, 0, 0), (0, 0, 0, 1)),
 ((1, 0, 0), (0, 0, 0), (0, 0, 1, 0)),
 ((1, 0, 0), (0, 0, 0), (0, 0, 1, 1)),
 ((1, 0, 0), (0, 0, 0), (0, 1, 0, 0)),
 ((1, 0, 0), (0, 0, 0), (0, 1, 0, 1)),
 ((1, 0, 0), (0, 0, 0), (0, 1, 1, 0)),
 ((1, 0, 0), (0, 0, 0), (1, 0, 0, 0)),
 ((1, 0, 0), (0, 0, 0), (1, 0, 0, 1)),
 ((1, 0, 0), (0, 0, 0), (1, 0, 1, 0)),
 ((1, 0, 0), (0, 0, 0), (1, 1, 0, 0))]

In [36]:
def get_Exp_fixed_ij(sum_df, output_variable, cat_tuple=(0,1)):
    bool_cat_tuple=get_all_pairs_two_categ(cat_tuple[0],cat_tuple[1])
    expectations_list=[]
    for boolean_input in bool_cat_tuple:
        reduced_input = get_input_fix_two(boolean_input,cat_tuple)
        mask = [mo.get_scenario_string(x) for x in reduced_input]
        temp_df = sum_df[[output_variable]].loc[mask]
        expected_xij = temp_df[output_variable].mean()
        expectations_list.append(expected_xij)
    variance = np.var(expectations_list) # variance of the given variable/category
    return expectations_list,variance

In [41]:
def get_all_Var_fixed_ij(sum_df,cat_tuple, output_variables = output_vars):
    output_dict={}
    for output_var in output_variables:
        variance_out = get_Exp_fixed_ij(sum_df, output_var, cat_tuple)[1]
        output_dict[output_var] = variance_out
    return output_dict

In [42]:
def second_order_indices(sum_df,output_vars = output_vars):
    categories = ["I", "P", "E"]
    cat_pairs = list(itertools.combinations(categories, 2))
    cat_tuples = list(itertools.combinations(range(3), 2))
    cat_dict = dict(zip(cat_pairs,cat_tuples))
    second_ind_dict={}
    var_Y = list(sum_df.var())
    for cat_pair,cat_tuple in cat_dict.items():
        v_i = [get_variance_fixed_xi(cat_tuple[0],x, sum_df) for x in output_vars]
        v_j = [get_variance_fixed_xi(cat_tuple[1],x, sum_df) for x in output_vars]
        var_cat = list(get_all_Var_fixed_ij(sum_df,cat_tuple).values())
        np_S_ij = (np.array(var_cat)-np.array(v_i)-np.array(v_j))/np.array(var_Y)
        var_cat = list(np_S_ij)
        second_ind_dict[cat_pair]=var_cat
    second_ind_dict = pd.DataFrame.from_dict(second_ind_dict, orient='index',columns=output_vars)
    return second_ind_dict

In [96]:
second_Sij = second_order_indices(df2)
second_Sij

Unnamed: 0,stock_C,stock_E,stock_N,stock_P,stock_S,total_emissions
"(I, P)",0.000557,0.035216,0.000989,0.003023,0.023244,0.000508
"(I, E)",0.006729,0.037781,0.002485,0.004842,0.012958,0.001033
"(P, E)",0.007902,0.004284,0.009361,0.006122,0.084962,0.007984


## Total-order index
The total-order indev $S_{T_i}$, gives us a further interpretation of the interactions. It is given by

$$S_{T_i}=1-\frac{Var_{X_{\sim i}}(E_{X_{i}}(Y|X_{\sim i}))}{Var(Y)}=\frac{E_{X_{\sim i}}(Var_{X_{i}}(Y|X_{\sim i}))}{Var(Y)}$$

This equation computes the variance of the mean of all the terms of any order that do not include factor $X_i$. Therefore, it quantifies the *total effect* of the factor $X_i$ by measuring  all variance caused by its interactions.

In [56]:
# use get_input_fix_two!!
def exp_but_xi(sum_df, category, output_var):
    categories = list(range(3))
    fixed_cats = [x for x in categories if x != category]
    fixed_bools_tuples = get_all_pairs_two_categ(fixed_cats[0],fixed_cats[1])
    # bool_cat_tuple=get_all_pairs_two_categ(cat_tuple[0],cat_tuple[1])
    expectations_list=[]
    for boolean_tuples in fixed_bools_tuples:
        reduced_input = get_input_fix_two(boolean_tuples,fixed_cats)
        mask = [mo.get_scenario_string(x) for x in reduced_input]
        temp_df = sum_df[[output_var]].loc[mask]
        expected_xij = temp_df[output_var].mean()
        expectations_list.append(expected_xij)
    variance = np.var(expectations_list) # variance of the given variable/category
    return variance,expectations_list

In [86]:
def total_order_indices_xi(sum_df, output_vars=output_vars):
    cat_dict = {"ST_I":0, "ST_P":1, "ST_E":2}
    # total_index_dict = {}
    total_index_list = []
    variance_list = []
    for out_var in output_vars:
        var_Y = sum_df[out_var].var()
        variance_lst = [exp_but_xi(sum_df, category, out_var)[0]/var_Y for category in range(3)]
        S_Ti = [1 - v for v in variance_lst]
        total_index_list.append(S_Ti)
    total_index_list = [[item[i] for item in total_index_list] for i in range(3)]
    total_index_dict = dict(zip(list(cat_dict.keys()),total_index_list))
    # df = pd.DataFrame.from_dict(total_index_dict, orient='index',columns=output_vars)
    return total_index_dict

In [97]:
# temp_list = total_order_indices_xi(sum_df=df2)
total_index_dct = total_order_indices_xi(sum_df=df2)
total_ST = pd.DataFrame.from_dict(temp_dict, orient='index',columns=output_vars)
total_ST

Unnamed: 0,stock_C,stock_E,stock_N,stock_P,stock_S,total_emissions
ST_I,0.154036,0.564289,0.520846,0.544966,0.147658,0.009404
ST_P,0.437261,0.194559,0.393472,0.172389,0.390685,0.950466
ST_E,0.428523,0.326969,0.103618,0.301723,0.689096,0.053803
