# Variance-sensitivity analysis: individual options

We have seen that we made an analysis for each of the categories, however, the results do not suggest how relevant the individual options inside the categories are. The main reasin we did not perform this analysis individually is that the input space of the options (that is, 10 different ON/OFF variables) is not mutually independent, since the ocurrence of one option in certain category is subject to the condition that there are no more than two ones/ON in it. Although the indices can be computed, we are not "allowed" to do so, since one of the requirements for decomposing the varaince as we showed before, is that the input variables are uniformly distributed and mutually independent (which is not our case).

But we decided to implement the indices over tvhe individual options and we made our own interpretation. We still need to prove that what we are going to see here matches the theory, but that is a work for the future.

Therefore, our input $X\in\{0,1\}^10$.

## Describing the options
The following table is a dictionary of each of the options that we are taking now as our variables. For further details, check out the [data-explainer document](https://github.com/MoniSoto/MoTMo/blob/main/MoTMo-scenarioDataExplainer.pdf).

<style type="text/css">
.tg  {border-collapse:collapse;border-spacing:0;}
.tg td{border-color:black;border-style:solid;border-width:1px;font-family:Arial, sans-serif;font-size:14px;
  overflow:hidden;padding:10px 5px;word-break:normal;}
.tg th{border-color:black;border-style:solid;border-width:1px;font-family:Arial, sans-serif;font-size:14px;
  font-weight:normal;overflow:hidden;padding:10px 5px;word-break:normal;}
.tg .tg-lboi{border-color:inherit;text-align:left;vertical-align:middle}
.tg .tg-c3ow{border-color:inherit;text-align:center;vertical-align:top}
.tg .tg-fymr{border-color:inherit;font-weight:bold;text-align:left;vertical-align:top}
.tg .tg-0pky{border-color:inherit;text-align:left;vertical-align:top}
<style type="text/css">
.tg  {border-collapse:collapse;border-spacing:0;}
.tg td{border-color:black;border-style:solid;border-width:1px;font-family:Arial, sans-serif;font-size:14px;
  overflow:hidden;padding:10px 5px;word-break:normal;}
.tg th{border-color:black;border-style:solid;border-width:1px;font-family:Arial, sans-serif;font-size:14px;
  font-weight:normal;overflow:hidden;padding:10px 5px;word-break:normal;}
.tg .tg-lboi{border-color:inherit;text-align:left;vertical-align:middle}
.tg .tg-baqh{text-align:center;vertical-align:top}
.tg .tg-c3ow{border-color:inherit;text-align:center;vertical-align:top}
.tg .tg-fymr{border-color:inherit;font-weight:bold;text-align:left;vertical-align:top}
.tg .tg-0lax{text-align:left;vertical-align:top}
.tg .tg-0pky{border-color:inherit;text-align:left;vertical-align:top}
</style>
<table class="tg">
<thead>
  <tr>
    <th class="tg-fymr">Category</th>
    <th class="tg-fymr">Option</th>
    <th class="tg-fymr">Label</th>
    <th class="tg-0lax"><span style="font-weight:bold">Index</span></th>
  </tr>
</thead>
<tbody>
  <tr>
    <td class="tg-lboi" rowspan="3"><span style="font-weight:bold">INVESTMENT</span></td>
    <td class="tg-0pky">Charging infrastructure</td>
    <td class="tg-c3ow">CH</td>
    <td class="tg-baqh">0</td>
  </tr>
  <tr>
    <td class="tg-0pky">Public transport subsidy</td>
    <td class="tg-c3ow">SP</td>
    <td class="tg-baqh">1</td>
  </tr>
  <tr>
    <td class="tg-0pky">Electric vehicle subsidy</td>
    <td class="tg-c3ow">SE</td>
    <td class="tg-baqh">2</td>
  </tr>
  <tr>
    <td class="tg-lboi" rowspan="3"><span style="font-weight:bold">POLICY</span></td>
    <td class="tg-0pky">Car weight regulation</td>
    <td class="tg-c3ow">WE</td>
    <td class="tg-baqh">3</td>
  </tr>
  <tr>
    <td class="tg-0pky">Bike friendliness</td>
    <td class="tg-c3ow">BP</td>
    <td class="tg-baqh">4</td>
  </tr>
  <tr>
    <td class="tg-0pky">Urban combustion restriction</td>
    <td class="tg-c3ow">RE</td>
    <td class="tg-baqh">5</td>
  </tr>
  <tr>
    <td class="tg-lboi" rowspan="4"><span style="font-weight:bold">EVENT</span></td>
    <td class="tg-0pky">Higher gas price</td>
    <td class="tg-c3ow">CO</td>
    <td class="tg-baqh">6</td>
  </tr>
  <tr>
    <td class="tg-0pky">Intermodal digitalisation</td>
    <td class="tg-c3ow">DI</td>
    <td class="tg-baqh">7</td>
  </tr>
  <tr>
    <td class="tg-0pky">EV world market tour</td>
    <td class="tg-c3ow">WO</td>
    <td class="tg-baqh">8</td>
  </tr>
  <tr>
    <td class="tg-0pky">Increased car sharing availability</td>
    <td class="tg-c3ow">CS</td>
    <td class="tg-baqh">9</td>
  </tr>
</tbody>
</table>


In [2]:
import numpy as np
import random
import itertools
import pandas as pd

import MoTmo as mo

In [8]:
# some global variables
num_options_per_category = {
    'investment' : 3,
    'policy' : 3,
    'event' : 4
}

output_vars = ['stock_C','stock_E','stock_N','stock_P','stock_S','total_emissions']
categories_dict = {"CH":0,"SP":0,"SE":0,"WE":1,\
                  "BP":1,"RE":1,"CO":2,"DI":2,"WO":2, "CS":2}

df2 = pd.read_csv('total_sums_all_variables.csv',index_col=0)
df2 # total sum of each of the variables of each of the scenarios.

Unnamed: 0,stock_C,stock_E,stock_N,stock_P,stock_S,total_emissions
CH0SP0SE0WE0BP0RE0CO0DI0WO0CS0,110316117.0,1580696.0,12936938.0,13483558.0,1231243.0,1.409588e+12
CH0SP0SE0WE0BP0RE0CO0DI0WO0CS1,109870629.0,1617640.0,12918444.0,13312396.0,1829443.0,1.409944e+12
CH0SP0SE0WE0BP0RE0CO0DI0WO1CS0,109895224.0,3173387.0,12438540.0,13082043.0,959358.0,1.408022e+12
CH0SP0SE0WE0BP0RE0CO0DI0WO1CS1,109450141.0,2962202.0,12396586.0,12792877.0,1946746.0,1.410312e+12
CH0SP0SE0WE0BP0RE0CO0DI1WO0CS0,106228779.0,2143874.0,11804095.0,18080633.0,1291171.0,1.385397e+12
...,...,...,...,...,...,...
CH1SP1SE0WE1BP1RE0CO0DI1WO1CS0,102048760.0,4453709.0,10047799.0,22287564.0,710720.0,1.217163e+12
CH1SP1SE0WE1BP1RE0CO1DI0WO0CS0,105775528.0,1929494.0,12405545.0,18657163.0,780822.0,1.241862e+12
CH1SP1SE0WE1BP1RE0CO1DI0WO0CS1,105652304.0,1725845.0,12129573.0,18616825.0,1424005.0,1.245044e+12
CH1SP1SE0WE1BP1RE0CO1DI0WO1CS0,104806517.0,4241219.0,11996635.0,17904720.0,599461.0,1.237094e+12


## First-order indices
This index is given by $$S_i= \frac{V_i}{Var(Y)}$$where $V_i=Var_{x_i}(E_{X_{\sim i}}(Y|X))$. Notice that this form gives us a direct (possible) interpretation: "it is the fractional reduction in the variance of $Y$ which would be obtained on average if $X$ could be fixed".

In other words, the expected value $E_{X_{\sim i}}(Y|X)$ computes the mean of all inputs while fixing $X_i$, so the following function `get_input_fix_one` gives us all the input space resulting of fixing an input (`boolean_tuple`) of certain category (can take values 0, 1 or 2, representing investment, policies or events).

In [3]:
def get_category_of_option(option):
    # 'option' is any integer in range(10) representing the
    # option index taken. Since our inputs are separated by tuples (categories)
    # this function returns the category to which it belongs and
    # the index within this category.
    if option >= 0 and option <= 2:
        cat = 0
        op = option % 3
    elif option>2 and option<=5:
        cat = 1
        op = option % 3
    elif option > 5 and option <= 9:
        cat = 2
        op = option % 6
    else:
        raise Exception("ERROR: 'option' must be between 0 and 9")
    return cat, op

In [7]:
# Uncomment to see eaxmple revious function
get_category_of_option(5)

(1, 2)

Now we need the set of all inputs by fixing a value of an specific function. For example, if we choose option 5 (which corresponds to the option Urban combustion restriction - RE) with a value of 0, we need to retrieve all the input space in which this option is turned OFF. This is what the following function does.

In [4]:
def get_input_fix_one_option(is_on, option):
    # 'is_on' is 0 or 1 (OFF or ON);
    # 'option' is any number between 0 and 9 (indices of the options) 
    vals_is_on = [0,1]
    if is_on not in vals_is_on:
        raise Exception("ERROR: 'is_on' must be either 0 or 1!!")
    else:
        input_space = mo.generate_input_space_bool(num_options_per_category)
        category,option = get_category_of_option(option)
        reduced_input = [x for x in input_space if x[category][option]== is_on]
        return reduced_input

In [6]:
# uncomment to see an example of this input space
# get_input_fix_one_option(is_on=0, option=5)

### Variances
The following function computes $V_i$, for a given option $i\in\{0,1,\ldots,9\}$, and an specific output variable (it can be `total_emissions` or any of the mobility choices *public transport* (`stock_P`), *electric cars* (`stock_E`), *combustion cars* (`stock_C`), *non-motorized* (`stock_N`) and *shared vehicles* (`stock_S`)).

Recall: $V_i=Var_{x_i}(E_{X_{\sim i}}(Y|X))$

In [9]:
def get_variance_fixed_xi(option,output_variable, sum_df):
    # computes V_xi of the desired output variable
    expectations = []
    
    for is_on in range(0,2):
        reduced_boolean_input = get_input_fix_one_option(is_on,option)
        mask = [mo.get_scenario_string(x) for x in reduced_boolean_input]
        temp_df = sum_df[[output_variable]].loc[mask]
        expected_xi = temp_df[output_variable].mean()
        expectations.append(expected_xi)
    var_xi = np.var(expectations)
    return var_xi

In [10]:
# example for variable of mobility choice of combustion cars, and policies category (1)
get_variance_fixed_xi(option=0,output_variable="stock_E", sum_df=df2)

1350077436712.7993

In [13]:
def first_order_indices(sum_df, output_vars=output_vars):
    index_dict = {"S_CH":[],"S_SP":[],"S_SE":[],"S_WE":[],\
                  "S_BP":[],"S_RE":[],"S_CO":[],"S_DI":[],"S_WO":[], "S_CS":[]}
    for out_var in output_vars:
        var_Y = sum_df[out_var].var()
        ind_list = []
        for option in range(0,10):
            v_xi = get_variance_fixed_xi(option,out_var,sum_df)
            S_i = v_xi / var_Y
            ind_list.append(S_i)
            option_string = list(categories_dict.keys())[option]
            index_dict["S_"+option_string].append(S_i)
    df = pd.DataFrame.from_dict(index_dict, orient='index',columns=output_vars)
    return df

In [14]:
first_Si = first_order_indices(sum_df=df2)
first_Si

Unnamed: 0,stock_C,stock_E,stock_N,stock_P,stock_S,total_emissions
S_CH,0.00425,0.362661,0.004648,0.03741,0.009164,0.000517
S_SP,0.130453,0.164878,0.518079,0.540048,0.040419,0.004633
S_SE,0.00467,0.017744,0.004741,0.016618,0.001882,0.000585
S_WE,0.036342,0.017142,0.006265,0.000705,0.033877,0.909756
S_BP,0.007951,0.016392,0.384052,0.077115,0.040461,0.008095
S_RE,0.427839,0.141627,0.028125,0.113267,0.193846,0.000825
S_CO,0.020548,0.003635,0.005907,0.002511,0.012938,0.002109
S_DI,0.36955,0.002035,0.086154,0.283067,0.006562,0.038261
S_WO,0.006716,0.244843,0.002047,0.037777,0.028683,0.000907
S_CS,0.022793,0.042634,0.003294,0.015948,0.56625,0.005732


## Second-order indices
The second-order indices measure the interaction of pairs of inputs:
$$S_{ij}=\frac{V_{ij}}{Var(Y)}$$where 
$$V_{ij}=Var_{X_{ij}}(E_{X_{\sim ij}}(Y|X_i,X_j))-V_i-V_j$$


In [15]:
def get_input_fix_two(values, options):
    # values is a pair of 0 or 1 (on or off) and
    # options is also a pair such that each value
    # is between 0 and 9 (represent the options)
    if options[0]==options[1]:
        raise Exception('Options must be different!')
    cat1, opt1 = get_category_of_option(options[0])
    cat2, opt2 = get_category_of_option(options[1])
    reduced_space1 = get_input_fix_one_option(values[0],opt1)
    reduced_space2 = [x for x in reduced_space1 if x[cat2][opt2]== values[1]]
    return reduced_space2

In [16]:
def get_Exp_fixed_ij(sum_df, output_variable, opt_tuple):
    bool_opt_pairs=list(itertools.product([0,1], repeat=2))
    expectations_list=[]
    for bool_vals in bool_opt_pairs:
        reduced_input = get_input_fix_two(bool_vals,opt_tuple)
        mask = [mo.get_scenario_string(x) for x in reduced_input]
        temp_df = sum_df[[output_variable]].loc[mask]
        expected_xij = temp_df[output_variable].mean()
        expectations_list.append(expected_xij)
    variance = np.var(expectations_list) # variance of the given variable/category
    return expectations_list,variance

In [17]:
def get_all_Var_fixed_ij(sum_df,opt_tuple, output_variables = output_vars):
    output_dict={}
    for output_var in output_variables:
        variance_out = get_Exp_fixed_ij(sum_df, output_var, opt_tuple)[1]
        output_dict[output_var] = variance_out
    return output_dict

In [18]:
def second_order_indices(sum_df,output_vars = output_vars):
    options_str = list(categories_dict.keys())
    op_pairs = list(itertools.combinations(options_str, 2))
    # op_tuples = list(itertools.combinations(range(10), 2))
    # cat_dict = dict(zip(cat_pairs,cat_tuples))
    # print(op_pairs)
    second_ind_dict={}
    var_Y = list(sum_df.var())
    for pair in op_pairs:
        o_i = options_str.index(pair[0])
        o_j = options_str.index(pair[1])
        v_i = [get_variance_fixed_xi(o_i,x, sum_df) for x in output_vars]
        v_j = [get_variance_fixed_xi(o_j,x, sum_df) for x in output_vars]
        var_cat = list(get_all_Var_fixed_ij(sum_df,opt_tuple=(o_i,o_j)).values())
        np_S_ij = (np.array(var_cat)-np.array(v_i)-np.array(v_j))/np.array(var_Y)
        var_cat = list(np_S_ij)
        second_ind_dict[pair]=var_cat
    second_ind_dict = pd.DataFrame.from_dict(second_ind_dict, orient='index',columns=output_vars)
    return second_ind_dict

In [19]:
# this might take a few seconds
second_Sij = second_order_indices(df2)
second_Sij

Unnamed: 0,stock_C,stock_E,stock_N,stock_P,stock_S,total_emissions
"(CH, SP)",0.023871,-0.123461,0.002244,-0.048716,0.013733,0.001357
"(CH, SE)",0.004945,0.155473,0.04442,0.086271,-0.00055,-3e-06
"(CH, WE)",0.000543,-0.004311,1.7e-05,-0.000518,-0.001254,-0.001163
"(CH, BP)",-6e-06,-0.003964,-0.000933,-0.002164,2.7e-05,-1.4e-05
"(CH, RE)",0.003031,0.081323,0.000268,0.000235,-0.002162,4.9e-05
"(CH, CO)",0.00074,0.014405,-0.00014,0.000555,0.000697,3.5e-05
"(CH, DI)",-0.004139,0.012739,-0.000298,-0.00034,-0.001222,-0.000612
"(CH, WO)",0.003567,0.097189,-4.1e-05,0.001909,-0.000426,0.000328
"(CH, CS)",0.000553,-0.015041,0.000306,-0.002247,-0.004195,2.8e-05
"(SP, SE)",-0.008989,0.027183,0.002328,-0.011799,-0.003648,-0.000788


## Total-order index
The total-order indev $S_{T_i}$, gives us a further interpretation of the interactions. It is given by

$$S_{T_i}=1-\frac{Var_{X_{\sim i}}(E_{X_{i}}(Y|X_{\sim i}))}{Var(Y)}=\frac{E_{X_{\sim i}}(Var_{X_{i}}(Y|X_{\sim i}))}{Var(Y)}$$

This equation computes the variance of the mean of all the terms of any order that do not include factor $X_i$. Therefore, it quantifies the *total effect* of the factor $X_i$ by measuring  all variance caused by its interactions.

The following function, `exp_but_xi` computes $E_{X_{i}}(Y|X_{\sim i})$.

In [20]:
# use get_input_fix_two!!
def exp_but_xi(sum_df, option, output_var):
    input_space = mo.generate_input_space_bool(num_options_per_category)
    # input_sp0 is the input space when the given option is OFF (0)
    input_sp0 = get_input_fix_one_option(0,option)
    expectations_list=[]
    for inp_bool in input_sp0:
        cat, op = get_category_of_option(option)
        tuple_cat_option = inp_bool[cat]
        invert_tuple_op = [tuple_cat_option[i] if i!=op else 1 \
                           for i in range(len(tuple_cat_option))]
        inp_bool2 = tuple([inp_bool[i] if i!=cat \
                           else tuple(invert_tuple_op) for i in range(3)])
        if inp_bool2 in input_space:
            reduced_input = [inp_bool,inp_bool2]
        else:
            reduced_input = [inp_bool]
        mask = [mo.get_scenario_string(x) for x in reduced_input]
        temp_df = sum_df[[output_var]].loc[mask]
        expected_xij = temp_df[output_var].mean()
        expectations_list.append(expected_xij)
    variance = np.var(expectations_list) # variance of the means of the given variable/option
    return variance,expectations_list

In [21]:
def total_order_indices_xi(sum_df, output_vars=output_vars):
    cat_dict = categories_dict
    # total_index_dict = {}
    total_index_list = []
    variance_list = []
    for out_var in output_vars:
        var_Y = sum_df[out_var].var()
        variance_lst = [exp_but_xi(sum_df, option, out_var)[0]/var_Y for option in range(10)]
        S_Ti = [1 - v for v in variance_lst]
        total_index_list.append(S_Ti)
    total_index_list = [[item[i] for item in total_index_list] for i in range(10)]
    total_index_dict = dict(zip(list(cat_dict.keys()),total_index_list))
    # df = pd.DataFrame.from_dict(total_index_dict, orient='index',columns=output_vars)
    return total_index_dict

In [22]:
# this might take a few seconds
total_index_dct = total_order_indices_xi(sum_df=df2)
total_ST = pd.DataFrame.from_dict(total_index_dct, orient='index',columns=output_vars)
total_ST

Unnamed: 0,stock_C,stock_E,stock_N,stock_P,stock_S,total_emissions
CH,0.011368,0.370332,-0.004088,-0.022949,0.064896,0.003727
SP,0.143388,-0.109842,0.462588,0.41792,0.093288,0.006966
SE,-0.027332,0.064727,-0.000282,0.002439,0.064198,0.005993
WE,-0.030094,-0.029914,-0.012231,-0.003406,-0.010415,0.86681
BP,0.002378,-0.010919,0.310333,0.021766,0.088021,-0.040715
RE,0.347019,0.157609,-0.023314,0.080273,0.235909,0.026722
CO,0.039716,-0.004643,0.025471,0.005306,0.013687,-0.006311
DI,0.368878,-0.031948,0.072524,0.222789,-0.089141,0.047072
WO,-0.047191,0.266794,0.014516,-0.045872,-0.052945,0.020869
CS,-0.058555,-0.127523,0.017825,-0.048027,0.539941,0.022708


# Little summary of results
For simplicity, we will focus mainly on first and total-order indices, and only interpretations of some of the output variables:

For the variable **combustion cars** `stock_C`, both the 1st order and total-order indices tell us that the options that influence the most are `RE` (Urban combustion restriction) and `DI` (Intermodal digitalization). These indices do not tell us whether agents are choosing more or less these vehicles; empirical results showed that when turning ON any of these two options, not only this mobility choice decreases, but it does it faster than any other option. We notice that these results are not surprising as these two options make life "harder" for owning a combustion car.

In the same fashion, we can see that for the choice of **electric vehicles**, the most influential options are `CH` (Charging infrastructure) and `WO` (Electric vehicle world market tour). We see that these two options make total sense if the goal for the stakeholders is to increase the use of these types of vehicles.

Finally, let us take a look at **total emissions**: we have an absolute winner, which is `WE` (Car weight regulation). This option is clearly the most influential in terms of total CO" emissions produced, since the cars' weight and size are proportionally related to emissions effects.

We have an hypothesis about the negative indices, and we believe that we can read them the same as the positive ones, but in an inverted way: for example, for `stock_C`, the most negative option is `CS` (Increased char sharing availability).  While `RE` and `DI` are ON, the combustion choices decline (way) faster tahn when they are OFF; whereas when `CS` is  
