# Sensitivity Analysis

We are going to use only three variables $I,\, P$ and $E$, each of them is a valid boolean string such that together they form a valid scenario (taking into account no more than 2 ones in each variable).

In [1]:
import pandas as pd
import sensitivity_analysis as sa

## Creating input space
- $I$ and $P$ are boolean arrays of size three so that they have no 2 consecutive ones.
- $E$ is a boolean array of size 4. 

In total, there are $7*7*11=539$ possible scenarios.

In [2]:
num_options_per_category = {
    'investment' : 3,
    'policy' : 3,
    'event' : 4
}

In [3]:
input_space = sa.generate_input_space_bool(num_options_per_category)
input_space

[((0, 0, 0), (0, 0, 0), (0, 0, 0, 0)),
 ((0, 0, 0), (0, 0, 0), (0, 0, 0, 1)),
 ((0, 0, 0), (0, 0, 0), (0, 0, 1, 0)),
 ((0, 0, 0), (0, 0, 0), (0, 0, 1, 1)),
 ((0, 0, 0), (0, 0, 0), (0, 1, 0, 0)),
 ((0, 0, 0), (0, 0, 0), (0, 1, 0, 1)),
 ((0, 0, 0), (0, 0, 0), (0, 1, 1, 0)),
 ((0, 0, 0), (0, 0, 0), (1, 0, 0, 0)),
 ((0, 0, 0), (0, 0, 0), (1, 0, 0, 1)),
 ((0, 0, 0), (0, 0, 0), (1, 0, 1, 0)),
 ((0, 0, 0), (0, 0, 0), (1, 1, 0, 0)),
 ((0, 0, 0), (0, 0, 1), (0, 0, 0, 0)),
 ((0, 0, 0), (0, 0, 1), (0, 0, 0, 1)),
 ((0, 0, 0), (0, 0, 1), (0, 0, 1, 0)),
 ((0, 0, 0), (0, 0, 1), (0, 0, 1, 1)),
 ((0, 0, 0), (0, 0, 1), (0, 1, 0, 0)),
 ((0, 0, 0), (0, 0, 1), (0, 1, 0, 1)),
 ((0, 0, 0), (0, 0, 1), (0, 1, 1, 0)),
 ((0, 0, 0), (0, 0, 1), (1, 0, 0, 0)),
 ((0, 0, 0), (0, 0, 1), (1, 0, 0, 1)),
 ((0, 0, 0), (0, 0, 1), (1, 0, 1, 0)),
 ((0, 0, 0), (0, 0, 1), (1, 1, 0, 0)),
 ((0, 0, 0), (0, 1, 0), (0, 0, 0, 0)),
 ((0, 0, 0), (0, 1, 0), (0, 0, 0, 1)),
 ((0, 0, 0), (0, 1, 0), (0, 0, 1, 0)),
 ((0, 0, 0), (0, 1, 0), (

### Input space maps (bool <-> decimal)

In [4]:
bool_to_dec_dicts = sa.input_space_bool_to_decimal_dicts(num_options_per_category)
bool_to_dec_dicts
bool_to_dec_dicts

{'investment': {(0, 0, 0): 0.0,
  (0, 0, 1): 0.16666666666666666,
  (0, 1, 0): 0.3333333333333333,
  (0, 1, 1): 0.5,
  (1, 0, 0): 0.6666666666666666,
  (1, 0, 1): 0.8333333333333334,
  (1, 1, 0): 1.0},
 'policy': {(0, 0, 0): 0.0,
  (0, 0, 1): 0.16666666666666666,
  (0, 1, 0): 0.3333333333333333,
  (0, 1, 1): 0.5,
  (1, 0, 0): 0.6666666666666666,
  (1, 0, 1): 0.8333333333333334,
  (1, 1, 0): 1.0},
 'event': {(0, 0, 0, 0): 0.0,
  (0, 0, 0, 1): 0.1,
  (0, 0, 1, 0): 0.2,
  (0, 0, 1, 1): 0.3,
  (0, 1, 0, 0): 0.4,
  (0, 1, 0, 1): 0.5,
  (0, 1, 1, 0): 0.6,
  (1, 0, 0, 0): 0.7,
  (1, 0, 0, 1): 0.8,
  (1, 0, 1, 0): 0.9,
  (1, 1, 0, 0): 1.0}}

In [5]:
dec_to_bool_dicts = sa.input_space_decimal_to_bool_dicts(num_options_per_category)
# dec_to_bool_dicts[]

### Variance decomposition
We see this model as a function $f(X)=Y$, where the inputs are the scenarios, so $X\in\mathbb{R}^d$ and the output is a real value, that in our case, can be the emissions or the mobility choices. First, we are going to do an analysis on (overall) emissions and when we implement it, we can use the othe outputs.

The idea consists in estimating certain variances that we are going to define later in order to compute some sesitivity indices (there are first and second order sensitivity indices). We write
$$Var(Y)=\sum_{i=1}^dV_i+\sum_{i<j}^dV_{ij}+\ldots+V_{1,2, \ldots,d}$$where
$$V_i=Var_{x_i}(E_{X_{\sim i}}(Y|X))$$where the $X_{\sim i}$ notation means the set of all variables except $x_i$.

### First Order Index
$$S_i= \frac{V_i}{Var(Y)}$$We can interpret it as follows: "the fractional reduction in the variance of $Y$ which would be obtained on average if $X$ could be fixed".


## Estimating
To compute the variance, we can use the following estimator: $$Var_{x_i}(E_{X_{\sim i}}(Y|X))\approx \frac{1}{N}\sum_{i=1}^Nf(B)_j\left(f(A_B^i)_j-f(A)_j\right)$$

First, get outputs and then compute the variance

## Run Sensitivity Analysis Experiment
Multiple runs of computing the sensitivitiy indices for each category with a given number of samples.
Compute the mean and the 95% confidence interval for each category

In [6]:
num_samples = 100

In [7]:
num_runs = 1000

### Total Emissions

In [9]:
total_emissions_df = pd.read_csv("total_emissions.csv")
# df = pd.read_csv("all_metrics.csv")

In [None]:
sa_result_emissions = sa.sensitivity_analysis_experiment(num_samples, num_runs, total_emissions_df, 'total_emissions')

In [None]:
def create_results_dataframe(results):
    df_all_categories = []
    for category_str, stats in results.items():
        df_category = pd.DataFrame({'category': [category_str],
                                    'S1' : [stats['S1']],
                                    'S1_conf' : [stats['S1_conf']]})
        df_all_categories.append(df_category)
    df_SA = pd.concat(df_all_categories, ignore_index=True)
    return df_SA

In [None]:
df_SA_ex1 = create_results_dataframe(sa_result_emissions)
df_SA_ex1

### Stock_C

#### TODO: generate CSV that has a column for each metric that we want to use
e.g. 'total_emissions', 'stock_C', 'stock_E', 'stock_N', 'stock_P', & 'stock_s'

In [None]:
df = pd.read_csv("data.csv") # replace with generated CSV (see comment above)
sa_result_stock_C = sa.sensitivity_analysis_experiment(num_runs, df, 'stock_C', num_samples)

### Stock_E

In [None]:
sa_result_stock_E = sa.sensitivity_analysis_experiment(num_runs, df, 'stock_E', num_samples)

### Stock_N

In [None]:
sa_result_stock_N = sa.sensitivity_analysis_experiment(num_runs, df, 'stock_N', num_samples)

### Stock_P

In [None]:
sa_result_stock_P = sa.sensitivity_analysis_experiment(num_runs, df, 'stock_P', num_samples)

### Stock_S

In [None]:
sa_result_stock_E = sa.sensitivity_analysis_experiment(num_runs, df, 'stock_E', num_samples)