# SALib application

The following notebook can be used in order to replicate the SALib application developed for **The modifiable areal unit problem in geospatial least-cost electrification modelling** with the DBSCAN algorithm. Please refer **ADD LINK TO MENDELEY** for the final input files used in the analysis. 

The code generates sensitivity measures for given input parameters with regards to a specific output parameter.

## Theory

SALib<sup>1</sup> is an open-source python library containing applications of some of the most common sensitivity analysis methods. For our publication we have used the Delta Moment-Independent Measure (DMIM)<sup>2, 3</sup>. 

## Input

A "summary of summaries" file is needed. For the paper a summary file was generated using the summary files zipped in the **Results_used_in_Salib** folder (the finished input file is also available on this repo at: ).

The summary files includes:<br>
* 6,240 rows representing three countries <br> 
* Column C through P are model inputs <br>
* Column Q through AF are model outputs <br>
* Each country has three specific values that effect our model and are usually not subject to sensitivity analysis (current electrification rate, population in 2030 and national population density). They are the same across all national scenarios (2,080 entries per value). These values are taken from literature and considered "known", but when comparing all three countries they have a certain effect and are therefore included. **Note** these values are not in the file and added in the cells below. <br>

## Pre-processing

If you wish to conduct the sensitivity analysis on your own data you need to first generate the population layers (please refer to the DBSCAN.ipynb for information on how to do so). You also need to run OnSSET for the desired scenarios. Information on how to run OnSSET can be found on the official OnSSET websites and the various resources linked there (http://www.onsset.org/). The OnSSET codes used for this work specifically differs from the official OnSSET codes currently available throught the OnSSET repository, the codes for the paper are available in this repository with documentation of what has changed. 

## Output
The output includes:

* **delta** - DMIM delta value for each input 
* **S1** - First order Sobol index for each input
* **delta_conf** - The confidence intervall for each input's DMIM delta value
* **S1_conf** - The confidence intervall for each input's first order Sobol index
____________________________________________________________________________________________________________________________
<sup>1</sup> Herman, J. & Usher, W. SALib: An open-source Python library for Sensitivity Analysis. Journal of Open Source Software 2, 97 (2017). <br>
<sup>2</sup> Borgonovo, E. (2007). “A new uncertainty importance measure.” Reliability Engineering & System Safety, 92(6):771-784, doi:10.1016/j.ress.2006.04.015. <br>
<sup>3</sup> Plischke, E., E. Borgonovo, and C. L. Smith (2013). “Global sensitivity measures from given data.” European Journal of Operational Research, 226(3):536-550, doi:10.1016/j.ejor.2012.11.047. <br>

## Cell 1 - Importing packages
**Do not edit this cell**

In [None]:
import sys

from SALib.analyze import delta
from SALib.util import read_param_file
import numpy as np
import pandas as pd

## Cell 2 - Preprocessing

Please note that this section has to be ran and is highly specific to country and input data. 

Here we add: <br>
* The electrification rate in the start year for the countries studied (by the iso-2 code included in the summary file)<br>
* The population in the end year (2030) by the iso-2 code<br>

The values entered here may have to be updated if the country is changed or new data becomes available.

After the new columns have been generated the columns are rearranged in order to have all the inputs first and all the outputs last. The dataframe is also shuffled.


In [None]:
df = pd.read_csv("summaries_round_2.csv", sep=';')

df['ElecRate'] = 0.42
df["Pop2030"] = 15672000

df['ElecRate'] = np.where(df['Country'] == 'na', 0.54, df['ElecRate'])
df['ElecRate'] = np.where(df['Country'] == 'mw', 0.18, df['ElecRate'])

df['Pop2030'] = np.where(df['Country'] == 'na', 3010873, df['Pop2030'] )
df['Pop2030'] = np.where(df['Country'] == 'mw', 24849000, df['Pop2030'])

cols = ['EDemand',
 'GridGenCost',
 'PVCost',
 'GridCapInvCost',
 'DiscountRate',
 'LVCost',
 'MVCost',
 'GridLosses',
 'Core',
 'Buffer',
 'Admin',
 'Method',
 'Res',
 'ElecRate',
 'NatPopDens',
 'Pop2030',
 'PopGrid',
 'PopMG',
 'PopSA',
 'AddedCapGrid',
 'AddedCapSA',
 'AddedCapMG',
 'GridInv',
 'SAInv',
 'MGInv',
 'TotInv',
 'Country']

df = df[cols]

df["RatioGridPop"] = 100*(df["PopGrid"]/df['Pop2030'])
df["RatioMGPop"] = 100*(df["PopMG"]/df['Pop2030'])
df["RatioSAPop"] = 100*(df["PopSA"]/df['Pop2030'])

df["RatioGridInv"] = 100*(df["GridInv"]/df['TotInv'])
df["RatioSAInv"] = 100*(df["SAInv"]/df['TotInv'])
df["RatioMGInv"] = 100*(df["MGInv"]/df['TotInv'])

df_shuffled = df.sample(frac=1)

## Cell 3 - Splitting the file into national versions


In [None]:
dfB = df_shuffled.loc[df['Country'] == 'bj']
dfN = df_shuffled.loc[df['Country'] == 'na']
dfM = df_shuffled.loc[df['Country'] == 'mw']
dfB['id'] = np.arange(len(dfB))
dfN['id'] = np.arange(len(dfN))
dfM['id'] = np.arange(len(dfM))

## Cell 4 - Reading param files
SALib requires a parameter file. 

The parameter files are provided in the **Sample input** folder. The files have to include one row for each input paramter and three columns: 1) name of input, 2) minimum value and 3) maximum value

If you use your own data these have to be generated from scratch. 

We are reading four parameter files 1) one for the merged data, 2) one for Benin, 3) one for Namibia and 4) one for Malawi.

In [12]:
problem_merged = read_param_file('params_merged.txt')
problem_bj =read_param_file('params_ben.txt')
problem_na = read_param_file('params_nam.txt')
problem_mw = read_param_file('params_mal.txt')

FileNotFoundError: [Errno 2] No such file or directory: 'params_merged.txt'

## Merged

In [None]:
param_values=df_shuffled.iloc[:, : 16]
np.savetxt(r'model_input_merged.txt', param_values.values)
X_merged = np.loadtxt(r"model_input_merged.txt")

In [None]:
out = df_shuffled["RatioMGPop"]
np.savetxt(r'model_output_merged.txt', out.values)
Y_merged = np.loadtxt(r"model_output_merged.txt")

In [None]:
S_merged = delta.analyze(problem_merged, X_merged, Y_merged, num_resamples=10, conf_level=0.95, print_to_console=True)

## Benin

In [None]:
param_values=dfB.iloc[:, : 14]
np.savetxt(r'model_input_ben.txt', param_values.values)
X_bj = np.loadtxt(r"model_input_ben.txt")

In [None]:
out = dfB["RatioMGPop"]
np.savetxt(r'model_output_ben.txt', out.values)
Y_bj = np.loadtxt(r"model_output_ben.txt")

In [None]:
Si_ben = delta.analyze(problem_bj, X_bj, Y_bj, num_resamples=10, conf_level=0.95, print_to_console=True)

## Malawi

In [None]:
param_values=dfM.iloc[:, : 14]
np.savetxt(r'model_input_mal.txt', param_values.values)
X_mw = np.loadtxt(r"model_input_mal.txt")

In [None]:
out = dfM["LCOE"]
np.savetxt(r'model_output_mal.txt', out.values)
Y_mw = np.loadtxt(r"model_output_mal.txt")

In [None]:
Si_mal = delta.analyze(problem_mw, X_mw, Y_mw, num_resamples=10, conf_level=0.95, print_to_console=True)

## Namibia

In [None]:
param_values=dfN.iloc[:, : 14]
np.savetxt(r'model_input_nam.txt', param_values.values)
X_na = np.loadtxt(r"model_input_nam.txt")

In [None]:
out = dfN["LCOE"]
np.savetxt(r'model_output_nam.txt', out.values)
Y_na = np.loadtxt(r"model_output_nam.txt")

In [None]:
Si_nam = delta.analyze(problem_na, X_na, Y_na, num_resamples=10, conf_level=0.95, print_to_console=True)