# Construction of Conditional Probability Tables for the BN Model
This notebook constructs conditional probability tables for the Bayesian Network class inference model

In [409]:
import os, glob
import pandas as pd
import numpy as np
from support import data_dir

fileimport = glob.glob(os.path.join(data_dir, 'BN','*.txt'))
data = {}
names = [(f.rpartition('/')[2]).partition('.')[0] for f in fileimport]
for n, f in zip(names, fileimport):
    print(n)
    data[n] = pd.read_table(f, index_col=0)

AMPS2013bErvenElectrificationOffset
LSMmakeupHighDetail
LSMmakeupAssumptions
HHtoIncomeByLSM


## Customer class marginal probability distribution
The likelihood that an electrified customer belongs to a certain customer class is calculated by multiplying the probability that a LSM category is represented in a class with the probability that a household in that LSM category is electrified. This can be represented as the formula:
```
P(class|LSM) x P(LSM|electrified household) = P(class|electrified household)

```

### P ( LSM | electrified household )
The `AMPS2013bErvenElectrificationOffset` table has been obtained from the _Domestic Load Research Process Review 2015_ and is derived from data in the _AMPS 2013b Living Standard Measure survey_.

The Electrification Offset quantifies the likelihood that a household in a LSM category has been electrified. It is a range between 0 and 1, where 0 means noone is electrified and 1 means everyone is electrified.

In [410]:
tbl1 = data['AMPS2013bErvenElectrificationOffset']
tbl1['ElectrifiedDwellings'] = tbl1['EstErven']*tbl1['ElectrificationOffset']
tbl1['P_LSM|electrified'] = tbl1['ElectrifiedDwellings']/sum(tbl1['ElectrifiedDwellings'])
tbl1

Unnamed: 0_level_0,EstErven,ElectrificationOffset,ElectrifiedDwellings,P_LSM|electrified
LSM,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1,92741,0.3,27822.3,0.004554
2,195786,0.42,82230.12,0.01346
3,374441,0.73,273341.93,0.044744
4,1106303,0.93,1028861.79,0.168417
5,821354,0.98,804926.92,0.131761
6,765524,0.99,757868.76,0.124058
7,1361603,1.0,1361603.0,0.222884
8,798807,1.0,798807.0,0.130759
9,673994,1.0,673994.0,0.110328
10,299553,1.0,299553.0,0.049035


### P ( class | electrified household )
The distribution of LSM categories over the DLR customer classes `P (class | LSM)` has been approximated from customer class definitions in the Geo-based Load Forecast Appendix A.

In [411]:
tbl2 = data['LSMmakeupAssumptions']
t2 = tbl2.iloc[:, 0:10]
t2.columns = range(1,11)
tbl2['P_class|electrified'] = t2.dot(tbl1['P_LSM|electrified']).values
tbl2

Unnamed: 0_level_0,LSM 1,LSM 2,LSM 3,LSM 4,LSM 5,LSM 6,LSM 7,LSM 8,LSM 9,LSM 10,Assumption,P_class|electrified
class,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
rural,0.6,0.4,0,0,0,0,0.0,0.0,0.0,0.0,assume 40% of LSM 1&2 living in rural scattered,0.008117
village,0.4,0.6,0,0,0,0,0.0,0.0,0.0,0.0,assume 60% of LSM 1&2 living in rural scattered,0.009898
informal settlement,0.0,0.0,1,1,0,0,0.0,0.0,0.0,0.0,,0.213161
township,0.0,0.0,0,0,1,1,0.0,0.0,0.0,0.0,,0.255818
urban residential 7,0.0,0.0,0,0,0,0,0.6,0.0,0.0,0.0,assume 60% of LSM 7,0.133731
urban townhouse 7&8,0.0,0.0,0,0,0,0,0.4,0.5,0.0,0.0,assume 40% of LSM 7 & 50% of LSM 8,0.154533
urban residential 8&9,0.0,0.0,0,0,0,0,0.0,0.5,0.5,0.0,assume 50% of LSM 8&9,0.120543
urban townhouse 9&10,0.0,0.0,0,0,0,0,0.0,0.0,0.5,0.5,assume 50% of LSM 9&10,0.079681
urban estate,0.0,0.0,0,0,0,0,0.0,0.0,0.0,0.5,assume 50% LSM 10,0.024517


## Derivation of monthly income by customer class conditional probability distribution
The likelihood that an electrified customer belongs to a certain customer class is calculated by multiplying the probability that a LSM category is represented in a class with the probability that a household in that LSM category is electrified. This can be represented as the formula:
```
P(income|LSM) x P(LSM|class) = P(income|class)

```

### P ( income | LSM )

#### Number of households per income range per LSM
The number of households per income range per LSM has been approximated from Table 3 in the Geo-based Load Forecast, which is based on data from the AMPS 2010b Living Standard Measure Survey.

In [412]:
data['HHtoIncomeByLSM'].head()

Unnamed: 0_level_0,max income,lsm7low,lsm7high,lsm8low,lsm8high,lsm9low,lsm9high,lsm10low,lsm10high
min income,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
0,499,1182,705,645,470,35,477,215,0
500,599,2307,858,0,197,0,0,0,0
600,699,1234,752,0,0,0,0,0,0
700,799,1039,293,611,344,150,352,0,0
800,899,444,482,266,1076,564,0,0,0


#### Number of households per DLR compatible income bin per LSM

In [413]:
tbl3 = data['HHtoIncomeByLSM'].iloc[:,1:9]

count = [100/(tbl3.index[i+1]-tbl3.index[i]) for i in range(0, len(tbl3)-1)]+[100/(240700 - 50000)]
t3 = tbl3.multiply(count, axis = 0)

ix = np.arange(0, 240800, 100)
bins = [0, 1800, 3200, 7800, 11600, 19116, 24500, 65500, 240700]
tbl3x = t3.reindex(ix, method = 'ffill')
tbl3_binned = tbl3x.groupby(pd.cut(tbl3x.index,bins)).sum()
tbl3_binned

Unnamed: 0,lsm7low,lsm7high,lsm8low,lsm8high,lsm9low,lsm9high,lsm10low,lsm10high
"(0, 1800]",38858.85,30827.25,13914.25,8510.25,4709.75,5159.1,436.0,864.75
"(1800, 3200]",63866.55,34037.75,25986.95,12697.15,8988.35,6771.5,2264.0,1686.05
"(3200, 7800]",238428.2,225564.1,129699.7,107037.7,87627.2,47350.0,20245.5,6256.7
"(7800, 11600]",191884.3,187032.1,147215.3,126917.3,131591.5,112744.9,52059.6,22336.2
"(11600, 19116]",115805.9,133593.8,126419.8,141928.6,160740.4,146641.3,80955.5,50421.3
"(19116, 24500]",25403.48,42954.04,66228.0,59934.76,71996.76,85446.72,66943.56,51820.0
"(24500, 65500]",15529.337829,32321.574578,31542.57892,74723.937955,114935.653529,143351.479056,146132.109009,187405.798112
"(65500, 240700]",1190.661772,231.517567,1354.194022,5387.377032,13146.890404,19332.635553,37030.867331,57937.271106


In [414]:
tbl3_totals = tbl3_binned.sum(axis=0)
Pincome_lsm = tbl3_binned/tbl3_totals
Pincome_lsm

Unnamed: 0,lsm7low,lsm7high,lsm8low,lsm8high,lsm9low,lsm9high,lsm10low,lsm10high
"(0, 1800]",0.056238,0.044901,0.025655,0.015844,0.007932,0.009102,0.001074,0.002283
"(1800, 3200]",0.092431,0.049577,0.047915,0.023639,0.015139,0.011947,0.005575,0.004452
"(3200, 7800]",0.345064,0.328541,0.239139,0.199274,0.147586,0.08354,0.049858,0.01652
"(7800, 11600]",0.277704,0.272418,0.271434,0.236285,0.221633,0.198916,0.128204,0.058977
"(11600, 19116]",0.1676,0.194584,0.233092,0.264232,0.270727,0.258719,0.199365,0.133133
"(19116, 24500]",0.036765,0.062564,0.122111,0.111582,0.12126,0.150753,0.164858,0.136826
"(24500, 65500]",0.022475,0.047077,0.058158,0.139115,0.19358,0.252915,0.359872,0.494829
"(65500, 240700]",0.001723,0.000337,0.002497,0.01003,0.022143,0.034109,0.091194,0.152979


### P ( LSM | class )

In [415]:
tbl4 = data['LSMmakeupHighDetail']
Plsm_class = tbl4.divide(tbl4.sum(axis=1), axis=0)
Plsm_class

Unnamed: 0_level_0,lsm7low,lsm7high,lsm8low,lsm8high,lsm9low,lsm9high,lsm10low,lsm10high
class,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
rural,,,,,,,,
village,,,,,,,,
informal settlement,,,,,,,,
township,,,,,,,,
urban residential 7,0.5,0.5,0.0,0.0,0.0,0.0,0.0,0.0
urban townhouse 7&8,0.222222,0.222222,0.277778,0.277778,0.0,0.0,0.0,0.0
urban residential 8&9,0.0,0.0,0.25,0.25,0.25,0.25,0.0,0.0
urban townhouse 9&10,0.0,0.0,0.0,0.0,0.25,0.25,0.25,0.25
urban estate,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.5


### P ( income | class )

In [416]:
Pincome_class = Plsm_class.dot(Pincome_lsm.T)
Pincome_class

Unnamed: 0_level_0,"(0, 1800]","(1800, 3200]","(3200, 7800]","(7800, 11600]","(11600, 19116]","(19116, 24500]","(24500, 65500]","(65500, 240700]"
class,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
rural,,,,,,,,
village,,,,,,,,
informal settlement,,,,,,,,
township,,,,,,,,
urban residential 7,0.05057,0.071004,0.336803,0.275061,0.181092,0.049665,0.034776,0.00103
urban townhouse 7&8,0.034003,0.051433,0.271472,0.263282,0.218631,0.086988,0.070254,0.003937
urban residential 8&9,0.014633,0.02466,0.167385,0.232067,0.256692,0.126427,0.160942,0.017194
urban townhouse 9&10,0.005098,0.009278,0.074376,0.151932,0.215486,0.143425,0.325299,0.075106
urban estate,0.001679,0.005014,0.033189,0.093591,0.166249,0.150842,0.427351,0.122086
