# Problem Statement

Our raw data consists of:
 * Household load data every 10 minutes for a week
 * Load profiles of PEVs for 10 days, for two scenarios:
  * Level 1 (L1): charges at 1920 W
  * Level 2 (L2): charges at 6600 W
 * A mapping of vehicles to households

Our ultimate goal is to predict, using a smart meter that reports a household's electrical load every 30 minutes,
 1. How many plug-in electric vehicles (PEVs) that household has
 2. How much electricity the PEVs consume when charging
 3. Whether or not each PEV is charging at a given time

# Overview of Method

Presuming that the households sampled do not have PEVs, we simulate what the load data would look like if the households were to have PEVs. This gives us the input we need to train prediction algorithms that will ultimately fulfil the goal above.

# Input

The `simulate_data` method in the `get_input.py` file simulates datasets for us. It takes six parameters:
 * `vehicles_L1`: The approximate number of vehicles charging at L1 across the dataset
 * `error_L1`: The uncertainty of `vehicles_L1`
 * `vehicles_L2`: The approximate number of vehicles charging at L2 across the dataset
 * `error_L2`: The uncertainty of `vehicles_L2`
 * `timestep`: How many raw data samples to downsample into one simulated data sample
 * `random_seed` (optional): The random seed to be used in all simulation randomness

and returns a dict of four variables:
 * `combined`: The combination of the EV load and the baseline power consumption. The main input to our future prediction algorithms.
 * `load`: The EV load. Output 1 of our prediction algorithms.
 * `households`: The household-to-vehicle map. Output 2 of our prediction algorithms.
 * `params`: A list of parameters containing approximate numbers of L1 and L2 vehicles, uncertainties for these numbers, and the total number of vehicles.

For now, we assume that the 75% of the total cars are PEVs, that 30% of these charge at L1 and the remaining 60% charge at L2, and that we know the number of L1 and L2 vehicles to 5% uncertainty. We also assume the smart meters give us data every 30 minutes, meaning we set `timestep=3` to combine three of the 10-minute raw samples into each 30-minute simulated sample.

In [1]:
import pandas as pd
n_vehicles = len(pd.read_csv("raw_data/vehicles.csv"))
n_L1 = n_vehicles * 0.75 * 0.3
n_L2 = n_vehicles * 0.75 * 0.6
d_L1 = n_L1*0.05
d_L2 = n_L2*0.05

The `get_data` method automatically takes care of simulating data and storing it so it doesn't have to be re-simulated each time. We simulate two datasets, one with `random_seed=0` for training and the other with `random_seed=1` for testing even though we will later use separate groups of households for training and testing, because if we used the same dataset, prediction algorithms could circumvent the uncertainty we introduce in the number of L1 and L2 vehicles by looking at the number of those vehicles in the training data.

Here's what the simulated input data looks like:

In [2]:
import get_input
from IPython.display import display
data = get_input.get_data(n_L1, d_L1, n_L2, d_L2, 3, 0)
for key in data:
    print(key)
    display(data[key])

combined


Unnamed: 0_level_0,Household 1,Household 2,Household 3,Household 4,Household 5,Household 6,Household 7,Household 8,Household 9,Household 10,...,Household 191,Household 192,Household 193,Household 194,Household 195,Household 196,Household 197,Household 198,Household 199,Household 200
Time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2010-01-01 00:00:00,541.296667,475.540000,1377.033333,440.723333,2043.566667,319.280000,615.260000,869.083333,245.206667,387.316667,...,574.260000,1704.733333,710.270000,513.843333,262.886667,390.353333,275.740000,2126.933333,2109.733333,410.566667
2010-01-01 00:30:00,549.256667,449.853333,1377.030000,321.766667,2556.433333,783.810000,687.570000,845.746667,532.483333,387.313333,...,318.670000,1681.233333,263.600000,632.396667,261.873333,623.746667,511.760000,1937.166667,2228.066667,531.766667
2010-01-01 01:00:00,302.983333,459.343333,886.150000,328.263333,2074.500000,315.943333,679.126667,761.196667,773.606667,625.660000,...,569.483333,1299.666667,266.780000,388.213333,504.043333,359.723333,273.650000,1984.266667,2444.000000,526.040000
2010-01-01 01:30:00,558.503333,905.173333,885.230000,326.140000,1707.500000,299.633333,639.720000,774.286667,516.880000,626.013333,...,560.206667,1227.000000,263.100000,388.000000,261.620000,616.013333,274.553333,2024.000000,2370.533333,528.903333
2010-01-01 02:00:00,303.733333,435.046667,886.140000,555.623333,1784.166667,314.893333,925.186667,520.673333,290.290000,867.693333,...,325.863333,1225.033333,507.666667,388.000000,503.370000,384.013333,268.846667,2003.166667,2397.400000,528.903333
2010-01-01 02:30:00,544.656667,942.670000,886.200000,566.880000,1521.500000,542.340000,685.266667,736.493333,290.286667,620.106667,...,320.730000,1273.500000,505.546667,629.660000,503.876667,609.296667,274.890000,2111.700000,2031.266667,528.900000
2010-01-01 03:00:00,321.823333,448.930000,988.246667,333.833333,1788.100000,320.136667,426.370000,751.940000,292.423333,628.256667,...,332.023333,1371.100000,510.286667,639.706667,264.363333,372.926667,760.936667,2014.666667,4639.872510,537.210000
2010-01-01 03:30:00,551.273333,437.356667,950.093333,333.963333,1952.633333,305.926667,426.953333,538.793333,292.423333,386.756667,...,326.543333,1612.733333,268.286667,398.036667,263.686667,876.486667,720.083333,2211.366667,6019.417260,290.950000
2010-01-01 04:00:00,319.076667,713.966667,1161.986667,811.206667,1700.966667,805.490000,454.670000,561.920000,534.083333,386.943333,...,334.536667,1371.133333,508.276667,639.706667,265.963333,636.420000,945.283333,2249.966667,2848.466667,289.523333
2010-01-01 04:30:00,311.840000,951.700000,1198.683333,325.743333,1492.033333,545.893333,696.330000,1007.460000,775.740000,391.376667,...,807.483333,1580.400000,508.890000,639.686667,503.750000,374.330000,278.110000,2217.866667,2109.166667,1335.255100


load


Unnamed: 0_level_0,Household 1,Household 2,Household 3,Household 4,Household 5,Household 6,Household 7,Household 8,Household 9,Household 10,...,Household 191,Household 192,Household 193,Household 194,Household 195,Household 196,Household 197,Household 198,Household 199,Household 200
Time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2010-01-01 00:00:00,0.000000,0,0,0,0,0.000000,0.000000,0,0,0.000000,...,0,0.000000,0.000000,0.000000,0.0,0.000000,0,0,0.000000,0.000000
2010-01-01 00:30:00,0.000000,0,0,0,0,0.000000,0.000000,0,0,0.000000,...,0,0.000000,0.000000,0.000000,0.0,0.000000,0,0,0.000000,0.000000
2010-01-01 01:00:00,0.000000,0,0,0,0,0.000000,0.000000,0,0,0.000000,...,0,0.000000,0.000000,0.000000,0.0,0.000000,0,0,0.000000,0.000000
2010-01-01 01:30:00,0.000000,0,0,0,0,0.000000,0.000000,0,0,0.000000,...,0,0.000000,0.000000,0.000000,0.0,0.000000,0,0,0.000000,0.000000
2010-01-01 02:00:00,0.000000,0,0,0,0,0.000000,0.000000,0,0,0.000000,...,0,0.000000,0.000000,0.000000,0.0,0.000000,0,0,0.000000,0.000000
2010-01-01 02:30:00,0.000000,0,0,0,0,0.000000,0.000000,0,0,0.000000,...,0,0.000000,0.000000,0.000000,0.0,0.000000,0,0,0.000000,0.000000
2010-01-01 03:00:00,0.000000,0,0,0,0,0.000000,0.000000,0,0,0.000000,...,0,0.000000,0.000000,0.000000,0.0,0.000000,0,0,2339.472510,0.000000
2010-01-01 03:30:00,0.000000,0,0,0,0,0.000000,0.000000,0,0,0.000000,...,0,0.000000,0.000000,0.000000,0.0,0.000000,0,0,3860.283926,0.000000
2010-01-01 04:00:00,0.000000,0,0,0,0,0.000000,0.000000,0,0,0.000000,...,0,0.000000,0.000000,0.000000,0.0,0.000000,0,0,0.000000,0.000000
2010-01-01 04:30:00,0.000000,0,0,0,0,0.000000,0.000000,0,0,0.000000,...,0,0.000000,0.000000,0.000000,0.0,0.000000,0,0,0.000000,804.068433


households


Unnamed: 0,Household,L1,L2
0,Household 1,0,1
1,Household 2,0,0
2,Household 3,0,0
3,Household 4,0,0
4,Household 5,0,0
5,Household 6,0,1
6,Household 7,1,1
7,Household 8,0,0
8,Household 9,0,0
9,Household 10,0,1


params


0
vehicles_L1        78.300
vehicles_L2       156.600
vehicles_total    348.000
error_L1            3.915
error_L2            7.830
Name: 1, dtype: float64

# Predictor Class Design

To move us towards writing prediction algorithms, we create an abstract `Predictor` class:

In [4]:
from IPython.core import page
page.page = print

import predictor
%psource predictor.Predictor

[0;32mclass[0m [0mPredictor[0m[0;34m([0m[0mABC[0m[0;34m)[0m[0;34m:[0m[0;34m[0m
[0;34m[0m    [0mtrained[0m [0;34m=[0m [0;32mFalse[0m[0;34m[0m
[0;34m[0m[0;34m[0m
[0;34m[0m    [0;34m@[0m[0mabstractmethod[0m[0;34m[0m
[0;34m[0m    [0;32mdef[0m [0mload[0m[0;34m([0m[0mself[0m[0;34m,[0m [0mpath[0m[0;34m)[0m[0;34m:[0m[0;34m[0m
[0;34m[0m        [0mtrained[0m [0;34m=[0m [0;32mTrue[0m  [0;31m#May want to modify this behavior[0m[0;34m[0m
[0;34m[0m[0;34m[0m
[0;34m[0m    [0;34m@[0m[0mabstractmethod[0m[0;34m[0m
[0;34m[0m    [0;32mdef[0m [0mtrain[0m[0;34m([0m[0mself[0m[0;34m,[0m [0mparams[0m[0;34m,[0m [0mcombined[0m[0;34m,[0m [0mload[0m[0;34m,[0m [0mhouseholds[0m[0;34m)[0m[0;34m:[0m[0;34m[0m
[0;34m[0m        [0mtrained[0m [0;34m=[0m [0;32mTrue[0m[0;34m[0m
[0;34m[0m[0;34m[0m
[0;34m[0m    [0;34m@[0m[0mabstractmethod[0m[0;34m[0m
[0;34m[0m    [0;32mdef[0m [0mpredict[0m