# U.K. Smart Meter Energy Report
## Author: Alexander Van Roijen
## Date: 12/10/2019

## I. Introduction
This notebook and repository is dedicated to a reproducible investigation into a multi year study on energy consumption in the United Kingdom (U.K.). I will first provide background into the problem the data was collected for, related work, and the motivation behind this analysis. I will then explicitly describe the data and the permissions surrounding it. Afterwards, research questions will be described and the methods used to answer them. Finally the results will be presented showing how dynamic time of use (ToU) tariffs on residential electric demand in the U.K. exercise an extremely significant decrease in net electric consumption compared to their static pricing counterparts. In addition, I will highlight the annual savings per household per year under these ToU tariffs. I will also provide results demonstrating the use of Sparse Identification of Nonlinear Dynamics (SINDy) to predict future demand response. If you are unfamiliar with SINDy, make sure to look at this paper [here](https://github.com/bogyshi/AMATH563/blob/master/hw2/hw2Report.pdf).

__Important link__: All code used to generate the tables and figures made in this notebook can be found in [this notebook](https://github.com/bogyshi/Data512/blob/master/finalProject/analyses/smartMeterAnalyses.ipynb)

## II. Background
According to the world bank as of 2017, we are at a global electrification of 88.8%, meaning about 88.8% of the worlds population has access to electricity $^{[1]}$. At the same time, it is well known that the global population has been increasing dramatically, and furthermore that urbanization has been observed over the past century, over 50%$^{[2]}$.What this means is that we need more power in a more dense area. This comes with various associated challenged. Here are a few terms to keep in mind during this analysis. How do our results demonstrate the following definitions?

- System balancing (SB): The matching of energy supply to the time at which energy is being requested 
- Constraint management (CM): Limited capacity to transport electricity to all locations (bottleneck effect) means we need to manage this as best we can.
- Demand response (DR): Altering consumer electric usage by providing financial incentives.

Consequently, many utilities and governments are working together to implement demand response programs to help with system balancing and constraint management. In particular, a popular form is through Dynamic time-of-use (ToU) tariffs, as opposed to standard static pricing methods. Doing so discourages wasteful or excessive use of electricity during high demand periods, thus addressing both SB and CM. However, this is only plausible if the tariffs encourage some demand response. This notebook will explore exactly how well this method worked for the U.K. in 2013. But first, why am I interested in the first place?
### Motivation
I have always been fascinated by energy consumption, starting with the [energy dis-aggregation task](https://web.stanford.edu/group/peec/cgi-bin/docs/events/2011/becc/presentations/3%20Disaggregation%20The%20Holy%20Grail%20-%20Carrie%20Armel.pdf) that I did some research in during [my undergrad](https://aaai.org/ojs/index.php/AAAI/article/view/3872). It would be interesting to try similar work on this data. However, this notebook focuses on a simple analyses on how ToU tariffs impact DR.
### Related Work
There has been significant investigations already into demand response in various countries across the world. with a particular focus on European countries $^{[2,3,4]}$. Furthermore, despite low usage of this dataset within the first few years of its release, it has seen a large increase in usage to study various subjects such as demand prediction $^{[5]}$. However,despite the study aiming to understand how ToU tariffs were impacting DR, very little analysis has been done. In particular, I will be building off some of the work done by Gareth Piggot $^{[6]}$

So what does our data look like?

## III. Data
Smart meter readings from the United Kingdom over a three year period (2012 to 2014) at half hour intervals. What this means is that from ~5500 households we have 30 minute granular data on how much energy they are asking for from the grid for a little over a year. They were "recruited as a balanced sample representative of the Greater London population"$^{[7]}$. There are two types of groups, those under dynamic time of use (ToU) pricing and others under static pricing(Std). The main objective of this data collection process was to understand how users of various demographics respond to dynamic pricing compared to static pricing. "The signals given were designed to be representative of the types of signal that may be used in the future to manage both high renewable generation ... and also test the potential to use high price signals to reduce stress on local distribution grids during periods of stress" $^{[7]}$

Of the 5500 households included in the study, 1100 houses opted into the ToU pricing practice. while the remaining 4400 remained under std pricing, but agreed to have the smart meter measure their consumption.

### Licensing & Permissions
The terms and conditions from the london data store explicitly state that anyone "May use the data contained in this site for any purpose, providing it does not infringe the terms and conditions."$^{[8]}$. However, I will also take this time to note __"...the Greater London Authority cannot warrant the quality or accuracy of the data"__$^{[8]}$

As for all my transformations and usage of this data, have at it! Please feel free to use all codes and results from this analysis for any other purpose with proper accreditation to the original data source linked and referenced below.


### Data templates

Below I will be giving the schema for all data files generated by the analyses of [this notebook](https://github.com/bogyshi/Data512/blob/master/finalProject/analyses/smartMeterAnalyses.ipynb) and their structure. If their form doesnt make perfect sense, please review the notebook linked just before to understand how they were generated and why.

#### raw data located in data/Power-Networks-LCL-June2015(withAcornGps).csv_Pieces/ 168 csvs, each of form ? rows x 6 columns 
This is the raw data files you can download directly from the [here](https://data.london.gov.uk/dataset/smartmeter-energy-use-data-in-london-households) as many blocks or one large csv. They all hold this form

| Column | type | description |
|--------|------|-------------|
| LCLid  | string  | the house id     |
|   stdorToU      |    string  |     whether the house is a std or ToU pricing customer       |
|   DateTime    |    date-time  |    the date and time the according KWH/hh was recorded    |
|   KWH/hh (per half hour)      |    float  | the amount of energy in kilowatt hours (KWH)  consumed in a half-hour |
|   Acorn    |    string  |     What acorn group the house belongs to, like a U.S. census block        |
|   Acorn_grouped     |    string  |     a brief string to describe the type of block, e.g. "Affluent"        |

#### pivoted data located in data/pivotData/ (169 csvs, each of form ? rows x 50 columns)
This is the raw data after pivoting it into rows holding a days worth of data and removing extraneous information into the houseData.csv table. Each block was converted into a file like this

| Column | type | description |
|--------|------|-------------|
| LCLid  | string  | the house id     |
|   Date      |    string  |     the date when all the following energy measurements were recorded      |
|   00:00:00    |   float |    the amount of energy in kilowatt hours (KWH)  consumed from 00:00:00 to 00:29:59    |
|   ...      |    ...  | ...  |
|   23:30:00    |   float |    the amount of energy in kilowatt hours (KWH)  consumed from 23:30:00 to 23:59:59    |

#### houseData.csv (5x)
This is a small table we can join with the pivot data on LCLid to get information on demographics and its consumption type.

| Column | type | description |
|--------|------|-------------|
| LCLid  | string  | the house id     |
|   stdorToU      |    string  |     whether the house is a std or ToU pricing customer       |
|   Acorn    |    string  |     What acorn group the house belongs to, like a U.S. census block        |
|   Acorn_grouped     |    string  |     a brief string to describe the type of block, e.g. "Affluent"        |
| count | int | the number of rows in our pivotData files for this particular house id, representing days with consumption data available |

#### countsPerStdAndToU.csv (2 rows by 48 columns)
This is a group by done on the pivoted data along with house data to determine the number of rows available for both std and ToU houses. This will be necessary for determining statistical significances of differences

| Column | type | description |
|--------|------|-------------|
| stdorToU  | string  | whether the house is a std or ToU pricing customer     |
|   00:00:00    |  int |    the number of rows total available for either std or tou data from 00:00:00 to 00:29:59  |
|   ...      |    ...  | ...  |
|   23:30:00    |   int | the number of rows total available for either std or tou data from 23:30:00 to 23:59:59    |

#### avg_and_stdev.csv (2 rows by 98 columns)
This is a group by done on the pivoted data along with house data to determine the average consumption and standard deviation of each type of house (std or ToU) for each 30 minute interval.
*This is a multi-index headed table due to the groupby operation* This means that the table below will be an unstacked version of the table for ease of readability, but in reality is shown by pandas in a different manner when accessing it in the analyses jupyter notebook.

| Column | type | description |
|--------|------|-------------|
| stdorToU  | string  | whether the house is a std or ToU pricing customer     |
|   00:00:00 mean |  float |the average kwh/hh consumption from 00:00:00 to 00:29:59 for either std or ToU households |
|   00:00:00 std |  float |the standard deviation of kwh/hh consumption from 00:00:00 to 00:29:59 for either std or ToU households |
|   ...      |    ...  | ...  |
|   23:30:00 mean |  float |the average kwh/hh consumption from 23:30:00 to 23:59:59 for either std or ToU households |
|   23:30:00 std |  float |the standard deviation of kwh/hh consumption from 23:30:00 to 23:59:59 for either std or ToU households |


#### block1-1wdsz4INS.npy (? rows by 4 columns)
This is formatted data from the pivot data to create inputs for the SINDy process

| Column | type | description |
|--------|------|-------------|
| $C_{t-4}$ | float | consumption in kwh/hh at a time point 4 time steps, or two hours, before $C_{t}$   |
|   $C_{t-3}$  |  float | consumption in kwh/hh at a time point 3 time steps before $C_{t}$ |
|  $C_{t-2}$  |  float | consumption in kwh/hh at a time point 2 time steps before $C_{t}$ |
|  $C_{t-1}$      |   float  | consumption in kwh/hh at a time point 3 time steps before $C_{t}$ |

#### block1-1wdsz4OUTS.npy (? rows by 1 column)
This is formatted data from the pivot data to create ouputs for the SINDy process, associated with the INS file shown above

| Column | type | description |
|--------|------|-------------|
| $C_{t}$ | float | consumption in kwh/hh at the current time point $C_{t}$   |

With this taken care of, lets dive into what we will be exploring in the rest of this notebook

## IV. Research Questions

### 1) Are ToU customers responding to this pricing? In particular, do they consume more or less than their static pricing counterparts? Is this relationship significant?

Are we achieving forms of SB or CM using this set of tariffs? Can we see through the noise that this relationship is actually true?

### 2)  If this relationship is significant, what kind of savings do we get when switching to ToU? Where do we see maximum savings? Do we reduce peak load?

Even if they are significant, what kind of savings can we expect? Particularly, we want to know if the monetary cost of implementing the new system, along with its bureaucratic over head, offer significant gains.

### 3) Can we model future consumption using only past consumption as a signal?

In particular, I will be using Sparse Identification of Non-linear Dynamics(SINDy)$^{[9]}$. This method allows us to hopefully find an equation that can model consumption over time in an interpretable and intuitive manner,

## V. Methods

For questions 1 and 2, I will 
- First, parse the data into the forms indicated above
- Second, use a groupby operation with pandas to determine average and standard deviation for both std and ToU groups for each thirty minute interval.
- Second, create an error plot to highlight the difference in consumption per half hour for both std and ToU groups.
- Third, run an unequal variances T-Test (Welch Test) for each 30 minute interval to determine significance (or lack thereof).
- Fourth, take the difference between each interval, sum them up, and multiply over 365 days of the year to get annual savings for the average household in kWh.

For question 3, I will
- First, create methods to format data to fit the SINDy model.
- Second, create methods to generate a large library of transformations on our formatted data 
- Third, Use lasso regression on a low threshold with no bars on maximum iterations to get a set of coefficients
- Fourth, Use the created coefficients to model and predict future consumption with one set of inputs as my starting point
- Fifth, plot the results and analyze the coefficients for their meaning

## VI. Results & Discussion

### Research Question 1
Are ToU customers responding to this pricing? In particular, do they consume more or less than their static pricing counterparts? Is this relationship significant?

!["errorplot for std and tou pricing usage over 30 min intervals"](images/stdvstouAll.jpg "erroplot for std and tou pricing usage over 30 min intervals")


The answer is yes to all of the above! If you want to see the technical details on how this graph was generated and the significance determined, please look at [this notebook](https://github.com/bogyshi/Data512/blob/master/finalProject/analyses/smartMeterAnalyses.ipynb)

### Discussion
Clearly, we can see that there is quite a difference amongst all thirty minute intervals. The error bars are so small due to the large sample size that we can not even see them in this graphic. But given that we have tens of thousands of data points for every thirty minute interval for both types of houses, this should be expected. This prompts our next research question; do these changes make differences we would hope for?

### Research Question 2:  If this relationship is significant, what kind of savings do we get when switching to ToU? Where do we see maximum savings? Do we reduce peak load?

### From 6 pm to 8 pm, we save 5.5,5.7,6.1,7.1, and 8.1 percent under ToU users during their respective 30 minute interval
For example, from 6 to 6:29:59 p.m., we are reducing consumption by 5.5% on the average day!

### The greatest reduction in energy usage is at midnight. We see an average 23% decrease in usage, from 0.201kwh/hh to 0.156 kwh/hh

### Throughout the year, the average ToU household saves a total of 308.6 kwh, which using the prices highlight ed by UK Power Networks in their study, would cost approximately  308.6kwh * 0.1176BP/kwh = 36.3BP = 57\$

### Discussion

Clearly, we are seeing quite a significant decrease in overall energy usage, regardless of the time of day, with the greatest savings occuring during very low usage periods. This is a great result, in that we are able to reduce energy expenditures across the board, which solves a lot of problems, but indicates that we arent able to easily shift consumer usage patterns. Or at least, that is what is shown on the overall average scale amongst all houses. Next steps would be to analyze how users respond to high tariffs at peak consumption times! This is studied somewhat in a paper done by J. Schofield $^{[10]}$, and should be followed up in future work of this notebook.

Now lets see how SINDy can do!

### Research Question 3: Can we model future consumption using only past consumption as a signal?

!["SINDy results with lambda = 0.0001 and max iterations with 2 hour window or 4 time steps"](images/4dwindowRes.jpg)


### Parameters to generate the model above

$\lambda = 0.01$

$maxIters = $ None

$startPos$ = 23 , or 11:30 a.m.

$duration$ = 48, or 24 hours

### Model equation:
$0.004c_{t-4}+0.04c_{t-3}+0.003c_{t-2}
+0.57c_{t-1}+0.018c_{t-2}^2+0.005c_{t-1}^2+0.007c_{t-1}^3$ where $C_t$ is consumption at some time t

### Discussion
Clearly, our model is unable to predict out much further than 4 timesteps into the future, before settling on zero consumption and staying their. *However* this is not a discouraging result! We can see that consumption at time points very close to our next time point are powerful indicators of the future events to come. Furthermore, the sinusoidal components of our library were not present what so ever in this function. This appears to indicate that they are not powerful indicators, despite the seemingly sinusoidal nature of the curves. There is certainly room for further improvements of this model, including other transformations to try out


## VII. Conclusion

Overall, our results showed that promoting a tariff based time of use pricing mechanism for residential electric consumption in the United Kingdom, predominantly in the greater London area, significantly reduces consumption across all times of the day. This does in fact help our issues of system balancing and constraint management, but doesnt solve them. In order to see if aggressive tariffs truly shift demand away from peak load times, another set of analyses would have to be done on high tariff periods versus average or low tariff periods to determine their impact on shifting demand to different times.

Meanwhile, trying to predict future consumption using only up to 2 hours of previous signal does not appear to be easily done. After about two hours worth of time steps, our model simply converges to average consumption and stays there. However, it does highlight that high order polynomials tend to be indicative of future consumption, rather than interactions or sinusoidal transformations on the very same data.


### Things to keep in mind
- 1) Did those who opt into the survey begin as mindful people? This is a potential or were they not? We can test this! look at starting points of those who opted in, did they start conscious of consumption or not!
- 2) Are demographics balanced between std and ToU users in my analyses? The paper on this data claims they are, but we can verify this ourselves!

### Future Work

- 1) As aforementioned, analyzing the impact of high tariff levels on shifting demand throughout the day
- 2) Using demographic information to determine if all groups respond similarly to tariff pricing
- 3) Creating a larger library of transformations for SINDy to create a better predictive model
- 4) Using more data outside of 1 block for SINDy to create a potentially stronger model.

## References
- [1] Access to electricity (% of population) (2019).  Retrieved November 22, 2019, from World B ank. https://data.worldbank.org/indicator/eg.elc.accs.zs. CC 4.0 License. 
- [2] Department of Economic and Social Affairs. (2019). World urbanization prospects : the 2018 revision. New York: United Nations.
- [3] Torriti, J., Hassan, M. G., & Leach, M. (2010). Demand response experience in Europe: Policies, programmes and implementation. Energy, 35(4), 1575-1583.
- [4] Siano, Pierluigi. "Demand response and smart grids—A survey." Renewable and sustainable energy reviews 30 (2014): 461-478.
- [5] Jordehi, A. R. (2019). Optimisation of demand response in electric power systems, a review. Renewable and Sustainable Energy Reviews, 103, 308-319.
- [6] Piggott, G (June 24th, 2015). Electricity Consumption in a Sample of London Households. Retrieved from https://data.london.gov.uk/blog/electricity-consumption-in-a-sample-of-london-households/
- [7] UK Power Networks. (2015). SmartMeter Energy Consumption Data in London Households [Zip, Data File]. Retrieved from https://data.london.gov.uk/dataset/smartmeter-energy-use-data-in-london-households
- [8] Terms and Conditions. (2019). Retrieved November 22, 2019, from London Data Store, London Data Store Terms and Conditions. Website, https://data.london.gov.uk/about/terms-and-conditions/
- [9] J. Chem. (2019). Reactive SINDy: Discovering governing reactions from concentration data. AIP The journal of chemical physics. https://aip.scitation.org/doi/10.1063/1.5066099
- [10] Schofield, J. (2015). Dynamic time-of-use electricity pricing for residential demand response: Design and analysis of the Low Carbon London smart-metering trial

## Data Cleaning / Supporting Code
Dont forget to look at [this notebook](https://github.com/bogyshi/Data512/blob/master/finalProject/analyses/smartMeterAnalyses.ipynb) for additional information and details on the code used to generate the results shown here