# Applied Capstone Project Week 4. Demand Forecast For VPP

## Table of Contents

1. [Introduction](#intro)
2. [Business Understanding](#business_understanding)
3. [Data Understanding and Preparation](#data)

## 1. Introduction

<a id="intro"></a>

The electricity sector is changing rapidly and previously one-way market relationships between energy producers and consumers are no longer a single option. The course for decarbonisation, digitalisation and decentralisation has become a new strategic plan for the industry development. More often small and medium consumers of energy seek for greener and more profitable solutions, like solar panels, storage systems. These new market agents are called prosumers — consumers that produce energy.

Electricity is a unique product due to its physical characteristics and is traded on the so-called Day Ahead Markets - for each hour of the next day. Thus, prosumers should keep their eye on volumes they consume and produce, as renewable energy generation is intermittent and from time to time prosumers still need to buy energy from the grid. And what to do during periods where they produce more energy than you can consume or store? 

One of the possible local solutions is a creation of Virtual Power Plant (VPP) - an aggregator of prosumers that regulates their relationships with each other and the energy system. 

This project is inspired by the 2017 course “Qualitative Methods In Energy Economics” by Sergey Syntulsky. 


## 2. Business Understanding

<a id="business_understanding"></a>

Technically, VPP is an entity that optimises energy flows within group and the market given existing distribution system constraints. Energy consumption is uneven with picks during the working hours and bottoms at night. The distribution depends on the production processes of each prosumer type. For example, fridges in warehouses usually work uniformly through the day, on the opposite, office buildings need more energy from 9:00 am till 18:00 and private houses — before 9:00 and after 18:00. At the same time, power generation of solar PV systems directly connected to the level of solar radiation during the day. 

Thus, the demand profile of a set of prosumers varies through the day, the week and through the season. For VPP it is important to know typical behaviour of demand curve. This is the first question VPP developers should define for themselves after the set of prosumers is defined. However, what is the best algorithm for clustering hourly consumption data to find the most accurate estimate?

In this project we would compare three clustering methods in order to find the answer:
- K-means;
- Affinity propagation;
- HDBSCAN 

## 3. Data Understanding and Preparation

<a id="data"></a>

For this project the [open source data](https://data.lab.fiware.org//dataset/874ac2ac-1920-4639-a661-fab4864b7647/resource/06a22cae-694c-40a7-aabf-a0ddfe0611e8/download/ternipowerdemandalldata.csv) from Trial Site Terni, Italy. The trial took place at a small network segment that connected prosumers with solar power plants and a hydroelectric power station. The data set provides power demand/supply profiles in kW of customers in different energy sectors from 02/04/2014 to 27/07/2015.

In [94]:
#Set the environment

import pandas as pd
import numpy as np
import requests
from bs4 import BeautifulSoup
from datetime import datetime

In [95]:
#Read the data into pandas dataframe

url = 'https://data.lab.fiware.org//dataset/' + \
            "874ac2ac-1920-4639-a661-fab4864b7647/resource/" + \
            "06a22cae-694c-40a7-aabf-a0ddfe0611e8/download/" + \
            "ternipowerdemandalldata.csv"

dFrame = pd.read_csv(url, sep = ';') #create a dataframe
dFrame.columns = ['datetime','value', 'customer'] #rename columns for convenience

dFrame.head()

Unnamed: 0,datetime,value,customer
0,2014-04-02T00:16:04+02:00,12.0,SecondarySubstation
1,2014-04-02T00:16:04+02:00,187.0,CustomerLighting
2,2014-04-02T00:16:10+02:00,10.0,CustomerCommercial_2
3,2014-04-02T00:16:12+02:00,22.0,CustomerOffice_1
4,2014-04-02T00:16:14+02:00,65.0,CustomerOffice_2


In [96]:
#Datetime column contains the information that 

dFrame[['datetime','timeshift']] = dFrame.datetime.str.split("+",expand=True) #split datetime column into 'datetime' and 'timeshift'

dFrame.datetime = pd.to_datetime(dFrame['datetime'], format="%Y-%m-%dT%H:%M:%S") #convert into datetime format

dFrame['date'] = dFrame.datetime.dt.date #create new column with dates
dFrame['hour'] = dFrame.datetime.dt.hour #create new columns with hours

dFrame.head()

Unnamed: 0,datetime,value,customer,timeshift,date,hour
0,2014-04-02 00:16:04,12.0,SecondarySubstation,02:00,2014-04-02,0
1,2014-04-02 00:16:04,187.0,CustomerLighting,02:00,2014-04-02,0
2,2014-04-02 00:16:10,10.0,CustomerCommercial_2,02:00,2014-04-02,0
3,2014-04-02 00:16:12,22.0,CustomerOffice_1,02:00,2014-04-02,0
4,2014-04-02 00:16:14,65.0,CustomerOffice_2,02:00,2014-04-02,0


In [97]:
#Create a new dataFrame with values grouped by date and hour

demand = dFrame.groupby(['date', 'hour']).sum().reset_index()

demand.head()

Unnamed: 0,date,hour,value
0,2014-04-02,0,2642.0
1,2014-04-02,1,3270.0
2,2014-04-02,2,3382.0
3,2014-04-02,3,3170.0
4,2014-04-02,4,3495.0
