# Introduction

The world is urbanizing more and more every year. In fact, the United Nations predicts 70% of the world’s population will live in cities by 2050 (source). This is a marvelous feat for humanity as we continue to lift people out of extreme poverty all across the world. However, urbanization is not without its costs and at the top of the list of new problems we as society have to solve is the increased demand for energy. 

At this time, electricity is mainly created using fossil fuels. Though there are other environmentally friendly methods of energy production, at this time electricity cannot itself be stored on any scale. There are some storage system methods available, however they are nowhere close capacity-wise to fuel a city. 

As time goes on, innovation will give us more flexibility with solving the energy crisis urbanization will surely bring. But until then there is plenty of room to optimize energy production with the current methods we have at this time. In this analysis, I will be investigating hourly energy data in Spain (01/01/2015 - 12/31/2018) to do just that. 

The data contains hourly information about the generation, price, and demand of energy in Spain. Additionally, this data contains predictions for energy demand and prices made by Spain’s transmission system operator (TSO). 


## Intent

In this analysis, I will showcase a few models that outperforms the predictions the TSO made for hourly demand and the price of energy production. Once we have a better method to forecast energy demand and costs, we can help energy providers devise new energy production strategies that meet the needs of their consumers, lowers excess energy production thus saving cost and also explore ways to better integrate alternative green energy production methods. I will also look into consumer patterns of behaviors which can be used to help devise rebate programs to help reduce demand during forecasted peak times during the year.  

## Objectives

In this analysis, I will be conducting the following:

Model different consumer behaviors to create consumer profiles based on electricity demand using a clustering algorithm. I will extract times during the day they use power the most & least and try to draw additional insights about their behavior. Better understanding our consumers will allow us to devise rebate programs to incentivize consumers to limit power usage during specific times of the day/year when we forecast demand to be very high(helping energy production companies lower cost).

Creating a multivariate energy cost and demand predictors that will both surpasses previous TSO predictions in performance. Models will be able to make forecasts 3 months into the future. More specifically, models built were VAR, XGBRegressor, and a Deep Learning Network. Being able to forecast demand and cost will allow energy providers to devise production strategies that will limit costs, produce an optimal amount of energy and also leverage more green methods of production 


## Impact

The big picture impact of the analysis is to better develop ways to create cost efficient ways of energy production while also looking to incorporate green technology. By creating the right amount of energy, we protect the enviroment from harmful CO2 emissions as well as the bottomline of energy prodcution companies. If Spain is able to optimally produce electricity through the information we gained through the analysis, it stands to reason other nations can do the same. Thereby we can slowly work towards solving the energy crisis urbanization will certainly bring in the coming decades.

# Table of Contents

1. Introduction
     - Intent
     - Objectives
     - Impact
2. Table of Contents
3. Data Overview
    - About the Data
    - Preview Data
    - Data Overview - Metrics
4. Methodology
5. Consumer Profile Analysis
    - Results
6. Time Series Analysis
    - Results
7. Conclusion
    - Final Takeaways
    - Thoughts for future research

# Data Overview

## About the Data

Energy data contains hourly information about the generation of energy in Spain. In particular, there is info (in MW) about the amount of electricity generated by the various energy sources (fossil gas, fossil hard coal and wind energy dominate the energy grid), as well as about the total load (energy demand) of the national grid and the price of energy (€/MWh). Note: Since the generation of each energy type is in MW and the time-series contains hourly info, the number of each cell represent MWh.

Data source can be found [here](https://www.kaggle.com/nicholasjhana/energy-consumption-generation-prices-and-weather).

## Preview Data

Below is a preview of the processed data. Data shape is 35,070 rows by 22 columns

In [2]:
#Read in some libraries
import pandas as pd
import numpy as np
from IPython.display import Image

#Set notebook preferences - pandas
pd.set_option("display.max_columns", 101)

In [3]:
#Set path to clean aggregated data
path = r'C:\Users\kishe\Documents\Data Science\Projects\Python Projects\In Progress\Spain Hourly Energy Demand and Weather\Data\02_Cleaned_Data'

#Read in data
data = pd.read_csv(path + '/2020_0620_Weather_Energy.csv')

Unnamed: 0,date_time,forecast_solar_day_ahead,forecast_wind_onshore_day_ahead,generation_biomass,generation_fossil_brown_coal/lignite,generation_fossil_gas,generation_fossil_hard_coal,generation_fossil_oil,generation_hydro_pumped_storage_consumption,generation_hydro_run-of-river_and_poundage,generation_hydro_water_reservoir,generation_nuclear,generation_other,generation_other_renewable,generation_solar,generation_waste,generation_wind_onshore,price_actual,price_day_ahead,temp,total_load_actual,total_load_forecast
0,2015-01-01 00:00:00,17.0,6436.0,447.0,329.0,4844.0,4821.0,162.0,863.0,1051.0,1899.0,7096.0,43.0,73.0,49.0,196.0,6378.0,65.41,50.1,30.814633,25385.0,26118.0
1,2015-01-01 01:00:00,16.0,5856.0,449.0,328.0,5196.0,4755.0,158.0,920.0,1009.0,1658.0,7096.0,43.0,71.0,50.0,195.0,5890.0,64.92,48.1,30.85286,24382.0,24934.0
2,2015-01-01 02:00:00,8.0,5454.0,448.0,323.0,4857.0,4581.0,157.0,1164.0,973.0,1371.0,7099.0,43.0,73.0,50.0,196.0,5461.0,64.48,47.33,30.108448,22734.0,23515.0
3,2015-01-01 03:00:00,2.0,5151.0,438.0,254.0,4314.0,4131.0,160.0,1503.0,949.0,779.0,7098.0,43.0,75.0,50.0,191.0,5238.0,59.32,42.27,30.091044,21286.0,22642.0
4,2015-01-01 04:00:00,9.0,4861.0,428.0,187.0,4130.0,3840.0,156.0,1826.0,953.0,720.0,7097.0,43.0,74.0,42.0,189.0,4935.0,56.04,38.41,30.19262,20264.0,21785.0
5,2015-01-01 05:00:00,4.0,4617.0,410.0,178.0,4038.0,3590.0,156.0,2109.0,952.0,743.0,7098.0,43.0,74.0,34.0,188.0,4618.0,53.63,35.72,29.9732,19905.0,21441.0
6,2015-01-01 06:00:00,3.0,4276.0,401.0,172.0,4040.0,3368.0,158.0,2108.0,961.0,848.0,7098.0,43.0,74.0,34.0,186.0,4397.0,51.73,35.13,30.03512,20010.0,21285.0
7,2015-01-01 07:00:00,12.0,3994.0,408.0,172.0,4030.0,3208.0,160.0,2031.0,983.0,1012.0,7099.0,43.0,72.0,35.0,189.0,3992.0,51.43,36.22,30.55388,20377.0,21545.0
8,2015-01-01 08:00:00,39.0,3602.0,413.0,177.0,4052.0,3335.0,161.0,2119.0,1001.0,1015.0,7098.0,43.0,73.0,54.0,198.0,3629.0,48.98,32.4,33.51848,20094.0,21443.0
9,2015-01-01 09:00:00,784.0,3212.0,419.0,177.0,4137.0,3437.0,163.0,2170.0,1041.0,1357.0,7097.0,43.0,74.0,743.0,198.0,3073.0,54.2,36.6,33.88496,20637.0,21560.0


{'tags': ['remove_input']}

## Data Overview - Metrics

**Data Description**

In [4]:
data.describe().T

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
forecast_solar_day_ahead,35070.0,1438.825121,1677.661532,0.0,69.0,576.0,2635.0,5836.0
forecast_wind_onshore_day_ahead,35070.0,5471.372512,3176.148983,237.0,2979.0,4855.5,7353.0,17430.0
generation_biomass,35070.0,383.536128,85.348006,0.0,333.0,367.0,433.0,592.0
generation_fossil_brown_coal/lignite,35070.0,448.060251,354.603125,0.0,0.0,509.0,757.0,999.0
generation_fossil_gas,35070.0,5622.474309,2201.444741,0.0,4126.0,4969.0,6428.75,20034.0
generation_fossil_hard_coal,35070.0,4256.296179,1961.968024,0.0,2527.0,4474.0,5839.0,8359.0
generation_fossil_oil,35070.0,298.335358,52.518153,0.0,263.0,300.0,330.0,449.0
generation_hydro_pumped_storage_consumption,35070.0,475.867237,792.594472,0.0,0.0,68.0,617.0,4523.0
generation_hydro_run-of-river_and_poundage,35070.0,972.117536,400.74052,0.0,637.0,906.0,1250.0,2000.0
generation_hydro_water_reservoir,35070.0,2605.122241,1835.141359,0.0,1077.25,2165.0,3757.0,9728.0


# Methodology

## Model  Consumer Behavior

I will be using a clustering algorithm to model different patterns of consumer behavior when it comes to hourly energy demand. I will be using the following steps

1.
2.
3.
4.

## Multivariate Time-Series Analysis

I will be conducting a multivariate time series analysis to create 2 machine learning models that will better predict and forecast energy demand and cost than the TSO predictions. I will be using the following steps:
1. Use the Granger's Test of Causality to confirm there is some kind of a relationship between the features of the dataset. If there are features without a relationship with the other features, they will be removed
2. Use the cointegration test to verify the presence of a statistically significant connection between each of the time series.
3. Make data stationary and extract additional features
4. Develop and deploy models with the aim to outperform OTO forecasts over a 3-month period

# Consumer Profile Analysis

## Results

#

# Time Series Analysis

## Predicting Cost of Energy Production

Base model

VAR REsults

Tensorflow results

## Predicting Energy Demand

## Results

# Conclusion

## Final Takeaways

## Thoughts for future research