<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Notebook-Introduction" data-toc-modified-id="Notebook-Introduction-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Notebook Introduction</a></span></li></ul></div>

# Energy Analysis Project - Notebook 1 : Data Collection / Wrangling

**@Author: Stratos Hadjioannou**

**June 2020**

This project will go through the entire data science pipeline by analysing the Energy consumption of various European countries. Most of the data was obtained from the [Open Power System Data](https://open-power-system-data.org/) website with the exception of the sunset and sunrise dataset that is taken from [this website.](https://www.timeanddate.com/sun/). This project has been inspired by a workshop I attended in the conference virtual DSGO that analysed this data to demonstrate the use of MLflow. You can find out more about this project in [this repo.](https://github.com/Samreay/DSGoPipeline) This project also uses Cookiecutter for file structure. You can find more about it in [this article.](https://medium.com/@rrfd/cookiecutter-data-science-organize-your-projects-atom-and-jupyter-2be7862f487e)

![energy_image](https://795d665f9fc7b1053e24-4d632937b8453c17306cf8bcb974f77f.ssl.cf3.rackcdn.com/x/1330cm500/images/Sectors/Sustainable_energy/sector-banner_sustainable-energy-banner-fisher-german.jpg)

This project will combine various open-source datasets that relate top power generation of various European countries. It will go through the entire data science project outlined below: 

1. Data Collection / Wrangling
2. Data Manipulation / Cleanning
3. Exploratory Data Analysis (EDA)
4. Modelling / Validation
5. Deployment of Model

The aim of this project is to learn what affect the energy generation of these countries. The full code for this project can be found here.

## Notebook Introduction

This notebook will focus on data collection and wrangling. This is the process of collecting all the data you will need for your project and transforms it into a usable form. Most of the times data that you can find is in a form that is hard to be analyzed, this part will look at different ways to transform this data and make it usable for the rest of the project.

The data sources we will use in this project are:

- Solar / Wind power generation
- Historic weather data
- Energy capacity across different countries
- Sunset / Sunrise data

## Import the libraries

This section imports the libraries needed for this project.

In [3]:
# general python libraries
import os
from datetime import datetime

# data manipulation
import numpy as np
import pandas as pd

# import the .src file for custom functions
# this requires that you run the following command
# pip install --editable .
import src

## Solar / Wind power generation
As mentioned in the introduction of this notebook the data for this is taken directly from [this website](https://data.open-power-system-data.org/time_series/). From the website, this data includes Load, wind and solar, prices in hourly resolution for various European countries. We use *Version 2019-06-05*.

In [4]:
# read the data from the direct link
df_energy = pd.read_csv("https://data.open-power-system-data.org/time_series/2019-06-05/time_series_60min_singleindex.csv")

In [5]:
df_energy.head()

Unnamed: 0,utc_timestamp,cet_cest_timestamp,AL_load_actual_entsoe_power_statistics,AT_load_actual_entsoe_power_statistics,AT_load_actual_entsoe_transparency,AT_load_forecast_entsoe_transparency,AT_price_day_ahead,AT_solar_generation_actual,AT_wind_onshore_generation_actual,BA_load_actual_entsoe_power_statistics,...,SK_load_forecast_entsoe_transparency,SK_solar_generation_actual,TR_load_actual_entsoe_power_statistics,UA_load_actual_entsoe_transparency,UA_load_forecast_entsoe_transparency,UA_east_load_actual_entsoe_transparency,UA_east_load_forecast_entsoe_transparency,UA_west_load_actual_entsoe_power_statistics,UA_west_load_actual_entsoe_transparency,UA_west_load_forecast_entsoe_transparency
0,2004-12-31T23:00:00Z,2005-01-01T00:00:00+0100,,,,,,,,,...,,,,,,,,,,
1,2005-01-01T00:00:00Z,2005-01-01T01:00:00+0100,,,,,,,,,...,,,,,,,,,,
2,2005-01-01T01:00:00Z,2005-01-01T02:00:00+0100,,,,,,,,,...,,,,,,,,,,
3,2005-01-01T02:00:00Z,2005-01-01T03:00:00+0100,,,,,,,,,...,,,,,,,,,,
4,2005-01-01T03:00:00Z,2005-01-01T04:00:00+0100,,,,,,,,,...,,,,,,,,,,
