# Europe Wind Farm Data Exploration

## Data Description

The EuropeWindFarm data
set containing the day-ahead forecasts of 45 wind farms
(off- and onshore) scattered over the European continent as
shown in Fig. 6. The data set contains hourly averaged wind
power generation time series for two consecutive years and
the corresponding day-ahead meteorological forecasts using
the European Centre for Medium-Range Weather Forecasts
(ECMWF) weather model. 

The power generation time series are normalized with the
respective nominal capacity of the wind farm in order to
enable a scale-free comparison and to mask the original
characteristics of the wind farm. Additionally, all weather
situations are normalized in the range [0..1]. The data set is
pre-filtered to discard any period of time longer than 24h in
which no energy has been produced, as this is an indicator
of a wind farm malfunction.

The data set contains the
following data items:

- Time Stamp of the forecast / power measurement

__Forecasts__
- Forecasting Time Step - Time between the creation of the forecast to the forecasted point in time  
- Wind Speed in 100m height
- Wind Speed in 10m height
- Wind Direction (zonal) in 100m height
- Wind Direction (meridional) in 100m height
- Air Pressure Forecast of the measurement
- Air Temperature Forecast of the measurement  

__Power Observations__
- Wind Farm Power Generation Observations 


# Import Libraries Modules

In [1]:
%matplotlib inline
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

import seaborn as sns
from IPython.display import display

### Algorithm for reading multiple csvs at once

__0. Set path folder (wind farm data)__

__1. FOR LOOOP on .csv files__
    #### IN FOR LOOP:
    * read csv file, savre it in a dataframe
    * save number of rows (length)
    * save number of columns
    * save df
   
__2. Get a list (or array) containing (at least) each df for each csv file.__

__3. Explore different df__


In [2]:
wf_path = '/Users/andrewnachtigal/Documents/DSR/dsr_project/dsr_project/EuropeWindFarm/wind_forecast_cleanup/wind_forecasting_cleanup/data'

In [3]:
# save all csv files in a dataframe
import csv
import os
import glob
# glob.glob('*.csv') # find all csv files in a pathname
os.chdir(wf_path)
csv_files = [i for i in glob.glob('*.csv')]
csv_files

['wf1.csv', 'wf3.csv', 'wf2.csv']

In [4]:
# Read each csv file and store in a dictionary containing file name and dataframe
import ntpath
dict_files={}
files_names=[]
for files in csv_files:
    df = pd.read_csv(files,engine='python',index_col=0, parse_dates=True)
    basepath, filename=ntpath.split(files)
    files_names.append(filename)
    dict_files[filename]=df
#list(dict_files.keys()

# Calculate number of rows and features for each df and store in a list of tuples
shape_df=[]
for i in range(0,len(dict_files.keys())):
    nrows, ncols=(len(dict_files[files_names[i]]), len(dict_files[files_names[i]].columns))
    shape_df.append((nrows,ncols))
shape_df

[(16920, 9), (15927, 9), (11856, 9)]

In [5]:
df = dict_files[files_names[0]]
df.head()

Unnamed: 0_level_0,ForecastingTime,AirPressure,Temperature,Humidity,WindSpeed100m,WindSpeed10m,WindDirectionZonal,WindDirectionMeridional,PowerGeneration
Time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
0000-01-01 01:00:00,25,0.392238,0.406113,0.638134,0.625094,0.548931,0.820776,0.88354,0.50697
0000-01-01 02:00:00,26,0.37992,0.40519,0.614377,0.628325,0.563099,0.812227,0.890531,0.579394
0000-01-01 03:00:00,27,0.367603,0.404268,0.590621,0.631556,0.577266,0.803525,0.897331,0.524848
0000-01-01 04:00:00,28,0.360061,0.401325,0.598596,0.629573,0.582233,0.791599,0.906165,0.543939
0000-01-01 05:00:00,29,0.352519,0.398381,0.606571,0.62759,0.587199,0.779416,0.91464,0.620606


In [6]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 16920 entries, 0000-01-01 01:00:00 to 0001-12-30 22:00:00
Data columns (total 9 columns):
ForecastingTime            16920 non-null int64
AirPressure                16920 non-null float64
Temperature                16920 non-null float64
Humidity                   16920 non-null float64
WindSpeed100m              16920 non-null float64
WindSpeed10m               16920 non-null float64
WindDirectionZonal         16920 non-null float64
WindDirectionMeridional    16920 non-null float64
PowerGeneration            16920 non-null float64
dtypes: float64(8), int64(1)
memory usage: 1.3+ MB


In [7]:
df.index

Index(['0000-01-01 01:00:00', '0000-01-01 02:00:00', '0000-01-01 03:00:00',
       '0000-01-01 04:00:00', '0000-01-01 05:00:00', '0000-01-01 06:00:00',
       '0000-01-01 07:00:00', '0000-01-01 08:00:00', '0000-01-01 09:00:00',
       '0000-01-01 10:00:00',
       ...
       '0001-12-30 13:00:00', '0001-12-30 14:00:00', '0001-12-30 15:00:00',
       '0001-12-30 16:00:00', '0001-12-30 17:00:00', '0001-12-30 18:00:00',
       '0001-12-30 19:00:00', '0001-12-30 20:00:00', '0001-12-30 21:00:00',
       '0001-12-30 22:00:00'],
      dtype='object', name='Time', length=16920)

In [8]:
# fix dates
new_index = []
for stamp in df.index:
    new = '2' + stamp[1:]
    new_index.append(new)
    
df.index = pd.to_datetime(new_index)

df.head()

Unnamed: 0,ForecastingTime,AirPressure,Temperature,Humidity,WindSpeed100m,WindSpeed10m,WindDirectionZonal,WindDirectionMeridional,PowerGeneration
2000-01-01 01:00:00,25,0.392238,0.406113,0.638134,0.625094,0.548931,0.820776,0.88354,0.50697
2000-01-01 02:00:00,26,0.37992,0.40519,0.614377,0.628325,0.563099,0.812227,0.890531,0.579394
2000-01-01 03:00:00,27,0.367603,0.404268,0.590621,0.631556,0.577266,0.803525,0.897331,0.524848
2000-01-01 04:00:00,28,0.360061,0.401325,0.598596,0.629573,0.582233,0.791599,0.906165,0.543939
2000-01-01 05:00:00,29,0.352519,0.398381,0.606571,0.62759,0.587199,0.779416,0.91464,0.620606


In [9]:

df = df[['PowerGeneration',    
 'ForecastingTime',
 'AirPressure',
 'Temperature',
 'Humidity',
 'WindSpeed100m',
 'WindSpeed10m',
 'WindDirectionZonal',
 'WindDirectionMeridional']]