# Structure of this Notebook:
For the analysis, the data regarding actual and forecast wind feed-in was downloaded from the information platform of the four German transmission system operators. In Excel, the actual and forecast wind feed-in was combined into an Excel spreadsheet, broken down by the four TSO. Also irrelevant columns like e.g. time to, were removed. 
In the sections 2 to 5 the wind data from the four control are prepared for the analysis. In this context. 
In section 6 the four data sets are combined into one data set and subjected to further processing.

# 1. Importing the required libaries

In [142]:
# Importing the required libaries
import pandas as pd
import numpy as np
import seaborn as sns #visualization
import matplotlib.pyplot as plt #visualization
from datetime import datetime , timedelta
#%matplotlib inline 
#import missingno as msno
#sns.set(color_codes=True)
#sns.set_style('whitegrid') # white plot background 
#sns.set_palette('Blues_r')

# 2. Control Area: 50 Hertz

## 2.1 Loading the data into the data frame

In [130]:
wind_50Hertz= pd.read_excel("Data/wind_raw_data.xlsx", sheet_name = '50Hertz', header = 0, parse_dates= [['date', 'time']])

## 2.2 Checking the types of the data and count of observations

In [131]:
wind_50Hertz.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 315648 entries, 0 to 315647
Data columns (total 4 columns):
 #   Column     Non-Null Count   Dtype         
---  ------     --------------   -----         
 0   date_time  315648 non-null  datetime64[ns]
 1   timezone   315648 non-null  object        
 2   pred       315064 non-null  float64       
 3   act        315585 non-null  float64       
dtypes: datetime64[ns](1), float64(2), object(1)
memory usage: 9.6+ MB


## 2.3 Changing the Data Type of  Date time

In [132]:
wind_50Hertz['date_time'] = pd.to_datetime(wind_50Hertz['date_time'], format = '%Y-%m-%d %H:%M:%S')

In [None]:
wind_50Hertz['date_time'] = wind_50Hertz.set_index('date_time', inplace = True)

## Drop irrelevant columns

In [None]:
wind_50Hertz.info()

## 2.4 Remove duplicated rows regarding the time change

In [None]:
#wind_50Hertz.duplicated('date_time', keep = False)
duplicate_rows_wind_50Hertz = wind_50Hertz[wind_50Hertz.index.duplicated()]
duplicate_rows_wind_50Hertz

In [None]:
wind_50Hertz = wind_50Hertz[~wind_50Hertz['date_time'].duplicated(keep='first')]

## 2.5 Check and handle missing values

In [151]:
#Print the number of missing values per columns
print(wind_50Hertz.isnull().sum(),"\n")

date_time      0
timezone       0
pred         584
act           63
dtype: int64 



In [None]:
wind_50Hertz['pred'] = wind_50Hertz['pred'].replace(to_replace = np.nan, method = 'ffill')
wind_50Hertz['act'] = wind_50Hertz['act'].replace(to_replace = np.nan, method = 'ffill')

In [None]:
wind_50Hertz.info()

Transform the data set in hourly

In [None]:
wind_50Hertz = wind_50Hertz.resample('H').sum()

## Add features

### Control zone

In [None]:
wind_50Hertz['control_area'] = '50Hertz'
wind_50Hertz.info()

### Forecast error

In [None]:
wind_50Hertz['delta'] = wind_50Hertz['pred'] - wind_50Hertz['act']
wind_50Hertz.info()

In [None]:
wind_50Hertz2.info()

# 3. Amprion

In [148]:

wind_amprion = pd.read_excel("Data/wind_raw_data.xlsx", sheet_name = 'Amprion', header = 0 )


In [149]:
wind_amprion.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 315648 entries, 0 to 315647
Data columns (total 5 columns):
 #   Column     Non-Null Count   Dtype         
---  ------     --------------   -----         
 0   date       315648 non-null  datetime64[ns]
 1   time       315648 non-null  object        
 2   timezone   315648 non-null  object        
 3   pred       314864 non-null  float64       
 4   act        315278 non-null  float64       
dtypes: datetime64[ns](1), float64(2), object(2)
memory usage: 12.0+ MB


# 4. Control zone: Tennet

wind_tennet = pd.read_excel("Data/wind_raw_data.xlsx", sheet_name = 'Tennet', header = 0)


# 5. Control Zone: TransnetBW

In [None]:
wind_transnetbw =  pd.read_excel("Data/wind_raw_data.xlsx", sheet_name = 'TransnetBW', header = 0)