# Assignment 4: Prediction of renewable energy generation

## Context
A friend recently had to sign a new electricity supply contract. The high prices surprised him very much and he decided to look into the electricity market. Among a lot of other information, he learned that electricity is also traded on an exchange.
He found the so-called "day-ahead" market, where electricity is traded for the next day, the most interesting. He found out that the price is almost completely determined by the supply, because the demand hardly changes at such short notice (no private person turns on less light and no industrial company stops production at such short notice). He also found out that there are already very good forecasting models for this.
The electricity supply, on the other hand, has become much more exciting in recent years. Wind turbines and large photovoltaic plants have hardly any running costs and can therefore undercut any other power plant (in the short term). However, their production strongly depends on the current weather in the area where the respective plant is located. The "conventional" power plants, which then fill the gaps to demand, determine the electricity price based on their operating costs. If you know the current oil, coal and gas prices, this is also relatively easy to predict.

Meanwhile, he is sure that you could make good money if you had a good forecast of how much electricity wind power and PV will deliver. Since he has heard that you now have some experience with data analysis, he asks you to help him and to create a forecast model that predicts the amount of electricity produced (wind & photovoltaic) based on a weather report.

As he is aware that data is needed for this, he has already obtained data:
 - From "SMARTD" (part of the regulatory authority) the installed production capacity of the different types of power plants ("energy_installed_capacity.csv") and the amount of electricity produced in reality ("energy_produced.csv").
 - Daily records from the DWD (weather service) at many measuring stations. Two files, one with the measured values (weather.csv“), one with further information (e.g. location) of the measuring stations („weatherstations.csv“).
Both sources (all four data sets) cover the period from 2016 to 2021 inclusive.

He is also sure - if the model is good enough to be worthwhile - to be able to buy sufficiently good weather forecasts, no matter in which form they would be needed exactly. So there are no limititations how the data is groupped or preprocessed.

## Assignment

Develop a forecast model, evaluate it and answer the question whether it would be useful for the intended use!
Deliver a Jupyter notebook (able to run on the server) that includes the following parts:
- Data analysis and exploration, including preparation for the model. This includes (but not limited to):
  - understanding the data (continous/categorial, range of values ...)
  - unifying the time base
  - detecting and dealing missing data point
  - possibly necessary simplifications
- Develop and evaluate a model for the forecast.
- Conclude whether (at least on the basis of the data) a meaningfully usable forecasting model could be achieved. 

Also leave drafts steps in the Jupyter notebook so that we can understand your approach.
For each decision that is relevant to the result, give a brief justification, if not clear from the context. So after a parameter analysis comparable to Task 3 in Assignment 3, no justification would of course be needed for the choice of epochs and learning rate. No justification is necessary in an "exploratory phase" either, as these form the basis for later justifications.     

**Hints:**
 - Do not underestimate the importance of the data preprocessing.
 - Remember what we talked in the different lectures, where we have talked about different ways to solve different problems. For time reasons we often had just choosen one, but that one do not necesarily be the right one in this assignment.
 - The data is quite "raw", it has some faults and/or is not in the shape you may need it and can include unnecessary information.
 - Use your "common sense" especially during the preprocessing stage.
 - You can add an arbitrarily number of additional cells of course.
 - If you want to use additional python libraries, just ask, usually we will be able to provide them.
 - If you have more than one idea to solve a problem, allow yourself to experiment a bit! There is not only one solution but at the end make very clear what is your final result.
 - For EDA/Data Preparation and ANN there are total of 60 points. Roughly equal distributed, but depending where you make some decisions there can a bit movement.
 - For EDA/Data preparation you may consider, as an example, selecting the three measured parameters and compare the distribution of the values in terms of time, location, etc.

## Exploratory Data Analysis & Data preparation (~30 Points)


In [313]:
import pandas as pd 
import datetime

# importing neccessary data
weather_df = pd.read_csv('weather.csv')
weather_df_droplist= ['Minimum Temperature', 'Average Temperature', 'Maximum Temperature', 'relative humidity', 'average air preassure', 'Rain'] #drop the useless data without influnce on the energy_produced
weather_df.drop( weather_df_droplist, inplace= True, axis = 1)

ep_df = pd.read_csv('energy_produced.csv', sep= ';' )
#only need PV and Wind
ep_df_droplist= ['Water power[MW]','Biomass[MW]','Nuclear power[MW]','Brown coal[MW]','Coal[MW]','Natural gas[MW]','Pump storage[MW]','Other conventional[MW]', 'Other renewables[MW]'] 
ep_df.drop(ep_df_droplist, inplace= True, axis = 1)

eic_df= pd.read_csv('energy_installed_capacity.csv', sep = ';')
#only need PV and Wind 
eic_df_droplist = ['Biomass[MW]', 'Water power[MW]','Other renewables[MW]', 'Nuclear power[MW]', 'Brown coal[MW]', 'Coal[MW]','Natural gas[MW]','Pump storage[MW]','Other conventional[MW]'] 
eic_df.drop(eic_df_droplist, inplace = True, axis = 1)

ws_df = pd.read_csv('weatherstations.csv')
ws_df_droplist = ['Operator']
ws_df.drop(ws_df_droplist, inplace = True, axis= 1)

#converting date columnes from string to date type and setting date as index
#ep_df['Date'] = pd.to_datetime(ep_df['Date'], format= '%d.%m.%Y')
eic_df['Date'] = pd.to_datetime(eic_df['Date'], format= '%d.%m.%Y')
#weather_df['Date'] = pd.to_datetime(weather_df['Date'], format= '%Y-%m-%d')

#ep_df.set_index('Date', inplace = True)
eic_df.set_index('Date', inplace = True)
#weather_df.set_index('Date', inplace = True)

#calculating the utilization rate of the energy installed capacity over each day 

# calculate the mean of the energy produced for each day

start_date = datetime.date(2016, 1, 1)
end_date = datetime.date(2016, 2, 1)  #2021, 12, 31
delta = datetime.timedelta(days=1)
onshore_list = []
offshore_list = []
pv_list= []


while start_date <= end_date:
    filt_ep = (ep_df['Date'] == start_date.strftime('%d.%m.%Y'))
    ep_date_mean = ep_df[filt_ep].mean(numeric_only = True)
    offshore_list.append(ep_date_mean[0]/eic_df.loc[start_date.strftime('%Y'),'Wind Offshore[MW]'])
    onshore_list.append(ep_date_mean[1]/eic_df.loc[start_date.strftime('%Y'),'Wind Onshore[MW]'])
    pv_list.append(ep_date_mean[2]/eic_df.loc[start_date.strftime('%Y'),'Photovoltaics[MW]'])
    start_date += delta
    

#ws_df
#eic_df
#ep_df
#weather_df

In [322]:
print(pv_list[4])
#print(type(eic_df.loc[start_date.strftime('%Y'),'Photovoltaics[MW]']))
#print(pv_list)
#print(offshore_list)
#print(onshore_list)
#onshore_list

Date
2016-01-01    0.003381
Name: Photovoltaics[MW], dtype: float64


## Developing and evaluation of the ANN (~30 Points)


## Summary (10 Points)

Is this model usable for predicting the amount of generated renewable energy based on weather data?