# German Electricity Data :electric_plug: 


- Data Source: [SMARD Web Portal](https://www.smard.de/) 
- Task: forecasting (using the past to predict the future)
- Language: [julia](https://julialang.org)

## First steps:
- load the data (with appropriate column types)
- split the data into train/val/test sets
- plot the time series (at different temporal resolution)
- calculate the mean production per month
- is there a seasonal pattern (what kinds of seasonalities)
- make a prediction for the validation set
- evaluate the quality of the prediction

In [1]:
using Random

In [2]:
Random.seed!(42)
"Jörg Marko Sascha Chaitanya Sonia Robin Andi" |> split |> shuffle |> x -> join(x," -> ")

"Jörg -> Sascha -> Robin -> Andi -> Chaitanya -> Marko -> Sonia"

## 1. Data loading

- load the hourly data from the file `Realisierte_Erzeugung_201501010000_202305301300_Stunde.csv` as a table into julia
- make sure, that data is interpreted and typed correctly
- clean header names



In [54]:
using CSV, DataFrames, Dates

In [55]:
function string_parse(strg)
    if ismissing(strg) 
        return strg
    else
    string = replace(strg, "." => "")
    newstring = replace(string, "," => ".")
    newnewstring = replace(newstring, "-" => missing)
    return newnewstring
    end
end

string_parse (generic function with 1 method)

In [56]:
function str_vec_to_float(vec) 
    b = string_parse.(vec)
    c = Vector{Union{Missing, Float64}}(undef,0)
    for (i,x) in enumerate(b) 
        if !ismissing(x) 
            push!(c, parse(Float64, x))
        else
            push!(c,x)
        end
    end
    return c
end

str_vec_to_float (generic function with 1 method)

In [57]:
df = DataFrame(CSV.File("Realisierte_Erzeugung_201501010000_202305301300_Stunde.csv", delim = ";"; missingstring = ["-"]))
head = names(df)
df[!,1] = Date.(df[!,1], "dd.mm.yyyy")
for i in head[4:end]
    df[!,i] = str_vec_to_float(df[!,i])
end

In [58]:
df

Row,Datum,Anfang,Ende,Biomasse [MWh] Berechnete Auflösungen,Wasserkraft [MWh] Berechnete Auflösungen,Wind Offshore [MWh] Berechnete Auflösungen,Wind Onshore [MWh] Berechnete Auflösungen,Photovoltaik [MWh] Berechnete Auflösungen,Sonstige Erneuerbare [MWh] Berechnete Auflösungen,Kernenergie [MWh] Berechnete Auflösungen,Braunkohle [MWh] Berechnete Auflösungen,Steinkohle [MWh] Berechnete Auflösungen,Erdgas [MWh] Berechnete Auflösungen,Pumpspeicher [MWh] Berechnete Auflösungen,Sonstige Konventionelle [MWh] Berechnete Auflösungen
Unnamed: 0_level_1,Date,Time,Time,Float64?,Float64?,Float64?,Float64?,Float64?,Float64?,Float64?,Float64?,Float64?,Float64?,Float64?,Float64?
1,2015-01-01,00:00:00,01:00:00,4024.25,1158.25,516.5,8128.0,0.0,133.0,10710.5,15687.2,3219.75,1226.25,1525.75,4909.25
2,2015-01-01,01:00:00,02:00:00,3982.75,1188.0,516.25,8297.5,0.0,122.5,11086.2,15321.8,2351.25,870.75,1079.25,4932.75
3,2015-01-01,02:00:00,03:00:00,4019.5,1139.25,514.0,8540.0,0.0,93.0,11026.2,14817.5,2227.0,809.5,787.0,5041.75
4,2015-01-01,03:00:00,04:00:00,4040.75,1122.5,517.75,8552.0,0.0,86.5,11027.8,14075.0,2339.75,821.0,287.75,5084.0
5,2015-01-01,04:00:00,05:00:00,4037.75,1112.0,519.75,8643.5,0.0,86.5,10962.2,14115.0,2461.5,831.25,346.75,5070.75
6,2015-01-01,05:00:00,06:00:00,4028.25,1107.75,520.0,8711.75,0.0,86.75,10696.0,13474.2,2217.75,851.0,765.5,5096.75
7,2015-01-01,06:00:00,07:00:00,4013.25,1111.75,521.5,9167.25,0.0,87.0,10299.5,12403.8,2373.25,868.25,414.5,5153.0
8,2015-01-01,07:00:00,08:00:00,4012.75,1113.75,520.25,9811.0,0.0,87.0,10035.2,12062.5,2491.0,876.0,582.5,5161.0
9,2015-01-01,08:00:00,09:00:00,3999.75,1107.5,525.25,9683.0,53.0,87.0,10245.8,12405.0,2530.25,888.25,750.5,5393.5
10,2015-01-01,09:00:00,10:00:00,4016.25,1121.0,527.0,9501.75,773.25,85.75,10060.2,12798.8,2386.25,891.5,387.0,5884.0
