# German Electricity Data :electric_plug: 


- Data Source: [SMARD Web Portal](https://www.smard.de/) 
- Task: forecasting (using the past to predict the future)
- Language: [julia](https://julialang.org)

## First steps:
- load the data (with appropriate column types)
- split the data into train/val/test sets
- plot the time series (at different temporal resolution)
- calculate the mean production per month
- is there a seasonal pattern (what kinds of seasonalities)
- make a prediction for the validation set
- evaluate the quality of the prediction

In [1]:
using Random

In [2]:
Random.seed!(530)
"Max Joel Felix Caro Kerstin Mike Markus Chris" |> split |> shuffle |> x -> join(x," -> ")

"Max -> Joel -> Markus -> Chris -> Kerstin -> Felix -> Caro -> Mike"

## 1. Data loading

- load the hourly data from the file `Realisierte_Erzeugung_201501010000_202305301300_Stunde.csv` as a table into julia
- make sure, that data is interpreted and typed correctly
- clean header names



In [3]:
using DataFrames
using CSV

In [73]:
data = DataFrame(CSV.File("Realisierte_Erzeugung_201501010000_202305301300_Stunde.csv", delim=";", dateformat="dd.mm.yyyy"));


In [74]:
head(data)

Unnamed: 0_level_0,Datum,Anfang,Ende,Biomasse [MWh] Berechnete Auflösungen,Wasserkraft [MWh] Berechnete Auflösungen
Unnamed: 0_level_1,Date…,String,String,String,String
1,2015-01-01,00:00,01:00,"4.024,25","1.158,25"
2,2015-01-01,01:00,02:00,"3.982,75",1.188
3,2015-01-01,02:00,03:00,"4.019,5","1.139,25"
4,2015-01-01,03:00,04:00,"4.040,75","1.122,5"
5,2015-01-01,04:00,05:00,"4.037,75",1.112
6,2015-01-01,05:00,06:00,"4.028,25","1.107,75"


In [11]:
data["Anfang"]

73716-element Vector{Dates.Time}:
 00:00:00
 01:00:00
 02:00:00
 03:00:00
 04:00:00
 05:00:00
 06:00:00
 07:00:00
 08:00:00
 09:00:00
 10:00:00
 11:00:00
 12:00:00
 ⋮
 01:00:00
 02:00:00
 03:00:00
 04:00:00
 05:00:00
 06:00:00
 07:00:00
 08:00:00
 09:00:00
 10:00:00
 11:00:00
 12:00:00

In [13]:
data["Biomasse [MWh] Berechnete Auflösungen"]

73716-element Vector{String}:
 "4.024,25"
 "3.982,75"
 "4.019,5"
 "4.040,75"
 "4.037,75"
 "4.028,25"
 "4.013,25"
 "4.012,75"
 "3.999,75"
 "4.016,25"
 "4.007,75"
 "4.011,75"
 "4.014"
 ⋮
 "4.581,5"
 "4.551"
 "4.503,25"
 "4.454,25"
 "4.406,75"
 "4.413,25"
 "4.468"
 "4.503,5"
 "4.544,5"
 "4.568,25"
 "4.618,75"
 "4.612,25"

In [16]:
parse.(Float64, data["Biomasse [MWh] Berechnete Auflösungen"])

LoadError: ArgumentError: cannot parse "4.024,25" as Float64

In [23]:
# replace . by "" and , by "."
parse.(Float64, replace.(replace.(data["Biomasse [MWh] Berechnete Auflösungen"], Pair(".","")), Pair(",",".")))

LoadError: ArgumentError: cannot parse "-" as Float64

In [30]:
#parse.(Float64, data["Biomasse [MWh] Berechnete Auflösungen"], Pair("-",) )
#data["Biomasse [MWh] Berechnete Auflösungen"]
#data["Biomasse [MWh] Berechnete Auflösungen"][occursin.("-", data["Biomasse [MWh] Berechnete Auflösungen"])] = None

replace.(data["Biomasse [MWh] Berechnete Auflösungen"] , r"^-$" => s"")

73716-element Vector{String}:
 "4.024,25"
 "3.982,75"
 "4.019,5"
 "4.040,75"
 "4.037,75"
 "4.028,25"
 "4.013,25"
 "4.012,75"
 "3.999,75"
 "4.016,25"
 "4.007,75"
 "4.011,75"
 "4.014"
 ⋮
 "4.581,5"
 "4.551"
 "4.503,25"
 "4.454,25"
 "4.406,75"
 "4.413,25"
 "4.468"
 "4.503,5"
 "4.544,5"
 "4.568,25"
 "4.618,75"
 "4.612,25"

In [31]:
names(data)

15-element Vector{String}:
 "Datum"
 "Anfang"
 "Ende"
 "Biomasse [MWh] Berechnete Auflösungen"
 "Wasserkraft [MWh] Berechnete Auflösungen"
 "Wind Offshore [MWh] Berechnete Auflösungen"
 "Wind Onshore [MWh] Berechnete Auflösungen"
 "Photovoltaik [MWh] Berechnete Auflösungen"
 "Sonstige Erneuerbare [MWh] Berechnete Auflösungen"
 "Kernenergie [MWh] Berechnete Auflösungen"
 "Braunkohle [MWh] Berechnete Auflösungen"
 "Steinkohle [MWh] Berechnete Auflösungen"
 "Erdgas [MWh] Berechnete Auflösungen"
 "Pumpspeicher [MWh] Berechnete Auflösungen"
 "Sonstige Konventionelle [MWh] Berechnete Auflösungen"

In [50]:
for column in names(data)[4:end]
    data[column] = replace.(data[column] , r"^-$" => -999)
    data[column] = parse.(Float64, replace.(replace.(data[column], Pair(".","")), Pair(",",".")))
    data[column] = replace.(data[column] , -999 => NaN)
end

LoadError: MethodError: no method matching similar(::Float64, ::Type{Float64})
[0mClosest candidates are:
[0m  similar([91m::Union{LinearAlgebra.Adjoint{T, var"#s886"}, LinearAlgebra.Transpose{T, var"#s886"}} where {T, var"#s886"<:(AbstractVector)}[39m, ::Type{T}) where T at /ext/julia/julia-1.8.4/share/julia/stdlib/v1.8/LinearAlgebra/src/adjtrans.jl:207
[0m  similar([91m::Union{LinearAlgebra.Adjoint{T, S}, LinearAlgebra.Transpose{T, S}} where {T, S}[39m, ::Type{T}) where T at /ext/julia/julia-1.8.4/share/julia/stdlib/v1.8/LinearAlgebra/src/adjtrans.jl:211
[0m  similar([91m::Union{LinearAlgebra.Adjoint{T, S}, LinearAlgebra.Transpose{T, S}} where {T, S}[39m, ::Type{T}, [91m::Tuple{Vararg{Int64, N}}[39m) where {T, N} at /ext/julia/julia-1.8.4/share/julia/stdlib/v1.8/LinearAlgebra/src/adjtrans.jl:212
[0m  ...

In [55]:
for print(data["Biomasse [MWh] Berechnete Auflösungen"])


LoadError: MethodError: no method matching occursin(::Float64, ::Float64)
[0mClosest candidates are:
[0m  occursin(::Any) at strings/search.jl:642

In [64]:
for i in data["Biomasse [MWh] Berechnete Auflösungen"]
    if isnan(i)
        println(i)
    end
end
    

NaN
NaN
NaN
NaN


In [65]:
NaN < missing

missing

In [66]:
NaN < NaN

false

In [69]:
NaN === NaN

true

In [72]:
0 == 0.0

true

In [67]:
missing < missing

missing