# Data Exploration

Never go and model something if you haven't had a look at the data. In the case of this dataset - a univariate time series - there isn't much too look at, but we can still get very helpful insights. We are going to use `Plotly.NET` to look at the data.

In [None]:
#r "nuget: Deedle, 2.3.0"
#r "nuget: Plotly.NET, 2.0.0-beta9"
#r "nuget: Plotly.NET.Interactive, 2.0.0-beta9"

#i "nuget:https://www.myget.org/F/gregs-experimental-packages/api/v3/index.json"
#r "nuget:Deedle.DotNet.Interactive.Extension, 0.1.0-alpha6"

For this and all further steps we'll always return to the same dataset. We can be pretty sure, that everything should be in order with it but it is always good to take a quick cautionary glance.

In [None]:
open Deedle
open Plotly.NET

let data =
    Frame.ReadCsv("../data/at_load_hourly_mw.csv", hasHeaders = true, culture = "en-US", inferTypes = true, inferRows = 5_000)
    |> Frame.indexRowsDate "TimeStamp"

data

With around 50k observations it isn't a bad idea to enable `WebGL` in our charts. As .NET Interactive runs in the electron based VSCode (or the browser) `WebGL` should usually be available for us.

In [None]:
data?Value
|> Series.observations
|> fun xy -> Chart.Line(xy, UseWebGL = true)

Another way to get rid of a lot of datapoints would be to resample the data at a lower granularity (days instead of hours). We can also look and see if we see any different patterns in the resampled data.

In [None]:
data?Value
|> Series.sampleTime (TimeSpan.FromDays(1.)) Direction.Forward
|> Series.mapValues Stats.median
|> Series.observations
|> fun xy -> Chart.Line(xy, UseWebGL = true)

Looking at the different years we see, that there is a clear yearly pattern (a so called seasonality). We can filter the data and take a closer look at the relatively "clean" year 2018.

In [None]:
let data_2018 =
    data
    |> Frame.filterRows (fun idx rs -> idx.Year > 2017 && idx.Year < 2019)

data_2018?Value
|> Series.observations
|> fun xy -> Chart.Line(xy, UseWebGL = true)

Looking at this filtered view I'd suspect (also guided by my knowledge of energy consumption) to see some patterns at the week and day level. Looking at a couple of consecutive weeks shows that my initial suspicion might be true.

In [None]:
let predicateWeekOfYear (weeks: int list) (dt: DateTime) =
    let cal = System.Globalization.CultureInfo.InvariantCulture.Calendar
    let weekOfYear = cal.GetWeekOfYear(dt, Globalization.CalendarWeekRule.FirstDay, DayOfWeek.Monday)
    List.contains weekOfYear weeks

data_2018?Value
|> Series.observations
|> Seq.filter (fst >> (predicateWeekOfYear [10; 11; 12; 13]))
|> fun xy -> Chart.Line(xy, UseWebGL = true)