Execute the following cells only the first time running this notebook:

In [1]:
import os
os.chdir("..")

import pandas as pd
pd.set_option('display.precision', 2)

# Complete example

Load a JSON settings file which contains all the data to properly load a source DataFrame (see [README](../README.md#input-parameters-file-explanation)).

In [2]:
from datetime import datetime
import synthDataGen.controller as controller

loader = controller.ESIOSLoader("./synthDataGen/settings/inputParams.json")
# loader = controller.LocalDFLoader("./synthDataGen/settings/inputParams.json")

df = loader.getDataFromSource(initialYear=2018, initDatetime=datetime(2023, 2, 28, 22, 0), hoursAhead=6, include29February=False)
df

Unnamed: 0_level_0,2018,2019,2020,2021,2022
datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2023-02-28 22:00:00,56.97,59.37,31.77,31.26,283.32
2023-02-28 23:00:00,55.14,58.03,23.4,39.14,272.98
2023-03-01 00:00:00,50.8,56.47,17.97,36.96,264.05
2023-03-01 01:00:00,48.93,54.88,17.13,31.92,246.79
2023-03-01 02:00:00,49.02,53.72,11.28,30.92,230.26
2023-03-01 03:00:00,48.39,53.85,9.84,32.78,227.92
2023-03-01 04:00:00,49.71,56.49,10.38,32.83,236.13


Adjust the source `DataFrame` with the anual adjustments dictionary and resample it to the desired resolution:

In [3]:
from synthDataGen.adjustments import FactorByYear

df = FactorByYear.run(df, adjustmentsDict={2018: 1.2, 2019: 2.3, 2020: 1.45, 2021: 3, 2022: 8})
df

Adjusting years: 2020,2021,2022


Unnamed: 0_level_0,2020,2021,2022
datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2023-02-28 22:00:00,32.23,32.2,305.99
2023-02-28 23:00:00,23.74,40.31,294.82
2023-03-01 00:00:00,18.23,38.07,285.17
2023-03-01 01:00:00,17.38,32.88,266.53
2023-03-01 02:00:00,11.44,31.85,248.68
2023-03-01 03:00:00,9.98,33.76,246.15
2023-03-01 04:00:00,10.53,33.81,255.02


In [4]:
from synthDataGen.adjustments import ChangeResolution

df = ChangeResolution.upsample(df, frequency="20T", method="spline", order=3)
df

Unnamed: 0,2020,2021,2022
2023-02-28 22:00:00,32.23,32.2,305.99
2023-02-28 22:20:00,29.07,35.99,302.82
2023-02-28 22:40:00,26.57,38.02,299.65
2023-02-28 23:00:00,23.74,40.31,294.82
2023-02-28 23:20:00,22.37,39.27,292.26
2023-02-28 23:40:00,20.61,38.84,288.02
2023-03-01 00:00:00,18.23,38.07,285.17
2023-03-01 00:20:00,17.64,36.79,278.35
2023-03-01 00:40:00,16.38,35.51,272.91
2023-03-01 01:00:00,17.38,32.88,266.53


In [5]:
df = ChangeResolution.downsample(df, frequency="22.73T", aggregationFunc="mean")
df

Unnamed: 0,2020,2021,2022
2023-02-28 21:58:20.400,30.65,34.09,304.4
2023-02-28 22:21:04.200,26.57,38.02,299.65
2023-02-28 22:43:48.000,23.74,40.31,294.82
2023-02-28 23:06:31.800,22.37,39.27,292.26
2023-02-28 23:29:15.600,20.61,38.84,288.02
2023-02-28 23:51:59.400,18.23,38.07,285.17
2023-03-01 00:14:43.200,17.64,36.79,278.35
2023-03-01 00:37:27.000,16.88,34.19,269.72
2023-03-01 01:00:10.800,14.16,33.33,260.83
2023-03-01 01:22:54.600,13.19,32.6,254.82


Get new samples from the current `DataFrame`:

In [6]:
from synthDataGen.utils import Sampling

df = Sampling.getSamples(df, 1000, "truncnorm")
df.head()

Unnamed: 0,2023-02-28 21:58:20.400,2023-02-28 22:21:04.200,2023-02-28 22:43:48.000,2023-02-28 23:06:31.800,2023-02-28 23:29:15.600,2023-02-28 23:51:59.400,2023-03-01 00:14:43.200,2023-03-01 00:37:27.000,2023-03-01 01:00:10.800,2023-03-01 01:22:54.600,2023-03-01 01:45:38.400,2023-03-01 02:08:22.200,2023-03-01 02:31:06.000,2023-03-01 02:53:49.800,2023-03-01 03:16:33.600,2023-03-01 03:39:17.400
0,47.01,159.51,189.41,109.77,58.53,194.85,236.27,78.01,24.71,61.14,151.54,185.12,74.67,100.6,134.47,212.46
1,41.07,68.0,124.82,128.8,45.07,42.31,118.73,180.2,106.73,163.99,168.49,45.49,28.16,23.45,59.75,40.3
2,157.73,5.47,102.95,121.83,121.59,93.85,135.87,106.82,100.8,126.67,261.32,287.97,196.48,264.81,45.52,138.22
3,202.03,138.87,6.83,267.97,113.79,9.08,301.45,32.78,181.68,69.94,167.21,177.02,101.86,12.99,25.89,122.9
4,32.25,201.86,83.3,49.21,152.7,174.4,78.97,183.64,137.6,245.43,246.4,148.8,89.3,187.72,227.33,75.8
