### Loading libraries

In [2]:
import sys
import os
sys.path.insert(1, '..')
os.chdir('..')

from data_formatters.iglu import *
from dataset import TSDataset
from conf import Conf

ModuleNotFoundError: No module named 'data_formatters'

### Code walk-through

The major parts of the code that need to be defined for each data set are:
1. config file in `.yaml` format,
2. data formatter script.

For now, you can study the `electricity.yaml` example for a look of what a config file should feel like. You can skip the hyperparam defintions and the model parameters. The main focus would be on defining the dataset parameters. 

We do not intereact with `.yaml` in a direct way but instead though `Conf` class, which handles the following:
1. defines some defaults if not specified in `.yaml`,
2. sets save paths,
3. allows for nice colored printing.

Technically, we could doo all of this in the `.yaml` file directly. However, then every time we re-run the experiment, we would have to manually modify the `.yaml` file to reset save paths and redefine some variables, which would be inconvenient.  


In [None]:
# loading the config file, setting the experiment name, and the seed for random pre-processing parts (like splitting)
cnf = Conf(conf_file_path='./conf/iglu.yaml', seed=15, exp_name="test", log=False)

In [None]:
# lets print out the config file
print(f'\nDefault configuration parameters: \n{cnf}')


Default configuration parameters: 
[34mLR[0m[31m: [0m[35m0.001[0m
[34mEPOCHS[0m[31m: [0m[35m20[0m
[34mN_WORKERS[0m[31m: [0m[35m0[0m
[34mBATCH_SIZE[0m[31m: [0m[35m64[0m
[34mQUANTILES[0m[31m: [0m[35m[0.1, 0.5, 0.9][0m
[34mDS_NAME[0m[31m: [0m[33miglu_urjeet[0m
[34mALL_PARAMS[0m[31m: [0m[35m{'ds_name': 'iglu_urjeet', 'data_csv_path': './raw_data/iglu_example_data_5_subject.csv', 'index_col': -1, 'total_time_steps': 192, 'num_encoder_steps': 168, 'max_samples': 5000, 'batch_size': 64, 'device': 'cuda', 'lr': 0.001, 'num_epochs': 20, 'n_workers': 0, 'model': 'transformer', 'loader': 'base', 'quantiles': [0.1, 0.5, 0.9], 'batch_first': True, 'early_stopping_patience': 5, 'hidden_layer_size': 160, 'stack_size': 1, 'dropout_rate': 0.1, 'max_gradient_norm': 0.01, 'num_heads': 4, 'd_model': 64, 'q': 16, 'v': 16, 'h': 4, 'N': 2, 'attention_size': 0, 'dropout': 0.1, 'pe': 'original', 'chunk_mode': 'None', 'd_input': 5, 'd_output': 3}[0m
[34mEXP_LOG_PATH[

Now let's move on to the data formatter. This is the part that should handle:
1. loading the data and setting types,
2. splitting the data into train / val / test sets,
3. setting scalers and encoders for numerical / categorical variables resp.

We are going to leave parts 2-3 for the future exploration. Now, let's focus on loading and settting the types for the data. 

In [None]:
# call the data fromatter directly
data_formatter = IGLUFormatter()

Finally, let's work with the `TSDataset` class. This is the main part of the code as it aligns all of our previous steps. In the end, it is the `TSDataset` that is going to call the splitters, scalers, and encoders. **Importatnly** the model is only going to interact with the data through this class. 

In [None]:
# we are going to pass our data formatter and the config file to the TSDataset class
dataset = TSDataset(cnf, data_formatter)

Getting valid sampling locations.
# available segments=12911
Extracting 5000 samples out of 12911
1000 of 5000 samples done...
2000 of 5000 samples done...
3000 of 5000 samples done...
4000 of 5000 samples done...
5000 of 5000 samples done...


In [None]:
# now let's see how we can sample minibatches from our dataset that we can then pass to the model to train on
for i in range(10):
    # 192 x ['power_usage', 'hour', 'day_of_week', 'hours_from_start', 'categorical_id']
    x = dataset[i]['inputs']
    # 24 x ['power_usage']
    y = dataset[i]['outputs']
    print(f'Example #{i}: x.shape={x.shape}, y.shape={y.shape}')

Example #0: x.shape=(192, 1), y.shape=(192, 1)
Example #1: x.shape=(192, 1), y.shape=(192, 1)
Example #2: x.shape=(192, 1), y.shape=(192, 1)
Example #3: x.shape=(192, 1), y.shape=(192, 1)
Example #4: x.shape=(192, 1), y.shape=(192, 1)
Example #5: x.shape=(192, 1), y.shape=(192, 1)
Example #6: x.shape=(192, 1), y.shape=(192, 1)
Example #7: x.shape=(192, 1), y.shape=(192, 1)
Example #8: x.shape=(192, 1), y.shape=(192, 1)
Example #9: x.shape=(192, 1), y.shape=(192, 1)
