# Ray.io optimization framework

Ray is a framework to orchestrate HPO search

Ray\[Tune\] is a mature optimization framework compatible with both PyTorch and TensorFlow, with specific integration with Lightning.

Building a deep learning estimator requires to gradually converge to a

Achieving HPO is a 4-step process:
* Making a selection of the HParams you wish to optimize for, and setting the search space (and choosing for each parameter a sampling method.)
* A callback to monitor and automatically report metrics progress during training
* A trials scheduler to kill unpromising HP sets
* A search algorithm used to explore the HP space
* A logger to push values to a possibly remote monitor solution
* A runner to sequentially execute experiments with the set of HP

HPO allows to 

Let's first load all the necessary params

We will implement all of these components 

## Search Algorithm

The simplest form of search algorithms are the **GridSearch** and **RandomSearch**. More recent research in this direction have lead to the discovery of more sophisticated algos, including **BayesOptSearch**, **OptunaSearch**, and **HEBOSearch**.

They come as external packages of Ray\[Tune\], directly integrated into

In [7]:
from kosmoss import CONFIG, LOGS_PATH, METADATA
from kosmoss.parallel.data import FlattenedDataModule
from kosmoss.parallel.models import LitMLP

In [8]:
import numpy as np
import os
from pytorch_lightning import Trainer
from pytorch_lightning.loggers.tensorboard import TensorBoardLogger

# Ensures this Notebook's reproducibility
pl.seed_everything(42, workers=True)

step = CONFIG['timestep']
params = METADATA[str(step)]['flattened']

Global seed set to 42


## Model and training logic

In [9]:
!cat models.py

# MIT License
# 
# Copyright (c) 2022 alxyok
# 
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
# in the Software without restriction, including without limitation the rights
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:
# 
# The above copyright notice and this permission notice shall be included in all
# copies or substantial portions of the Software.
# 
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, A

In [10]:
x_feats = params['x_shape'][-1]
y_feats = params['y_shape'][-1]

In [11]:
print(f'x number of features: {x_feats}')
print(f'y number of features: {y_feats}')

x number of features: 4128
y number of features: 552


In [12]:
mlp = LitMLP(
    in_channels=x_feats,
    hidden_channels=100,
    out_channels=y_feats
)
mlp

LitMLP(
  (normalization_layer): Normalize()
  (net): Sequential(
    (0): Normalize()
    (1): Linear(in_features=4128, out_features=100, bias=True)
    (2): SiLU()
    (3): Linear(in_features=100, out_features=100, bias=True)
    (4): SiLU()
    (5): Linear(in_features=100, out_features=100, bias=True)
    (6): SiLU()
    (7): Linear(in_features=100, out_features=552, bias=True)
  )
)

## Dataset creation and data loading mechanics

In [13]:
!cat data.py

# MIT License
# 
# Copyright (c) 2022 alxyok
# 
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
# in the Software without restriction, including without limitation the rights
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:
# 
# The above copyright notice and this permission notice shall be included in all
# copies or substantial portions of the Software.
# 
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, A

* `batch_size` sets the number of element in a batch of data.
* `num_workers` sets the number of workers the DataLoader can spawn to handle data loading and Dataset batching.

In [14]:
import psutil
cores = psutil.cpu_count(logical=False)

In [15]:
datamodule = FlattenedDataModule(
    batch_size=1024,
    
    # In CPU-only setup, make sure you still have enough cores to handle the training, 
    # Not just data loading, otherwise, it will bottleneck
    num_workers=cores
)

## Orchestrating the training

In [16]:
logger = TensorBoardLogger(
    save_dir=LOGS_PATH,
    name='flattened_mlp_logs',
    log_graph=True
)

All the training instrumentation is done by an object call the Trainer. You can fix parameters such as:
* `max_epochs` unless an early stopping happens
* `accelerator` type and `device` logical number

Notably interesting: 
* `callbacks` to handle in-betweens
* `gradient_clip_val` and `gradient_clip_algorithm` to setup the gradient clipping
* `logger` to interface with loss and metrics logging
* `resume_from_checkpoint` helps resuming a previously initiated training
* `amp_backend` to switch to Nvidia Apex framework for Automatic Mixed Precision support

In [17]:
cpu_trainer = Trainer(
    max_epochs=1,
    logger=logger,
    deterministic=True,
)

GPU available: True, used: False
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs


  "GPU available but not used. Set the gpus flag in your trainer `Trainer(gpus=1)` or script `--gpus=1`."


Training CPU is a one-line

In [18]:
cpu_trainer.fit(model=mlp, datamodule=datamodule)

Missing logger folder: /home/jupyter/.kosmoss/logs/flattened_mlp_logs

  | Name                | Type       | Params
---------------------------------------------------
0 | normalization_layer | Normalize  | 0     
1 | net                 | Sequential | 488 K 
---------------------------------------------------
488 K     Trainable params
0         Non-trainable params
488 K     Total params
1.955     Total estimated model params size (MB)




Global seed set to 42                                                 
Epoch 0:  89%|████████▉ | 848/954 [01:13<00:09, 11.52it/s, loss=0.592, v_num=0, train_loss=0.567]
Validating: 0it [00:00, ?it/s][A
Validating:   0%|          | 0/106 [00:00<?, ?it/s][A
Epoch 0:  89%|████████▉ | 852/954 [01:15<00:08, 11.35it/s, loss=0.592, v_num=0, train_loss=0.567]
Epoch 0:  90%|████████▉ | 857/954 [01:15<00:08, 11.39it/s, loss=0.592, v_num=0, train_loss=0.567]
Epoch 0:  91%|█████████ | 865/954 [01:16<00:07, 11.37it/s, loss=0.592, v_num=0, train_loss=0.567]
Validating:  18%|█▊        | 19/106 [00:02<00:09,  9.46it/s][A
Epoch 0:  92%|█████████▏| 873/954 [01:16<00:07, 11.44it/s, loss=0.592, v_num=0, train_loss=0.567]
Epoch 0:  92%|█████████▏| 881/954 [01:17<00:06, 11.42it/s, loss=0.592, v_num=0, train_loss=0.567]
Validating:  31%|███       | 33/106 [00:03<00:06, 10.57it/s][A
Epoch 0:  93%|█████████▎| 889/954 [01:17<00:05, 11.48it/s, loss=0.592, v_num=0, train_loss=0.567]
Epoch 0:  94%|█████████▍| 

Never forget to test. The handy thing with the `Trainer` is, if a `.test()` is called somewhere at runtime, once a `SIGTERM` is thrown by the runtime such as a `KeyboardInterruptError`, it gets caught by Lightning, which tries to gracefully release resources, terminate training, and run the test anyway.