# Tutorial 2

In this tutorial, we will see how to use the built-in function of Pyraug to set upd our own configuration for the trainer, models and samplers. This follows the section ``Setting up your own configuations`` of the documentation

## Link between `.json` and `dataclasses`

In pyraug, the configurations of the models, trainers and samplers are stored and used as dataclasses.dataclass and all inherit from the BaseConfig. Hence, any configuration class has a classmethod from_json_file coming from BaseConfig allowing to directly load config from `.json` files into dataclasses or save dataclasses into a ``.json`` file.

### Loading a configuration from a `.json`

Since all `ModelConfig` inherit from `BaseModelConfig` data class, any pyraug's model configuration can be loaded from a `.json` file with the `from_json_file` classmethod. Defining your own `model_config.json` may be useful when you decide to use the Pyraug's scripts which take as arguments paths to json files.

**note:** Make sure the keys and types match the one expected in the `dataclass` or errors will be raised. Check documentation to find the expected types and keys 

In [46]:
from pyraug.models.base.base_config import BaseModelConfig
config = BaseModelConfig.from_json_file('_demo_data/configs/model_config.json')
print(config)

BaseModelConfig(input_dim=784, latent_dim=10, uses_default_encoder=True, uses_default_decoder=True)


Let's try with a `RHVAE` model

In [47]:
from pyraug.models.rhvae import RHVAEConfig
config = RHVAEConfig.from_json_file('_demo_data/configs/rhvae_config.json')
print(config)

RHVAEConfig(input_dim=784, latent_dim=10, uses_default_encoder=True, uses_default_decoder=True, n_lf=3, eps_lf=0.0001, beta_zero=0.3, temperature=1.5, regularization=0.01, uses_default_metric=True)


### Saving a configuration to a `.json`

Conversely, you can save a `dataclass` quite easily using the `save_json` method coming form `BaseModelConfig`

In [48]:
from pyraug.models.base.base_config import BaseModelConfig

my_model_config = BaseModelConfig(latent_dim=11)
print(my_model_config)

BaseModelConfig(input_dim=None, latent_dim=11, uses_default_encoder=True, uses_default_decoder=True)


Save the `.json` file ...

In [49]:
my_model_config.save_json(dir_path='_demo_data/configs', filename='my_model_config')

... and reload it 

In [50]:
BaseModelConfig.from_json_file('_demo_data/configs/my_model_config.json')

BaseModelConfig(input_dim=None, latent_dim=11, uses_default_encoder=True, uses_default_decoder=True)

The same can be done with a `TrainingConfig` or `SamplerConfig`

In [51]:
from pyraug.trainers.training_config import TrainingConfig
my_training_config = TrainingConfig(max_epochs=10, learning_rate=0.1)
print(my_training_config)
my_training_config.save_json(dir_path='_demo_data/configs', filename='my_training_config')
TrainingConfig.from_json_file('_demo_data/configs/my_training_config.json')

TrainingConfig(output_dir=None, batch_size=50, max_epochs=10, learning_rate=0.1, train_early_stopping=50, eval_early_stopping=None, steps_saving=None, seed=8, no_cuda=False, verbose=True)


TrainingConfig(output_dir=None, batch_size=50, max_epochs=10, learning_rate=0.1, train_early_stopping=50, eval_early_stopping=None, steps_saving=None, seed=8, no_cuda=False, verbose=True)

In [52]:
from pyraug.models.base.base_config import BaseSamplerConfig
my_sampler_config = BaseSamplerConfig(batch_size=10, samples_per_save=100)
print(my_sampler_config)
my_sampler_config.save_json(dir_path='_demo_data/configs', filename='my_sampler_config')
BaseSamplerConfig.from_json_file('_demo_data/configs/my_sampler_config.json')

BaseSamplerConfig(output_dir=None, batch_size=10, samples_per_save=100, no_cuda=False)


BaseSamplerConfig(output_dir=None, batch_size=10, samples_per_save=100, no_cuda=False)

## Setting up configs in `Pipelines`

Let's consider the example of Tutorial 1

In [53]:
import torch
import torchvision.datasets as datasets
import matplotlib.pyplot as plt
import numpy as np

In [54]:
mnist_trainset = datasets.MNIST(root='../data', train=True, download=True, transform=None)
n_samples = 200
dataset_to_augment = mnist_trainset.data[:n_samples] 
dataset_to_augment.shape

torch.Size([200, 28, 28])

### Amending the model parameters

Conversely to tutorial 1, we here first instantiate a model we want to train to avoid using the default on. Ths `Model` instance will then be passed to the `TrainingPipeline` for training. 

Let's set up a custom model config and build the model

In [55]:
from pyraug.models.rhvae import RHVAEConfig

model_config = RHVAEConfig(
    input_dim=28*28, # This is needed since we do not provide any encoder, decoder and metric architecture
    latent_dim=9,
    eps_lf=0.0001,
    temperature=0.9
    )

In [56]:
from pyraug.models import RHVAE

model = RHVAE(
    model_config=model_config
)
model.latent_dim, model.eps_lf, model.temperature

(9,
 0.0001,
 Parameter containing:
 tensor([0.9000]))

### Amending training parameters

In the meantime we can also amend the training parameter through the `TrainingConfig` instance

In [57]:
from pyraug.trainers.training_config import TrainingConfig

training_config = TrainingConfig(
    output_dir='my_model_with_custom_parameters',
    no_cuda=False,
    learning_rate=1e-3,
    batch_size=200,
    train_early_stopping=100,
    steps_saving=None,
    max_epochs=5)
training_config

TrainingConfig(output_dir='my_model_with_custom_parameters', batch_size=200, max_epochs=20, learning_rate=0.001, train_early_stopping=100, eval_early_stopping=None, steps_saving=None, seed=8, no_cuda=False, verbose=True)

Now we only have to pass the model and the training config to the TrainingPipeline to perform training !

In [58]:
from pyraug.pipelines import TrainingPipeline

torch.manual_seed(8)
pipeline = TrainingPipeline(
    data_loader=None,
    data_processor=None,
    model=model,
    optimizer=None,
    training_config=training_config)


In [59]:
pipeline(
    train_data=dataset_to_augment,
    log_output_dir='output_logs'
)

Data normalized using individual_min_max_scaling.
 -> If this is not the desired behavior pass an instance of DataProcess with 'data_normalization_type' attribute set to desired normalization or None
Model passed sanity check !

Created my_model_with_custom_parameters/training_2021-06-29_20-22-52. 
Training config, checkpoints and final model will be saved here.

Successfully launched training !
----------------------------------
Training ended!
Saved final model in my_model_with_custom_parameters/training_2021-06-29_20-22-52/final_model


Now, the model and training parameters are saved in `json` files in `my_model_with_custom_parameters/training_YYYY-MM-DD_hh-mm-ss/final_model` and we can reload any of them.

In [60]:
last_training = sorted(os.listdir('my_model_with_custom_parameters'))[-1]

Let's get the saved `Trainingconfig` ...

In [61]:
TrainingConfig.from_json_file(os.path.join('my_model_with_custom_parameters', last_training, 'final_model/training_config.json'))

TrainingConfig(output_dir='my_model_with_custom_parameters', batch_size=200, max_epochs=20, learning_rate=0.001, train_early_stopping=100, eval_early_stopping=None, steps_saving=None, seed=8, no_cuda=False, verbose=True)

... and rebuild the model

In [62]:
model_rec = RHVAE.load_from_folder(os.path.join('my_model_with_custom_parameters', last_training, 'final_model'))

In [63]:
model_rec.latent_dim, model_rec.eps_lf, model_rec.temperature

(9,
 0.0001,
 Parameter containing:
 tensor([0.9000]))

### Amending the Sampler parameters

Of course, we can also amend the sampler parameters that is used within the `GenerationPipeline` as well. Again, simpy, build a `ModelSampler` instance and pass it to the `GenerationPipeline`

In [64]:
from pyraug.models.rhvae import RHVAESamplerConfig

sampler_config = RHVAESamplerConfig(
        output_dir='my_generated_data_with_custom_parameters',
        mcmc_steps_nbr=100,
        batch_size=100,
        n_lf=5,
        eps_lf=0.01
        )

Build the sampler

In [65]:
from pyraug.models.rhvae.rhvae_sampler import RHVAESampler

sampler = RHVAESampler(model=model_rec, sampler_config=sampler_config)

At initialization, the sampler creates the folder where the generated data should be saved in case it does not exist.

Now we only have to pass the model and the sampler to the GenerationPipeline to perform generation !

In [66]:
from pyraug.pipelines import GenerationPipeline

generation_pipe = GenerationPipeline(
    model=model_rec,
    sampler=sampler
)

In [67]:
generation_pipe(5)

Created my_generated_data_with_custom_parameters/generation_2021-06-29_20-23-04.Generated data and sampler config will be saved here.

Generation successfully launched !



Now, the sampler parameters are saved in a `json` file in `my_generated_data_with_custom_parameters/training_YYYY-MM-DD_hh-mm-ss/final_model` and we can reload any it to check everything is ok .

In [68]:
last_generation = sorted(os.listdir('my_generated_data_with_custom_parameters'))[-1]

In [69]:
RHVAESamplerConfig.from_json_file(os.path.join('my_generated_data_with_custom_parameters', last_generation, 'sampler_config.json' ))

RHVAESamplerConfig(output_dir='my_generated_data_with_custom_parameters', batch_size=100, samples_per_save=500, no_cuda=False, mcmc_steps_nbr=100, n_lf=5, eps_lf=0.01, beta_zero=1.0)