# Latent Space Generator - Train a model

In order to generate a latent space for a given dataset, we need to first train a model capable of learn how to represent data.

In that notebook you'll be guided into a step by step pipeline that will generate a model able to produce embeddings on a specific dataset, MNIST.

We will use tensorflow as ML framework and an autoencoder as our model.

<img src="autoencoder_schema.png" /> 

In [3]:
import tensorflow as tf

print('TensorFlow version: ', tf.__version__)
print('Devices', tf.config.list_physical_devices())

TensorFlow version:  2.6.0
Devices [PhysicalDevice(name='/physical_device:CPU:0', device_type='CPU')]


Here there are some definitions of folder paths that will be usefull later

In [4]:
import os

DATA_DIR = os.path.join(os.getcwd(), 'data')
DATA_OUTPUT_DIR = os.path.join(DATA_DIR, 'output')
MODELS_DIR = os.path.join(os.getcwd(), 'models')
LOGS_DIR = os.path.join(os.getcwd(), 'logs')
INFERENCE_DIR = os.path.join(os.getcwd(), 'inference')

### Preparation
Here follows preparation of the dataset, consisting of convert data into npy format.

In CHANNELS_MAP you could define which channels to use in case of multi-band images, more detailed example will be provide with a notebook on the eurosat dataset.

In [5]:
from utils.preparation import preparation

IMAGE_FORMAT = "image"
CHANNELS_MAP = { "r": 0, "g": 0, "b": 0}
preparation(DATA_DIR, IMAGE_FORMAT, CHANNELS_MAP)


ModuleNotFoundError: No module named 'astropy'

### Preprocessing
The next step is the data preprocessing where resize and normalization are computed. 

NORMALIZATION_TYPE define how data are normalized, "default" value scale it between 0 and 1 by divide each pixel by 255. (it's the case of a general image). In the utils/preprocessing.py file you could define your own normalization function.

IMAGE_DIM is the size of the image that will be used for training.

The computation of that operation is postponed to the start of the training process.

In [6]:
from utils.preprocessing import tf_numpy_load, tf_preprocessing

# Dataset
pattern = os.path.join(DATA_OUTPUT_DIR, '*.npy')
dataset = tf.data.Dataset.list_files(pattern)

dataset = dataset.map(
    lambda file: tf_numpy_load(file),
    num_parallel_calls=tf.data.AUTOTUNE
)

NORMALIZATION_TYPE = "default"
IMAGE_DIM = 28

# Preprocessing
dataset = dataset.map(
    lambda image: tf_preprocessing(
        image,
        tf.constant(IMAGE_DIM, tf.uint16),
        tf.constant(NORMALIZATION_TYPE, tf.string)
    ),
    num_parallel_calls=tf.data.AUTOTUNE
)

length = tf.data.experimental.cardinality(dataset).numpy()
print(f'Dataset cardinality: {length}')

Dataset cardinality: 10000


### Split

A subset of the training set is used to check if the model generalize or overfit. Here you could change the parameter SPLIT_THRESHOLD according to the portion of the dataset to use as training set.

Training and test are further splitted into batches in order to fit the available memory and optimize the training. The parameter to adjust here is BATCH_SIZE

In [7]:
SPLIT_THRESHOLD = 0.8
index = round(length * SPLIT_THRESHOLD)
train_set = dataset.take(index)
test_set = dataset.skip(index + 1)

print('Training set: {}'.format(
    tf.data.experimental.cardinality(train_set).numpy()))
print('Test set: {}'.format(
    tf.data.experimental.cardinality(test_set).numpy()))

train_set = train_set.cache()
test_set = test_set.cache()

BATCH_SIZE = 256

train_set = train_set.shuffle(
    len(train_set)).batch(BATCH_SIZE)
test_set = test_set.batch(BATCH_SIZE)

Training set: 8000
Test set: 1999


### Augmentation
The augmentation step is useful to transform the data, in a realistic way, to increase the variability of the dataset. The operation is random and in order to let the model analyze the original data, AUGMENTATION_THRESHOLD parameter activate/deactivate all augmentation with a probability. The other parameters activate specific augmentations, that will be executed with a probability of 50%.

In [8]:
from utils.augmentation import tf_augmentation

AUGMENTATION_THRESHOLD = 0.7
AUGMENTATION_FLIP_X = False
AUGMENTATION_FLIP_Y = False
AUGMENTATION_ROTATE = True
AUGMENTATION_ROTATE_DEGREES = 5
AUGMENTATION_SHIFT = True
AUGMENTATION_SHIFT_PERCENTAGE = 10

train_set = train_set.map(
    lambda images:
        tf.cond(
            tf.random.uniform([], 0, 1) < AUGMENTATION_THRESHOLD,
            lambda: tf_augmentation(
                images,
                tf.constant(AUGMENTATION_FLIP_X, tf.bool),
                tf.constant(AUGMENTATION_FLIP_Y, tf.bool),
                tf.constant(AUGMENTATION_ROTATE, tf.bool),
                tf.constant(
                    AUGMENTATION_ROTATE_DEGREES, tf.float32),
                tf.constant(AUGMENTATION_SHIFT, tf.bool),
                tf.constant(
                    AUGMENTATION_SHIFT_PERCENTAGE, tf.float32)
            ),
            lambda: images
        ),
    num_parallel_calls=tf.data.AUTOTUNE
)

train_set = train_set.prefetch(buffer_size=tf.data.AUTOTUNE)
test_set = test_set.prefetch(buffer_size=tf.data.AUTOTUNE)

Here follows definition of the model, in that case a Convolutional AutoEncoder. 

The depth and width of the model is defined through FILTERS constant, each entry in the array is a convolutional layer with the specified amount of kernels/filters. The dimension of the latent vector (the real output of the autoencoder) is definable in the LATENT_DIM constant.

In [9]:
from architectures.cae import CAE

IMAGE_DIM = 28
CHANNELS_NUM = len(set(CHANNELS_MAP.values()))
LATENT_DIM = 8
FILTERS = [32, 64]

model = CAE(
    image_dim=IMAGE_DIM,
    channels_num=CHANNELS_NUM,
    latent_dim=LATENT_DIM,
    filters=FILTERS
)

OPTIMIZER = "Adam"
LEARNING_RATE = 0.001
LOSS = "MeanSquaredError"
model.compile(
    optimizer=OPTIMIZER,
    learning_rate=LEARNING_RATE,
    loss=LOSS
)

Model: "encoder"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
down_sampling_0 (DownSamplin (None, 14, 14, 32)        448       
_________________________________________________________________
down_sampling_1 (DownSamplin (None, 7, 7, 64)          18752     
_________________________________________________________________
flatten (Flatten)            (None, 3136)              0         
_________________________________________________________________
dense (Dense)                (None, 8)                 25096     
Total params: 44,296
Trainable params: 44,104
Non-trainable params: 192
_________________________________________________________________
Model: "decoder"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_1 (Dense)              (None, 3136)              28224     
____________________________________

Experiment folder creation is needed to save models and logs

In [10]:
import datetime

EXPERIMENT_NAME = "MNIST-ConvAutoEncoder-Small"

experiment_dir = datetime.datetime.now().strftime("%Y%m%d-%H%M%S")
experiment_dir = "{}-{}".format(experiment_dir, EXPERIMENT_NAME)

# Create model dir
model_dir = os.path.join(MODELS_DIR, experiment_dir)
os.makedirs(model_dir)

# Set logger
log_dir = os.path.join(LOGS_DIR, experiment_dir)
summary_writer = tf.summary.create_file_writer(log_dir)

In order to save details about the training process, a metadata.json file is created. That configuration is useful in inference phase.

Usually metadata are picked from configuration file: config.json. But for sake of understandability we create it manually.

In [15]:
import json

CONFIG = {}


CONFIG['name'] = EXPERIMENT_NAME
CONFIG['image'] = {}
CONFIG['image']['format'] = IMAGE_FORMAT
CONFIG['image']['dim'] = IMAGE_DIM
CONFIG['image']['channels'] = {}
CONFIG['image']['channels']['map'] = CHANNELS_MAP
CONFIG['image']['channels_map'] = CHANNELS_MAP
CONFIG['dataset'] = {}
CONFIG['dataset']['split_threshold'] = SPLIT_THRESHOLD
CONFIG['preprocessing'] = {}
CONFIG['preprocessing']['normalization_type'] = NORMALIZATION_TYPE
CONFIG['augmentation'] = {}
CONFIG['augmentation']['threshold'] = AUGMENTATION_THRESHOLD
CONFIG['augmentation']['flip_x'] = AUGMENTATION_FLIP_X
CONFIG['augmentation']['flip_y'] = AUGMENTATION_FLIP_Y
CONFIG['augmentation']['rotate'] = {}
CONFIG['augmentation']['rotate']['enabled'] = AUGMENTATION_ROTATE
CONFIG['augmentation']['rotate']['degrees'] = AUGMENTATION_ROTATE_DEGREES
CONFIG['augmentation']['shift'] = {}
CONFIG['augmentation']['shift']['enabled'] = AUGMENTATION_SHIFT
CONFIG['augmentation']['shift']['percentage'] = AUGMENTATION_SHIFT_PERCENTAGE
CONFIG['architecture'] = {}
CONFIG['architecture']['name'] = "cae" # convolutional autoencoder. Useful to run main.py script in order to choose the architecture.
CONFIG['architecture']['filters'] = FILTERS
CONFIG['architecture']['latent_dim'] = LATENT_DIM
CONFIG['training'] = {}
CONFIG['training']['epochs'] = 10 # number of epochs, defined later
CONFIG['training']['batch_size'] = BATCH_SIZE
CONFIG['training']['optimizer'] = {}
CONFIG['training']['optimizer']['name'] = OPTIMIZER
CONFIG['training']['optimizer']['learning_rate'] = LEARNING_RATE
CONFIG['training']['loss'] = LOSS

# Save experiement config
experiment_config = os.path.join(
    MODELS_DIR, experiment_dir, 'config.json')
with open(experiment_config, 'w+') as f:
    json.dump(CONFIG, f, indent=4)

Finally the training step could be executed. Note that, before start the training, the preprocessing step defined above is executed, and so there could be a delay in the start of the training. Be patients.

A mechanism for saving only the best model is implemented. After the middle of the EPOCHS...

In [9]:
from tqdm import tqdm

EPOCHS = 10

for epoch in tqdm(range(EPOCHS)):

    # Train set
    for batch, train_batch in enumerate(train_set):
        model.train_step(train_batch)

    # Test set
    for batch, test_batch in enumerate(test_set):
        model.test_step(test_batch)

    # Save best model
    if epoch > (EPOCHS / 2):
        model.save_best_model(model_dir)

    # Log
    with summary_writer.as_default():
        model.log(epoch, train_batch, test_batch)

    # Reset losses
    model.reset_losses_state()

 60%|██████    | 6/10 [01:01<00:27,  6.94s/it]

INFO:tensorflow:Assets written to: /home/jovyan/models/20220323-111325-MNIST-ConvAutoEncoder-Small/assets



 70%|███████   | 7/10 [01:09<00:21,  7.32s/it]

INFO:tensorflow:Assets written to: /home/jovyan/models/20220323-111325-MNIST-ConvAutoEncoder-Small/assets



 80%|████████  | 8/10 [01:17<00:15,  7.57s/it]

INFO:tensorflow:Assets written to: /home/jovyan/models/20220323-111325-MNIST-ConvAutoEncoder-Small/assets



 90%|█████████ | 9/10 [01:25<00:07,  7.73s/it]

INFO:tensorflow:Assets written to: /home/jovyan/models/20220323-111325-MNIST-ConvAutoEncoder-Small/assets


100%|██████████| 10/10 [01:33<00:00,  9.37s/it]
