# Demo - Using CSLearn to Train an Image Domain Learner

This notebook provides a demo that shows how to use the ImageLearningController API to train an image domain learner model on a locally-saved dataset. This model uses a custom CNN architecture.

### Preliminaries - Import and Initialize

First, we'll import the API from the `controllers` module. Then we'll initialize the API, telling it that we intend to train a domain learner.

In [1]:
from cslearn.controllers import ImageLearningController

ctrl = ImageLearningController(learner_type='domain_learner')



2024-02-24 13:35:24.158495: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2024-02-24 13:35:24.182349: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-02-24 13:35:24.182377: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-02-24 13:35:24.183220: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-02-24 13:35:24.187925: I tensorflow/core/platform/cpu_feature_guar

### Step 1: Create the Data Loaders

Next we'll use the `create_data_loaders` method to indicate what data we'll be using to train our classifier. We are going to use a locally saved dataset to train the domain learner. This data must be saved as four *memory-mapped* .npy files (i.e., they were created using `numpy.memmap(filename, mode='w+', ...)`) containing the training data, validation data, training labels and validation labels. To be compatible with `create_data_loaders`, the data in each file must have `dtype=np.float32`. The labels should be one-hot encoded.

To load the data, we need to pass two dictionaries to `create_data_loaders`: one containing the paths to each data file and one containing the shape of each data array. Examples for some demo data are included below (the demo data is a small subset of the CIFAR10 dataset).

`create_data_loaders` creates `tf.data.Dataset` objects under the hood to handle the data that is passed to the model during training. As a consequence, we need to specify the batch size used when creating the data loaders - this will be the minibatch size used during training of the models.

In [2]:
paths_dict = {
    'train_data_path': 'demo_data/demo_cifar10subset_trn_data.npy',
    'train_labels_path': 'demo_data/demo_cifar10subset_trn_labels.npy',
    'valid_data_path': 'demo_data/demo_cifar10subset_vld_data.npy',
    'valid_labels_path': 'demo_data/demo_cifar10subset_vld_labels.npy',
}

shapes_dict = {
    'train_data_shape': (500, 32, 32, 3),
    'train_labels_shape': (500,10),
    'valid_data_shape': (100, 32, 32, 3),
    'valid_labels_shape': (100,10),
}

ctrl.create_data_loaders(
    dataset='local', 
    batch_size=16,
    paths_dict=paths_dict,
    shapes_dict=shapes_dict
)

2024-02-24 13:35:25.126489: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:901] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
2024-02-24 13:35:25.154467: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:901] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
2024-02-24 13:35:25.154601: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:901] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-

### Step 2: Create and Compile the Model(s)

The next step is to create the learner that we'll be training. This is done using the `create_learner` method. Note that this method creates multiple models assigned as attributes to our `ctrl` object - for the domain learner model type, it creates `encoder` and `decoder` sub-models and the overall domain learner model that is stored as `model`. You can customize nearly every aspect of the CNN when using `custom_cnn` - we'll use the default parameters here.

We set `latent_dim=2` to indicate that the domain we are learning is two-dimensional. `latent_dim` can be any positive integer.

In [3]:
ctrl.create_learner(latent_dim=2, architecture='custom_cnn')

After creating the leaner, we need to compile it. This is done with the `compile_leaner` method, which takes arguments such as the loss function to be used and metrics to save during training. 

The domain learner uses a custom loss function, so we do not need to specify the `loss` parameter here. However, we can set the hyperparameters for the loss function `alpha`, `beta`, and `lam` here. See [cite paper] for details on these parameters.

In [4]:
ctrl.compile_learner(
    alpha=1.0,
    beta=1.0,
    lam=0.1
)

CSLearn comes with a helper method for summarizing the models that you created - simply call `summarize_models` with no inputs.

In [5]:
ctrl.summarize_models()




_________________________________________________________________
                           encoder                          
_________________________________________________________________
 Layer                       Output Shape              Param #   
 input_1                     (None, 32, 32, 3)         0
 convolution_block           (None, 16, 16, 16)        2432
 convolution_block_1         (None, 16, 16, 16)        6480
 convolution_block_2         (None, 8, 8, 32)          4768
 convolution_block_3         (None, 8, 8, 32)          9376
 global_average_pooling2d    (None, 32)                0
 dense                       (None, 2)                 66
Total params: 23122
Trainable params: 22930
Non-trainable params: 192
_________________________________________________________________




_________________________________________________________________
                           decoder                          
___________________________________________________________

### Step 3: Train the Learner

After we have created and compiled the leaner, we use the `train_leaner` method to initiate training. This method takes various parameters related to the training algorithm. We'll only specify the number of epochs and indicate that we would like a verbose output during training.

The default verbosity of 1 can get a bit messy for the domain learner, so we'll use `verbose=2`, which provides an output that is a bit more suppressed.

In [6]:
ctrl.train_learner(epochs=10, verbose=2, proto_update_step_size=32)




Epoch 1/10 (warmup)


2024-02-24 13:35:27.593207: I external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:454] Loaded cuDNN version 8904
2024-02-24 13:35:27.642651: I external/local_tsl/tsl/platform/default/subprocess.cc:304] Start cannot spawn child process: No such file or directory
2024-02-24 13:35:27.773697: I external/local_tsl/tsl/platform/default/subprocess.cc:304] Start cannot spawn child process: No such file or directory
2024-02-24 13:35:29.166351: I external/local_xla/xla/service/service.cc:168] XLA service 0x7f395041da40 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2024-02-24 13:35:29.166376: I external/local_xla/xla/service/service.cc:176]   StreamExecutor device (0): NVIDIA GeForce RTX 3060, Compute Capability 8.6
2024-02-24 13:35:29.170121: I tensorflow/compiler/mlir/tensorflow/utils/dump_mlir_util.cc:269] disabling MLIR crash reproducer, set env var `MLIR_CRASH_REPRODUCER_DIRECTORY` to enable.
I0000 00:00:1708803329.223417  296189 device_compiler.

32/32 - 6s - loss: 3.5309 - wl_r: 0.7580 - wl_c: 2.7476 - wl_d: 0.0253 - l_r: 0.7580 - l_c: 2.7476 - l_d: 0.2528 - accuracy: 0.1250 - val_loss: 2.5888 - val_l_r: 0.3368 - val_l_c: 2.2265 - val_l_d: 0.2552 - val_accuracy: 0.1600 - 6s/epoch - 184ms/step



Epoch 2/10
Epoch 2/2
32/32 - 1s - loss: 2.8211 - wl_r: 0.7044 - wl_c: 2.1150 - wl_d: 0.0017 - l_r: 0.7044 - l_c: 2.1150 - l_d: 0.0168 - accuracy: 0.1602 - val_loss: 3.2984 - val_l_r: 1.1493 - val_l_c: 2.1482 - val_l_d: 0.0082 - val_accuracy: 0.1600 - 798ms/epoch - 25ms/step



Epoch 3/10
Epoch 3/3
32/32 - 1s - loss: 2.9627 - wl_r: 0.7290 - wl_c: 2.2321 - wl_d: 0.0016 - l_r: 0.7290 - l_c: 2.2321 - l_d: 0.0164 - accuracy: 0.2070 - val_loss: 4.2107 - val_l_r: 1.9458 - val_l_c: 2.2648 - val_l_d: 6.4900e-04 - val_accuracy: 0.1200 - 850ms/epoch - 27ms/step



Epoch 4/10
Epoch 4/4
32/32 - 1s - loss: 2.7889 - wl_r: 0.5551 - wl_c: 2.2304 - wl_d: 0.0034 - l_r: 0.5551 - l_c: 2.2304 - l_d: 0.0343 - accuracy: 0.2109 - val_loss: 4.2540 - val_l_r: 1.

### Step 4: Evaluation

After the model has been trained, we can call multiple methods that will perform some kind of evaluation on the result. These methods begin with `eval_` followed by a description of the evaluation performed.

First, we'll just call a simple method that plots the training and validation loss curves.

In [7]:
ctrl.eval_plot_loss_curves(which='both')
ctrl.eval_plot_accuracy_curves(which='both')

Note that because the dataset we are training on is so small, overfitting is very likely and the results will not be great. The demo data is purely for demonstrating the usage of the API.

Most of the evaluation methods in the API are geared toward the domain learner model. Examples of these methods are given below.

First, create a list defining the legend.

In [8]:
legend = [
    'airplane',
    'automobile',
    'bird',
    'cat',
    'deer',
    'dog',
    'frog',
    'horse',
    'ship',
    'truck'
]

Plot the features of the encoded validation images:

In [9]:
ctrl.eval_plot_scattered_features(
    which='validation',
    legend=legend
)

Computing features...
Done.


The domain learner learns prototype points for each property/class in the learned domain. We can plot these prototypes in the space, as well as visualize the decoded prototype points:

In [10]:
ctrl.eval_plot_scattered_protos()
ctrl.eval_show_decoded_protos()



Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping i

We can visualize a single dimension of the latent space using the decoder, or visualize all dimensions at once:

In [11]:
ctrl.eval_visualize_dimension(dimension=1)
ctrl.eval_visualize_all_dimensions()

Computing features...


Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping i

Computing features...
Fixed: [0.5783857326954603, -0.3200235665170476]
Dimension: 1 - Min: -1.504 - Max: 2.125
Dimension: 2 - Min: -1.867 - Max: 0.431


A semantic similarity function is used to help the domain learner obtain meaningful representations for the different properties in the domain space. We can plot a heatmap to visualize the pairwise similarities between property prototypes, and also plot similarity histograms using the encoded validation samples (the x-axis of each histogram has a range of 0 to 1):

In [12]:
ctrl.eval_plot_similarity_heatmap(legend=legend)
ctrl.eval_plot_similarity_histograms(legend=legend)

Computing features...
0
1
2
3
4
5
6
Plotting similarity histograms...
Finished 100 of 100
Done.


Finally, we can compare true images to recovered images to assess the quality of the autoencoder model.

In [13]:
ctrl.eval_compare_true_and_generated()

