## Lab: Convolutional Autoencoders

In this lab, you will use convolution layers to build your autoencoder. This usually leads to better results than dense networks and you will see it in action with the [Fashion MNIST dataset](https://www.tensorflow.org/datasets/catalog/fashion_mnist).

## Imports

In [4]:
%pip install tensorflow tensorflow_datasets

Collecting tensorflow_datasets
  Downloading tensorflow_datasets-4.9.6-py3-none-any.whl.metadata (9.5 kB)
Collecting click (from tensorflow_datasets)
  Downloading click-8.1.7-py3-none-any.whl.metadata (3.0 kB)
Collecting dm-tree (from tensorflow_datasets)
  Downloading dm_tree-0.1.8-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (1.9 kB)
Collecting immutabledict (from tensorflow_datasets)
  Downloading immutabledict-4.2.0-py3-none-any.whl.metadata (3.4 kB)
Collecting promise (from tensorflow_datasets)
  Downloading promise-2.3.tar.gz (19 kB)
  Preparing metadata (setup.py) ... [?25ldone
Collecting pyarrow (from tensorflow_datasets)
  Downloading pyarrow-17.0.0-cp312-cp312-manylinux_2_28_x86_64.whl.metadata (3.3 kB)
Collecting simple-parsing (from tensorflow_datasets)
  Downloading simple_parsing-0.1.6-py3-none-any.whl.metadata (7.3 kB)
Collecting tensorflow-metadata (from tensorflow_datasets)
  Downloading tensorflow_metadata-1.15.0-py3-none-any.whl.metadata (2.4

In [5]:
try:
  # %tensorflow_version only exists in Colab.
  %tensorflow_version 2.x
except Exception:
  pass

import tensorflow as tf
import tensorflow_datasets as tfds

import numpy as np
import matplotlib.pyplot as plt

## Prepare the Dataset

As before, you will load the train and test sets from TFDS. Notice that we don't flatten the image this time. That's because we will be using convolutional layers later that can deal with 2D images.

In [6]:
def map_image(image, label):
  '''Normalizes the image. Returns image as input and label.'''
  image = tf.cast(image, dtype=tf.float32)
  image = image / 255.0

  return image, image

In [7]:
BATCH_SIZE = 128
SHUFFLE_BUFFER_SIZE = 1024

train_dataset = tfds.load('fashion_mnist', as_supervised=True, split="train")
train_dataset = train_dataset.map(map_image)
train_dataset = train_dataset.shuffle(SHUFFLE_BUFFER_SIZE).batch(BATCH_SIZE).repeat()

test_dataset = tfds.load('fashion_mnist', as_supervised=True, split="test")
test_dataset = test_dataset.map(map_image)
test_dataset = test_dataset.batch(BATCH_SIZE).repeat()

2024-09-11 04:30:23.620537: W external/local_tsl/tsl/platform/cloud/google_auth_provider.cc:184] All attempts to get a Google authentication bearer token failed, returning an empty token. Retrieving token from files failed with "NOT_FOUND: Could not locate the credentials file.". Retrieving token from GCE failed with "FAILED_PRECONDITION: Error executing an HTTP request: libcurl code 6 meaning 'Couldn't resolve host name', error details: Could not resolve host: metadata.google.internal".


[1mDownloading and preparing dataset 29.45 MiB (download: 29.45 MiB, generated: 36.42 MiB, total: 65.87 MiB) to /home/codespace/tensorflow_datasets/fashion_mnist/3.0.1...[0m


  from .autonotebook import tqdm as notebook_tqdm
Dl Completed...: 0 url [00:00, ? url/s]
Dl Completed...:   0%|          | 0/1 [00:00<?, ? url/s]
Dl Completed...:   0%|          | 0/2 [00:00<?, ? url/s]
Dl Completed...:   0%|          | 0/3 [00:00<?, ? url/s]
Dl Completed...:   0%|          | 0/4 [00:00<?, ? url/s]
Dl Completed...:   0%|          | 0/4 [00:00<?, ? url/s]
Dl Completed...:   0%|          | 0/4 [00:00<?, ? url/s]
Dl Completed...:   0%|          | 0/4 [00:00<?, ? url/s]
Dl Completed...:   0%|          | 0/4 [00:00<?, ? url/s]
Dl Completed...:  25%|██▌       | 1/4 [00:00<00:01,  2.84 url/s]
Dl Completed...:  25%|██▌       | 1/4 [00:00<00:01,  2.75 url/s]
Dl Completed...:  25%|██▌       | 1/4 [00:00<00:01,  2.72 url/s]
Dl Completed...:  50%|█████     | 2/4 [00:00<00:00,  3.47 url/s]
Dl Completed...:  50%|█████     | 2/4 [00:00<00:00,  3.43 url/s]
Dl Completed...:  50%|█████     | 2/4 [00:00<00:00,  3.40 url/s]
Dl Completed...:  50%|█████     | 2/4 [00:01<00:01,  1.26 url/s]

[1mDataset fashion_mnist downloaded and prepared to /home/codespace/tensorflow_datasets/fashion_mnist/3.0.1. Subsequent calls will reuse this data.[0m




## Define the Model

As mentioned, you will use convolutional layers to build the model. This is composed of three main parts: encoder, bottleneck, and decoder. You will follow the configuration shown in the image below.

<img src="cnnEncoder.png" width="75%" height="75%"/>

The encoder, just like in previous labs, will contract with each additional layer. The features are generated with the Conv2D layers while the max pooling layers reduce the dimensionality.

In [19]:
def encoder(inputs):
  '''Defines the encoder with two Conv2D and max pooling layers.'''
  # START YOUR CODE HERE
  conv_1 = tf.keras.layers.Conv2D(filters=32, kernel_size=3, activation='relu', padding='same')(inputs)
  max_pool_1 = tf.keras.layers.MaxPool2D(2, 2)(conv_1)

  conv_2 = tf.keras.layers.Conv2D(filters=64, kernel_size=3, activation='relu', padding='same')(max_pool_1)
  max_pool_2 = tf.keras.layers.MaxPool2D(2, 2)(conv_2)
  #END YOUR CODE HERE
  
  return max_pool_2

A bottleneck layer is used to get more features but without further reducing the dimension afterwards. Another layer is inserted here for visualizing the encoder output.

In [20]:
def bottle_neck(inputs):
  '''Defines the bottleneck.'''
  # START YOUR CODE HERE
  bottle_neck = tf.keras.layers.Conv2D(filters=128, kernel_size=3, activation='relu', padding='same')(inputs)
  # END YOUR CODE HERE
  encoder_visualization = tf.keras.layers.Conv2D(filters=1, kernel_size=(3,3), activation='sigmoid', padding='same')(bottle_neck)

  return bottle_neck, encoder_visualization

The decoder will upsample the bottleneck output back to the original image size.

In [21]:
def decoder(inputs):
  '''Defines the decoder path to upsample back to the original image size.'''
  # START YOUR CODE HERE
  conv_1 = tf.keras.layers.Conv2D(filters=128, kernel_size=(3,3), activation='relu', padding='same')(inputs)
  # END YOUR CODE HERE
  up_sample_1 = tf.keras.layers.UpSampling2D(size=(2,2))(conv_1)

  conv_2 = tf.keras.layers.Conv2D(filters=64, kernel_size=(3,3), activation='relu', padding='same')(up_sample_1)
  up_sample_2 = tf.keras.layers.UpSampling2D(size=(2,2))(conv_2)

  conv_3 = tf.keras.layers.Conv2D(filters=1, kernel_size=(3,3), activation='sigmoid', padding='same')(up_sample_2)

  return conv_3

You can now build the full autoencoder using the functions above.

In [22]:
def convolutional_auto_encoder():
  '''Builds the entire autoencoder model.'''
  inputs = tf.keras.layers.Input(shape=(28, 28, 1,))
  encoder_output = encoder(inputs)
  bottleneck_output, encoder_visualization = bottle_neck(encoder_output)
  decoder_output = decoder(bottleneck_output)

  model = tf.keras.Model(inputs =inputs, outputs=decoder_output)
  encoder_model = tf.keras.Model(inputs=inputs, outputs=encoder_visualization)
  return model, encoder_model


In [23]:
convolutional_model, convolutional_encoder_model = convolutional_auto_encoder()
convolutional_model.summary()

## Compile and Train the model

In [24]:
train_steps = 60000 // BATCH_SIZE
valid_steps = 60000 // BATCH_SIZE

convolutional_model.compile(optimizer=tf.keras.optimizers.Adam(), loss='binary_crossentropy')
conv_model_history = convolutional_model.fit(train_dataset, steps_per_epoch=train_steps, validation_data=test_dataset, validation_steps=valid_steps, epochs=40)

Epoch 1/40


2024-09-11 04:37:30.942853: W external/local_tsl/tsl/framework/cpu_allocator_impl.cc:83] Allocation of 32514048 exceeds 10% of free system memory.


[1m  1/468[0m [37m━━━━━━━━━━━━━━━━━━━━[0m [1m17:42[0m 2s/step - loss: 0.6949

2024-09-11 04:37:31.471510: W external/local_tsl/tsl/framework/cpu_allocator_impl.cc:83] Allocation of 32514048 exceeds 10% of free system memory.


[1m  2/468[0m [37m━━━━━━━━━━━━━━━━━━━━[0m [1m4:12[0m 542ms/step - loss: 0.6914

2024-09-11 04:37:32.020144: W external/local_tsl/tsl/framework/cpu_allocator_impl.cc:83] Allocation of 32514048 exceeds 10% of free system memory.


[1m  3/468[0m [37m━━━━━━━━━━━━━━━━━━━━[0m [1m4:06[0m 529ms/step - loss: 0.6875

2024-09-11 04:37:32.501773: W external/local_tsl/tsl/framework/cpu_allocator_impl.cc:83] Allocation of 32514048 exceeds 10% of free system memory.


[1m  4/468[0m [37m━━━━━━━━━━━━━━━━━━━━[0m [1m3:56[0m 509ms/step - loss: 0.6824

2024-09-11 04:37:32.979023: W external/local_tsl/tsl/framework/cpu_allocator_impl.cc:83] Allocation of 32514048 exceeds 10% of free system memory.


[1m468/468[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m282s[0m 600ms/step - loss: 0.3263 - val_loss: 0.2701
Epoch 2/40
[1m468/468[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m272s[0m 581ms/step - loss: 0.2660 - val_loss: 0.2620
Epoch 3/40
[1m468/468[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m269s[0m 575ms/step - loss: 0.2590 - val_loss: 0.2585
Epoch 4/40
[1m468/468[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m268s[0m 574ms/step - loss: 0.2557 - val_loss: 0.2561
Epoch 5/40
[1m468/468[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m270s[0m 576ms/step - loss: 0.2536 - val_loss: 0.2555
Epoch 6/40
[1m468/468[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m269s[0m 575ms/step - loss: 0.2522 - val_loss: 0.2538
Epoch 7/40
[1m468/468[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m268s[0m 573ms/step - loss: 0.2516 - val_loss: 0.2537
Epoch 8/40
[1m468/468[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m270s[0m 577ms/step - loss: 0.2505 - val_loss: 0.2526
Epoch 9/40
[1m221/

## Display sample results

As usual, let's see some sample results from the trained model.

In [None]:
def display_one_row(disp_images, offset, shape=(28, 28)):
  '''Display sample outputs in one row.'''
  for idx, test_image in enumerate(disp_images):
    plt.subplot(3, 10, offset + idx + 1)
    plt.xticks([])
    plt.yticks([])
    test_image = np.reshape(test_image, shape)
    plt.imshow(test_image, cmap='gray')


def display_results(disp_input_images, disp_encoded, disp_predicted, enc_shape=(8,4)):
  '''Displays the input, encoded, and decoded output values.'''
  plt.figure(figsize=(15, 5))
  display_one_row(disp_input_images, 0, shape=(28,28,))
  display_one_row(disp_encoded, 10, shape=enc_shape)
  display_one_row(disp_predicted, 20, shape=(28,28,))

In [None]:
# take 1 batch of the dataset
test_dataset = test_dataset.take(1)

# take the input images and put them in a list
output_samples = []
for input_image, image in tfds.as_numpy(test_dataset):
      output_samples = input_image

# pick 10 indices
idxs = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])

# prepare test samples as a batch of 10 images
conv_output_samples = np.array(output_samples[idxs])
conv_output_samples = np.reshape(conv_output_samples, (10, 28, 28, 1))

# get the encoder ouput
encoded = convolutional_encoder_model.predict(conv_output_samples)

# get a prediction for some values in the dataset
predicted = convolutional_model.predict(conv_output_samples)

# display the samples, encodings and decoded values!
display_results(conv_output_samples, encoded, predicted, enc_shape=(7,7))