In [None]:
from tensorflow.keras import datasets

# Load the Fashion MNIST dataset, splitting it into training and testing sets.
# x_train, x_test: Grayscale images (originally 28x28 pixels, 0-255 values).
# y_train, y_test: Integer labels (0-9).
(x_train,y_train),(x_test,y_test) = datasets.fashion_mnist.load_data()

# Pre-process the MNIST dataset to be easier to work with.
# 1. Normalize pixel values: Convert to float32 and scale from 0-255 to 0.0-1.0.
# 2. Pad images: Add 2 pixels of zero-padding around each 28x28 image,
#    making them 32x32 pixels. This helps with common CNN input sizes.
# 3. Add channel dimension: For grayscale images, add a channel dimension of 1
#    (e.g., from (32, 32) to (32, 32, 1)). This is required by Keras Conv2D layers.
    def preprocess_mnist_image(images):
    images = images.astype("float32") / 255.0
    images = np.pad(images, ((0,0), (2,2), (2,2)), constant_values = 0.0)
    images = np.expand_dims(images, -1)
    return images

x_train = preprocess_mnist_image(x_train)
x_test = preprocess_mnist_image(x_test) 

## Encoders

Encoders can be thought of as feature extractors. They take raw input (in our case, MNIST images) and compress them 
into a compact but informative representation in a latent space/embedding space (the space of all possible outcomes of outputs). So an image, such as pants with pockets, may be encoded into an embedding within the MNIST dataset's latent space, such as coordinates (5.5, -6.3). This isn't just about shrinking data; it's about making it understandable for downstream tasks by highlighting the key underlying features.

## Decoders

Decoders are the counterparts to encoders. Given an encoding/embedding/latency representation, they expand it back into an output. Going with the previous example, a decoder may take the (5.5, -6.3) coordinate and turn it back into an image with pants with pockets. 

## Autoencoders

Autoencoders are made up of encoders and decoders. An autoencoder can take an image, encode it into an embedding, and then decode that same embedding into a similar image as the input. In other words, it can take an image, map it to a point in its embedding space (or latent space), and generate some facsimile of the original version. 

## Encoding: Mapping to a Latent Space

To do all that, we will first need to embed images into a latent space using an encoder.

In [None]:
encoder_input = layers.Input(shape=(32,32,1), name="encoder_input"))

# Our encoder will progressively extract features and reduce the dimensionality
# of the input image, mapping it to a lower-dimensional latent space.
# Output shape after this layer: (16, 16, 32)
x = layers.Conv2D(32, (3,3), strides=2, activation="relu", padding="same")(encoder_input)
# Output shape after this layer: (8, 8, 64)
x = layers.Conv2D(64, (3,3), strides=2, activation="relu", padding="same")(x)
# Output shape after this layer: (4, 4, 128)
x = layers.Conv2D(128, (3,3), strides=2, activation="relu", padding="same")(x)

# Finally, we flatten the 3D output of the last convolutional layer (4x4x128) into 
# a 1D vector (4 * 4 * 128 = 2048 elements).
# This is necessary to connect to a fully connected (Dense) layer. Yes, by flattening,
# we DO lose the spatial ifnormation about features that were next to each other in the 
# 2D maps, because the 2D grid structure gets flattened. 
# However, we assume this spatial relationship information has already been effectively captured
# and encoded by the preceding Conv2D layers.
x = layers.Flatten()(x)

# Finally, we create a fully-connected output layer. We specify 2 units, as the dimensionality of
# the latent space representation. Thus, each input image will be compressed into a 2-dimensional vector.
encoder_output = layers.Dense(2, name="encoder_output")(x)

encoder = models.Model(encoder_input, encoder_output)

## Decoder

The decoder does the opposite of the encoder - as such, instead of convolutional layers, it uses convolutional tranpose layers. This uses
many of the same principles as a standard convolutional layer, instead of downsampling, it is used for upsampling. In other words, given a 
low-dimension input such as an embedding (e.g. (3,5) in a latent space), the transpose can reconstruct a higher-resolution output (a picture of clothing).

