First, let's make sure we have the necessary dependencies installed.
We'll use conda to install the mxnet library from the conda-forge channel.
This will provide us with the required deep learning capabilities.
Execute the following command in your terminal or command prompt:
# conda install -c conda-forge mxnet
Once the installation is complete, we can proceed with the U-Net implementation.

In [25]:
conda install -c conda-forge mxnet

Collecting package metadata (current_repodata.json): done
Solving environment: \ 
The environment is inconsistent, please check the package plan carefully
The following packages are causing the inconsistency:

  - defaults/osx-arm64::qtconsole==5.4.0=py310hca03da5_0
  - defaults/osx-arm64::nbclassic==0.5.2=py310hca03da5_0
  - defaults/osx-arm64::jupyterlab_server==2.19.0=py310hca03da5_0
  - defaults/osx-arm64::jupyter_server==1.23.4=py310hca03da5_0
  - defaults/osx-arm64::hvplot==0.8.2=py310hca03da5_0
  - defaults/osx-arm64::notebook-shim==0.2.2=py310hca03da5_0
  - defaults/osx-arm64::jupyterlab==3.5.3=py310hca03da5_0
  - defaults/osx-arm64::nbconvert==6.5.4=py310hca03da5_0
  - defaults/osx-arm64::ipykernel==6.19.2=py310h33ce5c2_0
  - defaults/osx-arm64::sphinx==5.0.2=py310hca03da5_0
  - defaults/noarch::jupyterlab_pygments==0.1.2=py_0
  - defaults/osx-arm64::scikit-image==0.19.3=py310h313beb8_1
  - defaults/osx-arm64::holoviews==1.15.4=py310hca03da5_0
  - defaults/osx-arm64::spyder==5

The 'conda list | grep mxnet' command is used to check whether the 'mxnet' library is installed in the current Conda environment. It filters the output using 'grep' to display only the lines containing the term 'mxnet'. This helps us verify if the 'mxnet' package is already installed and available for use.

In [27]:
conda list | grep mxnet

_mutex_mxnet              0.0.50                 openblas  
libmxnet                  1.9.1           openblas_h34268ac_0  
mxnet                     1.6.0                    pypi_0    pypi
py-mxnet                  1.9.1           py310h89c6318_0  

Note: you may need to restart the kernel to use updated packages.


# We start by importing the required libraries for our U-Net implementation.

In [3]:
import os
import mxnet as mx
from mxnet import autograd, gluon, init, nd
from mxnet.gluon import data as gdata, loss as gloss, nn

# Load dataset
We will use the same Pet Dataset which we used in Jax-UNet

In [4]:
!curl -OL https://www.robots.ox.ac.uk/~vgg/data/pets/data/images.tar.gz
!curl -OL https://www.robots.ox.ac.uk/~vgg/data/pets/data/annotations.tar.gz
!tar -xf images.tar.gz
!tar -xf annotations.tar.gz

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   350  100   350    0     0    576      0 --:--:-- --:--:-- --:--:--   623
100   185  100   185    0     0    187      0 --:--:-- --:--:-- --:--:--   187
100  755M  100  755M    0     0  20.6M      0  0:00:36  0:00:36 --:--:-- 23.4M0M 0  20.5M      0  0:00:36  0:00:36 --:--:-- 23.2M
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   355  100   355    0     0   1028      0 --:--:-- --:--:-- --:--:--  1047
100   185  100   185    0     0    258      0 --:--:-- --:--:-- --:--:--  2341
100 18.2M  100 18.2M    0     0  7518k      0  0:00:02  0:00:02 --:--:-- 24.9M


## Data Loading and Preprocessing

This code loads and preprocesses the data for semantic segmentation model training. The main steps include:

1. **File Paths Retrieval**: Retrieve paths for input images and target annotations.

2. **Data Reading and Storage**: Read and store images and annotations.

3. **Data Validation**: Ensure equal number of input and target images.

4. **Data Transformation**: Resize and convert data to tensors.

5. **Dataset Creation**: Combine input and target data to create a dataset.

6. **Data Size Information**: Return the dataset and data lengths for training.

The code prepares data for training a semantic segmentation model in MXNet.

In [2]:
# Load and preprocess the data
def get_data():
    """
    This function loads and preprocesses input images and target images for training the U-Net model.

    argument: None

    returning:
        - An ArrayDataset containing input images and target images after applying transformations.
        - The number of input images (length of the ArrayDataset).
        - The number of target images (length of the ArrayDataset).
    """
    # Define directories and image size
    input_dir = "images/"
    target_dir = "annotations/trimaps/"
    img_size = (160, 160)
    num_classes = 3

    # Get file paths for input images
    input_img_paths = sorted(
        [
            os.path.join(input_dir, fname)
            for fname in os.listdir(input_dir)
            if fname.lower().endswith(('.jpg', '.jpeg', '.png'))
        ]
    )

    # Get file paths for target images
    target_img_paths = sorted(
        [
            os.path.join(target_dir, fname)
            for fname in os.listdir(target_dir)
            if fname.lower().endswith('.png') and not fname.startswith(".")
        ]
    )

    # Load input images
    input_img = []
    for path in input_img_paths:
        try:
            img = mx.image.imread(path)
            input_img.append(img)
        except mx.base.MXNetError:
            print(f"Failed to read image: {path}")

    # Load target images
    target_img = []
    for path in target_img_paths:
        try:
            img = mx.image.imread(path, 0)
            target_img.append(img)
        except mx.base.MXNetError:
            print(f"Failed to read target image: {path}")

    # Ensure the number of input and target images match
    min_length = min(len(input_img), len(target_img))
    input_img = input_img[:min_length]
    target_img = target_img[:min_length]

    # Define transformations for input and target images
    transform_fn = gdata.vision.transforms.Compose(
        [
            gdata.vision.transforms.Resize(img_size),
            gdata.vision.transforms.ToTensor(),
        ]
    )

    # Apply transformations to input images
    for i in range(len(input_img)):
        input_img[i] = transform_fn(input_img[i])

    # Apply transformations to target images
    for i in range(len(target_img)):
        target_img[i] = transform_fn(target_img[i])

    # Return the preprocessed data and the number of images
    return (
        gdata.ArrayDataset(input_img, target_img),
        len(input_img),
        len(target_img),
    )

## UNet Model Architecture

The UNet model is defined as a subclass of `gluon.HybridBlock`. It consists of an encoder, bottleneck, decoder, and output layers.

- **Encoder**: The encoder includes three sets of convolutional layers with batch normalization and ReLU activation, followed by max pooling for down-sampling.

- **Bottleneck**: The bottleneck compresses and represents features using two convolutional layers with batch normalization and ReLU activation.

- **Decoder**: The decoder mirrors the encoder's structure with transposed convolutions for up-sampling.

- **Output Layer**: The output layer performs a 1x1 convolution to generate the final segmentation map with `num_classes` channels.

The `hybrid_forward` method implements the forward pass, passing the input tensor through the encoder, bottleneck, decoder, and output layers sequentially. The UNet architecture excels in semantic segmentation tasks, providing accurate pixel-wise object classification in images.

In [13]:
class UNet(gluon.HybridBlock):
    def __init__(self, num_classes, **kwargs):
        super(UNet, self).__init__(**kwargs)
        with self.name_scope():
            # Encoder
            """
            Initializes the encoder layers of the U-Net.
            
            argument:
                - num_classes: The number of classes for the output segmentation.
            """
            self.encoder = nn.HybridSequential()
            with self.encoder.name_scope():
                self.encoder.add(
                    nn.Conv2D(64, kernel_size=3, padding=1, strides=1),
                    nn.BatchNorm(),
                    nn.Activation("relu"),
                    nn.Conv2D(64, kernel_size=3, padding=1, strides=1),
                    nn.BatchNorm(),
                    nn.Activation("relu"),
                    nn.MaxPool2D(pool_size=2, strides=2),
                    nn.Conv2D(128, kernel_size=3, padding=1, strides=1),
                    nn.BatchNorm(),
                    nn.Activation("relu"),
                    nn.Conv2D(128, kernel_size=3, padding=1, strides=1),
                    nn.BatchNorm(),
                    nn.Activation("relu"),
                    nn.MaxPool2D(pool_size=2, strides=2),
                    nn.Conv2D(256, kernel_size=3, padding=1, strides=1),
                    nn.BatchNorm(),
                    nn.Activation("relu"),
                    nn.Conv2D(256, kernel_size=3, padding=1, strides=1),
                    nn.BatchNorm(),
                    nn.Activation("relu"),
                    nn.MaxPool2D(pool_size=2, strides=2),
                )

            # Bottleneck
            """
            Initializes the bottleneck layers of the U-Net.
            """
            self.bottleneck = nn.HybridSequential()
            with self.bottleneck.name_scope():
                self.bottleneck.add(
                    nn.Conv2D(512, kernel_size=3, padding=1, strides=1),
                    nn.BatchNorm(),
                    nn.Activation("relu"),
                    nn.Conv2D(512, kernel_size=3, padding=1, strides=1),
                    nn.BatchNorm(),
                    nn.Activation("relu"),
                )

            # Decoder
            """
            Initializes the decoder layers of the U-Net.
            """
            self.decoder = nn.HybridSequential()
            with self.decoder.name_scope():
                self.decoder.add(
                    nn.Conv2DTranspose(256, kernel_size=2, strides=2),
                    nn.Conv2D(256, kernel_size=3, padding=1, strides=1),
                    nn.BatchNorm(),
                    nn.Activation("relu"),
                    nn.Conv2D(256, kernel_size=3, padding=1, strides=1),
                    nn.BatchNorm(),
                    nn.Activation("relu"),
                    nn.Conv2DTranspose(128, kernel_size=2, strides=2),
                    nn.Conv2D(128, kernel_size=3, padding=1, strides=1),
                    nn.BatchNorm(),
                    nn.Activation("relu"),
                    nn.Conv2D(128, kernel_size=3, padding=1, strides=1),
                    nn.BatchNorm(),
                    nn.Activation("relu"),
                    nn.Conv2DTranspose(64, kernel_size=2, strides=2),
                    nn.Conv2D(64, kernel_size=3, padding=1, strides=1),
                    nn.BatchNorm(),
                    nn.Activation("relu"),
                    nn.Conv2D(64, kernel_size=3, padding=1, strides=1),
                    nn.BatchNorm(),
                    nn.Activation("relu"),
                )

            # Output layer
            """
            Initializes the output layer of the U-Net.

            argument:
                - num_classes: The number of classes for the output segmentation.
            """
            self.output = nn.Conv2D(num_classes, kernel_size=1, strides=1)

    def hybrid_forward(self, F, x):
        # Encoder
        """
        Defines the forward pass for the U-Net model.

        argument:
            - F: The framework-specific symbol or tensor manipulation object (e.g., mx.sym or mx.nd in MXNet).
            - x: The input tensor.

        returning:
            - The output tensor after passing through the U-Net model.
        """
        encoder_out = self.encoder(x)

        # Bottleneck
        bottleneck_out = self.bottleneck(encoder_out)

        # Decoder
        decoder_out = self.decoder(bottleneck_out)

        # Output
        return self.output(decoder_out)

## Training the UNet Model

The `train_unet` function is responsible for training the UNet model on the prepared dataset. Here's an overview of the steps involved:

1. **Get the Data**: The function retrieves the dataset, the number of samples, and the number of target classes using the `get_data` function.

2. **DataLoader**: The DataLoader is created to handle batching and shuffling of the data during training.

3. **U-Net Model**: An instance of the `UNet` model is created with `num_classes` specifying the number of classes for segmentation. The model is initialized and hybridized to accelerate computation.

4. **Loss Function and Optimizer**: The loss function used is softmax cross-entropy loss. The Adam optimizer is employed with a learning rate of 0.001.

5. **Training Loop**: The training process iterates over the specified number of epochs. For each epoch, the data is processed in batches, and the model parameters are updated based on the computed loss. The average loss per epoch is calculated and displayed.

6. **Save Model Parameters**: After training is complete, the trained model parameters are saved to "unet.params".

The `train_unet` function enables the UNet model to learn from the dataset and perform semantic segmentation with improved accuracy as the training progresses.

In [16]:
def train_unet(num_classes, batch_size, epochs):
    """
    Trains the U-Net model.

    arguments:
        - num_classes: The number of classes for the output segmentation.
        - batch_size: The number of samples per batch during training.
        - epochs: The total number of training epochs.

    returning: None
    """
    ctx = mx.cpu()  # Used CPU for training 

    # Get the data
    dataset, num_samples, num_targets = get_data()

    # DataLoader
    train_data = gdata.DataLoader(
        dataset, batch_size=batch_size, shuffle=True, last_batch="discard"
    )

    # U-Net model
    """
    Creates and initializes the U-Net model.

    argument:
        - num_classes: The number of classes for the output segmentation.

    returning: None
    """
    model = UNet(num_classes)
    model.initialize(ctx=ctx)
    model.hybridize()

    # Loss function and optimizer
    """
    Defines the loss function and optimizer for training the model.

    argument: None

    returning: None
    """
    loss_fn = gloss.SoftmaxCrossEntropyLoss(axis=1)
    optimizer = "adam"
    lr = 0.001
    trainer = gluon.Trainer(model.collect_params(), optimizer, {"learning_rate": lr})

    # Training loop
    """
    The main training loop for the U-Net model.

    argument: None

    returning: None
    """
    for epoch in range(epochs):
        epoch_loss = 0
        num_batches = 0

        for data, target in train_data:
            data = data.as_in_context(ctx)
            target = target.as_in_context(ctx)

            with autograd.record():
                output = model(data)
                loss = loss_fn(output, target)
            loss.backward()
            trainer.step(batch_size)
            epoch_loss += nd.mean(loss.as_nd_ndarray()).asscalar()
            num_batches += 1

        print(f"Epoch {epoch+1}/{epochs}, Loss: {epoch_loss/num_batches:.4f}")

    # Save the model
    """
    Saves the trained U-Net model parameters to a file.

    argument: None

    returning: None
    """
    model.save_parameters("unet.params")

## Main Script

The main script initiates the training of the UNet model on the Oxford Pets dataset. Here's an overview of the script:

1. **Num Classes, Batch Size, and Epochs**: The script sets the number of classes (num_classes), batch size (batch_size), and the number of training epochs (epochs).

2. **Training**: The `train_unet` function is called with the specified parameters to start the training process.

The main script is responsible for executing the training procedure and producing the trained UNet model with semantic segmentation capabilities for the Oxford Pets dataset.

In [None]:
if __name__ == "__main__":
    num_classes = 3
    batch_size = 128
    epochs = 2
    # Train the U-Net model
    """
    Calls the function to train the U-Net model with specified parameters.

    arguments:
        - num_classes: The number of classes for the output segmentation.
        - batch_size: The number of samples per batch during training.
        - epochs: The total number of training epochs.

    returning: None
    """
    train_unet(num_classes, batch_size, epochs)

Failed to read image: images/Abyssinian_34.jpg
Failed to read image: images/Egyptian_Mau_139.jpg
Failed to read image: images/Egyptian_Mau_145.jpg
Failed to read image: images/Egyptian_Mau_167.jpg
Failed to read image: images/Egyptian_Mau_177.jpg
Failed to read image: images/Egyptian_Mau_191.jpg


Corrupt JPEG data: premature end of data segment
Corrupt JPEG data: 245 extraneous bytes before marker 0xd9
