# AI4Mars

A Machine learning Model for Martian Terrain Image Segmentation

## Introduction

The objective of this project is to take the amazing work done by NASA and train a machine learning model on their [AI4Mars](https://data.nasa.gov/Space-Science/AI4MARS-A-Dataset-for-Terrain-Aware-Autonomous-Dri/cykx-2qix/about_data) dataset. This dataset is the culmination of an incredible effort by experts and the public at large to create a dataset for semantic segmentation of Martian terrain.

The AI4Mars dataset consists ~326K semantic segmentation full image labels on 35K images from Curiosity, Opportunity, and Spirit rovers, collected through crowdsourcing. Each image was labeled by 10 people to ensure greater quality and agreement of the crowdsourced labels. It also includes ~1.5K validation labels annotated by the rover planners and scientists from NASA’s MSL (Mars Science Laboratory) mission, which operates the Curiosity rover, and MER (Mars Exploration Rovers) mission, which operated the Spirit and Opportunity rovers.

As opposed to Earthly applications, training machine learning models for deep space missions is incredibly difficult. This is largely due to the scarcity of available training data and the stringent requirements for safety-critical flight software. As such, this project aims to train a machine learning model under the constraints imposed by a typical deep space mission. This means that efficiency in the size and resource utilization of a machine learning model will be considered a requirement.

## Requirements

| ID          | Description                                                                         | Rationale                                                         | Verification Method |
|-------------|-------------------------------------------------------------------------------------|-------------------------------------------------------------------|---------------------|
| AI4MARS-001 | A machine learning model shall be trained on the AI4Mars v0.1 dataset                    | The AI4Mars dataset is robust and high quality.                   | Inspection          |
| AI4MARS-002 | A machine learning model shall perform semantic segmentation of Mars terrain imagery| The task of the machine learning model is segmentation.           | Testing             |
| AI4MARS-003 | A machine learning model shall execute in under 2GB of RAM                          | The allocated RAM budget for the machine learning model is 4GB.   | Inspection          |
| AI4MARS-004 | A machine learning model shall fit in under 500MB of disk storage                     | The allocated disk budget for the machine learning model is 1GB.  | Inspection          |

## Setup

Lets start with setting up our notebook. We'll need to import our dependencies.

In [3]:
# Import all the modules we will need
from fastai.imports import *
from fastai.vision.all import *
import seaborn as sns
import torch
import PIL

from utils.utils import *

torch.set_printoptions(linewidth=140, sci_mode=False, edgeitems=7)

sns.set_theme()

# ! We need this to get some of the training output to work. This will be fixed
# in a future release of Jupyter for VS Code.
# https://github.com/microsoft/vscode-jupyter/pull/13442#issuecomment-1541584881
from IPython.display import clear_output, display, DisplayHandle, Image


def update_patch(self, obj):
    clear_output(wait=True)
    self.display(obj)
    
# Enable Weights & Biases if the env variable is set
# if os.getenv("ENABLE_WANDB"):
if True:
    g_ENABLE_WANDB = True
else:
    g_ENABLE_WANDB = False

if g_ENABLE_WANDB:
    os.environ["WANDB_NOTEBOOK_NAME"] = "ai4mars.ipynb"
    import wandb
    from fastai.callback.wandb import *
    wandb.login()
 
DisplayHandle.update = update_patch

g_DEVICE = enable_gpu_if_available()

ModuleNotFoundError: No module named 'torch._custom_ops'

In [None]:
# start logging a wandb run
if g_ENABLE_WANDB:
    wandb.init(project="AI4Mars")

## Exploratory Data Analysis

### Download the Data

Now that we have all of our packages imported and our notebook set up, we can
proceed with downloading our data from Kaggle. 

Note that the data is available directly from NASA, a user-uploaded version to Kaggle makes downloading that data much easier.

In [None]:
dataset_name = "yash92328/ai4mars-terrainaware-autonomous-driving-on-mars"
dataset_path = URLs.path(dataset_name)

Path.BASE_PATH = dataset_path

# Download the dataset to a hidden folder and extract it from kaggle
if not dataset_path.exists():
    import kaggle

    dataset_path.mkdir(parents=True, exist_ok=True)
    kaggle.api.dataset_download_cli(dataset_name, path=dataset_path, unzip=True)

# Lets append the subfolder to our path
dataset_path = Path(dataset_path / dataset_path.ls()[0])

dataset_path.ls()

Setup the paths to our images and labels.

In [None]:
# Paths
IMAGES_PATH = Path(dataset_path / "msl" / "images" / "edr")
MASK_PATH_TRAIN = Path(dataset_path / "msl" / "labels" / "train")
MASK_PATH_TEST = Path(
    dataset_path / "msl" / "labels" / "test" / "masked-gold-min3-100agree"
)

In [None]:
len(IMAGES_PATH.ls()), len(MASK_PATH_TRAIN.ls())

One thing we'll notice is that, for some undocumented reason, there are some images missing their labels. We'll need to make sure all our training images have their labels next.

In [None]:
# images_missing_labels = find_images_missing_labels(IMAGES_PATH, MASK_PATH_TRAIN)
# len(images_missing_labels)

In [None]:
# Path(dataset_path / "archive").mkdir(exist_ok=True)

# for img_path in images_missing_labels:
#     shutil.move(img_path, Path(dataset_path / "archive"))

In [None]:
assert(len(IMAGES_PATH.ls()) == len(MASK_PATH_TRAIN.ls()))

### Visualize

Now that we have the data downloaded, we can proceed with actually visualizing some of the images and masks that we have.

Lets start by first seeing what one of these images looks like.

In [None]:
img = PILImage.create(IMAGES_PATH.ls()[2])
img.show(figsize=(5, 5))

Next, to see the corresponding mask associated with our image, we'll need a small function to help us map them on
the fly. According to the AI4Mars info.txt file, the images end with extension `.JPG` while the corresponding label
ends with `.png`.

In [None]:
get_mask_path = lambda file: MASK_PATH_TRAIN / f"{file.stem}.png"

In [None]:
example_mask = PILMask.create(get_mask_path(IMAGES_PATH.ls()[2]))
example_mask.show()

In semantic segmentation, the “labels” are a 1:1 mask of the original picture with each pixel representing a label and are single channel:

In [None]:
tensor(example_mask)

Very quickly we see an issue!

Because of how the loss gets calculated (and how fastai does things in general), the values of the pixel mask must be from 0 -> n, with n being the number of classes possible. If we take things as they are here during training you’ll hit an error that says “CUDA Segmentation Fault, Index Out of Bounds” (or something similar).

This is because our labels should be from 0 -> 4, to align with the fact predicted probabilities from our model are 0 -> 4. Instead they are 0, 1, 2, 3 and 255, leading to this issue.

So how do we fix the issue? In numpy we can just override the numbers for a particular value in the array and set it. To generalize this however a dictionary of the original value to the new one should also be made:

In [None]:
unique_codes = {
    0: 0,
    1: 1,
    2: 2,
    3: 3,
    4: 255
}

Next we need to create a get_y function. In this case it should take in a filename and our dictionary, open the filename, and return the mask:

In [None]:
def get_label(filename: Path, unique_codes: dict) -> PILMask:
    filename = get_mask_path(filename)
    mask_array = np.asarray(PIL.Image.open(filename)).copy()
    
    mask_array[mask_array == 255] = 4
    
    return PILMask.create(mask_array)

Lets stop check this now.

In [None]:
tensor(get_label(IMAGES_PATH.ls()[2], unique_codes))

In [None]:
mask: PILMask = get_label(IMAGES_PATH.ls()[2], unique_codes)
mask.show()

Perfect!

The next thing we should do is create a list of our codes that map to the corresponding pixel value. According to the
info.txt, the codes are as follows:

| RGB         | Key             |
|-------------|-----------------|
| 0,0,0       | soil            |
| 1,1,1       | bedrock         |
| 2,2,2       | sand            |
| 3,3,3       | big rock        |
| 255,255,255 -> 4, 4, 4 | NULL (no label) |

In [None]:
codes = np.array(["soil", "bedrock", "sand", "big rock", "null"], dtype=str)

Now everything is in place to create our `DataBlock` object. Jeremy Howard, the founder of Fast.AI, popularized the idea of image resizing:

* Train on smaller sized images
* Eventually get larger and larger
* Transfer Learning loop

In the AI4Mars paper, the authors mention that they resize their images to 512x512 from the original 1024x1024 size.

Since I am training on a laptop GPU, for this first round we will train at half the image size as well.

In [None]:
mask_sizes = mask.shape; mask_sizes

In [None]:
mask_sizes = tuple(int(x / 2) for x in mask_sizes); mask_sizes

### Create Our `DataLoader`

In the AI4Mars paper, the authors mention that the "batch size was chosen to be as large as possible before running into GPU memory issues". We will do the same.

In [None]:
data_loader = DataBlock(
    blocks=(ImageBlock, MaskBlock(codes=codes)), # our input is an image and outpus is a mask
    splitter=RandomSplitter(), # randomly split our dataset into 80% train and 20% valid
    get_y=partial(get_label, unique_codes=unique_codes), # load our preprocessed labels
    batch_tfms=[*aug_transforms(size=mask_sizes), Normalize.from_stats(*imagenet_stats)] # apply some standard augs
).dataloaders(get_image_files(IMAGES_PATH), bs=8)

data_loader.show_batch()

### Select Our Model Architecture

Next, we can create our `Learner` object that will wrap our model architecture, hyperparameters, and `DataLoader` into one abstract object.

In the AI4Mars paper, the authors opted for the DeepLabv3+ model architecture with a ResNet-101 backend pretrained on ImageNet. For our architecture, we will go with a U-Net with a ResNet18 backing that's been pre-trained on ImageNet. We _could_ utilize a fancy modern vision transformer, however, with spacecraft you want technology that is tried and true. U-Net has a long and rich history that make it a reliable model architecture for our purposes.

In [None]:
def CamVidAcc(input: torch.Tensor, target: torch.Tensor, axis: int=1) -> torch.Tensor:
    """For segmentation, we want to squeeze all the outputted values to
    have it as a matrix of digits for our segmentation mask. From there,
    we want to match their argmax to the target's mask for each pixel
    and take the average

    Args:
        input (torch.Tensor): The independent variable
        target (torch.Tensor): The dependent variable
        axis (int, optional): The dim to apply argmax to. Defaults to 1.

    Returns:
        torch.Tensor: The mean of the input argmax to the target's mask
    """
    target = target.squeeze(1)
    mask = target != 4
    return (input.argmax(dim=axis)[mask] == target[mask]).float().mean()

In [None]:
if g_ENABLE_WANDB:
    learner = unet_learner(
        data_loader,
        resnet18,
        metrics=[CamVidAcc(), JaccardCoeff(), Dice()],
        act_cls=Mish, # use a modern activation function
        loss_func=DiceLoss(axis=1),
        self_attention=True,
        opt_func=ranger # use the Ranger optimization function
    )

This is what our model architecture looks like.

In [None]:
learner.summary()

In [None]:
learning_rates = learner.lr_find(suggest_funcs=(minimum, steep, valley, slide))

learner.lr =  learning_rates.valley

f"Learning Rate: {learner.lr}"

Now lets fine tune our model!

In [None]:
cleanup_gpu_cache()

learner.fine_tune(epochs=3, cbs=WandbCallback())

cleanup_gpu_cache()

## References

[1] AI4MARS: A dataset for Terrain-Aware autonomous driving on Mars. (2021, June 1). IEEE Conference Publication | IEEE Xplore. https://ieeexplore.ieee.org/document/9523149.

[2] SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers. (2021, October 21). https://arxiv.org/abs/2105.15203v3.

[2] Mish: A Self Regularized Non-Monotonic Activation Function. (2019, August 13). https://arxiv.org/abs/1908.08681