# Starter code for Cloud Classification Challenge

This code is designed as starter point for your development. You do not have to use it, but feel free to use it if you do not know where to start.

The [Pytorch](https://pytorch.org/) collection of packages is used to define and train the model, and this code is adapted from their [introductory tutorial](https://pytorch.org/tutorials/beginner/basics/intro.html).

Other machine learning python packages that you may wish to use include [TensorFlow](https://www.tensorflow.org/overview) and [scikit-learn](https://scikit-learn.org/stable/index.html).

In [1]:
import os
os.environ['SLURM_NTASKS_PER_NODE'] = '1' # set to prevent pytorch_lightning.trainer from breaking

import pandas as pd
import torch
from torch.nn import functional as F
from torchvision.io import read_image
import torchvision.transforms as transforms
from torch.utils.data import Dataset, DataLoader
import pytorch_lightning as pl
from torchmetrics.functional.classification import multiclass_accuracy
import mlflow.pytorch
from mlflow import MlflowClient

## Create Custom Dataset for sat images

Dataset instance reads in the directory to the images and their labels.
The dataloader enables simple iteration over these images when training and testing a model.


In [2]:
# Define transforms for label data
def get_label_dict():
    label_dict = {"Fish": 0,
                  "Flower": 1,
                  "Gravel": 2,
                  "Sugar": 3}
    return label_dict


def sat_label_transform(label):
    label_dict = get_label_dict()
    return label_dict[label]


def sat_label_transform_inv(num):
    label_dict = get_label_dict()
    ret_list = [key for key in label_dict.keys() if label_dict[key]==num]
    return ret_list[0]

In [3]:
# Define the transform for images.
# Converts to float and scales values to range 0-1.
# Normalisation using the mean/std used by AlexNet.
img_transform = transforms.Compose([
    transforms.ConvertImageDtype(torch.float),
    transforms.Normalize([0.485, 0.456, 0.406],
                         [0.229, 0.224, 0.225])
    ])

In [4]:
# Create class for loading the satellite image into a Dataset
class SatImageDataset(Dataset):
    def __init__(self, labels_file, img_dir, transform=img_transform, target_transform=sat_label_transform):
        self.img_labels = pd.read_csv(labels_file)[:1000] # TODO: remove, used for testing
        self.img_dir = img_dir
        self.transform = transform
        self.target_transform = target_transform

    def __len__(self):
        return len(self.img_labels)

    def __getitem__(self, idx):
        img_path = os.path.join(self.img_dir, self.img_labels["Image"].iloc[idx])
        image = read_image(img_path)
        label = self.img_labels["Label"].iloc[idx]
        if self.transform:
            image = self.transform(image)
        if self.target_transform:
            label = self.target_transform(label)
        return image, label

Load the training and testing data using instances of the SatImageDataset defined above.

In [5]:
# Load the training data.
train_files_dir = "/data/users/meastman/understanding_clouds_kaggle/input/single_labels/224s/train/"
train_files_labels = "/data/users/meastman/understanding_clouds_kaggle/input/single_labels/224s/train/train_labels.csv"

# Create train images dataloader
train_images = SatImageDataset(labels_file=train_files_labels, img_dir=train_files_dir)
train_dataloader = DataLoader(train_images, batch_size=32, shuffle=True)

In [6]:
# Test Data
test_files_dir = "/data/users/meastman/understanding_clouds_kaggle/input/single_labels/224s/test/"
test_files_labels = "/data/users/meastman/understanding_clouds_kaggle/input/single_labels/224s/test/test_labels.csv"

# Create test images dataloader
test_images = SatImageDataset(labels_file=test_files_labels, img_dir=test_files_dir)
test_dataloader = DataLoader(test_images, batch_size=32, shuffle=True)

## Building a Neural Network

This is a single layer neural network. For more details on the individual layers, and for further options if you wish to create a different model architecture see [the tutorial](https://pytorch.org/tutorials/beginner/basics/buildmodel_tutorial.html).

Note that the input to the layer has size `150528 = 3*224*224`. The input images are 224 * 224 pixels, with 3 RGB channels.

The output layer has size 4 which matches the number of cloud categories available.

In [7]:
class NNCloudClassifier(pl.LightningModule):
    def __init__(self):
        super().__init__()
        self.l1 = torch.nn.Linear(3 * 224 * 224, 4)
        self.test_outputs = []
        self.avg_test_acc = None

    def forward(self, x):
        """
        :param x: Input data

        :return: output - mnist digit label for the input image
        """
        batch_size = x.size()[0]

        # (b, 1, 224, 224) -> (b, 1*224*224)
        x = x.view(batch_size, -1)

        # layer 1 (b, 1*224*224) -> (b, 4)
        x = self.l1(x)
        x = torch.relu(x)

        return x

    def training_step(self, batch, batch_nb):
        x, y = batch
        logits = self(x)
        loss = F.cross_entropy(logits, y)
        pred = logits.argmax(dim=1)
        acc = multiclass_accuracy(pred, y, num_classes=4)

        # Use the current of PyTorch logger
        self.log("train_loss", loss, on_epoch=True)
        self.log("acc", acc, on_epoch=True)
        return loss
    
    def test_step(self, test_batch, batch_idx):
        """
        Performs test and computes the accuracy of the model

        :param test_batch: Batch data
        :param batch_idx: Batch indices

        :return: output - Testing accuracy
        """
        x, y = test_batch
        output = self.forward(x)
        _, y_hat = torch.max(output, dim=1)
        test_acc = multiclass_accuracy(y_hat, y, num_classes=4)
        self.test_outputs.append(test_acc)
        return {"test_acc": test_acc}
    
    def on_test_epoch_end(self):
        """
        Computes average test accuracy score
        """
        self.avg_test_acc = torch.stack(self.test_outputs).mean()
        self.log("avg_test_acc", self.avg_test_acc, sync_dist=True)
        self.test_outputs.clear()

    def configure_optimizers(self):
        return torch.optim.Adam(self.parameters(), lr=0.02)
    

In [8]:
def print_auto_logged_info(r):
    tags = {k: v for k, v in r.data.tags.items() if not k.startswith("mlflow.")}
    artifacts = [f.path for f in MlflowClient().list_artifacts(r.info.run_id, "model")]
    print("run_id: {}".format(r.info.run_id))
    print("artifacts: {}".format(artifacts))
    print("params: {}".format(r.data.params))
    print("metrics: {}".format(r.data.metrics))
    print("tags: {}".format(tags))

In [9]:
# Initialize our model
classifier = NNCloudClassifier()

# Initialize a trainer
trainer = pl.Trainer(max_epochs=20, devices=1, num_nodes=1)

# Auto log all MLflow entities
mlflow.pytorch.autolog()

# Train the model
with mlflow.start_run() as run:
    trainer.fit(classifier, train_dataloader)
    trainer.test(classifier, test_dataloader)
    mlflow.log_metric('avg_test_acc', classifier.avg_test_acc)

# fetch the auto logged parameters and metrics
print_auto_logged_info(mlflow.get_run(run_id=run.info.run_id))

GPU available: False, used: False
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs

  | Name | Type   | Params
--------------------------------
0 | l1   | Linear | 602 K 
--------------------------------
602 K     Trainable params
0         Non-trainable params
602 K     Total params
2.408     Total estimated model params size (MB)
SLURM auto-requeueing enabled. Setting signal handlers.
  rank_zero_warn(
  rank_zero_warn(


Training: 0it [00:00, ?it/s]

`Trainer.fit` stopped: `max_epochs=20` reached.
SLURM auto-requeueing enabled. Setting signal handlers.
  rank_zero_warn(
  rank_zero_warn(


Testing: 0it [00:00, ?it/s]

────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Runningstage.testing metric      DataLoader 0
────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
      avg_test_acc          0.2444826066493988
────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
run_id: f58b85a4b8764ad0893caa657c682a16
artifacts: ['model/MLmodel', 'model/conda.yaml', 'model/data', 'model/python_env.yaml', 'model/requirements.txt']
params: {'epochs': '20', 'optimizer_name': 'Adam', 'lr': '0.02', 'betas': '(0.9, 0.999)', 'eps': '1e-08', 'weight_decay': '0', 'amsgrad': 'False', 'maximize': 'False', 'foreach': 'None', 'capturable': 'False', 'differentiable': 'False', 'fused': 'None'}
metrics: {'train_loss': 1.2448924779891968, 'train_loss_step': 1.0397207736968994, 'acc': 0.307343989610672, 'acc_step': 0.666666686534