<a href="https://colab.research.google.com/github/Akramz/vllm-satim-labeling/blob/main/notebooks/starter_colab.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Starter Code

In this notebook, we want to use an open source Vision-LLM to label satellite imagery in Africa. We will cover the following:

1. Install the required dependencies.
2. Download, extract, and load the dataset.
3. Load the (quantized) model.
4. Create a pipeline to label the images.
5. Export performance metrics on the training set.
6. Export a sample submission file on the test set that we can submit to Kaggle.



## Installing Dependencies

In [None]:
!pip install -q -U transformers==4.37.2
!pip install -q bitsandbytes==0.41.3 accelerate==0.25.0
!pip install -q datasets
!pip install -q evaluate
!pip install -q scikit-learn
!pip install -q gdown
!pip install -q seaborn

In [None]:
import re
import random
from pathlib import Path
import random
from datetime import *
from tqdm import tqdm

random.seed(1337)

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from datasets import load_dataset

In [None]:
import torch
from transformers import pipeline
from transformers.pipelines.pt_utils import KeyDataset
from transformers import BitsAndBytesConfig
from sklearn.metrics import confusion_matrix, f1_score

## Data Preparation


In [None]:
# Download and extract the dataset
!mkdir -p data/
!gdown -O data/dataset.zip "https://drive.google.com/uc?id=1fIAHpLdHvdlgbEhHy-tWKlq_RXCOduwG"
!unzip data/dataset.zip -d data/

In [None]:
# Visualize some training patches
root = Path("./data/dataset/train")
assert root.exists()

In [None]:
# Load the training metadata file
train_md = pd.read_csv(root / "metadata.csv")
train_md.head()

In [None]:
# Check the label distribution
_ = train_md["label"].value_counts().plot(kind="bar")

We plot a few examples by label:

In [None]:
def plot(d, label, n=10):
    imgs = d.loc[d["label"] == label, "file_name"]
    n_ = n if n < len(imgs) else len(imgs)
    imgs = imgs.sample(n_).tolist()
    fig, axes = plt.subplots(1, n_, figsize=(20, 5))
    for i, ax in enumerate(axes):
        ax.imshow(plt.imread(root / imgs[i]))
        ax.axis("off")
        ax.set_title(imgs[i])
    plt.show()

In [None]:
plot(train_md, "industrial_energy")

In [None]:
plot(train_md, "transportation_infrastructure")

In [None]:
plot(train_md, "agriculture_and_water_bodies")

In [None]:
plot(train_md, "residential_settlements")

Let's load the dataset now:

In [None]:
# Add a placeholder `label` for the `test` dataset
test_md = pd.read_csv("./data/dataset/test/metadata.csv")

In [None]:
data_path = "africa_dataset"
train = load_dataset("imagefolder", data_path, split="train")

## Model Loading

In [None]:
# Set quantization
quantization_config = BitsAndBytesConfig(
    load_in_4bit=True, bnb_4bit_compute_dtype=torch.float16
)

# Set the model name
model_id = "llava-hf/llava-1.5-7b-hf"

# Create pipeline
pipe = pipeline(
    "image-to-text",
    model=model_id,
    trust_remote_code=True,
    model_kwargs={"quantization_config": quantization_config},
)

## Prompting

We create a map from indices to labels.

*Note: You may use custom indexes to represent the label strings **BUT remember** to provide the **label strings** in the submission file.*

In [None]:
# Mapping indices to broader categories with sub-category information
idx_to_label_map = {
    0: "residential and human settlements (single-unit residential, multi-unit residential, impoverished settlement)",
    1: "industrial and energy (electric substation, factory or powerplant, wind farm, solar farm, surface mine, storage tank, water treatment facility, dam)",
    2: "transportation and infrastructure (ground transportation station, toll booth, road bridge, interchange, railway bridge, airport, airport hangar, airport terminal, runway, helipad, port, shipyard)",
    3: "recreational facilities (stadium, golf course, race track)",
    4: "agriculture and water bodies (crop field, lake or pond)"
}

# Set the maximum number of model output tokens
max_new_tokens = 10

prompt = f"""
USER: <image>
Given the following broad categories and their sub-categories: {str(idx_to_label_map)}.\nYour task is to analyze the image, identify the primary category that best matches the image content, and return only the class key as an int.\nASSISTANT: Class key:"""


Next, we define a post-processing to get the first mentioned object type:

In [None]:
def process_result(input_string):
    print(input_string)
    match = re.search(r"ASSISTANT:.*?(\d+)", input_string)
    if match:
        return int(match.group(1))
    return -1

Let's sample a bunch of data point sto run inference & evaluation on:

In [None]:
# Sample 64 elements from the dataset
sample_size = 64
X = train.select(random.sample(range(len(train)), sample_size))

We can use our Vision-LLM to predict the categories:

In [None]:
outputs = list()

prepped_dataset = KeyDataset(X, "image")
for out in tqdm(
    pipe(
        prepped_dataset,
        prompt=prompt,
        generate_kwargs={"max_new_tokens": max_new_tokens},
    ),
    total=len(prepped_dataset),
):
    outputs.append(process_result(out[0]["generated_text"]))

X = X.add_column("y_hat", outputs)

Let's map predicted indices back to original labels

In [None]:
long_2_shot_categories = {
    "residential and human settlements (single-unit residential, multi-unit residential, impoverished settlement)": "residential_settlements",
    "industrial and energy (electric substation, factory or powerplant, wind farm, solar farm, surface mine, storage tank, water treatment facility, dam)": "industrial_energy",
    "transportation and infrastructure (ground transportation station, toll booth, road bridge, interchange, railway bridge, airport, airport hangar, airport terminal, runway, helipad, port, shipyard)": "transportation_infrastructure",
    "recreational facilities (stadium, golf course, race track)": "recreational_facilities",
    "agriculture and water bodies (crop field, lake or pond)": "agriculture_and_water_bodies"
}

X = X.map(lambda x: {"y_hat": long_2_shot_categories.get(x["y_hat"], "unk")})

## Evaluation

We calculate the confusion matrix and visualize accuracy per class

In [None]:
# Assuming X is a DataFrame that contains the target and prediction columns
targets = X["label"]
predictions = X["y_hat"]

# Compute the confusion matrix
cm = confusion_matrix(targets, predictions)

# Compute class accuracies
class_acc = cm.diagonal() / cm.sum(axis=1)
macro_acc = class_acc.mean()

# Create the plot
fig, ax = plt.subplots(figsize=(5, 5))

# Add the class labels to the heatmap
sns.heatmap(cm, annot=True, fmt="d", cmap="viridis", ax=ax,
            xticklabels=sorted(list(set(targets))),
            yticklabels=sorted(list(set(targets)))
            )

# Add title and labels
ax.set_xlabel('Predicted Labels')
ax.set_ylabel('True Labels')
fig.suptitle("Confusion Matrix\nMacro Accuracy: {:.2f}".format(macro_acc))

# Show the plot
plt.show()

We calculate macro f1. This is the evaluation metric for this competition.

In [None]:
f1 = f1_score(targets, predictions, average="macro")
f1

## Generating a sample submission file

In [None]:
# Random baseline submission file
test = pd.read_csv("./data/dataset/test/metadata.csv")

# Select the columns
cols = ["file_name", "label"]
test = test[cols].rename(columns={"file_name": "ID"})

# Assign random labels from 0 to 4 to `label`
test["label"] = np.random.randint(0, 4, size=len(test))

# Save the submission file
test.to_csv("random_baseline_submission.csv", index=False)

---