<a href="https://colab.research.google.com/github/evalevanto/Indaba-2024-GeoAI-Challenge/blob/main/bootstrap_geoai_challenge_2024.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>


# Starter Code

In this notebook, we want to use an open source Vision-LLM to label satellite imagery in Africa. We will cover the following:

1. Install the required dependencies.
2. Download, extract, and load the dataset.
3. Load the (quantized) model.
4. Create a pipeline to label the images.
5. Export performance metrics on the training set.
6. Export a sample submission file on the test set that we can submit to Kaggle.




# Installing Dependencies


In [None]:
!pip install -q -U transformers==4.37.2
!pip install -q bitsandbytes==0.41.3 accelerate==0.25.0
!pip install -q datasets
!pip install -q evaluate
!pip install -q scikit-learn
!pip install -q gdown
!pip install -q seaborn

In [None]:
import re
import random
from pathlib import Path
from random import *
from datetime import *
from tqdm import tqdm

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

from datasets import Image, load_dataset

import torch
from transformers import pipeline
from transformers.pipelines.pt_utils import KeyDataset
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig

from sklearn.metrics import confusion_matrix, f1_score

# Data Preparation


In [None]:
# Download and extract the dataset
!mkdir -p data/
!gdown -O data/dataset.zip https://drive.google.com/uc?id=1jw8vz6KvBm5u0RZExTEB4Zk1zjklEvea
!unzip data/dataset.zip -d data/

In [None]:
# Visualize some training patches
root = Path("./data/dataset/train")
assert root.exists()

In [None]:
# Load the training metadata file
train_md = pd.read_csv(root / "metadata.csv")
train_md.head()

In [None]:
# Check the label distribution
_ = train_md['label'].value_counts().plot(kind='bar')

We plot a few examples by label:

In [None]:
def plot(d, label, n=10):
    imgs = d.loc[d["label"] == label, "file_name"]
    n_ = n if n < len(imgs) else len(imgs)
    imgs = imgs.sample(n_).tolist()
    fig, axes = plt.subplots(1, n_, figsize=(20, 5))
    for i, ax in enumerate(axes):
        ax.imshow(plt.imread(root / imgs[i]))
        ax.axis("off")
        ax.set_title(imgs[i])
    plt.show()

In [None]:
plot(train_md, "runway")

In [None]:
plot(train_md, "port")

In [None]:
plot(train_md, "multi-unit_residential")

In [None]:
plot(train_md, "impoverished_settlement")

Let's load the dataset now:

In [None]:
# Add a placeholder `label` for the `test` dataset
test_md = pd.read_csv("./data/dataset/test/metadata.csv")
test_md["label"] = "o"
test_md.to_csv("./data/dataset/test/metadata.csv", index=False)

In [None]:
data_path = 'africa_dataset'
train_ = load_dataset("imagefolder", data_path, split='train')

# Model Loading

In [None]:
# Set quantization
quantization_config = BitsAndBytesConfig(load_in_4bit=True, bnb_4bit_compute_dtype=torch.float16)

# Set the model name
model_id = "llava-hf/llava-1.5-7b-hf"

# Create pipeline
pipe = pipeline("image-to-text", model=model_id, trust_remote_code=True, model_kwargs={"quantization_config": quantization_config})

# Prompting


We create a map from indices to labels.

You may use custom indexes to represent the label strings **BUT remember** to provide the **label strings** in the submission file.




In [None]:
idx_to_label_map = {
0: "single-unit residential",
1: "storage tank",
2: "place of worship",
3: "ground transportation station",
4: "airport hangar",
5: "toll booth",
6: "dam",
7: "educational institution",
8: "surface mine",
9: "road bridge",
10: "hospital",
11: "prison",
12: "electric substation",
13: "military facility",
14: "multi-unit residential",
15: "airport",
16: "oil or gas facility",
17: "helipad",
18: "police station",
19: "runway",
20: "railway bridge",
21: "impoverished settlement",
22: "shopping mall",
23: "port",
24: "water treatment facility",
25: "factory or powerplant",
26: "interchange",
27: "airport terminal",
28: "smokestack",
29: "office building",
30: "gas station",
31: "wind farm"
}

In [None]:
# Set the maximum number of model output tokens
max_new_tokens = 10

prompt = f"""
USER: <image>
Given the following classes: {str(idx_to_label_map)}.\nYour task is to analyze the image, identify the primary category that best matches the image content, and return only the class key as an int.\nASSISTANT: Class key:"""

Next, we define a post-processing to get the first mentioned object type:

In [None]:
def process_result(input_string):
    print(input_string)
    match = re.search(r'ASSISTANT:.*?(\d+)', input_string)
    if match: return int(match.group(1))
    return -1

We create a random batch of defined `batch_size` to run inference on:

In [None]:
import random
random.seed(1337)
batch_size = 32

curr_batch = train_.select(random.sample(range(len(train_)), batch_size))

We iterate over the images and label them using the Vision-LLM:

[Batched inference?](https://github.com/huggingface/transformers/blob/a49f4acab3c1eea82907e12f82eafbd4673deb39/tests/models/llava/test_modeling_llava.py#L245)

In [None]:
outputs = []

prepped_dataset = KeyDataset(curr_batch, "image")
for out in tqdm(pipe(prepped_dataset, prompt=prompt, generate_kwargs={"max_new_tokens": max_new_tokens}), total=len(prepped_dataset)):
    outputs.append(process_result(out[0]['generated_text']))

curr_batch = curr_batch.add_column('y_hat', outputs)

Let's map predicted indices back to original labels

In [None]:
curr_batch = curr_batch.map(lambda x: {'y_hat': idx_to_label_map.get(x['y_hat'], 'unk')})


# Evaluating the batch


We calculate the confusion matrix and visualize accuracy per class

In [None]:
targets = curr_batch['label']
predictions = curr_batch['y_hat']

cm = confusion_matrix(targets, predictions)
class_acc = cm.diagonal() / cm.sum(axis=1)
macro_acc = class_acc.mean()


fig, ax = plt.subplots(figsize=(10, 10))
sns.heatmap(cm, annot=True, fmt='d', cmap='viridis', ax=ax)
fig.suptitle('Confusion Matrix\n Macro Accuracy: {:.2f}'.format(macro_acc))
plt.show()


We calculate macro f1. This is the evaluation metric for this competition.

In [None]:
f1 = f1_score(targets, predictions, average='macro')
f1

# Generating submission file

In [None]:
def generate_sumbission_file(test_dataset, pred_column):
    test_df = test_dataset.to_pandas()
    # we keep only the 'file_name' and pred_column
    test_df = test_df[['file_name', pred_column]]
    test_df.rename(columns={pred_column: 'label', 'file_name': 'ID'}, inplace=True)
    assert test_df['label'].dtype == 'O' and not test_df['label'].str.isdigit().any(), 'Warning: label is not of type str or contains digits. Remember to provide the label strings.'

    test_df.to_csv('submission.csv', index=False)
