![NASA logo](https://upload.wikimedia.org/wikipedia/commons/thumb/e/e5/NASA_logo.svg/110px-NASA_logo.svg.png) ![IBM Research logo](https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcSwHxsDwxcOHsQUD2pghQ32j90pzsZLcOujpGCyU1yE&s)

# Geospatial Foundation Model: Burn scar fine-tuning

This is an example of how to fine-tune a model to map burn scars from HLS data using the IBM Geospatial Foundation models as a starting point.  

To run a fine-tuning experiment for flood mapping we will use the MMSegmentation library (https://github.ibm.com/GeoFM-Finetuning/mmsegmentation) to fine-tune a model starting from the geospatial foundation model trained on HLS data.

The following notebook assumes that you project files are placed in folder on the shared volume in the following folder structure:
```
configs                   - folder to place experiment configuration files
fine-tune-checkpoints     - folder where training outputs will be generated
GFM-Models                - folder containing the checkpoint files from the pre-trained GFM
inference                 - folder where we will carry out our inference tasks
training_data             - folder containing the training dataset (including labels and test/train splits etc)
```

You then create you configuration script, before submitting to the cluster to run.  The notebook will then guide you to: 
* monitor and visualise the training, 
* run the test tasks
* use the trained model for local inference.


In [None]:
import json
import pandas as pd
import glob
import matplotlib.pyplot as plt
import os
import subprocess
from pprint import pprint
from dotenv import load_dotenv
import datetime
import string
import sys
import rasterio
from rasterio.plot import show

sys.path.append('../')
import geoft

# Load environment variables
load_dotenv()

# Grab cluster details
login_url, namespace, path_to_shared_volume = geoft.get_cluster_details()

# Create S3 client (for pulling data and model weights)
aws_access_key_id = os.getenv("AWS_ACCESS_KEY_ID")
aws_access_key_secret = os.getenv("AWS_ACCESS_KEY_SECRET")

s3 = geoft.create_s3_client(aws_access_key_id, aws_access_key_secret)

# S3 bucket where data and model weights reside
bucket_name = "nasa-gfm-summer-school"


In [None]:
# import importlib
# importlib.reload(geoft)

In [None]:
#------- Define the project name you wish to use

project_name = "burn"

## Project setup

If we are starting a new fine-tuning project, we can create a new set of folders and download the training data+labels and the pre-train foundation model weights.  We create the folder structure described above, then pull the data and weights from an S3 bucket.

In [None]:
#------- Create project folder structure 
geoft.create_project_folders(project_name)

In [None]:
#------- Download the pre-trained model weights
model_name = 'epoch-832-loss-0.0473.pt' # best for burn scar mapping
s3.download_file(bucket_name, 'gfm-models/' + model_name, path_to_shared_volume + project_name + '/gfm-models/' + model_name)


In [None]:
#------- Download the training data
dataset = 'burn-scars'

training_data_path = path_to_shared_volume + project_name + '/training-data/'

# Download training data
subfolder = 'training/'
geoft.download_s3_dir(dataset + '/' + subfolder, training_data_path, bucket_name, client=s3, number_of_files=200)

# Download training data labels
subfolder = 'validation/'
geoft.download_s3_dir(dataset + '/' + subfolder, training_data_path, bucket_name, client=s3, number_of_files=50)


## Creating fine-tuning configuration

![Fine-tune architecture](../images/finetune_arch.png)


To configure the fine-tuning, we define a configuration by creating a config file (example below).  You can edit the python scripts in the jupyterlab editor, in a cell and write with `%%writefile` or edit locally and upload. 


In [None]:
conf = {'gfm_ckpt': 'epoch-832-loss-0.0473.pt',
        'loss_function': '''type='DiceLoss', use_sigmoid=False, loss_weight=1''',
        'batch_size': '4',
        'learning_rate': '6e-5',
        'aux_head': 'True',
        'decode_head_conv': '1',        
        'num_epochs': 50,
        'number_training_files': 150,
        'project_name': project_name}


In [None]:
experiment_name, experiment_filepath = geoft.generate_config(project_name, conf, "burn_config.py.template")

In [None]:
geoft.view_config(experiment_filepath)


## Submitting fine-tuning job to run

The first thing you need to do in order to submit a training job to the cluster is login to the cluster.  This will only need to be done once per 24 hours.

Run the cell below (`login_url`), and click on the generated url.

Authenticate, then copy and paste the `oc login` command into the cell below (with `%%sh` at the top) and this will log you in to the cluster and allow you submit and monitor jobs.


In [None]:
login_url

In [None]:
%%sh
oc login --token=sha256~fw9M6wcxufG5aqBv7IcLkRj7Z0aMKetpqTiajOc4yg4 --server=https://api.codeflare.xx6d.p1.openshiftapps.com:6443


In [None]:
mcad_id = geoft.submit_tune(project_name,
                namespace,
                experiment_name,
                image='quay.io/bedwards-ibm/mmsegmentation-geo:latest',
                num_gpus=1,
                memory_mb=28000)

print(mcad_id)

## Monitoring training job
Once you have submitted the job to the cluster, we can monitor it using the following commands.


In [None]:
%%sh
torchx list -s kubernetes_mcad

In [None]:
check_log_cmd = '''torchx log ''' + str(mcad_id) +  ''' | tail -n20'''
os.system(check_log_cmd)


## Viewing the training metrics

Now that we have run (or at least are running) the experiment, we can view the training metrics.  To do this we will load the log file and extract the metrics to a dataframe (`val_df`).

In [None]:
train_df, val_df = geoft.load_tune_metrics(project_name, experiment_name)

In [None]:
plt.figure().set_figwidth(15)
plt.subplot(1, 2, 1)
plt.plot(train_df.index, train_df.loss, '-r');
plt.ylabel('Training Loss');
# plt.yscale('log')

plt.subplot(1, 2, 2)
plt.plot(train_df.index, train_df.loss_val, '-b');
plt.ylabel('Validation Loss');

## Test output model

In [None]:
test_mcad_id = geoft.submit_test(project_name,
                        namespace,
                        experiment_name,
                        checkpoint='latest.pth',
                        num_gpus=1,
                        memory_mb=8000)


In [None]:
test_metrics = geoft.get_test_metrics(project_name, experiment_name)

## Running inference using the trained model

Once we have a trained model, we can use it to run inference on other images.

In [None]:
infer_mcad_id = geoft.submit_inference(project_name,
                namespace,
                experiment_name,
                checkpoint='latest.pth',
                image='quay.io/bedwards-ibm/mmsegmentation-geo:latest',
                num_gpus=1,
                memory_mb=8000)

## Visualizing the predicitons

In [None]:
!pip install rasterio folium

In [None]:
import folium
import folium.plugins as plugins
import numpy as np

def colorize(array, cmax, cmin=0, cmap="rainbow"):
    """Converts a 2D numpy array of values into an RGBA array given a colour map and range.
    Args:
        array (ndarray):
        cmax (float): Max value for colour range
        cmin (float): Min value for colour range
        cmap (string): Colour map to use (from matplotlib colourmaps)
    Returns:
            rgba_array (ndarray): 3D RGBA array which can be plotted.
    """
    normed_data = (array - cmin) / (array.max() - cmin)
    cm = plt.cm.get_cmap(cmap)
    return cm(normed_data)



In [None]:
inference_files = sorted(glob.glob('/opt/app-root/src/data/' + project_name + '/inference/*.tif'))
inference_files

In [None]:
filenum = 0
original_file = inference_files[filenum]
predict_file = inference_files[filenum].replace('/inference','/inference/pred/' + experiment_name).replace('.tif','_pred.tif')

# Load the original image layer
with rasterio.open(original_file) as src:
    redArray = src.read(1)
    greenArray = src.read(2)
    blueArray = src.read(3)
    bounds = src.bounds
    nd = src.nodata
    midLat = (bounds[3] + bounds[1]) / 2
    midLon = (bounds[2] + bounds[0]) / 2
    im_rgb = np.moveaxis(np.array([redArray,greenArray,blueArray]), 0, -1)/2048
    # im_rgb = im_rgb/np.max(im_rgb)


# Create the map
m = folium.Map(location=[midLat, midLon], tiles='openstreetmap', max_zoom=22)

# Add the prediciton layer to the map
with rasterio.open(predict_file) as src:
    dataArray = src.read(1)
    bounds = src.bounds
    nd = src.nodata

# cmax = np.max(dataArray)
cmax = 1000
dataArrayMasked = np.ma.masked_where(dataArray == nd, dataArray)
dataArrayMasked = np.ma.masked_where(dataArray == 0, dataArrayMasked)
imc = colorize(dataArrayMasked, cmax, cmin=0, cmap="viridis")

# Add the layers to the map
pred = folium.raster_layers.ImageOverlay(imc, [[bounds[1], bounds[0]], [bounds[3], bounds[2]]], name="Prediction", opacity=0.8)
orig = folium.raster_layers.ImageOverlay(im_rgb, [[bounds[1], bounds[0]], [bounds[3], bounds[2]]], name="Original image", opacity=1.0)

orig.add_to(m)
pred.add_to(m)

folium.LayerControl().add_to(m)
m.fit_bounds(bounds)

m