# HLS Foundation Model Finetuning notebook

This notebook demonstrates the steps to fintune the HLS foundation model (A.K.A Prithvi) which is trained using HLSL30 and HLSS30 datasets. 

Note: Entierty of this notebook is desigend to work well within the Julich Supercomputing Center (JSC).


In [None]:
# Clone HLS Foundation OS from github.
! git clone https://github.com/nasa-impact/hls-foundation-os.git

In [None]:
# Install required packages
!cd hls-foundation-os
!pip install -r requirements.txt

# Create directories needed for data, model, and config preparations
!mkdir datasets
!mkdir models
!mkdir configs

## Dataset preparation

For this hands-on session, Burn Scars example will be used for fine-tuning. All of the data and pre-trained models are available in Huggingface. Huggingface packages and git will be utilized to download, and prepare datasets and pretrained models.


Note: Git Large File Storage (git LFS) is utilized to download larger files from huggingface.

In [None]:
# Install git lfs
! sudo apt-get install git-lfs; git lfs install

### Download HLS Burn Scars dataset from Huggingface: https://huggingface.co/datasets/ibm-nasa-geospatial/hls_burn_scars

In [None]:
! cd datasets; git clone https://huggingface.co/datasets/ibm-nasa-geospatial/hls_burn_scars; tar -xvzf hls_burn_scars/hls_burn_scars.tar.gz 

## Download config and Pre-trained model

The HLS Foundation Model (pre-trained model), and configuration for Burn Scars downstream task are available in Huggingface. We use `huggingface_hub` python package to download the files locally.

In [None]:
# Define constants
BUCKET_NAME = '<your-bucket-name>' # Replace this with the bucket name available from http://smd-ai-workshop-creds-webapp.s3-website-us-east-1.amazonaws.com/ 
CONFIG_PATH = './configs'
DATASET_PATH = './datasets'
MODEL_PATH = './models'

In [None]:
# Download config file from huggingface
from huggingface_hub import hf_hub_download

hf_hub_download(repo_id="ibm-nasa-geospatial/Prithvi-100M-burn-scar", filename="burn_scars_Prithvi_100M.py", local_dir='./configs')

Note: The configuration file in Huggingface have place holders and won't be directly usable for fine-tuning. Placeholder values need to be updated for individual usecases.

In [None]:
# Download pre-trained model file from huggingface
hf_hub_download(repo_id="ibm-nasa-geospatial/Prithvi-100M", filename="Prithvi_100M.pt", local_dir='./models')

*Warning: * Before running the remaining cells please update the details in the configuration file as mentioned below:

1. Update line number 13 from `data_root = '<path to data root>'` to `data_root = '/p/project/training2411/<user>'`. Note: Please replace `<user>` with your username
2. Update line number 41 from `pretrained_weights_path = '<path to pretrained weights>'` to `pretrained_weights_path = f"{data_root}/models/Prithvi_100M.pt"`. This provides the pre-trained model path to the train script.
3. Update line number 53 from `experiment = '<experiment name>'` to `experiment = 'burn_scars'` or your choice of experiment name.
4. Update line number 54 from `project_dir = '<project directory name>'` to `project_dir = 'v1'` or your choice of project directory name. 
5. Save the config file.

# Submit Training Job
In the `train_job.sh` you can specify the number of nodes you want to use for training. As an example, you are going to use 2 nodes for training.

Check details of the training job:

`cat /p/project/training2411/$USER/HDCRS-school-2024/train_job.sh`

*Note: Please replace <identifier> with a proper name for the run, <user> with your username, and <config file path> with the full configuration file path and save the file.*

The training job can then be submitted using the `sbatch` command. Like so: `sbatch train_job.sh`

Once submitted, two new files will be created by the process: `output.out` and `error.err`. `output.out` will contain details of the output from your processes, and `error.err` will provide details on any errors from the scripts. Once the job is submitted and the files are created, you can check for updates simply by using `tail -f output.out error.err`. (Any warnings, automated messages, and errors are tracked in the `error.err` file while only the [ed. note: incomplete sentence]

You can see how good or bad the model training is by watching the loss outputs in `output.out`.

# Uploading the Model to a Cloud Environment

After the model is finished training, the model is stored in the location specified in your config file `/p/project/training2206/<username>/HDCRS-school-2024/models/<experiment>/best_mIoU_epoch_*.pth`. `*` is the latest best performing epoch number. You will be taking this model and pushing it to an S3 bucket using `boto3` and the credentials from the AWS account shared with you.

## Get AWS credentials
Account creation links should have been shared with you. Once the account is setup, you can obtain the credentials required for upload from the AWS SSO homepage.
Please follow the steps listed below:

1. Navigate to https://nasa-impact.awsapps.com/start
2. Login
3. Click on `AWS Account`
4. Click on `summerSchool`
5. Click on `Command line or Programmatic access`
6. Copy the `AWS Access Key Id`, `AWS Secret Access Key`, and `AWS session token` from the pop up
7. Update the following script and run it in a python shell. (You can start a python shell by just typing `python` in the terminal).

*Note: Please make sure the virtual environment is active while working with the python shell.

In [None]:
import boto3 
import os

AWS_ACCESS_KEY_ID = <Copied over from SSO login>
AWS_SECRET_ACCESS_KEY = <Copied over from SSO login>
AWS_SESSION_TOKEN = <Copied over from SSO login>

BUCKET_NAME = 'workshop-'

USER = os.environ.get('USER')

def generate_federated_session():
    """
    Method to generate federated session to upload the file from HPC to S3 bucket.
    ARGs:
        filename: Upload filename
    Returns: 
        Signed URL for file upload 
    """
    return boto3.session.Session(
            aws_access_key_id=AWS_ACCESS_KEY_ID,
            aws_secret_access_key=AWS_SECRET_ACCESS_KEY,
            aws_session_token=AWS_SESSION_TOKEN
        )

model_filename = f"/p/project/training2411/<username>/HDCRS-school-2024/models/<project>/<experiment>/best_mIoU_epoch_*.pth"
config_filename = f"/p/project/training2411/<username>/hls-foundation-os/configs/multi_temporal_crop_classification.py" # please replace this with path of the config file used for fine-tuning.

model_basename = os.path.basename(model_filename)
config_basename = os.path.basename(config_filename)

session = generate_federated_session()
s3_connector = session.client('s3')
s3_connector.upload_file(model_filename, BUCKET_NAME, f"models/{USER}_{model_basename}.pth")
s3_connector.upload_file(config_filename, BUCKET_NAME, f"configs/{USER}_{config_basename}.pth")