# Data Driven AI for Remote Sensing Hackathon - Sample Notebook for experimentation


### Install the required packages (skip if already installed)

In [10]:
!pip install -r requirements.txt


Collecting git+https://github.com/IBM/terratorch.git (from -r requirements.txt (line 17))
  Cloning https://github.com/IBM/terratorch.git to /tmp/pip-req-build-f1z45yi3
  Running command git clone --filter=blob:none --quiet https://github.com/IBM/terratorch.git /tmp/pip-req-build-f1z45yi3
  Resolved https://github.com/IBM/terratorch.git to commit 2683140e3862954f62212df3417208692c6b879a
  Installing build dependencies ... [?25ldone
[?25h  Getting requirements to build wheel ... [?25ldone
[?25h  Preparing metadata (pyproject.toml) ... [?25ldone


### Install necessary libraries (skip if already installed)

In [11]:
!sudo apt-get update && sudo apt-get install ffmpeg libsm6 libxext6  -y

Hit:1 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64  InRelease
Hit:2 http://archive.ubuntu.com/ubuntu jammy InRelease                      
Hit:3 http://archive.ubuntu.com/ubuntu jammy-updates InRelease
Hit:4 http://security.ubuntu.com/ubuntu jammy-security InRelease
Hit:5 http://archive.ubuntu.com/ubuntu jammy-backports InRelease
Reading package lists... Done
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
libsm6 is already the newest version (2:1.2.3-1build2).
libxext6 is already the newest version (2:1.3.4-1build1).
ffmpeg is already the newest version (7:4.4.2-0ubuntu0.22.04.1).
0 upgraded, 0 newly installed, 0 to remove and 14 not upgraded.


### Test the pytorch GPU setup

In [12]:
import torch
import time

def test_pytorch_gpu():
    print(f"PyTorch version: {torch.__version__}")
    
    # Check if CUDA is available
    if torch.cuda.is_available():
        print("CUDA is available. GPU can be used.")
        device = torch.device("cuda")
        print(f"Current CUDA device: {torch.cuda.current_device()}")
        print(f"GPU name: {torch.cuda.get_device_name(0)}")
    else:
        print("CUDA is not available. GPU cannot be used.")
        return
    
    # Create a large tensor on GPU
    size = 5000
    x = torch.randn(size, size, device=device)
    y = torch.randn(size, size, device=device)
    
    # Perform matrix multiplication
    start_time = time.time()
    result = torch.matmul(x, y)
    end_time = time.time()
    
    print(f"Matrix multiplication of {size}x{size} tensors took {end_time - start_time:.4f} seconds")
    
    # Verify the result
    print(f"Result shape: {result.shape}")
    print(f"Result sum: {result.sum().item():.4f}")
    
    # Move result back to CPU for further processing if needed
    result_cpu = result.cpu()
    print(f"Result successfully moved back to CPU. Shape: {result_cpu.shape}")


test_pytorch_gpu()

PyTorch version: 2.3.1.post300
CUDA is available. GPU can be used.
Current CUDA device: 0
GPU name: Tesla T4
Matrix multiplication of 5000x5000 tensors took 0.0003 seconds
Result shape: torch.Size([5000, 5000])
Result sum: -232564.9375
Result successfully moved back to CPU. Shape: torch.Size([5000, 5000])


### Create directories needed for data, model, and config preparations

In [13]:
!mkdir datasets
!mkdir models
!mkdir configs

mkdir: cannot create directory ‘datasets’: File exists
mkdir: cannot create directory ‘models’: File exists
mkdir: cannot create directory ‘configs’: File exists


### install git-lfs and clone the datasets

In [14]:
! sudo apt-get install git-lfs; git lfs install

Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
git-lfs is already the newest version (3.0.2-1ubuntu0.2).
0 upgraded, 0 newly installed, 0 to remove and 14 not upgraded.
Updated git hooks.
Git LFS initialized.


In [15]:
### Clone the dataset. Should use this dataset for training.

In [16]:
! cd datasets; git clone https://huggingface.co/datasets/Muthukumaran/fire_scars_hackathon_dataset

fatal: destination path 'fire_scars_hackathon_dataset' already exists and is not an empty directory.


### Unzip the dataset and move it to the datasets directory. Takes a while to download and unzip.

In [17]:
! cd datasets; ! tar -xvzf fire_scars_hackathon_dataset/fire_scars_train_val.tar.gz

fire_scars_train_val/
fire_scars_train_val/train/
fire_scars_train_val/train/subsetted_512x512_HLS.S30.T10SEH.2018280.v1.4.mask.tif
fire_scars_train_val/train/subsetted_512x512_HLS.S30.T10SEH.2018280.v1.4_merged.tif
fire_scars_train_val/train/subsetted_512x512_HLS.S30.T10SEH.2019305.v1.4.mask.tif
fire_scars_train_val/train/subsetted_512x512_HLS.S30.T10SEH.2019305.v1.4_merged.tif
fire_scars_train_val/train/subsetted_512x512_HLS.S30.T10SEH.2020190.v1.4.mask.tif
fire_scars_train_val/train/subsetted_512x512_HLS.S30.T10SEH.2020190.v1.4_merged.tif
fire_scars_train_val/train/subsetted_512x512_HLS.S30.T10SEH.2020285.v1.4.mask.tif
fire_scars_train_val/train/subsetted_512x512_HLS.S30.T10SEH.2020285.v1.4_merged.tif
fire_scars_train_val/train/subsetted_512x512_HLS.S30.T10SEJ.2018185.v1.4.mask.tif
fire_scars_train_val/train/subsetted_512x512_HLS.S30.T10SEJ.2018185.v1.4_merged.tif
fire_scars_train_val/train/subsetted_512x512_HLS.S30.T10SEJ.2018220.v1.4.mask.tif
fire_scars_train_val/train/subsetted_5

### Modify the model config file. 

# **Note:** You SHOULD change the config file to play with the training parameters. Also, change the paths within `< >` to the correct paths.

## Run the training using terratorch.
#### This will take a while to complete.
#### The training logs will be saved to the EFS mount point.

In [None]:
!terratorch fit --config logs/fire_scars/version_9/config.yaml 

2024-10-24 04:25:08.447146: I tensorflow/core/util/port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2024-10-24 04:25:08.523442: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:485] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-10-24 04:25:08.546209: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:8454] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-10-24 04:25:08.553565: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1452] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-10-24 04:25:08.576318: I tensorflow/core/platform/cpu_feature_guar

In [None]:
#iou score

In [34]:
import wandb
api = wandb.Api()

sweep = api.sweep("intern_test/rsds-hackathon-24/7b4a6qni")
runs = sorted(sweep.runs,
  key=lambda run: run.summary.get("val_acc", 0), reverse=True)
val_acc = runs[0].summary.get("val_acc", 0)
print(f"Best run {runs[0].name} with {val_acc}% validation accuracy")

runs[0].file("model.h5").download(replace=True)
print("Best model saved to model-best.h5")

CommError: Could not find sweep <Sweep intern_test/rsds-hackathon-24/7b4a6qni (Unknown State)>

In [32]:
import torch
import torchmetrics
import wandb
import lightning.pytorch as pl

class YourModel(pl.LightningModule):
    def _init_(self):
        super()._init_()
        self.model = rsds-hackathon-24/logs/fire_scars/version_17/config.yaml  # Define your model architecture here

        # Use torchmetrics to calculate IoU for 2 classes (binary segmentation)
        self.train_iou = torchmetrics.JaccardIndex(num_classes=2, ignore_index=None)  # Remove ignore_index for proper IoU calculation
        self.val_iou = torchmetrics.JaccardIndex(num_classes=2, ignore_index=None)

    def forward(self, x):
        return self.model(x)  # Ensure this returns model predictions (e.g., logits or sigmoid output)

    def training_step(self, batch, batch_idx):
        images, masks = batch
        preds = self(images)
        
        # IoU calculation for training
        iou_value = self.train_iou(preds, masks)

        # Log IoU to wandb
        self.log("train/iou", iou_value, on_step=True, on_epoch=True, prog_bar=True, logger=True)

        return {"train_iou": iou_value}

    def validation_step(self, batch, batch_idx):
        images, masks = batch
        preds = self(images)
        
        # IoU calculation for validation
        iou_value = self.val_iou(preds, masks)

        # Log IoU to wandb
        self.log("val/iou", iou_value, on_step=False, on_epoch=True, prog_bar=True, logger=True)

        return {"val_iou": iou_value}
    
    def configure_optimizers(self):
        optimizer = torch.optim.AdamW(self.parameters(), lr=0.0003)
        lr_scheduler = torch.optim.lr_scheduler.ReduceLROnPlateau(optimizer, mode="min", patience=5)
        return {
            "optimizer": optimizer,
            "lr_scheduler": {
                "scheduler": lr_scheduler,
                "monitor": "val/iou"
            }
        }

# wandb initialization
wandb.init(project="fire_scars")

# Instantiate the model
model = YourModel()

# Instantiate the trainer
trainer = pl.Trainer(
    accelerator="auto",
    precision=16,
    logger=pl.loggers.WandbLogger(project="fire_scars"),
    max_epochs=50
)


/opt/conda/lib/python3.11/site-packages/lightning/fabric/connector.py:571: `precision=16` is supported for historical reasons but its usage is discouraged. Please set your precision to 16-mixed instead!
Using 16bit Automatic Mixed Precision (AMP)
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs


In [38]:
def calculate_iou(y_true, y_pred):
    intersection = np.logical_and(y_true, y_pred)
    union = np.logical_or(y_true, y_pred)
    iou_score = np.sum(intersection) / np.sum(union)
    return iou_score

In [36]:
# Load your dataset with additional required arguments
data_module = GenericNonGeoSegmentationDataModule(
    train_data_root='rsds-hackathon-24/datasets/fire_scars_train_val/train/',
    val_data_root='rsds-hackathon-24/datasets/fire_scars_train_val/validation/',
    test_data_root='rsds-hackathon-24/datasets/fire_scars_train_val/test/',  # Specify test data root
    img_grep="*.tif",  # Assuming your images are .tif files
    label_grep="*.mask.tif",  # Assuming your label masks are .mask.tif files
    means=[0.485, 0.456, 0.406],  # Example ImageNet means
    stds=[0.229, 0.224, 0.225],  # Example ImageNet stds
    num_classes=2,  # Assuming a binary segmentation task (fire/no fire)
    batch_size=4,
    num_workers=4
)

data_module.prepare_data()

# Use 'fit' to set up both training and validation data
data_module.setup(stage='fit')

# Alternatively, if you only want to set up validation or test, use:
# data_module.setup(stage='validate')
# data_module.setup(stage='test')

def display_ground_truth_and_predictions(model, data_module):
    model.eval()  # Set the model to evaluation mode

    with torch.no_grad():
        for batch in data_module.val_dataloader():
            images, labels = batch['image'], batch['label']
            images = images.to(device)  # Send to GPU if available
            labels = labels.to(device)
            
            # Get model predictions
            predictions = model(images)
            predicted_masks = torch.argmax(predictions, dim=1)  # Assuming softmax output

            # Calculate IoU
            iou = FM.iou(predicted_masks, labels, num_classes=2)

            # Display results
            fig, axs = plt.subplots(3, 4, figsize=(15, 10))
            for i in range(min(4, len(images))):  # Display up to 4 images
                # Original image
                axs[0, i].imshow(images[i].cpu().permute(1, 2, 0).numpy())
                axs[0, i].set_title('Original Image')
                axs[0, i].axis('off')

                # Ground truth
                axs[1, i].imshow(labels[i].cpu().numpy(), cmap='gray')
                axs[1, i].set_title('Ground Truth Mask')
                axs[1, i].axis('off')

                # Predicted mask
                axs[2, i].imshow(predicted_masks[i].cpu().numpy(), cmap='gray')
                axs[2, i].set_title(f'Predicted Mask (IoU: {iou[i]:.2f})')
                axs[2, i].axis('off')

            plt.tight_layout()
            plt.show()
            break  # Remove this if you want to process all validation data

# Assuming 'model' is your trained model
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model.to(device)

# Display ground truth and predictions
display_ground_truth_and_predictions(model, data_module)


MisconfigurationException: GenericNonGeoSegmentationDataModule.val_dataset has length 0.