# Demo Notebook: Flood Detection with Sen1Floods11 Dataset on SharingHub

## Objective

This notebook demonstrates how to organize, train, and use a deep learning model for image segmentation through the flood detection using the [**Sen1Floods11 dataset**](https://github.com/cloudtostreet/Sen1Floods11?tab=readme-ov-file), leveraging the [**SharingHub**](https://sharinghub.p2.csgroup.space/#/) platform.

### Key Features of this Demo:
1. **Dataset Management**: Retrieve and version datasets stored on SharingHub, using DVC and GitLab integration.
2. **Experiment Tracking**: Configure and use MLflow for tracking training experiments, directly linked to the GitLab repository.
3. **Model Training**: Train the flood detection model with data streamed from SharingHub’s STAC API or its DVC remote.
4. **Model Inference**: Use the trained model to perform segmentation on unseen data, predicting flooded areas.
5. **Automation**: Streamline operations with Docker containers and CWL workflows for reproducibility.

---

## Workflow Overview

### 1. **Training the Model**
- Store the model in onnx format.
- Get metrics to evaluate the model performances.
- Track the training progress using MLflow, with runs linked to the model’s GitLab repository.

### 2. **Inference**
- Perform segmentation on Sentinel-1 images to detect flooded areas.
- Visualize results to evaluate model performance.

### 3. **Reproducibility**
- Use Docker containers and CWL workflows for consistent environment setup and execution.

---

# prerequisites

Before entering the demonstration, make sure you have followed the tutorial in the [readme](../README.md) to be sure you have correctly configured your environment as well as your environment variables

# 1. **Training the model**

In [None]:
!poetry run python3 src/train.py

# 2. **Inference**

In [None]:
!poetry run python3 src/inference.py checkpoints/Sen1Floods11_0_0.5194225907325745.onnx inference/India_80221_S1Hand.tif

# 3. **Reproducibility**
## **Docker**

### Create docker image for training from Dockerfile.train

Add your credentials in a .env file and run the docker image:
-   MLFLOW_TRACKING_TOKEN=
-   LOGNAME=
-    ACCESS_KEY_ID= 
-    SECRET_ACCESS_KEY= 


In [None]:
!docker build -f Dockerfile.train -t train .

### Streaming Mode in `train`

The *streaming* mode in `train` allows data to be downloaded on the fly, image by image, during training. Then the downloaded data is stored in the cache. This eliminates the need to download the **entire** dataset, saving both time and storage space. Data is progressively loaded from remote storage via DVC, reducing memory usage and optimizing the training process. 


### No Cache Mode in `train`

If *no cache* mode in `train` is enabled with the *streaming mode*, the data is downloaded on the fly, but not saved locally in the cache. This mode is ideal for handling large datasets without overwhelming local resources.

In [None]:
!docker run -it --env-file .env train bash

### Create docker image for inference from Dockerfile.inf

In [None]:
!docker build -f Dockerfile.inf -t inference .

### Run the docker image for inference

In [None]:
!docker run inference checkpoints/Sen1Floods11_0_0.5194225907325745.onnx inference/India_80221_S1Hand.tif && ls predictions/

## **CWL**

### Run the docker image using cwl with custom parameters saved in run_inference_input.yml

In [None]:
!cwltool run_inference.cwl run_inference_input.yml 