    # Demo Notebook: Flood Detection with Sen1Floods11 Dataset on SharingHub

## Objective

This notebook demonstrates how to organize, train, and use a deep learning model for image segmentation through the flood detection using the [**Sen1Floods11 dataset**](https://github.com/cloudtostreet/Sen1Floods11?tab=readme-ov-file), leveraging the [**SharingHub**](https://sharinghub.develop.eoepca.org) platform.

### Key Features of this Demo:
1. **Dataset Management**: Retrieve datasets stored on SharingHub, using DVC and GitLab integration.
2. **Experiment Tracking**: Configure and use MLflow for tracking training experiments, directly linked to the GitLab repository.
3. **Model Training**: Train the flood detection model with data downloaded from its DVC remote.
4. **Model Inference**: Use the trained model to perform segmentation on unseen data, predicting flooded areas.
5. **Automation**: Streamline operations with Docker containers and CWL workflows for reproducibility.

---

## Workflow Overview

### 1. **Sen1Floods11 Dataset**
- Dataset presentation.
- Dataset access.

### 2. **Training the Model**
- Process data.
- Store the model in onnx format.
- Get metrics to evaluate the model performances.
- Track the model’s performance using the metrics on MLflow UI.

### 3. **Inference**
- Perform segmentation on Sentinel-1 images to detect flooded areas.
- Visualize results to evaluate model performance.

### 4. **Reproducibility**
- Use Docker containers and CWL workflows for consistent environment setup and execution.

---

### **prerequisite**

Install your poetry environment.

In [None]:
!poetry install --no-root

# 1. **Sen1Floods11 Dataset**

### **Presentation**

Sen1Floods11: a georeferenced dataset to train and test deep learning flood algorithms for Sentinel-1 (Example). This data was generated by Cloud to Street, a Public Benefit Corporation: https://www.cloudtostreet.info/. For questions about this dataset or code please email support@cloudtostreet.info. Please cite this data as:

Bonafilia, D., Tellman, B., Anderson, T., Issenberg, E. 2020. Sen1Floods11: a georeferenced dataset to train and test deep learning flood algorithms for Sentinel-1. The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2020, pp. 210-211.

Available Open access at: http://openaccess.thecvf.com/content_CVPRW_2020/html/w11/Bonafilia_Sen1Floods11_A_Georeferenced_Dataset_to_Train_and_Test_Deep_Learning_CVPRW_2020_paper.html

### **Dataset Access**
setup credentials

In [None]:
!cd sen1floods11-dataset/ && dvc remote modify --local sharinghub access_key_id <your_access_token> or <your_personal_gitlab_token>  && dvc remote modify --local sharinghub secret_access_key none

Then pull the data. With DVC[s3] we can fetch the data stored in a s3 bucket.

In [None]:
!cd sen1floods11-dataset/ && dvc pull

# 2. **Training the model**
Setup credentials for _mlflow_

In [None]:
!export MLFLOW_TRACKING_TOKEN=<your_access_token> or <your_personal_gitlab_token>
!export LOGNAME=<username>

Run training session

In [None]:
!poetry run python3 src/train.py

# 3. **Inference**

In [None]:
!poetry run python3 src/inference.py checkpoints/Sen1Floods11_0_0.5194225907325745.onnx inference/India_80221_S1Hand.tif

# 4. **Reproducibility**
## **Docker**

### Create docker image for training from Dockerfile.train

Add your credentials in a .env file and run the docker image:
-   MLFLOW_TRACKING_TOKEN=
-   LOGNAME=
-    ACCESS_KEY_ID= 
-    SECRET_ACCESS_KEY= 


In [None]:
!docker build -f Dockerfile.train -t train .

### Streaming Mode in `train`

The *streaming* mode in `train` allows data to be downloaded on the fly, image by image, during training. Then the downloaded data is stored in the cache. This eliminates the need to download the **entire** dataset, saving both time and storage space. Data is progressively loaded from remote storage via DVC, reducing memory usage and optimizing the training process. 


### No Cache Mode in `train`

If *no cache* mode in `train` is enabled with the *streaming mode*, the data is downloaded on the fly, but not saved locally in the cache. This mode is ideal for handling large datasets without overwhelming local resources.

In [None]:
!docker run -it --env-file .env train bash

### Create docker image for inference from Dockerfile.inf

In [None]:
!docker build -f Dockerfile.inf -t inference .

### Run the docker image for inference

In [None]:
!docker run inference checkpoints/Sen1Floods11_0_0.5194225907325745.onnx inference/India_80221_S1Hand.tif && ls predictions/

## **CWL**

### Run the docker image using cwl with custom parameters saved in run_inference_input.yml

In [None]:
!cwltool run_inference.cwl run_inference_input.yml 