# Segmentation
- [Balthazar Neveu](https://www.linkedin.com/in/balthazarneveu/)
- TP-5 [Introduction to geosciences](https://www.master-mva.com/cours/introduction-a-lapprentissage-statistique-pour-les-geosciences/) | ENS Paris Saclay - [Master MVA](https://www.master-mva.com/) 2024
- [Web version](https://balthazarneveu.github.io/geosciences) | [Github](https://github.com/balthazarneveu/geoscience)


## Dataset exploration
Pairs of patches and annotations (areas to segment). size 36x36 gray level.

- Train set: 7211 patches from 12 wells.
- Validation set: 2463 patches from 3 wells (majority in well 13)

__Observations__
- At first sight, the regions we're trying to segment look like thin dark lines.
- Sometimes the positive areas are spread on both sides of the image (due to the circular nature of the well images).
- Two images containing NaN values `validation/images/well_15_patch_201.npy` and `well_15_patch_202.npy` are discarded in the dataloader (sanity check before loading the images).

![](figures/dataset_samples.png)

## Dataloader
[data_loader.py](data_loader.py) loads pairs of image, labels.
A list of augmentations is provided in [augmentations.py](augmentations.py):
- Horizontal roll (since the pipelines are circular)
- Vertical/Horizontal flips can be performed randomly.
Random augmentations (and shuffles) are only performed on the training set, validation set is frozen.


## Architectures
Three families of models are coded in [model.py](model.py).
All models are pretty flexible. 
What can be changed:
- convolution sizes
- number of layers
- number of channels (hidden dimensions)
- activation function (Relu of Leaky ReLu were tested)
- number of input channels (ready to apply to other modalities)
- number of output channels (ready for multi classes).
Since all models inherit from `BaseModel`, the number of parameters and the receptive field can easily be retrieved.
Please note that all models do not include the sigmoïd applied to the final layer: we take logits as outputs and the loss function or further inference is in charge of applying the sigmoïd to convert these to probabilities.


| Model name | Number of parameters  | Convolution sizes | Number of layers | Activation | Receptive field (H, V) |
| :---:| :---:| :---:|  :---:|  :---:| :---:|
| Vanilla Stacked convolutions| 260k |  $(3,3)$ | 5 | ReLu | $(11,11)$ |
| Stack convolution | 1.904M | $3^{\circ} \perp 5$ | 5 | LeakyReLu | $(11,21)$ |
|U-Net|  775k | $(3,3)$ | 3 scales | LeakyReLu | $(27, 27)$ |


#### Stacked convolution (single scale)
- Flexible design with a parameterized amount of layers
- Base convolution block is a separable convolution directionwise ($H \perp V$)
  - The horizontal convolution pads using the "circular" convolution option which allows dealing with the specificity of dwell images.
  - The vertical convolution pads by repeating gray levels.
  - This should explain the notation  $3^{\circ} \perp 5$
- Input modality convolution block allows going from 1 to `h_dim` channels.
- Output modality convolution block allows going from `h_dim` channels back to a single channel.
- Last layer outputs an image of the same size as the original one. Since we use the BCE Loss with Logits at first, the output of the network are logits (*not probabilities*), Sigmoid is not included.
- Possibility to use residual connections when the number of layers is a multiple of 2. 

#### Vanilla convolution stack (single scale)

*Remark: I coded the stacked convolution before I re-discovered the slide on the proposed vanilla model ("baseline" model from the slides).*

Provided as a "baseline" model, uses ReLu .
![](figures/vanilla_convolution.png)


#### UNet
- 3 scales.
  - $(36, 36) \rightarrow (18, 18) \rightarrow (9, 9)$
  - Downsample by decimating information (skip 1 pixel over 4)
  - Upsample with a bilinear interpolation.
  - Concatenate skip connections together. 
- Large receptive field (27,27)

------
# Taining
### Defining experiments
- An experiment is defined by as specific ID (like 300) and the whole configuration is versioned under git in the [experiments.py](experiments.py) file:
    - architecture (model name, number of layers, convolution sizes)
    - augmentations
    - loss
    - hyper parameters
- Tracking is performed using [Weights and Biases](https://wandb.ai/balthazarneveu/geosciences-segmentation/workspace?workspace=user-balthazarneveu)


### Infrastructure
- It is possible to train locally with a Nvidia very tiny GPU T500 with 4Gb of RAM.
  - `python TP_5/train.py -e 300 301`
  - `-nowb` allows disabling logging to weights and biases for quick prototyping
  - `-e` to specify a list of experiments.
- The same experiment can be trained on a remote server `python TP_5/remote_training.py -e 300 301 -u kaggle_username -p` 
- To be able to train on the remote servers of Kaggle with 16Gb of RAM, I customized a remote training template that I wrote ([MVA-Pepites](https://github.com/balthazarneveu/mva_pepites)). I hosted the [dataset](https://www.kaggle.com/datasets/balthazarneveu/mva-geosciences-segmentation-dataset-slb) under Kaggle.
It is possible to train several experiments.

![remote training](https://github.com/balthazarneveu/mva_pepites/blob/main/illustrations/overview.png?raw=true)


# Monitoring
I implemented a set of [metrics](metrics.py) on the validation set:
- Accuracy (does not mean much because if the network returns all zeros, the accuracy is around 89%).
- Precision , Recall. Recall seems interesting allows having a metric of how well we detected positive areas.
- Segmentation specific metrics : Dice loss (also named F1-score) to measure the balance between precision and recall.
- IoU (intersection over union).

We train the network using BCE loss (with logits). The problem is casted as a per-pixel binary classification (background = 0, foreground = 1). Since the background class is over represented, we can weight the positive class a bit more.
The [loss.py](loss.py) file shows the possibilities.

### Visualization
To be able to visualize results, it is possible to perform live inference and compare several models.
|![](figures/interactive_demo_browse.gif)| ![](figures/interactive_demo_compare_models.gif) | ![](figures/interactive_demo_shift.gif) | ![](figures/interactive_demo_noise.gif) |
|:---:|:---:|
| We can browse between images using the left / right arrow  | page up/ page down to switch between models | Slider allow to shift the input horizontally, and add a bit of noise. Inference is performed live on the GPU|


`python TP_5/interactive_inference.py -i "TP_5/data/train/images/well_2*.npy" -e 200 402 300 --gui mpl --preload`


|![](figures/shift_effect.gif) |
|:---:|
| Our stacked convolution network which uses horizontal convolutions with circular wrapping is able to segment corrosion areas which are located at the image boundary | 


# Results analyzis

### Color code
- Pixels flagged in black or green: correctly labeled (black=background, green=foreground)
- Pixels flagged in red: predicted background (0), groundtruth = foreground (1) *a.k.a False Negative*
- Pixels flagged in blue: predicted foreground (1), groundtruth = background (0) *a.k.a False Positiive*

### Labeling relevance

Trying to reach best accuracy may be in vain. As a matter of fact, it seems that sometimes the labels are less relevant than the network prediction.

|![](figures/annotations_accuracy.png) | ![](figures/annotations_accuracy_2.png) |
|:---:| :----: |
| Label mis-location | Label is too thick. Network prediction is more thin and better located. The corrosion line is 2 pixels wide , not 3|