*Copyright (c) 2022 Centre National d'Etudes Spatiales (CNES).  
 This file is part of Bulldozer.  
 All rights reserved.*

# Bulldozer pre-processing

This notebook aims to present the tools available in the pre-processing module of **Bulldozer**:
* [Border nodata detection](#Border-nodata-detection)
* [Disturbance detection](#Disturbance-detection)
* [Full preprocess pipeline](#Full-preprocess-pipeline)

## Border nodata detection

In *Digital Surface Model* (DSM) we can distinguish two type of nodata. We call the *inner nodata* the nodata points that mainly come from correlation or oclusion issues during the DSM computation. Then the *border nodata* are the nodata points on the side that fills the image shape (for example if the input DSM is skewed in the TIF file and the corners are nodata).  
This function extract those points and provide the corresponding mask.

### Setup

In [None]:
from bulldozer.preprocessing.dsm_preprocess import build_border_nodata_mask

# Required parameter
dsm_path = '../tests/data/postprocess/dsm_test.tif'

# Optional
nb_max_workers = 16
nodata = -32768.0

### Usage

Basic call (sequential=only 1 CPU used):

In [None]:
border_nodata_mask = build_border_nodata_mask(dsm_path=dsm_path)

*(Optional)* Call with optional parameters:

In [None]:
border_nodata_mask = build_border_nodata_mask(dsm_path=dsm_path, nb_max_workers=nb_max_workers, nodata=nodata)

✅ **Done!**  
If you want to compute the *inner nodata*, you just have to run the following code :
```python
inner_nodata_mask = np.logical_and(np.logical_not(border_nodata_mask), dsm == nodata)
```

## Disturbance detection

This method generates a mask that matches all heavily disturbed areas in the input DSM.  
These areas often correspond to correlation errors from the DSM calculation (ex: water areas).  

### Setup

In [6]:
from bulldozer.preprocessing.dsm_preprocess import build_disturbance_mask

# Required parameters
dsm_path = '../tests/data/postprocess/dsm_test.tif'

# Optionnal parameters
nb_max_worker = 16
slope_treshold = 2.0
is_four_connexity = True
nodata = -32768.0

### Usage

Basic call (sequential=only 1 CPU used):

In [None]:
disturbed_areas_mask = build_disturbance_mask(dsm_path=dsm_path)

*(Optional)* Call with optional parameters:

In [None]:
disturbed_areas_mask = build_disturbance_mask(dsm_path=dsm_path, nb_max_worker=nb_max_worker, slope_treshold=slope_treshold, is_four_connexity=is_four_connexity, nodata=nodata)

✅ **Done!**  

## Full preprocess pipeline

The full pre-process pipeline is designed to be used before the **Bulldozer** DTM extraction.  
⚠️ It should not be called in standalone because it produces a pre-processed DSM that is only designed to be used with the **Bulldozer** DTM extraction.  
This part of the tutorial is adapted to the situation where you want to run the **Bulldozer** pipeline step by step (for example in the case you want to make separated jobs and then submit them to a cluster).

### Setup

In [None]:
from bulldozer.preprocessing.dsm_preprocess import preprocess_pipeline

# Required parameters
dsm_path = '../tests/data/postprocess/dsm_test.tif'
output_dir = '../tests/data/preprocess/'

# Optional
nb_workers = 16
nodata = -32768.0
slope_threshold = 2.0
four_connexity = True
min_valid_height = -32000.0


### Usage

Basic pipeline call:

In [None]:
preprocessed_dsm_path, quality_mask_path = preprocess_pipeline(dsm_path, output_dir)

*(Optional)* Preprocess pipeline call with all the options:

In [None]:
preprocessed_dsm_path, quality_mask_path = preprocess_pipeline(dsm_path, output_dir, nb_max_workers, nodata, slope_threshold, four_connexity, min_valid_height)

✅ **Done!**  