# Create new ADAF training dataset
---
```
Author: Nejc Čož
Mail: nejc.coz@zrc-sazu.si
Organisation: ZRC SAZU
Ljubljana, 2024
```
---

## Imports

In [8]:
from adaf.create_visualisations import run_visualisations
from adaf.create_patches import create_patches_main

## Create visualisations

The default visualisation for the ADAF model is SLRM (Simple Local Relief Model). The Irish dataset was processed with the SLRM radius of 10 metres.

The SLRM visualisation with default parameters can be prepared using the Python functions included in ADAF.

> The following parameteres are used for creating the visualisation:
>
>* Radius for trend assessment: **10 m** *(i.e. 20 px for 0.5m image)*
>   
> * Min/max normalisation between **-0.5 and 0.5**
>
> * Nodata and nan values are set to **0 (zero)**

In [9]:
# PROCESS INPUTS:
dem_path = r"./test_data/test_patches/ISA-15_Kilkee_dem_05m.vrt"
tile_size = 512
save_dir = r"./test_data/test_patches/slrm_visualisations"
nr_processes = 6

# RUN VISUALISATION
vis_results = run_visualisations(dem_path, tile_size, save_dir, nr_processes)

# We will need this for creating patches
vis_vrt = vis_results["vrt_path"]

# Print output
print(vis_vrt)

test_data\test_patches\slrm_visualisations\ISA-15_Kilkee_dem_05m_slrm.vrt


The same visualisation can be created at your own discretion using external software such as RVT (rvt_py package, RVT plugin for QGIS, RVT desktop app) or other third-party software. To replicate the default visualisation, make sure you use the same input parameters as listed above.


ADAF can also be trained on visualisations other than SLRM. In this case, make sure that the visualisation raster has:
* 	either 1 or 3 band
* is normalised between 0 and 1
* contains no Nan valuesues


## Create patches

Users can set the size of the patches (i.e. the size of the image tiles in pixels) and the overlap of the tiles. If the “DFM” attribute is included in the vector file, this information is included in the segmentation masks and bounding boxes as well. The DFM value was used in the original vector data to indicate the quality of the archaeological features on the DFM and can be used to filter out data by quality during ML training.

Default values for patch creation parameters can not be changed and are:

* tile size of 512 pixels
* overlap of tiles by 0.5 tile (i.e. 256 pixels)

The ADAF is set up for detection of three default classes, namely barrows, ringforts and enclosures. The procedure for creating patches follows this format, which means that masks can contain up to 3 different classes (not necessarily using the default names of labels). At least one vector file must be specified. In this case, the segmentation mask still has 3 bands, but only the first band is filled with the labelled data.

> NOTE: When training or retraining the models, only one band from the segmentation mask is used at a time. The user must specify the band ID and the corresponding labelling name.

INPUTS:

- input_name
- segmentation_masks
- output_dir

```
The dictionary HAST TO BE!!! in this format
- At least one label and max 3 labels (there are no check)
- Key is name of thw label and Value is path to vector file.
- Can use any label name, in the example default ADAF names are used:

segmentation_masks = {
    "barrow": r"../test_data/test_patches/arch/barrow_segmentation_TM75.gpkg",
    "enclosure": r"../test_data/test_patches/arch/enclosure_segmentation_TM75.gpkg",
    "ringfort": r"../test_data/test_patches/arch/ringfort_segmentation_TM75.gpkg"
}
```

In [10]:
# Define paths to inputs and outputs
input_image = vis_vrt
output_dir = r"./test_data/test_patches/training_samples"

# Define paths to masks
segmentation_masks = {
    "barrow": r"./test_data/test_patches/arch/barrow_segmentation_TM75.gpkg",
    "enclosure": r"./test_data/test_patches/arch/enclosure_segmentation_TM75.gpkg",
    "ringfort": r"./test_data/test_patches/arch/ringfort_segmentation_TM75.gpkg"
}

# Run create patches
create_patches_main(input_image, segmentation_masks, output_dir)


Start multiproc


Training a completely new class requires only a single vector file with labelled data when creating patches. As only one vector file has been provided, only the first band of the segmentation mask file is filled with valid data, the other two bands contain all zeros. When training a new model ([**semantic segmentation**](train_and_evaluate_semantic_segmentation.ipynb) or [**object detection**](train_and_evaluate_object_detection.ipynb)), the user must specify band 1 and give the class a new name. For example, you would want to retrain the model to train a "new_feature", follow this template:

In [None]:
# Define paths to inputs and outputs
input_image = vis_vrt
output_dir = r"./test_data/test_patches/training_samples"

# Define paths to masks
segmentation_masks = {
    "new_feature": r"./test_data/test_patches/arch/new_feature_segmentation.gpkg"
}

# Run create patches
create_patches_main(input_image, segmentation_masks, output_dir)

The training data must be divided geographically into training, validation and test subsets. This is not covered by any Python script and must be done at your own discretion. As the split is done geographically, it can be performed before or after processing, i.e:

* slice the DFM raster into appropriate regions and create patches for each group separately,
* create patches for the entire DFM and split the resulting patches into 3 groups.

It is recommended to split the dataset in an approximate ratio of 60:20:20 (train:validation:test).
