## LARD Dataset export tool
This notebook guides you through the process of exporting the dataset in a few existing format for object detection. It provides the following features:
- `Bounding box` generation based on the corners of the runway, in several possible format.
- `Multiple` label files (one label file per image) or a `single` label file containing all labels at once.
- `Image crop`, depending on the height of the watermark. Note that the crop only consists in removing the watermark height (typically 300 pixel) from the bottom AND the top of the images, as the initial picture was expanded by 600 pixel in total in height to plan for the watermark position. This crop also updates the position of the bounding box to match the new image dimensions. 
  - ⚠️ *It does not crop around the runway* in the image.
- A few additional options, such as the *bounding box normalisation*, the *separator* used in the label file, and the file extensions.

In [1]:
from src.dataset.lard_dataset import LardDataset

### Load your LARD dataset
This loader works with either a single folder (for instance `LARD_train_DAAG_DIAP/`), or a folder with multiple subfolders.

We recommend to gather all folders of the **training set** under a single folder. For instance in our case we have the following folder hierarchy:
```
LARD_train/
| LARD_train_BIRK_LFST/
| LARD_train_DAAG_DIAP/
| ...
```
The same can be done for the test set, or you can directly indicate the `LARD_test_synth/` folder if the real images are not needed.

*Please check [Dataset folder structure](#dataset-folder-structure) for a more comprehensive description.*

In [2]:
dataset = LardDataset(train_path="../LARD_dataset/LARD_train/", test_path="../LARD_dataset/LARD_test_synth/")

### Export to a specific format
- Quickstart: exporting to coco format

*Please check [Export options](#export-options) for a more comprehensive description.*

In [None]:
dataset.export(output_dir ="../newYoloFormat/", 
               bbx_format="xywh", # Options are 'tlbr', 'tlwh', 'xywh', 'corners'
               normalized=True, 
               label_file="multiple", # 'multiple' produces 1 file per label, as expected by yolo architectures. 
               crop=True, # 'True' recommended to remove the watermark. Pay attention to not crop a picture multiple times
               sep=' ', # Separator in the label file.
               header=False, # 'False' is recommender for multiple files, 'True' for single files. It adds a header with column names in the first line of the labels file  
               ext="txt")

### Additional information

#### Dataset folder structure

The expected format is one created with the Lard labeling script or notebook :

- Case 1 - **single dataset**, with `DATASET_PATH` being either of the path provided to `LardDataset(train_path=, test_path=)`
```
    DATASET_PATH/
    | metadata.csv
    | images/
    | | nameimage1.jpeg
    | | nameimage2.jpeg
    | | ...
```
With the same structure for `TEST_PATH`. Please note it is the default structure the labelling script exports the labels.

- Case 2 - **splitted dataset**, with `DATASET_PATH` being either of the path provided to `LardDataset(train_path=, test_path=)`
```
    DATASET_PATH/
    | dataset_1/
    | | metadata.csv
    | | images/
    | | | nameimage1.jpeg
    | | | nameimage2.jpeg
    | | | ...
    | dataset_2/
    | | metadata.csv
    | | images/
    | | | nameimage1.jpeg
    | | | nameimage2.jpeg
    | | | ...
    | ...
```

- Other informations : 
    - train_path and test_path can of the same type, or mixed
    - There can be more than two dataset in DATASET_PATH in case 2. 
    - CSV and images names do not matter, only the architecture : there should be a single csv file in each directory specified above. Each csv is expected to be generated with the LARD labeling script or notebook.



### Export options
A more comprehensive description of the available parameters and exports options :
- output_dir: directory where the converted dataset will be saved. Expects a pathlib Path or a string.
- bbx_format: string format for label bbox. Options are :
    - "tlbr" (x,y of top left then x,y of bottom rights corners of the bbox)
    - "tlwh" (x, y of top left, bbox width and height)
    - "xywh" (x, y of the center of the bbox, bbox width and height)
    - "corners" (x,y of each corner)
- normalized: boolean, option to normalize the bbox position by the image size. 
    - If true, bbox labels are expressed in fraction of the image width and height
    - If False, positions are left in pixels. Default choice.
- label_file: string, options are :
    - "single" : all the labels are in a single csv, with a column with image path
    - "multiple" : one label file per image, saved in output_dir/labels.
- sep: label file(s) separator, default is ";".
- header: boolean. If True, an header column with column names is added to each label file.
- crop: boolean 
    - If True, crop during export all images with watermarks and updates bboxes position to the cropped image. 
    - If False, the image will be copied without modifications.
- ext: Extension format for labels files (without the "."). Default is "txt".