# DeepSeaAI
#### Pipeline used for the cleaning and analysis of citizen science data with AI.

This markdown will explain how to clean your citizen science data, and train a Yolov8 model on it. It has been developed with the intent to deal with DeepSeaSpy data (accessible from : https://ocean-spy.ifremer.fr/). 

Note : some functions may copy your images (specifically **vision**, **catalog**, **prepare_yolo**). Please be wary of the space available on your computer and act accordingly.

### Contents
You should have 2 python files and a Jupyter notebook file (download here):

```
deep-sea-lab
├── Functions.py            <- Functions used for the cleaning/analysis.
├── DeepSeaLab.ipynb        <- Here
```

**Functions** contains all the functions needed for the cleaning and analysis of citizen science datasets. Detailed explanations of the functions are to be found here. You can modify them and use them as the basis for your work. There are also functions that are useful just for the exploration of your dataset.

**DeepSeaLab** is a ready-to-use file that you can change based on your needs/dataset. There are also examples on how to use the pipeline.

In [None]:
import sys

print("Python version:", sys.version_info)

### Requirements
Import the necessaries functions for the cleaning/analysis

In [None]:
import os, csv, json, collections, random, shutil
import pandas as pd
from pathlib import Path
from Functions import polygones2bb, points2bb, lines2bb, convert_yolo, prepare_yolo, vision, SaveCSV, create_yaml
from Functions import unite, catalog, get_df
import matplotlib.pyplot as plt
os.getcwd()

### Access to data/files
3 main paths are to be defined :

```
path_csv
```
Location of your dataset.


```
path_img
```
Path to the folder where your images are stored.
the format of the images should be in .jpg

```
images
├── image 1.jpg           
├── image 2.jpg          
├── ... 
```

Then, finally
```
path_save
```
Where to store the cleaned dataset, catalogs, etc...

In [None]:
# csv access :
path_csv=
# images :
path_img=Path() 
# save
path_save=

## Conversion of bounding boxes

Our pipeline can convert 3 type of bounding boxes into regular bounding boxes.
If your data already satisfies the following format, you can skip this part.

|xmin |ymin |xmax |ymax |
|-----|-----|-----|-----|
|972  |982  |549  |559  |


#### Polygons
On DeepSeaSpy, polygons are in the json format :

```
[{\x\":282,\"y\":115},{\"x\":15,\"y\":538},{\"x\":50,\"y\":679},{\"x\":285,\"y\":497}]
```

The column containing the polygon values can be named :
|polygon_values                                                                         |
|---------------------------------------------------------------------------------------|
|[{\x\":282,\"y\":115},{\"x\":15,\"y\":538},{\"x\":50,\"y\":679},{\"x\":285,\"y\":497}] |

Therefore our pipeline was made to deal with such format. If you need to convert your own type of polygons, you can modify the way points are stored in the **polygons2bb** function.

#### Lines
Lines are to be in the following format, with two points defined by (x1,y1) and (x2,y2).

|x1 |y1 |x2 |y2 |length|
|---|---|---|---|------|
|761|451|859|364|131   |

Length is used to correct the converted bounding box, depending on the line's angle with the x axis. If the line is too vertical or too horizontal, the lines2bb function automatically corrects the converted bounding box. By default, if the angle is of +-5 degrees, the corrections happens. You can modify/find mor info in the Functions.py file.

#### Points
You can manually set a padding on the x and y axis in the Functions.py file.

|x1 |y1 |
|---|---|
|761|451|

The padding is the same for every point in your dataset, if you wish to use a different one for different species/uses, we recommend you split your dataset and run each part with a different padding. Then, you can concatenate all of your subsets with :

```
pd.concat([polybb,lignesbb,pointsbb])
```

We alsor recommend changing the names of your images columns and species columns, so that our functions can run properly.

|name_img         |name_sp     |
|-----------------|------------|
|'MOMAR_90095.jpg'|'Buccinidae'|

In [None]:
# Import your dataset
data=
data.head

In [None]:
# Rename data from DeepSeaSpy
#data.rename(columns={'pos1x': 'x1', 'pos1y': 'y1','pos2x': 'x2', 'pos2y': 'y2','name_fr':'name_sp','name':'name_img'}, inplace=True)

# Subset of polygon labels
poly=
# Subset of lines labels
lines=
# Subset of points labels
points=

# Polygons
polybb=polygones2bb(poly)
# Lines
lignesbb=lines2bb(lines)
# Points
pointsbb=points2bb(points)

# Concatenate your split dataset into a single one
bb=pd.concat([polybb,lignesbb,pointsbb])

In [None]:
# Save the converted dataset
SaveCSV(bb,path_save,'export_bb')

In [None]:
bb=pd.read_csv(os.path.join(path_save,'export_bb'), sep=None, engine='python')

## Vision

We encourage you to use the vision function, which will allow you to visualize your images with your bounding boxes added onto them.
First, you have to define the object 'colors', which is a dictonary containing each species with its corresponding color coded in BGR.

If you wish to, you can limit the number of saved images by adding 'nb_img' as an argument. 
```
vision(bb,colors,path_img,path_save=None,nb_img=None)
```
When not specified, it saves the images in the parent directory of the path_img.

This function copies the images from path_img, so it may generate a lot of data if you don't specify a number of images.

In [None]:
# Colors are in BGR
couleurs = {
        'Escargot buccinidé': (0, 0, 255),  # Red
        'Couverture de moules': (255, 0, 0),  # Blue
        'Couverture microbienne': (0, 255, 0),  # Green
        'Couverture vers tubicole': (255, 255, 0),  
        'Crabe araignée': (128, 0, 128),  
        'Crabe bythograeidé': (255, 165, 0),  
        'Crevette alvinocarididae': (255, 192, 203),  
        'Escargot buccinidé': (0, 0, 255),  
        'Ophiure': (139, 69, 19),  
        'Poisson Cataetyx': (0, 255, 255),  
        'Poisson chimère': (0, 0, 0),  
        'Poisson zoarcidé': (192, 192, 192),  
        'Pycnogonide': (255, 215, 0),  
        'Ver polynoidé': (255, 255, 255),  
        'Vers polynoidés': (245, 245, 220),
        'Autre poisson':(245, 245, 220)
    } 

vision(bb, couleurs, path_img, path_save, nb_img=10) #nb_img is optional

## Unification of overlapping bounding boxes

This step ensures that there is no redundancy in your dataset. You can skip this part if you are not dealing with this kind of problem.

The unification of the bounding boxes is done when they are strictly overlapping (while the iou value is kept as None).
Still, if you wish to limit the unification of the BB to a certain superposition threshold (iou), you can.

```
unite(dataframe, iou=None, grouper_0=False)
```

iou_thresh corresponds to the minimum Intersection over Union (IoU) value between two bounding boxes to consider them overlapping.

Bounding boxes that are not overlapping any are automatically discarded. If you want to keep them, you can change the argument grouper_0 to grouper_0=True.

The function keeps track of how many bounding boxes the final ones are made of in the column "occurrences".



In [None]:
# Unification of all overlapping bounding boxes
ubb=unite(bb)

In [None]:
# Keeping bounding boxes only if they are made of at least 3 overlapping ones
ubb=ubb[ubb['occurences']>=3]

In [None]:
# Unification only when bounding boxes are 0.2% overlapping
ubb=unite(bb,0.2) 

In [None]:
# Unification all of your bounding boxes, and not discarding isolated ones
ubb=unite(bb,grouper_0=True) 

In [None]:
# Save your dataframe
SaveCSV(ubb,path_save,'ubb')

In [None]:
ubb=pd.read_csv(r'C:\Users\alebeaud\Desktop\save_notebook\ubb.csv', sep=',')

In [None]:
import os, csv, json, collections, random, shutil
import pandas as pd
from pathlib import Path
from Functions import polygones2bb, points2bb, lines2bb, convert_yolo, prepare_yolo, vision, SaveCSV, create_yaml
from PIL import Image
from Functions import unite, catalog, get_df
import matplotlib.pyplot as plt
os.getcwd()
ubb=pd.read_csv(r'C:\Users\alebeaud\Desktop\save_notebook\ubb.csv', sep=',')
# csv access :
path_csv=r'Q:/export_dss_20191018_Clean.csv'
# images :
path_img=Path(r'C:\Users\alebeaud\Desktop\Image_dsp') 
# save
path_save=r'C:\Users\alebeaud\Desktop\save_notebook'

## Catalog

Unite only unifies overlapping bounding boxes, it does not verify if the object you want to labelise is in fact inside the bounding box. If you wish to be very wary about which bounding box are to be kept in your dataset, you can use the two functions :

```
catalog(df, path_img, path_save=None)
```
When not specified, it saves the images in the parent directory of the path_img.

Creates a catalog of snapshots from all the bounding boxes you have in your dataframe (df). You can then delete the snapshots of bounding boxes you want to discard.

```
get_df(df,path_save)
```

From the path_save (where your remaining snapshots are), and your unified dataframe (df), this function returns a dataframe that lists all the remaining bounding boxes from your own cleaning.

In [None]:
#Create snapshots from images and the dataframe
catalog(ubb, path_img, path_save)

In [None]:
#After discarding snapshots, get the remaining rows
df=get_df(ubb,path_save)
df.head()

## Prepare for Yolo

You can then use prepare_yolo to split your dataset in 3 (train,val,test), and train yolov8 on it.

```
prepare_yolo(df,path_save,path_img,prop=[.8,.1])
```

The **prop** parameter stands for proportion. It asks for the size of 2 subsets (in order) : **train** and **validation**. The remaining percentage is the size of the **test** subset. The test subset is not mandatory for the training of a model.
prop=[.8,.1] means that the training subsets is 80% of our dataset, the validation subset is 10%. The remaining percentage is the test subset's size, here, 10%.

Yolov8 takes bounding boxes in the following format :

```
class x y w h
```

With x and y the coordinates to the center of the bounding box. W and h are the width and height of the bounding box. Those 4 values are normalised between 0 and 1. prepare_yolo generates 1 txt file for each image, and each line in this file is the description of 1 bounding box.

example :
```
7 0.5713542 0.6847222 0.0359375 0.0731481
```


In [None]:
#Size of your images
size=[1920,1080]

convert_yolo(df,size)
prepare_yolo(df,path_save,path_img,prop=[.8,.1])

## Yolo training
Now you can train a yolo model with your dataset.

Your yolo training folder path should look like this :

```
yolo_training
├── images        <- Where your images are
|   ├── train
|   ├── val
|   ├── test
├── labels        <- Where your bounding boxes/labels for each image are
|   ├── train
|   ├── val
|   ├── test
```
Yolo needs a yaml file to understands where the data is, and what the classes are.
You can create the file yourself or use create_yaml.

In [None]:
# Creates a yaml file containing all of the information necessary for running Yolov8 on your data
create_yaml(path_save,df,'output')

Everything is ready for Yolov8.


If you installed our requirements correctly, you can train your Yolov8 model in python :

In [None]:
from ultralytics import YOLO

In [None]:
# Load a new YOLO model from scratch
model = YOLO('yolov8n.yaml')  # build a new model from YAML

In [None]:
# Or load a pretrained YOLO model (recommended for training)
model = YOLO('yolov8n.pt')

In [None]:
# Get where the .yaml file is stored
output=str('output'+'.yaml')
yaml_path=os.path.join(path_save,output)

# Train the model
results = model.train(data=yaml_path, epochs=10, imgsz=640)

In [None]:
# Run the model to identify objects
# Load your trained YOLOv8n model
model = YOLO('your/path/to/trained/model/here')
img_to_predict=r'C:\Users\alebeaud\Desktop\images_to_be_predicted'

In [None]:
list_img=list(img_to_predict.glob('**/*.jpg'))

# Run inference on your list of images
results=model(list_img)

# Process results list
for result in results:
    boxes = result.boxes  # Boxes object for bounding box outputs
    masks = result.masks  # Masks object for segmentation masks outputs
    keypoints = result.keypoints  # Keypoints object for pose outputs
    probs = result.probs  # Probs object for classification outputs
    obb = result.obb  # Oriented boxes object for OBB outputs
    result.show()  # display to screen
    result.save(filename="result.jpg")  # save to disk

You can also launch the training within a command terminal (same line of code for windows or linux). This can be useful if you don't want to open a python interactive window, or you are working remotly. 

In [None]:
# For the training
!yolo task=detect mode=train model=yolov8n.yaml imgsz=640 data=absolute/path/to/data_yolo.yaml show_labels=False epochs=10 batch=8 name=run1

# Once your model is trained, you can run 'predict' to detect objects
!yolo predict model=path/to/yolo/runs/detect/run1/weights/best.pt source=path/to/data show_labels=False

### Steps to go further

If you wish to train the hyperparameters of your trained model, you can do so.
This step is to be done only if you have the allowable resources to do so, as hyperparameter tuning takes a long time.

In [None]:
model = YOLO('/runs/detect/buccin_cit_07/weights/best.pt')

model.tune(data='buccins_cit_07.yaml',epochs=200, iterations=300, optimizer='AdamW', plots=True, save=True, val=False)

# Deep Species Detection

Provided by Ifremer, iMagine.

[![Build Status](https://jenkins.services.ai4os.eu/buildStatus/icon?job=AI4OS-hub/deep-species-detection/main)](https://jenkins.services.ai4os.eu/job/AI4OS-hub/job/deep-species-detection/job/main/)

# Citizen science and data cleaning
In this repository, you will find a pipeline that cleans citizen science image datasets, and automatically trains a YoloV8 model on it.
You may also use this module to run inference on a pre trained YoloV8 model, specifically on 2 species : Buccinidae and Bythograeidae.
The pipeline converts bounding boxes from Deep Sea Spy format (lines, points, polygons) to regular bounding boxes (xmin, xmax, ymin, ymax). The conversion step is optional. It then unifies overlapping bounding boxes of each species, using the redundancy of citizen identifications as a 
There is 3 ways to use the pipeline :
    - DeepSeaLab.ipynb : step by step guide to clean the dataset and launch the Yolov8 training
    - Pipeline_txt.py : automatically cleans the dataset and launches the Yolov8 training with the arguments stored in config.txt
    - Deepaas API : easily visualize, customize and monitor the Yolov8 training

## Project structure

```
├── Jenkinsfile             <- Describes basic Jenkins CI/CD pipeline
├── LICENSE                 <- License file
├── README.md               <- The top-level README for developers using this project.
├── VERSION                 <- Version file indicating the version of the model
│
├── deep-sea-lab            <- All of the data cleaning files
│   ├── DeepSeaLab.ipynb    <- Notebook pipeline for data cleaning & Yolov8 training
│   ├── Functions.py        <- Data processing file DeepSeaLab draws function from
│   ├── Pipeline_txt.py     <- Automatic pipeline to clean the data & train Yolov8
│   ├── config.txt          <- Configuration file for Pipeline.txt, which stores arguments to run the pipeline
│
├── yolov8
│   ├── README.md           <- Instructions on how to integrate your model with DEEPaaS.
│   ├── __init__.py         <- Makes <your-model-source> a Python module
│   ├── ...                 <- Other source code files
│   └── config.py           <- Module to define CONSTANTS used across the AI-model python package
│
├── api                     <- API subpackage for the integration with DEEP API
│   ├── __init__.py         <- Makes api a Python module, includes API interface methods
│   ├── config.py           <- API module for loading configuration from environment
│   ├── responses.py        <- API module with parsers for method responses
│   ├── schemas.py          <- API module with definition of method arguments
│   └── utils.py            <- API module with utility functions
│
├── data                    <- Data subpackage for the integration with DEEP API
│   ├── external            <- Data from third party sources.
│   ├── processed           <- The final, canonical data sets for modeling.
│   └── raw                 <- The original, immutable data dump.
│
├── docs                   <- A default Sphinx project; see sphinx-doc.org for details
│
├── models                 <- Folder to store your models
│
├── notebooks              <- Jupyter notebooks. Naming convention is a number (for ordering),
│                             the creator's initials (if many user development),
│                             and a short `_` delimited description, e.g.
│                             `1.0-jqp-initial_data_exploration.ipynb`.
│
├── references             <- Data dictionaries, manuals, and all other explanatory materials.
│
├── reports                <- Generated analysis as HTML, PDF, LaTeX, etc.
│   └── figures            <- Generated graphics and figures to be used in reporting
│
├── requirements-dev.txt    <- Requirements file to install development tools
├── requirements-test.txt   <- Requirements file to install testing tools
├── requirements.txt        <- Requirements file to run the API and models
│
├── pyproject.toml         <- Makes project pip installable (pip install -e .)
│
├── tests                   <- Scripts to perform code testing
│   ├── configurations      <- Folder to store the configuration files for DEEPaaS server
│   ├── conftest.py         <- Pytest configuration file (Not to be modified in principle)
│   ├── data                <- Folder to store the data for testing
│   ├── models              <- Folder to store the models for testing
│   ├── test_deepaas.py     <- Test file for DEEPaaS API server requirements (Start, etc.)
│   ├── test_metadata       <- Tests folder for model metadata requirements
│   ├── test_predictions    <- Tests folder for model predictions requirements
│   └── test_training       <- Tests folder for model training requirements
│
└── tox.ini                <- tox file with settings for running tox; see tox.testrun.org
```

## Data requirements

The data required for the pipeline is to have a folder with your images, and a .csv file containing all of your annotations.

If your dataset is incomplete, it is better to remove incomplete rows rather than . Missing images will not cause problems with the pipeline.
For the conversion step, this pipeline converts data from Deep Sea Spy format :

|shapes  |x1 |y1 |x2 |y2 |polygon_values|name_img|species    |
|--------|---|---|---|---|--------------|--------|-----------|
|point   |59 |34 |NaN|NaN|NaN           |4366.jpg|Pycnogonid |
|lines   |761|451|859|364|NaN           |4366.jpg|Buccinidae |
|polygons|NaN|NaN|NaN|NaN|[{\x":282,"y":115},{"x":15,"y":538},{"x":50,"y":679},{"x":285,"y":497}]|4366.jpg|Mussels coverage|

To a regular format :

|xmin |xmax |ymin |ymax |name_img|species    |
|-----|-----|-----|-----|--------|-----------|
|69   |49   |24   |44   |4366.jpg|Pycnogonid |
|761  |859  |364  |451  |4366.jpg|Buccinidae |
|15   |285  |115  |679  |4366.jpg|Mussels coverage|

If your data is already in this format, you can skip the conversion steps (more details in the cleaning sections).

The pipeline expects image resolution of 1920x1080. You can input images with a different size by changing width_images and height_images in the beginning of Functions.py. If you have images of varying resolutions, you can modify the functions so that they take in the type of image you have.


# Cleaning from DeepSeaLab.ipynb
The **Notebook** is a ready-to-use, step by step cleaning file that you can change based on your needs/dataset. This option is better for a first use of the module since the notebook brings more context and guidance to the cleaning steps.
You can skip the conversion steps by skipping the cells that won't help your case. Arguments are pre filled to show you what's expected from the user.
You can launch the notebook by double clicking the file on the left, inside the deep-sea-lab folder.

# Cleaning from Pipeline_txt.py
Python script that automatically runs all the functions needed for the cleaning and analysis of citizen science datasets.
This option is more straight forward than using the python notebook. Detailed explanations of the functions can be found in the file in itself. You can modify them and use them as the basis for your work.
This file uses arguments in the config.txt file, which are pre filled to show you what's expected from the user.
You can modify arguments based on your needs.
### Arguments and usage
Paths to your data is required as the first arguments in the config.txt file
```
# Paths
# csv access :
path_csv=/storage/export.csv
# images :
path_imgs=/storage/Image_dsp/
# save
path_save=/storage/save
```
Those paths are by default, we are expecting you to have connected your Nextcloud account to the iMagine platform.
path_imgs should refer to the folder containing all of your images
In the config.txt file, if your dataset only contains annotated lines, it is expected :
```
# Dataset options
polygons=false
points=false
lines=True
```
If your dataset only contains annotated polygons and points, it is expected :
```
# Dataset options
polygons=True
points=True
lines=false
```
If your dataset is already in the regular format cited in the data requirements (therefore, you do not need the data conversion), you can put everything in false :
```
# Dataset options
polygons=false
points=false
lines=false
```
The pipeline will still create the training dataset from your data, and will train YoloV8 on it.

In the config.txt file, you can change the YoloV8 training parameters. They are set by their default value (from https://github.com/ultralytics/ultralytics).
You may also change the hyperparameters, but it is recommended to do so only if you know what each modified argument does to the training step.
If you wish to train the hyperparameters on a specific model, you can do so by running the last cell in the DeepSeaLab Notebook.

# Running Deepaas for YoloV8 training and inference

First, install the package :

```
git clone https://github.com/ai4os-hub/deep-species-detection
cd  deep-species-detection
pip install -e .
```

You can launch DeepaaS by using :

```
deepaas-run --listen-ip 0.0.0.0
```

# Adding DeepaaS API into the existing codebase
In this repository, we have integrated a DeepaaS API into the  Ultralytics YOLOv8, enabling the seamless utilization of this pipeline. The inclusion of the DeepaaS API enhances the functionality and accessibility of the code, making it easier for users to leverage and interact with the pipeline efficiently.



><span style="color:Blue">**Note:**</span> Before installing the API, please make sure to install the following system packages: `gcc`, `libgl1`, and `libglib2.0-0` as well. These packages are essential for a smooth installation process and proper functioning of the framework.
```
apt update
apt install -y gcc
apt install -y libgl1
apt install -y libglib2.0-0
```


# Environment variables settings
"In `./api/config.py` you can configure several environment variables:

- `DATA_PATH`: Path definition for the data folder; the default is './data'.
- `MODELS_PATH`: Path definition for saving trained models; the default is './models'.
- `REMOTE_PATH`: Path to the remote directory containing your trained models. Rclone uses this path for downloading or listing the trained models.
- `YOLOV8_DEFAULT_TASK_TYPE`: Specify the default tasks related to your work among detection (det), segmentation (seg), and classification (cls).
- `YOLOV8_DEFAULT_WEIGHTS`: Define default timestamped weights for your trained models to be used during prediction. If no timestamp is specified by the user during prediction, the first model in YOLOV8_DEFAULT_WEIGHTS will be used. If it is set to None, the Yolov8n trained on coco/imagenet will be used. Format them as timestamp1, timestamp2, timestamp3, ..."

# Track your experiments with Mlfow
If you want to use Mflow to track and log your experiments, you should first set the following environment variables:
- `MLFLOW_TRACKING_URI`
- `MLFLOW_TRACKING_USERNAME`
- `MLFLOW_TRACKING_PASSWORD`
- `MLFLOW_EXPERIMENT_NAME` (for the first experiment)

optional options:

- `MLFLOW_RUN`
- `MLFLOW_RUN_DESCRIPTION`
- `MLFLOW_AUTHOR`
- `MLFLOW_MODEL_NAME`: This name will be used as the name for your model registered in the MLflow Registry.
- Then you should set the argument `Enable_MLFLOW` to `True` during the execution of the training.


# Dataset Preparation
- Detection (det), oriented bounding boxes detection (obb) and Segmentation Tasks (seg):

    - To train the yolov8 model, your annotations should be saved as yolo formats (.txt). Please organize your data in the following structure:
```

│
└── my_dataset
    ├──  train
    │    ├── imgs
    │    │   ├── img1.jpg
    │    │   ├── img2.jpg
    │    │   ├── ...
    │    ├── labels
    │    │   ├── img1.txt
    │    │   ├── img2.txt
    │    │   ├── ...
    │    
    ├── val    
    │    ├── imgs
    │    │   ├── img_1.jpg
    │    │   ├── img_2.jpg
    │    │   ├── ...
    │    ├── labels
    │    │   ├── img_1.txt
    │    │   ├── img_2.txt
    │    │   ├── ...
    │    
    ├── test    
    │    ├── imgs
    │    │   ├── img_1.jpg
    │    │   ├── img_2.jpg
    │    │   ├── ...
    │    ├── labels
    │    │   ├── img_1.txt
    │    │   ├── img_2.txt
    │    │   ├── ...
    │    
    └── config.yaml
```

The `config.yaml` file contains the following information about the data:

```yaml
# Images and labels directory should be insade 'fasterrcnn_pytorch_api/data' directory.
train: 'path/to/my_dataset/train/imgs'
val: 'path/to/my_dataset/val/imgs'
test: 'path/to/my_dataset/test/imgs' #optional
# Class names.
names: 
    0: class1, 
    1: class2,
     ...

# Number of classes.
NC: n
```
The `train` and `val` fields specify the paths to the directories containing the training and validation images, respectively.
`names` is a dictionary of class names. The order of the names should match the order of the object class indices in the YOLO dataset files.

><span style="color:Blue">**Note:**</span>The train and val path should be a complete path or relative from
data directory e.g. `root/path/to/mydata/train/images` or if it is in the `path/to/deep-species-detection/data/raw` just 
`mydata/train/images`


-  Classification Task (cls):
For the classification task, the dataset format should be as follows:
```
data/
|-- class1/
|   |-- img1.jpg
|   |-- img2.jpg
|   |-- ...
|
|-- class2/
|   |-- img1.jpg
|   |-- img2.jpg
|   |-- ...
|
|-- class3/
|   |-- img1.jpg
|   |-- img2.jpg
|   |-- ...
|
|-- ...
```
><span style="color:Blue">**Note:**</span>  For the classification task, you don't need the config.yaml file. Simply provide the path to the data directory in the data argument for training.

><span style="color:Blue">**Note:**</span>  If you have annotations files in Coco json format or Pascal VOC xml format, you can use the following script to convert them to the proper format for yolo. 
``` 
deep-species-detection/yolov8/seg_coco_json_to_yolo.py #for segmentation
deep-species-detection/yolov8/preprocess_ann.py #For detection
``` 
# Available Models

The Ultralytics YOLOv8 model can be used to train multiple tasks including classification, detection, and segmentatio.
To train the model based on your project, you can select on of the task_type option in the training arguments and the corresponding model will be loaded and trained.
for each task, you can select the model arguments among the following options:

``` 
"yolov8n.yaml",
"yolov8n.pt",
"yolov8s.yaml",
"yolov8s.pt",
"yolov8m.yaml",
"yolov8m.pt",
"yolov8l.yaml",
"yolov8l.pt",
"yolov8x.yaml",
"yolov8x.pt",
```
`yolov8X.yaml` bulid a model from scratch and
`yolov8X.pt` load a pretrained model (recommended for training).

# Launching the API

To train the model, run:
```
deepaas-run --listen-ip 0.0.0.0
```
Then, open the Swagger interface, change the hyperparameters in the train section, and click on train.

><span style="color:Blue">**Note:**</span>  Please note that the model training process may take some time depending on the size of your dataset and the complexity of your custom backbone. Once the model is trained, you can use the API to perform inference on new images.

><span style="color:Blue">**Note:**</span> Augmentation Settings:
among the training arguments, there are options related to augmentation, such as flipping, scaling, etc. The default values are set to automatically activate some of these options during training. If you want to disable augmentation entirely or partially, please review the default values and adjust them accordingly to deactivate the desired augmentations.

# Inference Methods

You can utilize the Swagger interface to upload your images or videos and obtain the following outputs:

- For images:

    - An annotated image highlighting the object of interest with a bounding box.
    - A JSON string providing the coordinates of the bounding box, the object's name within the box, and the confidence score of the object detection.

- For videos:

    - A video with bounding boxes delineating objects of interest throughout.
    - A JSON string accompanying each frame, supplying bounding box coordinates, object names within the boxes, and confidence scores for the detected objects.

