# Introduction
In this notebook, we will demonstrate the process of collecting, preparing, and training a model using the Bplusplus library.

The steps include:
1. Installing the required packages.
2. Importing the necessary modules.
3. Setting up the directories for data storage.
4. Collecting insect images from the Global Biodiversity Information Facility (GBIF).
5. Preparing the collected images for training.
6. Training a YOLO model on the prepared dataset.
7. Validating the trained model.




## Make virtual environment (recommended)
It is recommended to create a virtual environment to manage dependencies and avoid conflicts.
 
To create a virtual environment, open your terminal and run the following commands:
 
```bash
python3 -m venv bplusplus_env
source bplusplus_env/bin/activate
```

This will create and activate a virtual environment named `bplusplus_env`.

## Install required packages

In [None]:
!pip install git+https://github.com/Tvenver/Bplusplus.git@collect-prepare-train #change to bplusplus when merged

## Import required packages

In [None]:
import bplusplus
import prettytable
from typing import Any
from pathlib import Path

## Set directories

In [None]:
MAIN_DIR = Path("/path/to/main_dir")

GBIF_DATA_DIR = MAIN_DIR / "GBIF_data"
PREPARED_DATA_DIR = MAIN_DIR / "prepared_data"
TRAINED_MODEL_DIR = MAIN_DIR / "trained_model"

## Collect insect images from GBIF

In [None]:
names = [
    "Nabis rugosus",
    "Forficula auricularia",
    "Calosoma inquisitor"
]

search: dict[str, Any] = {
    "scientificName": names
}

bplusplus.collect(
    group_by_key=bplusplus.Group.scientificName,
    search_parameters=search, 
    images_per_group=50,
    output_directory=GBIF_DATA_DIR
)


## Prepare the dataset for training (yolov8)

In [None]:
bplusplus.prepare(
    input_directory=GBIF_DATA_DIR,
    output_directory=PREPARED_DATA_DIR,
    with_background=True # Set to False if you don't want to include/download background images
)

## Train the model

In [None]:
model = bplusplus.train(
    input_yaml=str(PREPARED_DATA_DIR / "dataset.yaml"),
    output_directory=TRAINED_MODEL_DIR
    #Optional inputs:
    #output_directory: str = ./  # Directory to save the trained model
    # epochs: int = 30  # Number of epochs to train the model
    # imgsz: int = 640  # Image size for training
    # batch: int = 16  # Batch size for training
)

## Validate the model

In [None]:
metrics = bplusplus.validate(model, str(PREPARED_DATA_DIR / "dataset.yaml"))
print(metrics)