# Introduction
In this notebook, we will demonstrate the process of collecting, preparing, and training a model using the Bplusplus library.

The steps include:
1. Installing the required packages.
2. Importing the necessary modules.
3. Setting up the directories for data storage.
4. Collecting insect images from the Global Biodiversity Information Facility (GBIF).
5. Preparing the collected images for training.
6. Training a YOLO model on the prepared dataset.
7. Validating the trained model.




## Make virtual environment (recommended)
It is recommended to create a virtual environment to manage dependencies and avoid conflicts.
 
To create a virtual environment, open your terminal and run the following commands:
 
```bash
python3 -m venv bplusplus_env
source bplusplus_env/bin/activate
```

This will create and activate a virtual environment named `bplusplus_env`.

## Install required packages

In [None]:
#!pip install bplusplus

## Import required packages

In [1]:
import bplusplus
from typing import Any
from pathlib import Path

## Set directories

In [2]:
MAIN_DIR = Path("/mnt/nvme1n1p1/datasets/sample")

GBIF_DATA_DIR = MAIN_DIR / "GBIF_data"
PREPARED_DATA_DIR = MAIN_DIR / "prepared_data"
TRAINED_MODEL_DIR = MAIN_DIR / "trained_model"

## Collect insect images from GBIF

In [4]:
names = ["Coccinella septempunctata", "Apis mellifera", "Bombus lapidarius", "Bombus terrestris"]

search: dict[str, Any] = {
    "scientificName": names
}

bplusplus.collect(
    group_by_key=bplusplus.Group.scientificName,
    search_parameters=search, 
    images_per_group=50,
    output_directory=GBIF_DATA_DIR,
    num_threads=3
)


Thread 0 starting collection for 1 species.
Creating folders for images...
Thread 1 starting collection for 1 species.
Creating folders for images...
Beginning to collect images from GBIF...
Beginning to collect images from GBIF...
Thread 2 starting collection for 1 species.
Creating folders for images...
Beginning to collect images from GBIF...
Thread 3 starting collection for 1 species.
Creating folders for images...
Beginning to collect images from GBIF...


KeyboardInterrupt: 

Downloading 50 images into the Bombus terrestris folder...


Downloading images for Bombus terrestris:  18%|█▊        | 9/50 [00:03<00:16,  2.54image/s]

Downloading 50 images into the Bombus lapidarius folder...


Downloading images for Bombus terrestris:  50%|█████     | 25/50 [00:11<00:08,  3.05image/s]Exception in thread Exception in threading.excepthook:
Exception ignored in thread started by: <bound method Thread._bootstrap of <Thread(Thread-8 (__collect_subset), stopped 127493149296192)>>
Traceback (most recent call last):
  File "/usr/lib/python3.10/threading.py", line 973, in _bootstrap
    self._bootstrap_inner()
  File "/usr/lib/python3.10/threading.py", line 1018, in _bootstrap_inner
    self._invoke_excepthook(self)
  File "/usr/lib/python3.10/threading.py", line 1336, in invoke_excepthook
    local_print("Exception in threading.excepthook:",
  File "/mnt/nvme1n1p1/mit/bplusplus-env/lib/python3.10/site-packages/ipykernel/iostream.py", line 604, in flush
    self.pub_thread.schedule(self._flush)
  File "/mnt/nvme1n1p1/mit/bplusplus-env/lib/python3.10/site-packages/ipykernel/iostream.py", line 267, in schedule
    self._event_pipe.send(b"")
  File "/mnt/nvme1n1p1/mit/bplusplus-env/lib/

Downloading 50 images into the Coccinella septempunctata folder...

Downloading images for Bombus terrestris:  60%|██████    | 30/50 [00:16<00:13,  1.44image/s]

Downloading 50 images into the Apis mellifera folder...



Downloading images for Bombus terrestris:  66%|██████▌   | 33/50 [00:16<00:06,  2.43image/s]
Downloading images for Bombus terrestris:  70%|███████   | 35/50 [00:17<00:05,  2.80image/s]
Downloading images for Bombus terrestris:  78%|███████▊  | 39/50 [00:18<00:03,  3.15image/s]
Downloading images for Bombus terrestris:  82%|████████▏ | 41/50 [00:19<00:02,  4.12image/s]
Downloading images for Bombus terrestris:  88%|████████▊ | 44/50 [00:20<00:01,  3.46image/s]
Downloading images for Bombus terrestris:  90%|█████████ | 45/50 [00:20<00:02,  2.44image/s]
Downloading images for Bombus terrestris:  98%|█████████▊| 49/50 [00:22<00:00,  2.46image/s]
Downloading images for Bombus terrestris: 100%|██████████| 50/50 [00:23<00:00,  2.09image/s]


Finished collecting images.
Thread 3 finished collection.



[A
[A
[A
[A
[A
[A
[A
[A
Downloading images for Bombus lapidarius: 100%|██████████| 50/50 [00:30<00:00,  1.65image/s]


Finished collecting images.
Thread 2 finished collection.



[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
Downloading images for Apis mellifera: 100%|██████████| 50/50 [00:58<00:00,  1.17s/image]


Finished collecting images.
Thread 1 finished collection.


## Prepare the dataset for training (yolov8)

In [None]:
bplusplus.prepare(
    input_directory=GBIF_DATA_DIR,
    output_directory=PREPARED_DATA_DIR,
    with_background=True # Set to False if you don't want to include/download background images
)

## Train the model

In [None]:
model = bplusplus.train(
    input_yaml=str(PREPARED_DATA_DIR / "dataset.yaml"),
    output_directory=TRAINED_MODEL_DIR
    #Optional inputs:
    #output_directory: str = ./  # Directory to save the trained model
    # epochs: int = 30  # Number of epochs to train the model
    # imgsz: int = 640  # Image size for training
    # batch: int = 16  # Batch size for training
)

## Validate the model

In [None]:
metrics = bplusplus.validate(model, str(PREPARED_DATA_DIR / "dataset.yaml"))
print(metrics)