B++ repository

This project provides a complete, end-to-end pipeline for building a custom insect classification system. The framework is designed to be domain-agnostic, allowing you to train a powerful detection and classification model for any insect species by simply providing a list of names.

Using the Bplusplus library, this pipeline automates the entire machine learning workflow, from data collection to video inference.

Key Features

Automated Data Collection: Downloads hundreds of images for any species from the GBIF database.
Intelligent Data Preparation: Uses a pre-trained model to automatically find, crop, and resize insects from raw images, ensuring high-quality training data.
Hierarchical Classification: Trains a model to identify insects at three taxonomic levels: family, genus, and species.
Video Inference & Tracking: Processes video files to detect, classify, and track individual insects over time, providing aggregated predictions.

Pipeline Overview

The process is broken down into six main steps, all detailed in the full_pipeline.ipynb notebook:

Collect Data: Select your target species and fetch raw insect images from the web.
Prepare Data: Filter, clean, and prepare images for training.
Train Model: Train the hierarchical classification model.
Download Weights: Fetch pre-trained weights for the detection model.
Test Model: Evaluate the performance of the trained model.
Run Inference: Run the full pipeline on a video file for real-world application.

How to Use

Prerequisites

Python 3.10+

Setup

Create and activate a virtual environment:

python3 -m venv venv
source venv/bin/activate

Install the required packages:
```
pip install bplusplus
```

Running the Pipeline

The pipeline can be run step-by-step using the functions from the bplusplus library. While the full_pipeline.ipynb notebook provides a complete, executable workflow, the core functions are described below.

Step 1: Collect Data

Download images for your target species from the GBIF database. You'll need to provide a list of scientific names.

import bplusplus
from pathlib import Path

# Define species and directories
names = ["Vespa crabro", "Vespula vulgaris", "Dolichovespula media"]
GBIF_DATA_DIR = Path("./GBIF_data")

# Define search parameters
search = {"scientificName": names}

# Run collection
bplusplus.collect(
    group_by_key=bplusplus.Group.scientificName,
    search_parameters=search,
    images_per_group=200,  # Recommended to download more than needed
    output_directory=GBIF_DATA_DIR,
    num_threads=5
)

Step 2: Prepare Data

Process the raw images to extract, crop, and resize insects. This step uses a pre-trained model to ensure only high-quality images are used for training.

PREPARED_DATA_DIR = Path("./prepared_data")

bplusplus.prepare(
    input_directory=GBIF_DATA_DIR,
    output_directory=PREPARED_DATA_DIR,
    img_size=640  # Target image size for training
)

Step 3: Train Model

Train the hierarchical classification model on your prepared data. The model learns to identify family, genus, and species.

TRAINED_MODEL_DIR = Path("./trained_model")

bplusplus.train(
    batch_size=4,
    epochs=30,
    patience=3,
    img_size=640,
    data_dir=PREPARED_DATA_DIR,
    output_dir=TRAINED_MODEL_DIR,
    species_list=names
)

Step 4: Download Detection Weights

The inference pipeline uses a separate, pre-trained YOLO model for initial insect detection. You need to download its weights manually.

You can download the weights file from this link.

Place it in the trained_model directory and ensure it is named yolo_weights.pt.

Step 5: Run Inference on Video

Process a video file to detect, classify, and track insects. The final output is an annotated video and a CSV file with aggregated results for each tracked insect.

VIDEO_INPUT_PATH = Path("my_video.mp4")
VIDEO_OUTPUT_PATH = Path("my_video_annotated.mp4")
HIERARCHICAL_MODEL_PATH = TRAINED_MODEL_DIR / "best_multitask.pt"
YOLO_WEIGHTS_PATH = TRAINED_MODEL_DIR / "yolo_weights.pt"

bplusplus.inference(
    species_list=names,
    yolo_model_path=YOLO_WEIGHTS_PATH,
    hierarchical_model_path=HIERARCHICAL_MODEL_PATH,
    confidence_threshold=0.35,
    video_path=VIDEO_INPUT_PATH,
    output_path=VIDEO_OUTPUT_PATH,
    tracker_max_frames=60,
    fps=15  # Optional: set processing FPS
)

Customization

To train the model on your own set of insect species, you only need to change the names list in Step 1. The pipeline will automatically handle the rest.

# To use your own species, change the names in this list
names = [
    "Vespa crabro",
    "Vespula vulgaris",
    "Dolichovespula media",
    # Add your species here
]

Handling an "Unknown" Class

To train a model that can recognize an "unknown" class for insects that don't belong to your target species, add "unknown" to your species_list. You must also provide a corresponding unknown folder containing images of various other insects in your data directories (e.g., prepared_data/train/unknown).

# Example with an unknown class
names_with_unknown = [
    "Vespa crabro",
    "Vespula vulgaris",
    "unknown"
]

Directory Structure

The pipeline will create the following directories to store artifacts:

GBIF_data/: Stores the raw images downloaded from GBIF.
prepared_data/: Contains the cleaned, cropped, and resized images ready for training.
trained_model/: Saves the trained model weights (best_multitask.pt) and pre-trained detection weights.

Citation

All information in this GitHub is available under MIT license, as long as credit is given to the authors.

Venverloo, T., Duarte, F., B++: Towards Real-Time Monitoring of Insect Species. MIT Senseable City Laboratory, AMS Institute.

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
README.md		README.md
full_pipeline.ipynb		full_pipeline.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

B++ repository

Key Features

Pipeline Overview

How to Use

Prerequisites

Setup

Running the Pipeline

Step 1: Collect Data

Step 2: Prepare Data

Step 3: Train Model

Step 4: Download Detection Weights

Step 5: Run Inference on Video

Customization

Handling an "Unknown" Class

Directory Structure

Citation

About

Uh oh!

Releases 6

Packages

Uh oh!

Contributors 3

Uh oh!

Languages

Tvenver/Bplusplus

Folders and files

Latest commit

History

Repository files navigation

B++ repository

Key Features

Pipeline Overview

How to Use

Prerequisites

Setup

Running the Pipeline

Step 1: Collect Data

Step 2: Prepare Data

Step 3: Train Model

Step 4: Download Detection Weights

Step 5: Run Inference on Video

Customization

Handling an "Unknown" Class

Directory Structure

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 6

Packages 0

Uh oh!

Contributors 3

Uh oh!

Languages

Packages