<img src="https://www.luxonis.com/logo.svg" width="400">

# DataDreamer Tutorial: Generating a dataset for object detection, training a model, and deploying it to the OAK

Install the required dependencies.

In [None]:
!pip install datadreamer

Collecting datadreamer
  Downloading datadreamer-0.2.1-py3-none-any.whl.metadata (32 kB)
Collecting torch<=2.5.1,>=2.0.0 (from datadreamer)
  Downloading torch-2.5.1-cp311-cp311-manylinux1_x86_64.whl.metadata (28 kB)
Collecting compel>=2.0.0 (from datadreamer)
  Downloading compel-2.1.1-py3-none-any.whl.metadata (13 kB)
Collecting bitsandbytes>=0.42.0 (from datadreamer)
  Downloading bitsandbytes-0.46.1-py3-none-manylinux_2_24_x86_64.whl.metadata (10 kB)
Collecting luxonis-ml>=0.6.1 (from luxonis-ml[all]>=0.6.1->datadreamer)
  Downloading luxonis_ml-0.7.3-py3-none-any.whl.metadata (26 kB)
Collecting optimum-quanto>=0.2.6 (from datadreamer)
  Downloading optimum_quanto-0.2.7-py3-none-any.whl.metadata (13 kB)
Collecting loguru>=0.7.0 (from datadreamer)
  Downloading loguru-0.7.3-py3-none-any.whl.metadata (22 kB)
Collecting sam2>=1.1.0 (from datadreamer)
  Downloading sam2-1.1.0.tar.gz (152 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m152.8/152.8 kB[0m [31m11.8 MB/s

## 🗃️ Generate a dataset with your own classes (might take some time to download all models)

Make sure you are using the GPU runtime type (in Google Colab).

~8 min to generate 100 images

~2 min to annotate them

In [None]:
!datadreamer --save_dir generated_dataset \
             --class_names robot tractor horse car person bear \
             --prompts_number 100 \
             --disable_lm_filter \
             --prompt_generator simple \
             --num_objects_range 2 3 \
             --image_generator sdxl-turbo \
             --use_tta \
             --image_annotator owlv2 \
             --conf_threshold 0.15 \
             --vis_anns \
             --seed 42

### Parameters
- `--save_dir` (required): Path to the directory for saving generated images and annotations.
- `--class_names` (required): Space-separated list of object names for image generation and annotation. Example: `person moon robot`.
- `--prompts_number` (optional): Number of prompts to generate for each object. Defaults to `10`.
- `--annotate_only` (optional): Only annotate the images without generating new ones, prompt and image generator will be skipped. Defaults to `False`.
- `--task`: Choose between `detection`, `classification` and `instance-segmentation`. Default is `detection`.
- `--dataset_format`: Format of the dataset. Defaults to `raw`. Supported values: `raw`, `yolo`, `coco`, `voc`, `luxonis-dataset`, `cls-single`.
- `--split_ratios`: Split ratios for train, validation, and test sets. Defaults to `[0.8, 0.1, 0.1]`.
- `--num_objects_range`: Range of objects in a prompt. Default is 1 to 3.
- `--prompt_generator`: Choose between `simple`, `lm` (Mistral-7B), `tiny` (tiny LM), and `qwen2` (Qwen2.5 LM). Default is `qwen2`.
- `--image_generator`: Choose image generator, e.g., `sdxl`, `sdxl-turbo`, `sdxl-lightning` or `shuttle-3`. Default is `sdxl-turbo`.
- `--image_annotator`: Specify the image annotator, like `owlv2` for object detection or `aimv2` or `clip` for image classification or `owlv2-slimsam` and `owlv2-sam2` for instance segmentation. Default is `owlv2`.
- `--conf_threshold`: Confidence threshold for annotation. Default is `0.15`.
- `--annotation_iou_threshold`: Intersection over Union (IoU) threshold for annotation. Default is `0.2`.
- `--prompt_prefix`: Prefix to add to every image generation prompt. Default is `""`.
- `--prompt_suffix`: Suffix to add to every image generation prompt, e.g., for adding details like resolution. Default is `", hd, 8k, highly detailed"`.
- `--negative_prompt`: Negative prompts to guide the generation away from certain features. Default is `"cartoon, blue skin, painting, scrispture, golden, illustration, worst quality, low quality, normal quality:2, unrealistic dream, low resolution,  static, sd character, low quality, low resolution, greyscale, monochrome, nose, cropped, lowres, jpeg artifacts, deformed iris, deformed pupils, bad eyes, semi-realistic worst quality, bad lips, deformed mouth, deformed face, deformed fingers, bad anatomy"`.
- `--use_tta`: Toggle test time augmentation for object detection. Default is `False`.
- `--synonym_generator`: Enhance class names with synonyms. Default is `none`. Other options are `llm`, `wordnet`.
- `--use_image_tester`: Use image tester for image generation. Default is `False`.
- `--image_tester_patience`: Patience level for image tester. Default is `1`.
- `--lm_quantization`: Quantization to use for Mistral language model. Choose between `none` and `4bit`. Default is `none`.
- `--annotator_size`: Size of the annotator model to use. Choose between `base` and `large`. Default is `base`.
- `--disable_lm_filter`: Use only a bad word list for profanity filtering. Default is `False`.
- `--keep_unlabeled_images`: Whether to keep images without any annotations. Default if `False`.
- `--batch_size_prompt`: Batch size for prompt generation. Default is 64.
- `--batch_size_annotation`: Batch size for annotation. Default is `1`.
- `--batch_size_image`: Batch size for image generation. Default is `1`.
- `--raw_mask_format`: Format of segmentations masks when saved in raw dataset format. Default is `rle`.
- `--vis_anns`: Whether to save visualizations of annotations. Default is `False`.
- `--device`: Choose between `cuda` and `cpu`. Default is `cuda`.
- `--seed`: Set a random seed for image and prompt generation. Default is `42`.
- `--config`: A path to an optional `.yaml` config file specifying the pipeline's arguments.


In [None]:
import os

from IPython.display import Image

Image(filename=os.path.join("generated_dataset/bboxes_visualization", "bbox_0000070.jpg"))

## ✍ Convert the dataset to YOLO format

In [None]:
from datadreamer.utils.convert_dataset import convert_dataset

In [None]:
convert_dataset(
    input_dir="generated_dataset",
    output_dir="generated_dataset_yolo",
    dataset_format="yolo",
    split_ratios=[0.8, 0.1, 0.1],
    copy_files=True,
)

In [None]:
!ls generated_dataset_yolo

## 🏋️‍♂️ Train your model (YOLOv8 as an example)

In [None]:
!pip install ultralytics

In [None]:
from ultralytics import YOLO

model = YOLO("yolov8n.pt")  # load a pretrained model

In [None]:
results = model.train(data="generated_dataset_yolo/data.yaml", epochs=50)

### 🧠 Show the predictions

In [None]:
Image(filename=os.path.join(results.save_dir, "val_batch0_pred.jpg"))

In [None]:
metrics = model.val()

## 💾 Weights Download

By default, the library saves the weights with the best performance into the `best.pt` file inside the corresponding run folder. We'll rename it to reflect the model better.

In [None]:
!cp runs/detect/train/weights/best.pt yolov8n_trained_datadreamer.pt

We'll download the weights to convert them in the next step.

In [None]:
from google.colab import files
files.download("yolov8n_trained_datadreamer.pt")

<a name="conversion"></a>

## 🗂️ Conversion

Now that we have successfully trained the model, we aim to deploy it to the Luxonis device. The model's specific format depends on the Luxonis device series you have. We will show you how to use our [`ModelConverter`](https://github.com/luxonis/modelconverter) to convert the model as simply as possible.

We'll start by installing the `ModelConverter`.

In [None]:
%pip install -q modelconv==0.4.0 -U


We will use the `ModelConverter` Python API, which leverages our [`HubAI`](https://hub.luxonis.com) platform to perform model conversion in the background. To get started, you'll need to create an account on `HubAI` and obtain your team’s API key.

In [None]:
HUBAI_API_KEY = "<YOUR_HUBAI_API_KEY>"

Model conversion can be done via either the CLI or the Python API — here, we'll use the latter. For more information, see the [online usage section](https://github.com/luxonis/modelconverter?tab=readme-ov-file#online-usage) of the documentation.

The call below creates a new model card within your team on `HubAI`, uploads the model file and metadata, then performs cloud-side conversion to the selected target platform (e.g., [`RVC2`](https://rvc4.docs.luxonis.com/hardware/platform/rvc/rvc2/), [`RVC4`](https://rvc4.docs.luxonis.com/hardware/platform/rvc/rvc4/)). Once completed, the converted model is automatically downloaded to your device.

For HubAI-specific conversion parameters, refer to the [online conversion section](https://github.com/luxonis/modelconverter/tree/e6a3478ba47d8f92d4d60217f2aee0f4f468cb14/modelconverter/hub#online-conversion) of the ModelConverter documentation. Platform-specific parameters are also documented there.


In [None]:
from modelconverter import convert

labels = ["robot", "tractor", "horse", "car", "person", "bear"]

converted_model = convert.RVC2(
    api_key=HUBAI_API_KEY,
    path="yolov8n_trained_datadreamer.pt",
    name="YOLOv8n DataDreamer",
    description_short="Detection model trained on a synthetic dataset generated with DataDreamer.",
    yolo_version="yolov8",
    yolo_input_shape="512 288",
    yolo_class_names=labels,
    tasks=["OBJECT_DETECTION"],
    license_type="MIT",
    is_public=False
)

We have successfully converted our trained model for an RVC2 device, so let's test it! Please copy the path to the downloaded archive with the converted model from the output log of the last code cell; we will use it in the next section.

In [None]:
MODEL_PATH = "<YOUR_DOWNLOADED_MODEL_ARCHIVE_PATH>"

<a name="depthai-script"></a>

## 📷 DepthAI Script

To test our model on one of our cameras, we first need to install [`DepthAI`](https://rvc4.docs.luxonis.com/software/) in version 3 and [`DepthAI Nodes`](https://rvc4.docs.luxonis.com/software/ai-inference/depthai-nodes/). Moreover, the script we'll write must run locally and require a Luxonis device connected to your machine.

In [None]:
%pip install -q depthai==3.0.0rc2 -U
%pip install -q depthai-nodes==0.3.0 -U

To run the model on a DepthAI device using the script below, please note the following:

- You can view the output stream by opening [http://localhost:8082](http://localhost:8082) in your browser.

- If you're running the script from a Jupyter Notebook, the output may not appear directly within the notebook. The script should print a link pointing to [http://localhost:8082](http://localhost:8082) for accessing the stream.

- To stop the video stream, press **`q`** while focused on the visualizer page.

In [None]:
DEVICE = None # Set to None to use the default device, or you can specify a specific device IP

In [None]:
import depthai as dai
from depthai_nodes.node import ParsingNeuralNetwork, ImgDetectionsBridge

device = dai.Device(dai.DeviceInfo(DEVICE)) if DEVICE else dai.Device()
platform = device.getPlatform()
img_frame_type = dai.ImgFrame.Type.BGR888i if platform.name == "RVC4" else dai.ImgFrame.Type.BGR888p
visualizer = dai.RemoteConnection(httpPort=8082)

with dai.Pipeline(device) as pipeline:
    cam = pipeline.create(dai.node.Camera).build()
    nn_archive = dai.NNArchive(MODEL_PATH)
    # Create the neural network node
    nn_with_parser = pipeline.create(ParsingNeuralNetwork).build(
        cam.requestOutput((512, 288), type=img_frame_type, fps=30),
        nn_archive
    )

    # Bridge the detections to the visualizer
    label_encoding = {k: v for k, v in enumerate(nn_archive.getConfig().model.heads[0].metadata.classes)}
    bridge = pipeline.create(ImgDetectionsBridge).build(nn_with_parser.out)
    bridge.setLabelEncoding(label_encoding)

    # Configure the visualizer node
    visualizer.addTopic("Video", nn_with_parser.passthrough, "images")
    visualizer.addTopic("Detections", bridge.out, "detections")

    pipeline.start()
    visualizer.registerPipeline(pipeline)

    while pipeline.isRunning():
        key = visualizer.waitKey(1)
        if key == ord("q"):
            print("Got q key from the remote connection!")
            break