# Convert BDD100K To YOLOV5 PyTorch / Scaled YOLOV4 / YOLOV4 /YOLOX

The Berkeley Deep Drive (BDD) dataset is one of the largest and most
diverse video datasets for autonomous vehicles.

The BDD100K dataset contains 100,000 video clips collected from more than
50,000 rides covering New York, San Francisco Bay Area, and other regions.
The dataset contains diverse scene types such as city streets, residential
areas, and highways. Furthermore, the videos were recorded in diverse
weather conditions at different times of the day.

The videos are split into training (70K), validation (10K) and testing
(20K) sets. Each video is 40 seconds long with 720p resolution and a frame
rate of 30fps. The frame at the 10th second of each video is annotated for
image classification, detection, and segmentation tasks.

In order to load the BDD100K dataset, you must download the source data
manually. The directory should be organized in the following format::

    source_dir/
        labels/
            bdd100k_labels_images_train.json
            bdd100k_labels_images_val.json
        images/
            100k/
                train/
                test/
                val/

You can register at https://bdd-data.berkeley.edu in order to get links to
download the data.

Example usage::

    import fiftyone as fo
    import fiftyone.zoo as foz

    # The path to the source files that you manually downloaded
    source_dir = "/path/to/dir-with-bdd100k-files"

    dataset = foz.load_zoo_dataset(
        "bdd100k",
        split="validation",
        source_dir=source_dir,
    )

    session = fo.launch_app(dataset)

Dataset size
    7.10 GB

Source
    https://bdd-data.berkeley.edu

Args:
    source_dir (None): the directory containing the manually downloaded
        BDD100K files
    copy_files (True): whether to move (False) or create copies (True) of
        the source files when populating the dataset directory

***** Tags *****
image, multilabel, automotive, manual

***** Supported splits *****
train, validation, test

***** Dataset location *****
Dataset 'bdd100k' is not downloaded


## New Labels or Old Labels?

BDD100K had an update on the labels, feel free to use whatever but leave the file folder structure the Same

## First step Import and Analyse the Dataset

In [3]:
import fiftyone as fo
import fiftyone.zoo as foz

# The path to the source files that you manually downloaded
source_dir = "bdd100k/"

dataset = foz.load_zoo_dataset(
    "bdd100k",
    split="validation",
    source_dir=source_dir,
    copy_files=False,
)

session = fo.launch_app(dataset)

Split 'validation' already prepared
Loading existing dataset 'bdd100k-validation'. To reload from disk, either delete the existing dataset or provide a custom `dataset_name` to use


In [1]:
#!fiftyone zoo datasets info bdd100k

## Choose and Export dataset type

Uncomment the fo.type that you want to conver to: 

YOLOV5 - YOLOv5Dataset
Scaled YOLOV4 - YOLOv5Dataset

YOLOV4 - YOLOv4Dataset
YOLOX - VOCDetectionDataset or COCODetectionDataset

In [2]:
import fiftyone as fo

# The Dataset or DatasetView containing the samples you wish to export
dataset_or_view = dataset

# The directory to which to write the exported dataset
export_dir = "bdd_in_YOLOV5_train_newLabels/"


# The type of dataset to export
# Any subclass of `fiftyone.types.Dataset` is supported

#Uncomment what ever format you wish to conver to

#YOLOV5
dataset_type = fo.types.YOLOv5Dataset  # for example


# Export the dataset
dataset_or_view.export(
    export_dir=export_dir,
    dataset_type=dataset_type
    #export_media="copy",
    #label_field=label_field,
)

 100% |█████████████| 70000/70000 [16.3m elapsed, 0s remaining, 72.5 samples/s]       
