# Darknet to Polars dataset

Before start datasets transformations we need to install dvc by calling 

```pip install dvc```

In [None]:
! pip install dvc

## Download Dataset

To download the dataset we are going to clone playingcards dataset from github.
This dataset is stored in darknet format and has two different partitions, `train` and `validate`

In [None]:
! git clone https://github.com/DeepViewML/playingcards.git 

Once the repository is in our PC we need to invoke dvc to download dataset files from S3 bucket.
It could take few minutes based on you internet connection

In [None]:
! cd playingcards && dvc pull

In [None]:
# Count number of training images

! echo "Training Instances:"


! echo "   - images:" `ls -l playingcards/dataset/images/train | wc -l`
! echo "   - annotations:" `ls -l playingcards/dataset/labels/train | wc -l`

! echo "Validation Instances:"

! echo "   - images:" `ls -l playingcards/dataset/images/validate | wc -l`
! echo "   - annotations:" `ls -l playingcards/dataset/labels/validate | wc -l`

! echo "Quantization Samples:"

! echo "   - images:" `ls -l playingcards/dataset/images/quant | wc -l`


In [None]:
! pip install deepview-datasets

In [None]:
import yaml
from deepview.datasets.readers import DarknetDetectionReader
from deepview.datasets.writers.polars import PolarsDetectionWriter

In [None]:
# Reading Classes from dataset


with open("playingcards/dataset.yaml", 'r') as fp:
    true_order_classes = yaml.safe_load(fp).get('classes')

# Defines the reader class that loads dataset images and annotations from 
# disk and return the iterator for safe reading

train_reader = DarknetDetectionReader(
    images="playingcards/dataset/images/train",
    annotations="playingcards/dataset/labels/train",
    classes=true_order_classes
)

val_reader = DarknetDetectionReader(
    images="playingcards/dataset/images/validate",
    annotations="playingcards/dataset/labels/validate",
    classes=true_order_classes
)

In [None]:
# Defines the writer instance that loads the Darknet reader object and writes the instances
# to hard drive in arrow format

train_writer = PolarsDetectionWriter(
    reader=train_reader,
    output="playingcards-polars/dataset/train",
    override=True,
    max_file_size=2.0 # 2GB file chunk
)

val_writer = PolarsDetectionWriter(
    reader=val_reader,
    output="/home/reinier/Datasets/playingcards-polars/dataset/validate",
    override=True,
    max_file_size=2.0 # 2GB file chunk
)

train_writer.export()
val_writer.export()
train_writer.export_dataset_configuration_file(
    "playingcards-polars/dataset.yaml",
    "dataset/train", # make this path relative to the entire dataset
    "dataset/validate" # make this path relative to the entire dataset
)


In [None]:
# dataset is going to be saved into playingcards-polars folder

! tree -a playingcards-polars/train
! tree -a playingcards-polars/validate

## Reading Dataset

To read the dataset we need to use the polars reader from example below.

In [None]:
from deepview.datasets.readers import PolarsDetectionReader

In [None]:
from deepview.datasets.generators import ObjectDetectionGenerator

generator = ObjectDetectionGenerator("playingcards-polars/dataset.yaml")
plreader = generator.get_train_generator()

generator.get_class_distribution()

In [None]:
# visualize samples.
# To visualize samples make sure opencv-python and matplotlib are installed.
import matplotlib.pyplot as plt
import numpy as np
import cv2
import polars as pl

colors = np.array([
    [180, 0, 0],
    [0, 166, 76],
    [178, 179, 0],
    [2, 1, 181],
    [127, 96, 166],
    [3, 152, 133],
    [121, 121, 121],
    [76, 0, 0],
    [240, 0, 0],
    [107, 123, 61],
    [245, 185, 0],
    [94, 78, 127],
    [202, 2, 202],
    [105, 153, 199],
    [252, 155, 209],
    [53, 76, 32],
    [146, 76, 17],
    [0, 219, 99],
    [142, 206, 70],
    [2, 71, 128]    
], np.uint8)

In [None]:
image, boxes = next(plreader)
H, W, _ = image.shape
image = image.copy()

for xc, yc, w, h, l in boxes:
    x1 = int((xc - w * 0.5) * W)
    x2 = int((xc + w * 0.5) * W)
    
    y1 = int((yc - h *0.5) * H)
    y2 = int((yc + h *0.5) * H)
    color = colors[int(l)].tolist()
        
    cv2.rectangle(image, (x1, y1), (x2, y2), color, 5)

plt.imshow(image)
