# Darknet to Polars dataset

Before start datasets transformations we need to install dvc by calling 

```pip install dvc```

In [1]:
! pip install dvc

Collecting dvc
  Downloading dvc-3.49.0-py3-none-any.whl.metadata (17 kB)
Collecting attrs>=22.2.0 (from dvc)
  Using cached attrs-23.2.0-py3-none-any.whl.metadata (9.5 kB)
Collecting celery (from dvc)
  Using cached celery-5.3.6-py3-none-any.whl.metadata (21 kB)
Collecting configobj>=5.0.6 (from dvc)
  Downloading configobj-5.0.8-py2.py3-none-any.whl.metadata (3.4 kB)
Collecting distro>=1.3 (from dvc)
  Using cached distro-1.9.0-py3-none-any.whl.metadata (6.8 kB)
Collecting dpath<3,>=2.1.0 (from dvc)
  Using cached dpath-2.1.6-py3-none-any.whl.metadata (15 kB)
Collecting dulwich (from dvc)
  Using cached dulwich-0.21.7-cp311-cp311-win_amd64.whl.metadata (4.4 kB)
Collecting dvc-data<3.16,>=3.15 (from dvc)
  Downloading dvc_data-3.15.1-py3-none-any.whl.metadata (5.0 kB)
Collecting dvc-http>=2.29.0 (from dvc)
  Using cached dvc_http-2.32.0-py3-none-any.whl.metadata (1.3 kB)
Collecting dvc-objects (from dvc)
  Downloading dvc_objects-5.1.0-py3-none-any.whl.metadata (3.7 kB)
Collecting dvc

## Download Dataset

To download the dataset we are going to clone playingcards dataset from github.
This dataset is stored in darknet format and has two different partitions, `train` and `validate`

In [2]:
! git clone https://github.com/DeepViewML/playingcards.git 

Cloning into 'playingcards'...


Once the repository is in our PC we need to invoke dvc to download dataset files from S3 bucket.
It could take few minutes based on you internet connection

In [3]:
! cd playingcards && dvc pull

A       dataset\
A       out\static\
A       out\best.h5
A       out\last.h5
A       out\metrics.json
A       out\report.md
A       out\labels.txt
A       out\config.json
8 files added and 2952 files fetched


In [4]:
# Count number of training images

! echo "Training Instances:"


! echo "   - images:" `ls -l playingcards/dataset/images/train | wc -l`
! echo "   - annotations:" `ls -l playingcards/dataset/labels/train | wc -l`

! echo "Validation Instances:"

! echo "   - images:" `ls -l playingcards/dataset/images/validate | wc -l`
! echo "   - annotations:" `ls -l playingcards/dataset/labels/validate | wc -l`

! echo "Quantization Samples:"

! echo "   - images:" `ls -l playingcards/dataset/images/quant | wc -l`


"Training Instances:"


wc: unknown option -- `
Try 'wc --help' for more information.
wc: unknown option -- `
Try 'wc --help' for more information.


"Validation Instances:"


wc: unknown option -- `
Try 'wc --help' for more information.
wc: unknown option -- `
Try 'wc --help' for more information.


"Quantization Samples:"


wc: unknown option -- `
Try 'wc --help' for more information.


In [4]:
! pip install deepview-datasets

Processing d:\work\au-zone\tasks\validator\deepview-datasets
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
  Preparing metadata (pyproject.toml): started
  Preparing metadata (pyproject.toml): finished with status 'done'
Building wheels for collected packages: deepview-datasets
  Building wheel for deepview-datasets (pyproject.toml): started
  Building wheel for deepview-datasets (pyproject.toml): finished with status 'done'
  Created wheel for deepview-datasets: filename=deepview_datasets-0.3.1-py3-none-any.whl size=43901 sha256=f94e1640071f119ada2e1777c1254d4f0c7608665864e5e858942684babf968b
  Stored in directory: c:\users\reinier\appdata\local\pip\cache\wheels\64\51\d7\7a57945c491b475891babec2eace551b4b50c13ba6590263c3
Successfully built deepview-datasets
Installing collected packages: deepview-datasets
  Attempting

In [1]:
import yaml
from deepview.datasets.readers import DarknetDetectionReader
from deepview.datasets.writers.polars import PolarsDetectionWriter

In [2]:
# Reading Classes from dataset


with open("playingcards/dataset.yaml", 'r') as fp:
    true_order_classes = yaml.safe_load(fp).get('classes')

# Defines the reader class that loads dataset images and annotations from 
# disk and return the iterator for safe reading

train_reader = DarknetDetectionReader(
    images="playingcards/dataset/images/train",
    annotations="playingcards/dataset/labels/train",
    classes=true_order_classes
)

val_reader = DarknetDetectionReader(
    images="playingcards/dataset/images/validate",
    annotations="playingcards/dataset/labels/validate",
    classes=true_order_classes
)

	 [INFO] Reading:   0%|[32m               [0m| 0/1328 [00:00<?, ?it/s][32m[0m

	 [INFO] Reading: 100%|[32m███████████████[0m| 1328/1328 [00:00<00:00, 46597.14it/s][32m[0m
	 [INFO] Reading: 100%|[32m███████████████[0m| 148/148 [00:00<00:00, 49348.68it/s][32m[0m


In [3]:
# Defines the writer instance that loads the Darknet reader object and writes the instances
# to hard drive in arrow format

train_writer = PolarsDetectionWriter(
    reader=train_reader,
    output="playingcards-polars/train",
    override=True,
    max_file_size=2.0 # 2GB file chunk
)

val_writer = PolarsDetectionWriter(
    reader=val_reader,
    output="playingcards-polars/validate",
    override=True,
    max_file_size=2.0 # 2GB file chunk
)

train_writer.export()
val_writer.export()




	 [INFO] Writing: 100%|[32m███████████████[0m| 1328/1328 [00:01<00:00, 704.88it/s][32m[0m
	 [INFO] Writing: 100%|[32m███████████████[0m| 148/148 [00:00<00:00, 860.62it/s][32m[0m


In [None]:
# dataset is going to be saved into playingcards-polars folder

! tree -a playingcards-polars/train
! tree -a playingcards-polars/validate

## Reading Dataset

To read the dataset we need to use the polars reader from example below.

In [None]:
from deepview.datasets.readers import PolarsDetectionReader

In [None]:
with open("playingcards/dataset.yaml", 'r') as fp:
    true_order_classes = yaml.safe_load(fp).get('classes')
    
plreader = PolarsDetectionReader(
    inputs="playingcards-polars/train/images_*.arrow",
    annotations = "playingcards-polars/train/boxes_*.arrow",
    classes=['nine', 'ace'] # it could be any order of the classes within true_order_class list. Also could be a subset
)
plreader.classes

In [None]:
# visualize samples.
# To visualize samples make sure opencv-python and matplotlib are installed.
import matplotlib.pyplot as plt
import numpy as np
import cv2
import polars as pl

colors = np.array([
    [180, 0, 0],
    [0, 166, 76],
    [178, 179, 0],
    [2, 1, 181],
    [127, 96, 166],
    [3, 152, 133],
    [121, 121, 121],
    [76, 0, 0],
    [240, 0, 0],
    [107, 123, 61],
    [245, 185, 0],
    [94, 78, 127],
    [202, 2, 202],
    [105, 153, 199],
    [252, 155, 209],
    [53, 76, 32],
    [146, 76, 17],
    [0, 219, 99],
    [142, 206, 70],
    [2, 71, 128]    
], np.uint8)

In [None]:
image, boxes = next(plreader)
H, W, _ = image.shape
image = image.copy()

for xc, yc, w, h, l in boxes:
    x1 = int((xc - w * 0.5) * W)
    x2 = int((xc + w * 0.5) * W)
    
    y1 = int((yc - h *0.5) * H)
    y2 = int((yc + h *0.5) * H)
    color = colors[int(l)].tolist()
        
    cv2.rectangle(image, (x1, y1), (x2, y2), color, 5)

plt.imshow(image)
