# SARD - Search and rescue image dataset for person detection

Follow this notebook to prepare SARD dataset.

**Description:**
- From the recordings with a total length of about 35 minutes, 1,981 single frames with people on them were singled out. In the selected images, the persons were manually tagged so that the set could be used to train the supervised model. Tagging of persons was done using the LabelImg tool. The image annotation consists of the position of the bounding box around each object of interest, the size of the bounding box in terms of width and height, and the corresponding class designation (Standing, Walking, Running, Sitting, Lying, Not Defined) for the person.

**Annotations:**
- in .csv format (one file for whole directory):
```csv
filename,width,height,class,xmin,ymin,xmax,ymax
gss1307.jpg,1920,1080,person,309,666,358,740
gss1307.jpg,1920,1080,person,1798,321,1836,358
gss2100.jpg,1920,1080,person,1196,648,1256,700
...
```
- bounding box in annotation in xmin, ymin, xmax, ymax format

**Table of content:**

0. Init - imports and data download
1. Data annotation cleaning
2. Data transformation
3. Data visualization


## 0. Init - imports and data download
Download sard.zip files and extract them to `data/source/Sard` dir. After extract data should look like this:
```
data
└───source
    └───Sard
        ├───train
        └───val
```

In [None]:
# Uncomment below two lines to reload imported packages (in case of modifying them)
# %load_ext autoreload
# %autoreload 2

# Imports
import os
import random
import numpy as np
import pandas as pd
import shutil
import xmltodict
import json
import cv2
import pybboxes as pbx
from pathlib import Path

from prj_utils.consts import ROOT_DIR
from data_processing.image_processing import plot_xywhn_annotated_image_from_file, get_brightness_stats, copy_annotated_images, get_number_of_objects_stats

# Consts
TRAIN_DIR = f'{ROOT_DIR}/data/source/Sard/train'
VAL_DIR = f'{ROOT_DIR}/data/source/Sard/val'

TRAIN_PROCESSED_DIR = f'{ROOT_DIR}/data/processed/Sard/train'
VAL_PROCESSED_DIR = f'{ROOT_DIR}/data/processed/Sard/validate'
TEST_PROCESSED_DIR = f'{ROOT_DIR}/data/processed/Sard/test'

## 1. Data transformation
- Transform labels from xml format to yolo .txt files
- Split train data into train and validate dataset

After this step processed data directory should look like this:
```
data
└───processed
    └───Sard
        ├───test
        │   ├───images
        │   └───labels
        ├───train
        │   ├───images
        │   └───labels
        └───validate
            ├───images
            └───labels
```

## 1.1 Transform labels from .csv format to yolo .txt files

Yolo format:
- One *.txt file per image (if no objects in image, no *.txt file is required).
- One row per object.
- Each row is `class x_center y_center scaled_width scaled_height` format, separated by space.
- Box coordinates must be in normalized from 0 to 1. If your boxes are in pixels, divide x_center and width by image width, and y_center and height by image height.
- Bounding box in annotation in xywhn format.
- Class numbers are zero-indexed (start from 0).
- Files are saved into `data/Sard/processed/test` and `data/Sard/processed/train` to images and labels directory.


In [None]:
def process_directory(input_directory, output_directory):
    Path(f'{output_directory}/images').mkdir(parents=True, exist_ok=True)
    Path(f'{output_directory}/labels').mkdir(parents=True, exist_ok=True)

    labels_file = [f for f in os.listdir(input_directory) if f.endswith('.csv')][0]
    labels_filepath = os.path.join(input_directory, labels_file)
    labels_df = pd.read_csv(labels_filepath)

    files = [f for f in os.listdir(input_directory) if os.path.isfile(os.path.join(input_directory, f)) and not f.endswith('.csv')]

    for image_file in files:
        image_filename = Path(image_file).stem
        image_filepath = os.path.join(input_directory, image_file)

        if not image_filename[-1].isdigit():
            print(f'Warning: file {image_file} is augmented - skipping file')
            continue

        output_image_filepath = f'{output_directory}/images/{image_file}'
        output_labels_filepath = f'{output_directory}/labels/{image_filename}.txt'

        file_labels_df = labels_df.loc[labels_df['filename'] == image_file]

        if len(file_labels_df) < 1:
            print(f'Warning: file {image_file} does not contain any objects - skipping file')
            continue

        yolo_labels = []

        for _, label in file_labels_df.iterrows():
            if label['class'] == 'person':
                bbox = (int(label['xmin']), int(label['ymin']), int(label['xmax']), int(label['ymax']))
                image_size = (label['width'], label['height'])
                try:
                    yolo_bbox = pbx.convert_bbox(bbox, image_size=image_size, from_type="voc", to_type="yolo")
                except:
                    print(f'Warning: wrong label in file {image_file}. This should happen only in case of row "gss2104.jpg,1920,1080,person,527,464,527,464" in "val_labels.csv" file. This label is incorrect, but removing it fixes the problem.')
                    if image_file == 'gss2104.jpg':
                        continue
                    else:
                        raise
                yolo_label = (0,) + yolo_bbox
                yolo_labels.append(yolo_label)
            else:
                print("Warning: unknown object name")

        shutil.copyfile(image_filepath, output_image_filepath)

        with open(output_labels_filepath, 'w') as f:
            for label in yolo_labels:
                line = ' '.join([str(l) for l in label])
                f.write(f'{line}\n')

        #plot_xywhn_annotated_image_from_file(output_image_filepath, output_labels_filepath)

process_directory(TRAIN_DIR, TRAIN_PROCESSED_DIR)
process_directory(VAL_DIR, VAL_PROCESSED_DIR)

## 1.2 Change validate set to test set and split train data into train and validate dataset

Rename directory `data/Sard/processed/validate` to `data/Sard/processed/test`
Move random probes from `data/Sard/processed/train` to `data/Sard/processed/validate`.

In [None]:
random.seed(1)
np.random.seed(1)

shutil.move(VAL_PROCESSED_DIR, TEST_PROCESSED_DIR)

images_dir = f'{TRAIN_PROCESSED_DIR}/images'
filenames = [f for f in os.listdir(images_dir) if os.path.isfile(os.path.join(images_dir, f))]
split = int(0.75 * len(filenames))

np.random.shuffle(filenames)
train_filenames = filenames[:split]
val_filenames = filenames[split:]

Path(f'{VAL_PROCESSED_DIR}/images').mkdir(parents=True, exist_ok=True)
Path(f'{VAL_PROCESSED_DIR}/labels').mkdir(parents=True, exist_ok=True)

#todo: move validate files to data/Heridal/processed/validate directory
for file in val_filenames:
    filename = Path(file).stem

    image_filepath = f'{TRAIN_PROCESSED_DIR}/images/{file}'
    label_filepath = f'{TRAIN_PROCESSED_DIR}/labels/{filename}.txt'

    output_image_filepath = f'{VAL_PROCESSED_DIR}/images/{file}'
    output_label_filepath = f'{VAL_PROCESSED_DIR}/labels/{filename}.txt'

    shutil.move(image_filepath, output_image_filepath)
    shutil.move(label_filepath, output_label_filepath)
