# Crowd-YOLO (CYOLO)
- Step 0: Prepare necessary environment
- Step 1: Prepare data
- Step 2: Run experiments

### Step 0: Prepare necessary environment
- Ensure that there exists "master" data:
  - Images (`<data_directory>/images/<image_name>.jpg`)
  - Labels (`<data_directory>/labels/<volunteer_name>/<image_name>.txt`)
    - Experts would give ground-truth labels
    - Non-experts would give crowdsourced labels
- From the master data, we prepare multiple datasets for multiple experiments based on the following modes:
  - Size (**sz**) modes: toy (**to**), full (**fu**)
    - When in the toy mode, we only use a small sample of images (and their annotations). This is done mainly to check that our code works well. It's only a unit-test for our code. But the main experiment would be conducted on the full data (in the full (**fu**) mode).
  - Multiplicity (**ml**) modes: single (**si**), all (**all**)
    - To use labels from a single (**si**) labeller or multiple (**all**) labellers.
  - Train-Validation-Test modes:
    - Train (**tr**), Validation (**va**), Test (**te**)
      - What data to be used for training, validation, and testing of the model learnt.
    - Experts (**e**), Non-experts (**ne**)
      - Since labellers could either be experts or non-experts, we need to decide whether we want to use labels from experts or from non-experts. See below for examples.
    - Repeated (**r**), Crowdsourced (**c**)
      - If an algorithm that accepts training data in the usual form (i.e., one set of labels per image), we resolve "multiple" labellers' labels by repeating the (image, labels) combination as many number of times as there are labellers. More clearly, if we have a dataset as follows:
        - $(x_1, (y_{11}, y_{12}, \cdots, y_{1k_1}))$
        - $(x_2, (y_{21}, y_{22}, \cdots, y_{2k_2}))$
        - $\vdots$
        - $(x_n, (y_{n1}, y_{n2}, \cdots, y_{nk_n}))$,
      
        having $n$ images, with the $i^\text{th}$ image $x_i$ having labelled by $k_i$ labellers giving labels $y_{i1}, y_{i2}, \cdots, y_{ik_i}$. To use YOLO, a usual object detection algorithm, we transform this data via the "repeated" (**r**) mode into $\sum_{i=1}^{n}{k_i}$ (data, label) combinations:
        - $(x_1, y_{11})$
        - $(x_1, y_{12})$
        - $\cdots$
        - $(x_1, y_{1k_1})$,
        - $(x_2, y_{21})$
        - $(x_2, y_{22})$
        - $\cdots$
        - $(x_2, y_{2k_2})$,
        - $\vdots$
        - $(x_n, y_{n1})$
        - $(x_n, y_{n2})$
        - $\cdots$
        - $(x_n, y_{nk_n})$.
        
      - To be able to make use of the crowdsourced labels fully, we use Crowd-YOLO, and this is when we go with the "crowdsourced" (**c**) mode.
- For example, when running YOLO (without BCC), we have the following mode settings:
  - Size mode: full (**sz=fu**)
  - Multiplicity mode: multiple (**ml=all**)
  - Train-Validation-Test mode:
    - Train on non-expert repeated (iid) data: **tr=ner**
    - Validation on non-expert repeated (iid) data: **va=ner**
    - Test on expert repeated (iid) data: **te=er**
- But when running CYOLO (i.e., Crowd-YOLO), we have the following mode settings:
  - Size mode: full (**sz=fu**)
  - Multiplicity mode: multiple (**ml=all**)
  - Train-Validation-Test mode:
    - Train on non-expert crowdsourced data: **tr=nec**
    - Validation on non-expert repeated (iid) data: **va=ner**
    - Test on expert repeated (iid) data: **te=er**
- The modes that we use:
  - `singletoy-yolo`: **sz_to.ml_si.tr_ner.va_ner.te_er** --> For a quick run on the toy data using labels from a single volunteer for YOLO.
  - `singlefull-yolo`: **sz_fu.ml_si.tr_ner.va_ner.te_er** --> For the full data using labels from a single volunteer for YOLO.
  - `alltoy-yolo`: **sz_to.ml_all.tr_ner.va_ner.te_er** --> For a quick run on the toy data using all volunteers' labels for YOLO.
  - `allfull-yolo`: **sz_fu.ml_all.tr_ner.va_ner.te_er** --> For the full data using all volunteers' labels for YOLO.
  - `singletoy-cyolo`: **sz_to.ml_si.tr_nec.va_ner.te_er** --> For a quick run on the toy data using labels from a single volunteer for CYOLO.
  - `singlefull-cyolo`: **sz_fu.ml_si.tr_nec.va_ner.te_er** --> For the full data using labels from a single volunteer for CYOLO.
  - `multipletoy-cyolo`: **sz_to.ml_all.tr_nec.va_ner.te_er** --> For a quick run on the toy data using all volunteers' labels for CYOLO.
  - `multiplefull-cyolo`: **sz_fu.ml_all.tr_nec.va_ner.te_er** --> For the full data using all volunteers' labels for CYOLO.

### Step 1: Prepare data

In [10]:
import sys
import os

In [11]:
PROJ_PATH = '..'
sys.path.append(PROJ_PATH)

In [12]:
from src.data_preparer import prepare_data

In [13]:
SRC_PATH = os.path.join(PROJ_PATH, 'src')
DATA_PATH = os.path.join(PROJ_PATH, 'data/datasets')

In [5]:
data_modes = ['sty', 'stcy']

In [6]:
train_ratios = (0.7, 0.2, 0.1)
data_path = DATA_PATH
for mode in data_modes:
    prepare_data(mode, train_ratios, data_path)

Exception: Path already exists. Not recreating data

In [19]:
data = '../data/cyolo.yaml'
batch_size = 20 # Change this to number of train images
epochs = 1
bcc_epoch = 0 # Involve BCC from epoch number "bcc_epoch". Set to -1 for no BCC. 0 for all BCC.

In [21]:
!python $SRC_PATH/train.py --data $data \
                           --batch-size $batch_size \
                           --epochs $epochs \
                           --bcc_epoch $bcc_epoch

[34m[1mtrain: [0mbcc_epochs=0, qtfilter_epoch=-1, qt_thres_mode=, qt_thres=0.0, hybrid_entropy_thres=0.0, hybrid_conf_thres=0.0, weights=yolov5s.pt, cfg=, data=../data/cyolo.yaml, hyp=../data/hyps/hyp.scratch.yaml, epochs=1, batch_size=20, imgsz=640, rect=False, resume=False, nosave=False, noval=False, noautoanchor=False, evolve=None, bucket=, cache=None, image_weights=False, device=, multi_scale=False, single_cls=False, adam=False, sync_bn=False, workers=8, project=runs/train, entity=None, name=exp, exist_ok=False, quad=False, linear_lr=False, label_smoothing=0.0, upload_dataset=False, bbox_interval=-1, save_period=-1, artifact_alias=latest, local_rank=-1, freeze=0, patience=1100
[34m[1mgithub: [0mskipping check (not a git repository), for updates see https://github.com/ultralytics/yolov5
[31m[1mrequirements:[0m /Users/gs0029/repos/dental_disease/src/requirements.txt not found, check failed.
YOLOv5 🚀 e27cbbd torch 1.9.0 CPU

Traceback (most recent call last):
  File "../src/t