# Detect the fishes in the images
This notebook detects the fishes in images. This requires a trained YOLOv3 network.

## Setup GPU
When using Google Colab, setup the hardware accelerator to use a GPU by:

**Edit** > **Notebook settings** > **Hardware accelerator**.

![Change Colab to use GPU](colab_gpu.png)

## Install software
To install the software on your own computer, follow the steps provided in the [readme](https://github.com/Rick-v-E/automatic_discard_registration/blob/master/README.md). If running on Google Colab, clone the GIT repository and install it's dependencies:

In [None]:
%%shell

# Check if the repository is already available, if not, clone and install
if [ ! -d .git ]
then
  git clone https://github.com/Rick-v-E/automatic_discard_registration.git
  pip install -r automatic_discard_registration/requirements.txt
  pip install -r automatic_discard_registration/detection/yolov3/requirements.txt
  pip install automatic_discard_registration/detection/apex
  pip install gdown
else
  git pull
fi

If you installed the software in the previous step, enter the repository:

In [None]:
%cd automatic_discard_registration

## Setup dataset
The complete dataset can be downloaded from [4TU.ResearchData](https://doi.org/10.4121/16622566.v1). To use this dataset, extract both `fdf_images.zip` and `results.zip` in the [data](https://github.com/Rick-v-E/automatic_discard_registration/tree/master/data) folder.

For use on Google Colab, we have created a smaller subset of the data. This dataset contains only part of the images of the complete dataset, but contains all result from the complete dataset.

---
**IMPORTANT**

Execute only one of the three cells below! Each cell contains a method to import the data, if one method fails, use another method. If the method succeed, go to the next section in this notebook.

---

**METHOD 1** Download and extract the sample dataset (this will take around 5-10 minutes):

In [None]:
!gdown --id 1TcyeeX0UjhWldbjhLkCRJIuktDNeAMJJ
!unzip -q fdf_sample_dataset.zip -d data
!rm fdf_sample_dataset.zip

**METHOD 2** Download the [sample dataset](https://drive.google.com/file/d/1TcyeeX0UjhWldbjhLkCRJIuktDNeAMJJ/view?usp=sharing) manually and upload it to Google Colab in the `automatic_discard_registration` opening the files tab and right click on the folder name:

![Manual upload image](colab_manual_upload.png)

After uploading, extract the dataset:

In [None]:
!unzip -q fdf_sample_dataset.zip -d data
!rm fdf_sample_dataset.zip

**METHOD 3** Download the [sample dataset](https://drive.google.com/file/d/1TcyeeX0UjhWldbjhLkCRJIuktDNeAMJJ/view?usp=sharing) and upload it to your personal Google Drive account. Connect this account to Google Colab:

In [None]:
from google.colab import drive
drive.mount('/content/drive')

!unzip -q ../drive/MyDrive/fdf_sample_dataset.zip -d data

Check if the dataset is loaded correctly:

In [None]:
from pathlib import Path

DATA_PATH = Path("data")
NEEDED_FOLDERS = ["fdf_images", "results"]

# Check if all folders are correct
if not all([(DATA_PATH / f).is_dir() for f in NEEDED_FOLDERS]):
    print("Could not find all data folders! Did you extract both fdf_images.zip and results.zip in the data folder?")

To get the same results as in the paper, use the complete dataset. Upload the dataset to your Google Drive and [mount](https://towardsdatascience.com/downloading-datasets-into-google-drive-via-google-colab-bcb1b30b0166) this folder to your Google Colab environment. 

## Setup detection notebook
Start by loading the dependencies:

In [None]:
%matplotlib inline

import cv2
import torch
import warnings

from pathlib import Path
from tqdm.notebook import tqdm
from matplotlib import pyplot as plt

from detection import setup_paths
from detection.detect import FDFDetector
from common.io import load_image_file_names, write_detections_to_json
from common.nb_utils import show_random_image_with_detection

warnings.filterwarnings("ignore", category=UserWarning)

Check if we have a GPU available:

In [None]:
if not torch.cuda.is_available():
    print("No GPU device found! If you are working on Google Colab, make sure that you select the GPU hardware accelerator in the notebook settings.")

Next, load all images in the validation and / or the test image folder:

In [None]:
validation_images_path = Path("data/fdf_images/images/validation")
test_images_path = Path("data/fdf_images/images/test")

assert validation_images_path.is_dir()
assert test_images_path.is_dir()

validation_images = load_image_file_names(validation_images_path)
test_images = load_image_file_names(test_images_path)

print(f"Loaded { len(validation_images) } validation and { len(test_images) } test images...")

There are four trained weights files available in the dataset:

| Filename                     | Number of synthetic images |
|------------------------------|----------------------------|
| weights_0_synthetic_images   | 0                          |
| weights_50_synthetic_images  | 50                         |
| weights_100_synthetic_images | 100                        |
| weights_200_synthetic_images | 200                        |

For all 4 files, we should detect the fishes in the validation dataset. Based on the performance in the validation dataset, we choose the best working model and detect the fishes in the test dataset. 

Now, create a dictionary with all the files needed for the detection:

In [None]:
path_dict = {
    "weights_file_0": Path("data/results/model_weights/weights_0_synthetic_images.pt"),
    "weights_file_50": Path("data/results/model_weights/weights_50_synthetic_images.pt"),
    "weights_file_100": Path("data/results/model_weights/weights_100_synthetic_images.pt"),
    "weights_file_200": Path("data/results/model_weights/weights_200_synthetic_images.pt"),
    "cfg_file": Path("detection/yolov3-spp-fdf.cfg"),
    "names_file": Path("detection/fish_classes.names")
}

assert all(f.is_file() for f in path_dict.values())

## Detect fishs in image
Loop over all validation images, detect the fishes and save the results to a `.json` file. Do this for the models with 0, 50, 100 and 200 synthetic images.

In [None]:
for n_synthetic_images in tqdm([0, 50, 100, 200], desc="Models"):
    detector = FDFDetector(path_dict[f"weights_file_{ n_synthetic_images }"], path_dict)
    detection_dict = {}
    for image_name, image_path in tqdm(validation_images.items(), desc="Detecting fishes", leave=False):
        image = cv2.imread(str(image_path))
        detection_dict[image_name] = detector.detect(image, image_path)

    output_file = Path(f"data/validation_detections_{ n_synthetic_images }_synthetic_images.json")
    write_detections_to_json(output_file, detection_dict)

Check two output images of the last model with 200 synthetic images:

In [None]:
f, axarr = plt.subplots(1, 2, figsize=(20,20))
show_random_image_with_detection(validation_images, detection_dict, axarr[0])
show_random_image_with_detection(validation_images, detection_dict, axarr[1])
plt.show()

We know that 200 synthetic images works best (see [evaluation script](https://github.com/Rick-v-E/automatic_discard_registration/blob/master/evaluate.ipynb)), so we use that model. If you want, you can select one of the other models. Now, detect the fish in the images:

In [None]:
detector = FDFDetector(path_dict[f"weights_file_200"], path_dict)
detection_dict = {}
for image_name, image_path in tqdm(test_images.items(), desc="Detecting fishes", leave=False):
    image = cv2.imread(str(image_path))
    detection_dict[image_name] = detector.detect(image, image_path)

output_file = Path("data/test_detections_200_synthethic_images.json")
write_detections_to_json(output_file, detection_dict)

Plot two images with detected fishes:

In [None]:
f, axarr = plt.subplots(1, 2, figsize=(20,20))
show_random_image_with_detection(test_images, detection_dict, axarr[0])
show_random_image_with_detection(test_images, detection_dict, axarr[1])
plt.show()

The detection results (position of bounding box, objectness and class confidences) of the test dataset are saved in `data/test_detections_200_synthethic_images.json`. For the other notebooks, we use 'our' results file.