# Create synthetic data
This notebook shows how to create synthetic data.

## Install software
To install the software on your own computer, follow the steps provided in the [readme](https://github.com/Rick-v-E/automatic_discard_registration/blob/master/README.md). If running on Google Colab, clone the GIT repository and install it's dependencies:

In [None]:
%%shell

# Check if the repository is already available, if not, clone and install
if [ ! -d .git ]
then
  git clone --recurse-submodules https://github.com/WUR-ABE/automatic_discard_registration.git
  pip install -r automatic_discard_registration/requirements.txt
  pip install -r automatic_discard_registration/detection/yolov3/requirements.txt
  pip install automatic_discard_registration/detection/apex
else
  git pull
fi

If you installed the software in the previous step, enter the repository:

In [None]:
%cd automatic_discard_registration

## Generate synthetic images
The synthetic images can be generated using the [synthetic image generator](https://github.com/Rick-v-E/automatic_discard_registration/tree/master/synthetic_image_generator). First import the dependencies:

In [None]:
%matplotlib inline

import cv2
import matplotlib.pyplot as plt
import numpy as np

from collections import Counter
from tqdm.notebook import tqdm
from datetime import datetime
from pathlib import Path

from common.nb_utils import show_image
from synthetic_image_generator.annotation_generator import AnnotationGenerator
from synthetic_image_generator.image_generator import ImageGenerator

The image generator uses (manually) segmented images from the training data set and are saved the [images.zip](https://github.com/Rick-v-E/automatic_discard_registration/blob/master/synthetic_image_generator/images.zip) file. The fish species in the image are drawed from a normal distribution. The probability of drawing a specific specie is based on the frequence of the specie in the dataset. The probability of drawing a less frequent fish specie is higher than the probability of drawing a high frequent fish specie:

In [None]:
fish_probabilities = {
  "common_sole": 0.05,
  "dab": 0.15,
  "gurnard": 0.15,
  "lemon_sole": 0.10,
  "lesser_spotted_dogfish": 0.10,
  "plaice": 0.05,
  "pouting": 0.15,
  "ray": 0.05,
  "turbot": 0.15,
  "whiting": 0.05,
}

# Make sure that the sum adds up to 1.0
assert sum(fish_probabilities.values()) == 1.0

The number of fishes on each image is drawn from a normal distribution with a mean and standard deviation:

In [None]:
n_fish_mean = 6
n_fish_std = 2

Define the number of images to generate:

In [None]:
n_images = 50

Create the output folders:

In [None]:
data_folder = Path("data")

color_folder = data_folder / "synthetic/color"
depth_folder = data_folder / "synthetic/data"
annotation_folder = data_folder / "synthetic/annotation"

color_folder.mkdir(exist_ok=True, parents=True)
depth_folder.mkdir(exist_ok=True, parents=True)
annotation_folder.mkdir(exist_ok=True, parents=True)

Create the synthetic images:

In [None]:
# Create generators
annotation_generator = AnnotationGenerator()
image_generator = ImageGenerator(annotation_generator=annotation_generator)

# Get the number of fishes in each image from a distribution
number_distribution = np.random.normal(n_fish_mean, n_fish_std, size=n_images)

for n_fish in tqdm(number_distribution, desc="Generating images"):
    # Choose the fishes randomly
    choice = np.random.choice(
        list(fish_probabilities.keys()),
        size=round(n_fish),
        replace=True,
        p=list(fish_probabilities.values()),
    )
    counts = Counter(choice)

    # Create image
    image_name = datetime.now().strftime("%Y%m%d_%H%M%S_%f")[:-3]
    color_image_path = color_folder / (image_name + "_RGB.png")
    depth_image_path = depth_folder / (image_name + "_depth.png")
    color_image, depth_image = image_generator.generate_image(counts)

    # Write image file
    cv2.imwrite(str(color_image_path), color_image)
    cv2.imwrite(str(depth_image_path), depth_image)

    # Write annotation file
    annotation_json_path = annotation_folder / (image_name + "_RGB.json")
    annotation_generator.write_annotation_file(annotation_json_path, color_image_path)

Show two random generated synthetic images:

In [None]:
s_files = list(color_folder.glob("*.png"))
f, axarr = plt.subplots(1, 2, figsize=(20,20))
show_image(cv2.imread(str(s_files[np.random.randint(0, len(s_files))])), axarr[0])
show_image(cv2.imread(str(s_files[np.random.randint(0, len(s_files))])), axarr[1])
plt.show()

To use the generated synthetic images, download the data from `data/synthetic` and paste it in one of the images folders in `data/fdf_images/images` in the next notebooks. 