# Iterative Annotation Process with ipyannotator

This tutorial demonstrates how you can build an annotated dataset for road damage classification without ever leaving the
jupyter notebook / lab. We do this in three steps:

1. Use bounding box annotation to crop the orignal images.
2. Group the damage type in groups using classification labels.
3. Refine the inital class labels in a supervision step.

This steps can be applied iteratively for practical applications and significantly speed up by integrating the predictions of imperfect machine learning models.
For example we might train an image classification model on the first annotations and refine it's prediction on new data to increase the training data and repead the process again.

**Install @jupyter-widgets/jupyter-manager. To do this, go to puzzle/jigsaw symbol and install the widget manually.**

In [None]:
%load_ext autoreload
%autoreload 2

In [None]:
#all_slow

## Get Road Damage Images from BigData Cup 2020

First we need to retrieve some images from which we can build or data set. Fortunately the [Global Road Damage Detection Challenge 2020](https://rdd2020.sekilab.global/data/) provides
a freely usable images that we can download (Please cite [the paper](https://github.com/sekilab/RoadDamageDetector#citation) if you use this for your own work).

### Linux
The following commandas will download and prepare the images.

In [None]:
! wget https://mycityreport.s3-ap-northeast-1.amazonaws.com/02_RoadDamageDataset/public_data/IEEE_bigdata_RDD2020/test1.tar.gz

In [None]:
! tar -xf test1.tar.gz

In [None]:
! du -h test1/

In [None]:
! mkdir road_japan

In [None]:
! mv test1/Japan/images road_japan/images

In [None]:
! rm -r test1.tar.gz test1

### Other OS

Please complete the following manual steps.

- download the test1 dataset from https://github.com/sekilab/RoadDamageDetector#dataset-for-global-road-damage-detection-challenge-2020
- unpack the `*.tar.gz` file
- create a new folder `road_japan` next to this notebook
- move the folder `test1/Japan/images` into `road_japan`

## 1) Use bounding box annotation to crop the orignal imges.

We can now use the BBoxAnnotator to quickly inspect the available images.

In [None]:
from ipyannotator.bbox_annotator import BBoxAnnotator

In [None]:
bb = BBoxAnnotator(project_path='road_japan', image_dir='images', canvas_size=(500, 500))

In [None]:
bb

Let's now create a inital set of road damage images by using the mouse to draw a reactangle containing
the damage on individual images. Below you seed the annotation for a single image.

In [None]:
img_path, bbox = list(bb.to_dict().items())[0]; print(img_path); bbox

We can now use the bounding box annotations to crop damages from the images and save them in a seperate folder.
The following small function helps us to accomplish this.

In [None]:
from PIL import Image
from pathlib import Path

def crop_bboxs(bbox_annotations, source_dir, target_dir):
    Path(target_dir).mkdir(parents=True, exist_ok=True)
    for img_file, b in bbox_annotations.items():
        im = Image.open(Path(source_dir)/img_file)
        # box=(left, upper, right, lower)
        box_crop = (b['x'], b['y'], b['x'] + b['width'], b['y'] + b['height'])
        Image.open(Path(source_dir)/img_file).crop(box_crop).save(Path(target_dir)/img_file, quality=95)

In [None]:
crop_bboxs(bbox_annotations=bb.to_dict(), source_dir='road_japan/images', target_dir='road_japan/images_croped')

Below you can see the croping result from the bbox annotation above.

In [None]:
Image.open(Path('road_japan/images_croped')/img_path)

We first need to decide on the type of damages we want to classify. If we don't know this upfront we can initially create some
dummy labels.

In [None]:
# create dummy label_images
crop_bboxs(bbox_annotations={k: v for k, v in list(bb.to_dict().items())[:4]}, source_dir='road_japan/images', target_dir='road_japan/class_images')

## 2) Group the damage type in groups using classification labels.

1. We can now use the `Im2ImAnnotator` to quickly explore the cropped images in order to find some typical damage types we are interested in. 
  * The competition list the following types {D00: Longitudinal Crack, D10: Transverse Crack, D20: Aligator Crack, D40: Pothole}.
  * Hint: Check out https://en.wikipedia.org/wiki/Pavement_cracking to find some typical crack types.
2. Select a representative example for each damage type your are interested in and move the file to `road_japan/class_images`.
  * remove the existing dummy images first
  * give the image a nice name illustrative name such as aligator_crack.jpg. The file name is used to create the class labels.
3. Label the images by selecting one or more labels on the right side below "Damage Types".

In [None]:
from ipyannotator.im2im_annotator import Im2ImAnnotator

In [None]:
# im2im_full = Im2ImAnnotator('../data/projects/im2im1', 'pics', 150,100, 50, 50, 2, 5, question="HelloWorld")

im2im = Im2ImAnnotator('road_japan', 'images_croped', 600, 500, 100, 100, 2, 3, question="Damage Types")
im2im

We can now check the class labels that we have just created and save them to a json file.

In [None]:
from IPython import display
display.JSON(im2im.to_dict())

In [None]:
import json
with open('road_japan/classification_labels.json', 'w') as outfile:
    json.dump(im2im.to_dict(), outfile)

## 3. Refine the inital class labels in a supervision step.

When the data have been labeled initially, supervision is a great way to further improve the data quality by reviewing annotations generated by hand or a
machine learning model.

In [None]:
# text example data
# image_annotations = {'Japan_000342.jpg': ['Japan_010778.jpg'],
#  'Japan_000541.jpg': ['Japan_011190.jpg'],
#  'Japan_001155.jpg': ['Japan_003206.jpg', 'Japan_010778.jpg'],
#  'Japan_002337.jpg': ['Japan_001155.jpg'],
#  'Japan_003206.jpg': ['Japan_011190.jpg'],
#  'Japan_005979.jpg': ['Japan_010778.jpg'],
#  'Japan_006775.jpg': ['Japan_003206.jpg'],
#  'Japan_007389.jpg': ['Japan_003206.jpg'],
#  'Japan_010778.jpg': ['Japan_003206.jpg', 'Japan_010778.jpg'],
#  'Japan_011190.jpg': ['Japan_001155.jpg', 'Japan_010778.jpg'],
#  'Japan_012213.jpg': ['Japan_011190.jpg']}

We can now use the priviously generated class label to group the images by class.

In [None]:
with open('road_japan/classification_labels.json') as infile:
    image_annotations= json.load(infile)

In [None]:
from collections import defaultdict

def group_files_by_class(annotations):
    grouped = defaultdict(list)
    for file, labels in annotations.items():
        for class_ in labels:
            grouped[class_].append(file)
    return grouped

We have the following classes.

In [None]:
classes_to_files = group_files_by_class(image_annotations); classes_to_files.keys()

We can pick any class to start the supervision, we just take the first one here.

In [None]:
selected_class = list(classes_to_files.keys())[0]; selected_class

In [None]:
# classes_to_files[selected_class]

In [None]:
from ipyannotator.capture_annotator import CaptureAnnotator

In [None]:
html_question = 'Select images that don\'t belong to class <span style="color: red;">{}</span>'.format(selected_class)

The annotator now shows us a grid of images annotated as belonging to the same class. You can now quickly click through
this batches and select the images that belong in a different class.

In [None]:
ca = CaptureAnnotator('road_japan', 'images_croped', 150, 150, 2, 2,
                      question=html_question,
                      filter_files=classes_to_files[selected_class])
ca

You can repeat this process for each class and then reclassify the wrong labels in a later step. The Capture annotator is most useful when you already
have imperfect image classifications for example form an pretrained model or you have many less experienced annotators and a limited amount of experts to check there work. 

In [None]:
ca.to_dict()

## Conclusion

This short tutorial has demonstrated how annotation UI's already included in ipyannotator can be used to quickly annotate images.
Clearly these a very simple examples and the real power of using the ipyannotator concept lays in building project specific UI's.
Check out the other notebooks to get inspired how this can be done.