# Image annotation script

This notebook will serve as a prototype to create a script to annotate all the images of Terzani collection

## Installing the client library

If the Google cloud vision library is not installed already, install it.

If you have python environment use

```shell
pip install --upgrade google-cloud-vision
```

If you have conda environment use

```shell
conda install -c conda-forge google-cloud-vision
```

## Installing other libraries

Install `dotenv` to get the environment variables

If you have python environment use

```shell
pip install python-dotenv
```

If you have conda environment use

```shell
conda install -c conda-forge python-dotenvn
```

pip install python-dotenv

## Import the libraries

In [1]:
## Import the standard libraries
import os, io, pickle, random, json
## Import Vison API related libraries
from google.cloud import vision
from google.cloud.vision import types
## Import dotenv library to get environment variables
from dotenv import load_dotenv
# Import urllib to read images
import urllib.request as ur
# Import pymango to inset data into mangodb
import pymongo
from tqdm import tqdm

## Setup the service account credentials to use the API

In [2]:
load_dotenv()

GOOGLE_APPLICATION_CREDENTIALS = os.getenv('GOOGLE_APPLICATION_CREDENTIALS')
os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = GOOGLE_APPLICATION_CREDENTIALS

MANGO_CLIENT_URI = os.getenv('MONGO_URI')
os.environ['MANGO_CLIENT_URI'] = MANGO_CLIENT_URI

## Creation of Client to access the API

In [3]:
client = vision.ImageAnnotatorClient()

## Image selection

As this is a prototyping script we shall select 10 images randomly each from the color and monochrome photos (using the already created pickle files).

Generic class to store an image and its IIIF representation

In [4]:
class Terzani_Photo(object):
    def __init__(self, iiif, photo):
        self.iiif = iiif
        self.photo = photo
        
    def get_photo_link(self):
        return self.iiif["images"][0]["resource"]["@id"]

In [5]:
number_of_images = 10

# loading the color photos
color_photos = pickle.load(open("terzani_recto_iiif_color.pickle", "rb" ))
# randomly selecting 10 images
color_photos = random.sample(color_photos, number_of_images)

# loading the monochrome photos
bw_photos = pickle.load(open("terzani_recto_iiif_color.pickle", "rb" ))
# randomly selecting 10 images
bw_photos = random.sample(bw_photos, number_of_images)

all_photos = color_photos + bw_photos

## Calling the Vision API

In [6]:
annotated_images = dict() # We store information about each image in a dictionary to later transform into json.
failed_images = dict() # We store information about images that are failed to be annotated by google api.
for img in tqdm(all_photos):

    # if the image is already not present in the either annotated and failed dictionaries
    if img.iiif["label"] not in annotated_images and img.iiif["label"] not in failed_images:

        img_lbl = img.iiif["label"]
                
        image_data = ur.urlopen(img.get_photo_link()).read()
        image = types.Image(content=image_data)
        
        # call the goole vision api to get the annotations of various types
        response = client.annotate_image({
            'image': image,
            'features': [{'type': vision.enums.Feature.Type.LANDMARK_DETECTION}, 
                         {'type': vision.enums.Feature.Type.LOGO_DETECTION},
                         {'type': vision.enums.Feature.Type.LABEL_DETECTION},
                         {'type': vision.enums.Feature.Type.TEXT_DETECTION},
                         {'type': vision.enums.Feature.Type.OBJECT_LOCALIZATION},
                         {'type': vision.enums.Feature.Type.WEB_DETECTION}],})
        
        # check if there is any error returned by the api
        if response.error.code != 0:
            failed_images[img_lbl] = {}
            failed_images[img_lbl]["error"] = [response.error.code, response.error.message]
        else:
            annotated_images[img_lbl] = {}
        
            # store the iiif description
            annotated_images[img_lbl]["iiif"] = img.iiif
            
            tags = list() # A list to store tags for the image originating from label and web detection

            # We store the labels and webentities in a list called tags

            tags.extend([lbl.description for lbl in response.label_annotations])
            tags.extend([weben.description for weben in response.web_detection.web_entities])

            # store the generated tags into the dictionary.
            # The list is made into set and converedted back into to list to eliminate any duplicate tags 
            annotated_images[img_lbl]["tags"] = list(set(tags))

            obj_boxes = {} # this dictionary will store the information of annotations along with bounding boxes.
            # The key will the the name to identify the annotation and the value be a list of list of tuples with coordinates
            # for the bounding box. It would be a list of list to store coordinates for different boxes for same tag

            for lndmk in response.landmark_annotations:
                if lndmk.description not in obj_boxes:
                    obj_boxes[lndmk.description] = list()
                ulx, uly, box_width, box_height = None, None, None, None
                ulx, uly = lndmk.bounding_poly.vertices[3].x, lndmk.bounding_poly.vertices[3].y
                box_width = abs(lndmk.bounding_poly.vertices[2].x - lndmk.bounding_poly.vertices[3].x)
                box_height = abs(lndmk.bounding_poly.vertices[2].y - lndmk.bounding_poly.vertices[1].y)
                if (ulx and uly and box_width and box_height) is not None:
                    vert = [ulx, uly, box_width, box_height] 
                    obj_boxes[lndmk.description].append(vert)    

            for lgo in response.logo_annotations:
                if lgo.description not in obj_boxes:
                    obj_boxes[lgo.description] = list()
                ulx, uly, box_width, box_height = None, None, None, None
                ulx, uly = lgo.bounding_poly.vertices[3].x, lgo.bounding_poly.vertices[3].y
                box_width = abs(lgo.bounding_poly.vertices[2].x - lgo.bounding_poly.vertices[3].x)
                box_height = abs(lgo.bounding_poly.vertices[2].y - lgo.bounding_poly.vertices[1].y)
                if (ulx and uly and box_width and box_height) is not None:
                    vert = [ulx, uly, box_width, box_height]
                    obj_boxes[lgo.description].append(vert)

            if len(response.localized_object_annotations) > 0:
                img_width, img_height = img.iiif["width"], img.iiif["height"]
            for lobj in response.localized_object_annotations:
                if lobj.name not in obj_boxes:
                    obj_boxes[lobj.name] = list()
                ulx, uly, box_width, box_height = None, None, None, None
                ulx, uly = lobj.bounding_poly.normalized_vertices[3].x * img_width, lobj.bounding_poly.normalized_vertices[3].y * img_height
                box_width = abs(lobj.bounding_poly.normalized_vertices[2].x - lobj.bounding_poly.normalized_vertices[3].x) * img_width
                box_height = abs(lobj.bounding_poly.normalized_vertices[2].y - lobj.bounding_poly.normalized_vertices[1].y) * img_height
                if (ulx and uly and box_width and box_height) is not None:
                    vert = [ulx, uly, box_width, box_height]
                    obj_boxes[lobj.name].append(vert)

            for txt in response.text_annotations:
                modified_text = txt.description.replace(".", "_")
                if modified_text not in obj_boxes:
                    obj_boxes[modified_text] = list()
                ulx, uly, box_width, box_height = None, None, None, None
                ulx, uly = txt.bounding_poly.vertices[3].x, txt.bounding_poly.vertices[3].y
                box_width = abs(txt.bounding_poly.vertices[2].x - txt.bounding_poly.vertices[3].x)
                box_height = abs(txt.bounding_poly.vertices[2].y - txt.bounding_poly.vertices[1].y)
                if (ulx and uly and box_width and box_height) is not None:
                    vert = [ulx, uly, box_width, box_height]
                    obj_boxes[modified_text].append(vert)
            # store the generated object boxes into the dictionary.
            annotated_images[img_lbl]["obj_boxes"] = obj_boxes

100%|██████████████████████████████████████████████████████████████████████████████████| 20/20 [01:40<00:00,  5.01s/it]


## Saving the dictionaries to JSON file

In [7]:
with open('annotated_images.json', 'w') as fp:
    json.dump(annotated_images, fp, indent=4)

if len(failed_images) > 0:
    print("There are failed images")
    with open('failed_images.json', 'w') as fp:
        json.dump(failed_images, fp, indent=4)

## Inserting the data into Mangodb

In [8]:
# creating a client to work with mango db
mangoclient = pymongo.MongoClient(MANGO_CLIENT_URI)

In [10]:
# selecting the <terzani_photos> database
mango_db = mangoclient["terzani_photos"]
# creating a new collection named <sample_annotations>
mango_collection = mango_db["sample_annotations"]
# inserting the dictionary into the db
for label, annotations in annotated_images.items():
    x = mango_collection.insert_({label:annotations})

{'iiif': {'@id': 'http://dl.cini.it/oa/items/66816/canvas.json', 'label': 'T4_115_recto', '@type': 'sc:Canvas', 'width': 2838, 'height': 3887, 'images': [{'@id': 'http://dl.cini.it/oa/files/65776/anno.json', 'motivation': 'sc:painting', '@type': 'oa:Annotation', 'resource': {'@id': 'http://dl.cini.it/files/original/da866fb0287bf4b3eb6aa9f4d80ef427.jpg', '@type': 'dctypes:Image', 'format': 'image/jpeg', 'width': 2838, 'height': 3887, 'service': {'@id': 'http://dl.cini.it:8080/digilib/Scaler/IIIF/da866fb0287bf4b3eb6aa9f4d80ef427.jpg', '@context': 'http://iiif.io/api/image/2/context.json', 'profile': 'http://iiif.io/api/image/2/level2.json'}}, 'on': 'http://dl.cini.it/oa/items/66816/canvas.json'}], 'metadata': [{'label': 'Identifier', 'value': '97eb09a1-5410-487b-adb6-f8c704097391'}, {'label': 'notes', 'value': 'T4_115_recto'}, {'label': 'originalFileName', 'value': 'T4_115_recto.jpg'}, {'label': 'filename', 'value': '229.jpg'}, {'label': 'fileMD5', 'value': '16344d3a2ae7b1ce10ee7c707672e