# Convert Visual Object Tagging Tool (VoTT) labels to BRUVNet-COCO format

This notebook is designed to convert images annotated and labelled in VoTT to BRUVNet-COCO format with associated metadata. Some requirements:

* Images are stored in Azure Blob containers and connected to VoTT for annotation and labelling using polylines

When you've finished annotating and labelling, export images with the following settings. 

* VoTT export settings: 
  * Provider = VoTT .json file 
  * Asset State = Only Tagged Objects Assests (Include Images)

When all cells are run, the converted file, 'VOTT-COCO-Export.json', can be found in Azure Blob folder where the original images are stored. 

In [None]:
import pandas as pd
import json
import numpy as np
from PIL import Image, ImageDraw
import os, uuid,io
import cv2
import matplotlib.pyplot as plt
import matplotlib.patches as patches
from azure.storage.blob import BlobServiceClient, BlobClient, ContainerClient

## Enter Azure credentials

Provide azure storage account url and SAS with read, write permissions, that images and labels are stored in.

* source_url = blob storage url e.g. https://*mystorageaccount*.blob.core.windows.net
* source_SAS = SAS token for storage account images are in e.g. ?sv=2019-02-02&st=2019-04-29T22%3A18%3A26Z&se=2019-04-30T02%3A23%3A26Z&sr=b&sp=rw&sip=168.1.5.60-168.1.5.70&spr=https&sig=Z%2FRHIX5Xcg0Mq2rqI3OlWTjEg2tYkboXr1P9ZUXDtkk%3D


In [None]:
#Storage account URL and SAS key (with Read, Write (to write back the COCO format file to same location) and List permission on the container and blob)
source_url = ""
source_SAS = ""


### Specify Blob Container Name

Enter the name of the Azure blob container that the images and labels are stored in. 

i.e. source_container_client = source_blob_client.get_container_client('BLOB FOLDER NAME HERE')

In [None]:
source_blob_client = BlobServiceClient(account_url=source_url, credential=source_SAS)
source_container_client = source_blob_client.get_container_client('')

### Specify VoTT .json name

When exporting from VoTT a new folder is created in the azure blob called vott-json-export, containing all your labelled images and a .json file. Specify the directory with the .json file by ('vott-json-export/VOTTPROJECTNAME-export.json'). Change VOTTPROJECTNAME to the name of your VoTT project, this can be confirmed in the blob container. 

In [None]:
vott_export_blob = source_container_client.get_blob_client('vott-json-export/VOTTPROJECTNAME-export.json').download_blob()

In [None]:
vott_data = json.loads(vott_export_blob.readall())

In [None]:
coco_data = []

### Image licensing and dataset information

This adds license and data information, in COCO format, for the BRUVNet dataset. A creative commons license is used. Leave this section as default. 

In [None]:
info_dict = {}
info_dict["info"] =  {"description": "BRUVNet 2020 Dataset",
        "url": "https://www.bruvnet.org/",
        "version": "1.0",
        "year": 2020,
        "contributor": "BRUVNet.org",
        "date_created": "2020/08/01"
    }

In [None]:
license_dict = {}
license_dict["licenses"] = [
        {
            "url": "http://creativecommons.org/licenses/by/3.0/au/",
            "id": 1,
            "name": "Attribution 3.0 Australia (CC BY 3.0 AU)"
        }]

## Add BRUVNet Metadata

To allow subsetting and querying of the dataset, metadata is added that corresponds to the labels that you're converting to COCO format. 


* dataset_year = the year the images were collected (not labelled).
* site_type = name of the location images were collected e.g. Sydney Harbour, Shaw's Creek, Mudginberri Billabong.
* water_type = e.g. Freshwater, Marine or Estuarine
* habitat_type = the type of habitat images were collected from e.g. Lowland Billabong, Coral Reef, River.
* country = the country images were collected from.
* attribution = the organisation who collected or owns the images

In [None]:
##Additional Info for BRUVNet to enable tagging and search. Not accessed by COCO API tools

#Note: Takes 1 value for each assuming the batch of metadata is from a single source

# 1. Year captured
dataset_year = 2020

# 2. Site
site_type  =  "Georgetown Billabong"

# 3. Water Type
water_type = "Freshwater"

# 4. Habitat
habitat_type = "Lowland Billabong"

#Country
country = "Australia"

#Attribution
attribution = "Supervising Scientist"


In [None]:
year_dict = {}
year_dict["years"] = [
    {
        "name": dataset_year,
        "id":1
    }
]

In [None]:
site_dict = {}
site_dict["sites"] = [
    {
        "name": site_type,
        "id":1
    }
]

In [None]:

watertype_dict = {}
watertype_dict["waters"] = [
    {
        "name": water_type,
        "id": 1
    }]



In [None]:
habitat_dict = {}
habitat_dict["habitats"] = [
    {
        "name": habitat_type,
        "id":1
    }
]

In [None]:
country_dict = {}
country_dict["countries"] = [
    {
        "name": country,
        "id":1
    }
]

In [None]:
attribution_dict = {}
attribution_dict["attributions"] = [
    {
        "name": attribution,
        "id":1
    }
]

## Confirm Label Names

Run the cell below to read the species names in the dataset being converted. Use this to check the correct spelling and format before conversion. 

Fish species name should be in the following format

* Genus species e.g. Ambassis macleayi 

If the species name is spelt incorrectly it will conflict when uploading to BRUVNet Master dataset and the whole dataset will not be combined. It's important to ensure this is correct. 

If identification can only be made to genus ensure the species is included as either spp. or sp. for multiple or unnamed species respectively. See taxonomy of labelling on Readme for more information.

In [None]:
#Get all the labels in the dataset and create a categories dictionary to reference and append to the COCO format file
categories = []

cat_id = 1 #initiate this with the relevant annotation id

tags = []
for img in vott_data['assets']:
    for annot in vott_data["assets"][img]["regions"]:
        
        if annot["tags"][0] not in tags:
            
            cat_dict = {}
            tags.append(annot["tags"][0])
            cat_dict["id"] = cat_id
            cat_dict["name"] = annot["tags"][0]
            cat_dict["supercategory"] = "Fish Species"
            cat_id = cat_id + 1
            categories.append(cat_dict)

categories_dict= {}
categories_dict["categories"] = categories
print(categories_dict["categories"])


In [None]:
images = []
annotations = []

img_id = 1 #initiate this with the relevant image id
annot_id = 1 #initiate this with the relevant annotation id

for img in vott_data['assets']:
    img_dict = {}
    
    
    img_dict["id"] = img_id
    img_dict["width"] = vott_data['assets'][img]['asset']['size']['width']
    img_dict["height"] = vott_data['assets'][img]['asset']['size']['height']
    img_dict['file_name'] = vott_data['assets'][img]['asset']['name']
    img_dict['license'] = 1 #handle if multiple licenses
    img_dict["source_bruvnet_url"] = vott_data['assets'][img]['asset']['path']
    img_dict["coco_url"] =  vott_data['assets'][img]['asset']['path']
    img_dict["year_id"] = 1
    img_dict["site_id"] = 1
    img_dict["water_id"] = 1
    img_dict["habitat_id"] = 1
    img_dict["country_id"] = 1
    img_dict["attribution_id"] = 1
    

    

    ## Annotations
    for annot in vott_data["assets"][img]["regions"]:
        annot_dict = {}
        
        annot_dict["id"] = annot_id
        annot_dict["image_id"] = img_dict["id"]
        annot_dict["category_id"] = list(filter(lambda s: s['name'] in annot["tags"], categories_dict["categories"]))[0]['id']
        annot_dict["iscrowd"] = 0
        annot_dict["bbox"] = [annot["boundingBox"]["left"], annot["boundingBox"]["top"],annot["boundingBox"]["width"],annot["boundingBox"]["height"] ]
        annot_dict["area"] = annot["boundingBox"]["height"] * annot["boundingBox"]["width"]

        segmentation = []

        for seg in annot["points"]:
            segmentation.append(seg["x"])
            segmentation.append(seg["y"])

        annot_dict["segmentation"] = [segmentation]

        
        annot_id = annot_id+1
        annotations.append(annot_dict)

        
    
    
    img_id = img_id+1
    images.append(img_dict)

images_dict = {}
images_dict["images"] = images

annotations_dict = {}
annotations_dict["annotations"] = annotations




In [None]:
final_dict = {**info_dict,**images_dict,**annotations_dict,**license_dict,**categories_dict , **year_dict,**country_dict ,**site_dict, **watertype_dict, **habitat_dict, **attribution_dict}

In [None]:
final_dict.keys()

In [None]:
upload_blob_client = source_container_client.get_blob_client("VOTT-COCO-Export.json")

In [None]:
upload_blob_client.upload_blob(json.dumps(final_dict), blob_type="BlockBlob", overwrite= True)

In [None]:
print("total images:", len(final_dict["images"]))
print("total annotations:", len(final_dict["annotations"]))
