# Sample ML Project with Labelbox

<b>Overview</b>
* Retrieve Data from Labelbox
* Transform Data
* Pre-Process
* Train Model
* Optional - Upload Predictions to Labelbox Model Diagnostic tool
* Optional - Upload Predictions for MAL

### Usage
- <b>Model Training</b>:
  * Set a project ID containing polygons and segmentation labels
  * Polygons will train the instance segmentation head
  * Segmentation labels will train the semantic segmentation head
- <b>Diagnostics</b>:
  * No additional configuration is necessary. As long as the model has been   
- <b>MAL</b>:
  * Set a dataset ID for the dataset you would like to upload predictions to. A new project will automatically be created.
trained this will work.

### Suggested Workflow
* To get the most out of Labelbox, we suggest training a model on a small amount of data, exploring model performance using diagnostics, selecting a dataset using catalog to address model shortcomings, make any model architecture adjustments, and then upload predictions via MAL made on this new dataset for faster labeling.


## Setup

Basic Setup with installation of libraries that are important for Model building and working with Labelbox

In [None]:
!pip install -q torch \
                torchvision \
                tensorflow
!pip install -q "git+https://github.com/Labelbox/labelbox-python@ms/coco#egg=labelbox[data]"

## Retrieve Data from Labelbox

Insert the API Key that can be generated: See more https://docs.labelbox.com/docs/create-an-api-key 

In [None]:
API_KEY = None
# For training:
project_id = ""
# The model will make predictions on the following dataset 
# and upload predictions to a new project for model assisted labeling.
mal_dataset_id = ""

In [None]:
import json
import os
import uuid
import random
import functools
import cv2
import numpy as np
from PIL import Image
import torch
import requests

from labelbox.schema.model import Model
from labelbox.data.metrics.group import get_label_pairs
from labelbox import LabelingFrontend, OntologyBuilder, Client
from labelbox.data.metrics.iou import data_row_miou
from labelbox.data.serialization import COCOConverter, NDJsonConverter
from labelbox.data.annotation_types import (
    Point,
    Polygon,
    Mask, 
    Label,
    Rectangle, 
    Polygon,
    LabelList,
    ImageData,
    MaskData,
    ObjectAnnotation
)

#Feel free to insert any additional Imports as they are necessary for your Model Building. 

In [None]:
client = Client(api_key = API_KEY)

## Optional Config:
* `project_id` - Indicates which project labels should be exported from.
* `mal_dataset_id` - Dataset to use for MAL. We will create a new project in this notebook.
* `image_root` - Where to write images to on disk
* `mask_root` - Where to masks to on disk
* `seg_masks_root` - Where to write the semantic segmentation masks
* `train_json_instance_path` - Where the train partition of the instance data will be written
* `train_json_panoptic_path` - Where the train partition of the panoptic data will be written
* `test_json_instance_path` - Where the test partition of the instance data will be written
* `test_json_panoptic_path` - Where the test partition of the panoptic data will be written
* `train_test_split` - How much of the data to add to each parition (by percent)

In [None]:
image_root = "<insertPath>"
mask_root = "<insertPath>"
seg_masks_root = "<insertPath>"
train_json_instance_path = '<insertPath>
train_json_panoptic_path = "<insertPath>"
test_json_instance_path = '<insertPath>'
test_json_panoptic_path = "<insertPath>"
train_test_split = [0.8, 0.2]
train_ds_name = "<insertName>"
test_ds_name = "<insertName>"

model_name = "<insertProjectName>"

proj = client.get_project(project_id)
labels = proj.label_generator().as_list()

In [None]:
# Set some labels aside for the val set.
raw_data = labels._data
labels = LabelList(raw_data[100:])
val_labels = LabelList(raw_data[:100]) 

For More Information on how to download data see here: https://docs.labelbox.com/docs/export-labels

In [None]:
#Insert Code here to download and utilize the existing SDK to retrieve the Labelbox Data. 

## Transform Data

This operation transforms the given image on the basis of the transform vector given by the user. 

https://colab.research.google.com/github/tensorflow/addons/blob/master/docs/tutorials/image_ops.ipynb#scrollTo=uheQOL-y0Fj3

In [None]:
#Insert the Transformation for the Data that is required here. 

## Preprocessing Data

* Read image
* Resize image 
* Remove noise(Denoise)
* Segmentation
* Morphology(smoothing edges)


For More examples see here: https://colab.research.google.com/github/Blaizzy/BiSeNet-Implementation/blob/master/Preprocessing.ipynb#scrollTo=pD3e08HvWsF3

In [None]:
#Insert the Pre Processing Code here. 

## Train a Model

In [None]:
#This is where the Magic happens. Depends on the Requirements feel free to add the code here. 

## Test the Model

In [None]:
#Test the model with the predictions that are created. 

## Optional - Upload Predictions to Labelbox Model Diagnostic tool 

For more info: https://docs.labelbox.com/recipes/create-a-dataset
Additional Example: https://colab.research.google.com/github/Labelbox/labelbox-python/blob/develop/examples/integrations/detectron2/coco_panoptic.ipynb

In [None]:
labels_mea = LabelList()
with ThreadPoolExecutor(4) as executor:
    futures = [executor.submit(get_label,label.data) for label in val_labels]
    for future in tqdm(as_completed(futures)):
        labels_mea.append(future.result())

labels_mea.add_url_to_masks(signer) \
      .add_url_to_data(signer) \
      .assign_feature_schema_ids(OntologyBuilder.from_project(proj))

In [None]:
# If the model already exists fetch it with the following:

model = next(client.get_models(where = Model.name == model_name), None)
if model is None:
    model = client.create_model(model_name, ontology_id=proj.ontology().uid)


# Increment model run version if it exists. Otherwise use the initial 0.0.0
model_run_names = [model_run.name for model_run in model.model_runs()]
if len(model_run_names):
    model_run_names.sort(key=lambda s: [int(u) for u in s.split('.')])
    latest_model_run_name = model_run_names[-1]
    model_run_suffix = int(latest_model_run_name.split('.')[-1]) + 1
    model_run_name = ".".join([*latest_model_run_name.split('.')[:-1], str(model_run_suffix)])
else:
    model_run_name = "0.0.0"

print(f"Model Name: {model.name} | Model Run Version : {model_run_name}")
model_run = model.create_model_run(model_run_name)
model_run.upsert_labels([label.uid for label in val_labels])

In [None]:
upload_task = model_run.add_predictions(f'diagnostics-import-{uuid.uuid4()}', NDJsonConverter.serialize(labels_mea))
upload_task.wait_until_done()
print(upload_task.state)
print(upload_task.errors)

In [None]:
for idx, model_run_data_row in enumerate(model_run.model_run_data_rows()):
    if idx == 5:
        break
    print(model_run_data_row.url)

## Optional - Upload Predictions for MAL

In [None]:
# Some additional unlabeled data rows
dataset = client.get_dataset(mal_dataset_id) 


# Use ThreadPoolExecutor to parallelize image downloads.
# This is still a bit slow due to the amount of processing for each data row.
# For larger datasets this has to leverage multiprocessing.


labels_mal = LabelList()
with ThreadPoolExecutor(4) as executor:
    data_rows = dataset.data_rows()
    images = [ImageData(url = data_row.row_data, uid = data_row.uid, external_id = data_row.external_id) for data_row in data_rows]
    futures = [executor.submit(get_label, image) for idx, image in enumerate(images) if idx < 25]
    for future in tqdm(as_completed(futures)):
        labels_mal.append(future.result())
        
project = client.create_project(name = "<insertProjectName>")
editor = next(
    client.get_labeling_frontends(where=LabelingFrontend.name == 'editor'))
project.setup(editor, labels_mal.get_ontology().asdict())
project.enable_model_assisted_labeling()
project.datasets.connect(dataset)

labels_mal.add_url_to_masks(signer) \
      .add_url_to_data(signer) \
      .assign_feature_schema_ids(OntologyBuilder.from_project(project))

ndjsons = list(NDJsonConverter.serialize(labels_mal))
upload_task = project.upload_annotations(name=f"upload-job-{uuid.uuid4()}",
                                         annotations=ndjsons,
                                         validate=False)
# Wait for upload to finish
upload_task.wait_until_done()
# Review the upload status
print(upload_task.errors)