<td>
   <a target="_blank" href="https://labelbox.com" ><img src="https://labelbox.com/blog/content/images/2021/02/logo-v4.svg" width=256/></a>
</td>

<td>
<a href="https://colab.research.google.com/github/Labelbox/labelbox-python/blob/develop/examples/annotation_types/label_containers.ipynb" target="_blank"><img
src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"></a>
</td>

<td>
<a href="https://github.com/Labelbox/labelbox-python/tree/develop/examples/annotation_types/label_containers.ipynb" target="_blank"><img
src="https://img.shields.io/badge/GitHub-100000?logo=github&logoColor=white" alt="GitHub"></a>
</td>

# Label Containers
* There are two high level containers for labels
    1. [`LabelList`](#LabelList)
    2. [`LabelGenerator`](#LabelGenerator)
* Tools that are built to convert between formats, help with etl, and model training all will operate on these containers
* Make sure to read basics. Explanations are not repeated here

In [1]:
!pip install "labelbox[data]"

In [2]:
from labelbox import Client
from labelbox.data.annotation_types import LabelList, LabelGenerator
from labelbox.schema.ontology import OntologyBuilder, Tool, Classification, Option
from labelbox import LabelingFrontend
from labelbox.data.annotation_types import (Label, ImageData, MaskData, Mask,
                                            Point, Polygon,
                                            ClassificationAnswer, Radio,
                                            Checklist, ObjectAnnotation,
                                            ClassificationAnnotation)
import requests
import numpy as np
import os

# API Key and Client
Provide a valid api key below in order to properly connect to the Labelbox Client.

In [4]:
# Add your api key
API_KEY = None
client = Client(api_key=API_KEY)

### Helper Functions
* The following functions are explained in the [basics notebooks](https://colab.research.google.com/github/Labelbox/labelbox-python/blob/develop/examples/annotation_types/basics.ipynb)
* Please skip to the [LabelList](#LabelList) section to continue with this tutorial

In [5]:
def signing_function(obj_bytes: bytes) -> str:
    # Do not use this signer. You will not be able to resign these images at a later date.
    url = client.upload_data(content=obj_bytes, sign=True)
    return url

In [6]:
def get_polygon():
    # Given some polygon:
    xy_poly = [
        [60, 161],
        [67, 177],
        [76, 180],
        [77, 222],
        [82, 246],
        [78, 291],
        [72, 300],
        [87, 300],
        [94, 244],
        [103, 243],
        [100, 269],
        [90, 290],
        [95, 296],
        [104, 292],
        [108, 272],
        [111, 300],
        [121, 300],
        [117, 244],
        [128, 236],
        [133, 298],
        [142, 297],
        [137, 250],
        [146, 208],
        [138, 185],
        [120, 180],
        [105, 189],
        [112, 162],
        [93, 156],
        [72, 160],
    ]
    return Polygon(points=[Point(x=x, y=y) for x, y in xy_poly])


def get_labels():
    im_h, im_w = 300, 200
    image_url = "https://picsum.photos/id/1003/200/300"
    nose_color, eye_color = (0, 255, 0), (255, 0, 0)
    nose_mask = Point(x=96, y=194).draw(im_h, im_w, thickness=3)
    eye_masks = [
        Point(x=84, y=182).draw(im_h, im_w, thickness=3),
        Point(x=99, y=181).draw(im_h, im_w, thickness=3),
    ]
    mask_arr = np.max([*eye_masks, nose_mask], axis=0)
    mask = MaskData(arr=mask_arr)
    return [
        Label(data=ImageData(im_bytes=requests.get(image_url).content),
              annotations=[
                  ObjectAnnotation(value=get_polygon(), name="deer"),
                  ObjectAnnotation(name="deer_eyes",
                                   value=Mask(mask=mask, color=eye_color)),
                  ObjectAnnotation(
                      name="deer_nose",
                      value=Mask(mask=mask, color=nose_color),
                      classifications=[
                          ClassificationAnnotation(
                              name="nose_description",
                              value=Radio(answer=ClassificationAnswer(
                                  name="wet")))
                      ]),
                  ClassificationAnnotation(
                      name="image_description",
                      value=Checklist(
                          answer=[ClassificationAnswer(name="bright")]))
              ])
    ]

In [7]:
def show_feature_schema_ids(label):
    for annotation in label.annotations:
        print(f"Object : {annotation.name} - {annotation.feature_schema_id}")
        for classification in getattr(annotation, 'classifications', []):
            print(
                f"--- Subclass : {classification.name} - {classification.feature_schema_id}"
            )
            option = classification.value
            print(
                f"--- --- Options: {option.answer.name} - {option.answer.feature_schema_id}"
            )

        if isinstance(annotation, ClassificationAnnotation):
            for option in annotation.value.answer:
                print(
                    f"--- Options: {option.name} - {option.feature_schema_id}")

In [8]:
def setup_project():
    # These names have to match our object names exactly!!
    ontology_builder = OntologyBuilder(tools=[
        Tool(tool=Tool.Type.POLYGON, name="deer"),
        Tool(tool=Tool.Type.SEGMENTATION,
             name="deer_nose",
             classifications=[
                 Classification(class_type=Classification.Type.RADIO,
                                instructions="nose_description",
                                options=[Option(value="wet")])
             ]),
        Tool(tool=Tool.Type.SEGMENTATION, name="deer_eyes")
    ],
                                       classifications=[
                                           Classification(
                                               Classification.Type.CHECKLIST,
                                               instructions="image_description",
                                               options=[
                                                   Option(value="bright"),
                                                   Option(value="not_blurry"),
                                                   Option(value="dark")
                                               ])
                                       ])

    editor = next(
        client.get_labeling_frontends(where=LabelingFrontend.name == "Editor"))
    project = client.create_project(name="test_annotation_types")
    project.setup(editor, ontology_builder.asdict())
    dataset = client.create_dataset(name='my_ds')
    project.datasets.connect(dataset)

    ontology = OntologyBuilder.from_project(project)
    return ontology, dataset, project

In [9]:
def print_mask_urls(label):
    for annotation in label.annotations:
        if isinstance(annotation.value, Mask):
            print(annotation.value.mask.url)

In [10]:
def show_references(label):
    print('\n---  schema ids ---\n')
    show_feature_schema_ids(label)
    print("\n--- mask urls ---\n")
    print_mask_urls(label)
    print('\n--- image url ---\n')
    print(label.data.url)
    print('\n--- data row reference ---\n')
    print(original_label.data.uid)

# LabelList
* This object is essentially a list of Labels with a set of helpful utilties
* It is simple and fast at the expense of memory
    * Larger datasets shouldn't use label list ( or at least will require more memory )
* Why use label list over just a list of labels?
    * Multithreaded utilities (faster)
    * Compatible with converter functions (functions useful for translating between formats, etl, and training )

In [11]:
labels = get_labels()
label_list = LabelList(labels)

# Also build LabelLists iteratively
label_list = LabelList()
for label in labels:
    label_list.append(label)

## Iterate

In [12]:
# Iterable, behaves like a list
for label in label_list:
    print(type(label))
# Get length
print(len(label_list))
# By index
print(type(label_list[0]))

### Upload segmentation masks

In [13]:
### Add urls to all segmentation masks:
# (in parallel)
for label in label_list:
    print_mask_urls(label)

label_list.add_url_to_masks(signing_function)

for label in label_list:
    print_mask_urls(label)
# Again note that these all share the same segmentation mask
# ( This is determined by the fact that they share the same reference )
# This mask is only uploaded once

### Create signed urls for data

In [14]:
### Add urls to all segmentation masks:
# (in parallel)
print(label_list[0].data.url)
label_list.add_url_to_data(signing_function)
print(label_list[0].data.url)

### Add to labelbox dataset

In [15]:
# For the next two sections we need an ontology and dataset
ontology, dataset, project = setup_project()

In [16]:
print(label_list[0].data.uid)
# Note that this function will assign a uuid as the external id if it isn't provided.
label_list.add_to_dataset(dataset, signing_function)
print(label_list[0].data.uid)

### Add schema ids

In [17]:
for label in label_list:
    show_feature_schema_ids(label)
# Note that this function will assign a uuid as the external id if it isn't provided.
label_list.assign_feature_schema_ids(ontology)
print('-' * 50)
for label in label_list:
    show_feature_schema_ids(label)

In [18]:
# cleanup:
dataset.delete()
project.delete()

# LabelGenerator
* This object generates labels and provides a set of helpful utilties
* This object is complex and slower than the `LabelList` in order to be highly memory efficient
    * Larger datasets should use label generators
* Why use label generator over just a generator that yields labels?
    * Parallel io operations are run in the background to prepare results
    * Compatible with converter functions (functions useful for translating between formats, etl, and training )
* The first qsize elements run serially from when the chained functions are added.
    * After that iterating will get much faster.

In [19]:
labels = get_labels()
label_generator = LabelGenerator(labels)
ontology, dataset, project = setup_project()

In [20]:
# So we can't show the before and afters because the generator is not repeatable

try:
    label = next(label_generator)
    print("Ran once")
    label = next(label_generator)
    print("Ran twice")
except StopIteration:
    pass

In [21]:
# Does not support indexing ( it is a generator.. )
try:
    label_generator[0]
    print("Can index")
except TypeError:
    print("Unable to index")

### Functions to modify results
* We can set functions to run on the result of the generator
* Since these are run in background threads it is a lot faster than applying them on each label individually
* The functions are lazily evaluated

In [22]:
# Recreate because we already went through all of the items when we showed that it isn't repeatable
original_label = labels[0]

show_references(original_label)
label_generator = LabelGenerator(labels) \
        .add_url_to_masks(signing_function) \
        .add_to_dataset(dataset, signing_function) \
        .assign_feature_schema_ids(ontology)

In [23]:
show_references(original_label)

In [24]:
label = next(label_generator)
show_references(original_label)

* Note that the first qsize elements run serially from when the chained functions are added.
* After that iterating will get much faster.

In [25]:
# LabelGenerators can be converted to a LabelList
LabelGenerator(labels).as_list()

In [26]:
dataset.delete()
project.delete()