<td>   <a target="_blank" href="https://labelbox.com" ><img src="https://labelbox.com/blog/content/images/2021/02/logo-v4.svg" width=256/></a></td>

<td><a href="https://colab.research.google.com/github/Labelbox/labelbox-python/blob/develop/examples/annotation_import/html.ipynb" target="_blank"><imgsrc="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"></a></td>
<td><a href="https://github.com/Labelbox/labelbox-python/tree/develop/examples/annotation_import/html.ipynb" target="_blank"><imgsrc="https://img.shields.io/badge/GitHub-100000?logo=github&logoColor=white" alt="GitHub"></a></td>

# HTML Annotation Import
* This notebook will provide examples of each supported annotation type for HTML assets, and also  cover MAL and Label Import methods:

Suported annotations that can be uploaded through the SDK

* Classification Radio 
* Classification Checklist 
* Classification Free Text 

**Not** supported annotations

* Bouding box
* NER
* Polygon 
* Point
* Polyline 
* Segmentation Mask

MAL and Label Import:

* Model-assisted labeling - used to provide pre-annotated data for your labelers. This will enable a reduction in the total amount of time to properly label your assets. Model-assisted labeling does not submit the labels automatically, and will need to be reviewed by a labeler for submission.
* Label Import - used to provide ground truth labels. These can in turn be used and compared against prediction labels, or used as benchmarks to see how your labelers are doing.



* For information on what types of annotations are supported per data type, refer to this documentation:
    * https://docs.labelbox.com/docs/model-assisted-labeling#option-1-import-via-python-annotation-types-recommended

* Notes:
    * Wait until the import job is complete before opening the Editor to make sure all annotations are imported properly.

In [None]:
%pip install -q "labelbox[data]"

# Setup

In [None]:
import labelbox as lb
import uuid
import labelbox.types as lb_types

# Replace with your API key
Guides on [Create an API key](https://docs.labelbox.com/docs/create-an-api-key)

In [None]:
# Add your api key
API_KEY = ""
client = lb.Client(api_key=API_KEY)

## Supported annotations for HTML

In [None]:
##### Classification free text #####

text_annotation = lb_types.ClassificationAnnotation(
    name="text_html",
    value=lb_types.Text(answer="sample text"),
)

text_annotation_ndjson = {
    "name": "text_html",
    "answer": "sample text",
}

In [None]:
##### Checklist Classification #######

checklist_annotation = lb_types.ClassificationAnnotation(
    name="checklist_html",  # must match your ontology feature"s name
    value=lb_types.Checklist(answer=[
        lb_types.ClassificationAnswer(name="first_checklist_answer"),
        lb_types.ClassificationAnswer(name="second_checklist_answer"),
    ]),
)

checklist_annotation_ndjson = {
    "name":
        "checklist_html",
    "answers": [
        {
            "name": "first_checklist_answer"
        },
        {
            "name": "second_checklist_answer"
        },
    ],
}

In [None]:
######## Radio Classification ######

radio_annotation = lb_types.ClassificationAnnotation(
    name="radio_html",
    value=lb_types.Radio(answer=lb_types.ClassificationAnswer(
        name="second_radio_answer")),
)

radio_annotation_ndjson = {
    "name": "radio_html",
    "answer": {
        "name": "first_radio_answer"
    },
}

In [None]:
########## Classification - Radio and Checklist (with subclassifcations)  ##########

nested_radio_annotation = lb_types.ClassificationAnnotation(
    name="nested_radio_question",
    value=lb_types.Radio(answer=lb_types.ClassificationAnswer(
        name="first_radio_answer",
        classifications=[
            lb_types.ClassificationAnnotation(
                name="sub_radio_question",
                value=lb_types.Radio(answer=lb_types.ClassificationAnswer(
                    name="first_sub_radio_answer")),
            )
        ],
    )),
)

nested_radio_annotation_ndjson = {
    "name": "nested_radio_question",
    "answer": {
        "name":
            "first_radio_answer",
        "classifications": [{
            "name": "sub_radio_question",
            "answer": {
                "name": "first_sub_radio_answer"
            },
        }],
    },
}

nested_checklist_annotation = lb_types.ClassificationAnnotation(
    name="nested_checklist_question",
    value=lb_types.Checklist(answer=[
        lb_types.ClassificationAnswer(
            name="first_checklist_answer",
            classifications=[
                lb_types.ClassificationAnnotation(
                    name="sub_checklist_question",
                    value=lb_types.Checklist(answer=[
                        lb_types.ClassificationAnswer(
                            name="first_sub_checklist_answer")
                    ]),
                )
            ],
        )
    ]),
)

nested_checklist_annotation_ndjson = {
    "name":
        "nested_checklist_question",
    "answer": [{
        "name":
            "first_checklist_answer",
        "classifications": [{
            "name": "sub_checklist_question",
            "answer": {
                "name": "first_sub_checklist_answer"
            },
        }],
    }],
}

## Upload Annotations - putting it all together 

## Step 1: Import data rows into Catalog

In [None]:
# Create one Labelbox dataset

global_key = "sample_html_1.html"

asset = {
    "row_data":
        "https://storage.googleapis.com/labelbox-datasets/html_sample_data/sample_html_1.html",
    "global_key":
        global_key,
}

dataset = client.create_dataset(
    name="html_annotation_import_demo_dataset",
    iam_integration=
    None,  # Removing this argument will default to the organziation's default iam integration
)
task = dataset.create_data_rows([asset])
task.wait_till_done()
print("Errors:", task.errors)
print("Failed data rows: ", task.failed_data_rows)

## Step 2: Create/select an ontology

Your project should have the correct ontology setup with all the tools and classifications supported for your annotations, and the tool names and classification instructions should match the `name` fields in your annotations to ensure the correct feature schemas are matched.

For example, when we create the text annotation, we provided the `name` as `text_html`. Now, when we setup our ontology, we must ensure that the name of the tool is also `text_html`. The same alignment must hold true for the other tools and classifications we create in our ontology.

In [None]:
ontology_builder = lb.OntologyBuilder(classifications=[
    lb.Classification(class_type=lb.Classification.Type.TEXT, name="text_html"),
    lb.Classification(
        class_type=lb.Classification.Type.CHECKLIST,
        name="checklist_html",
        options=[
            lb.Option(value="first_checklist_answer"),
            lb.Option(value="second_checklist_answer"),
        ],
    ),
    lb.Classification(
        class_type=lb.Classification.Type.RADIO,
        name="radio_html",
        options=[
            lb.Option(value="first_radio_answer"),
            lb.Option(value="second_radio_answer"),
        ],
    ),
    lb.Classification(
        class_type=lb.Classification.Type.CHECKLIST,
        name="nested_checklist_question",
        options=[
            lb.Option(
                "first_checklist_answer",
                options=[
                    lb.Classification(
                        class_type=lb.Classification.Type.CHECKLIST,
                        name="sub_checklist_question",
                        options=[lb.Option("first_sub_checklist_answer")],
                    )
                ],
            )
        ],
    ),
    lb.Classification(
        class_type=lb.Classification.Type.RADIO,
        name="nested_radio_question",
        options=[
            lb.Option(
                value="first_radio_answer",
                options=[
                    lb.Classification(
                        class_type=lb.Classification.Type.RADIO,
                        name="sub_radio_question",
                        options=[lb.Option(value="first_sub_radio_answer")],
                    ),
                ],
            )
        ],
    ),
])

ontology = client.create_ontology(
    "Ontology HTML Annotations",
    ontology_builder.asdict(),
    media_type=lb.MediaType.Html,
)


## Step 3: Create a labeling project
Connect the ontology to the labeling project

In [None]:
# Create Labelbox project
project = client.create_project(name="HTML Import Annotation Demo",
                                media_type=lb.MediaType.Html)

# Setup your ontology
project.setup_editor(
    ontology)  # Connect your ontology and editor to your project

## Step 4: Send a batch of data rows to the project

In [None]:
# Setup Batches and Ontology

# Create a batch to send to your MAL project
batch = project.create_batch(
    "first-batch-html-demo",  # Each batch in a project must have a unique name
    global_keys=[
        global_key
    ],  # Paginated collection of data row objects, list of data row ids or global keys
    priority=5,  # priority between 1(Highest) - 5(lowest)
)

print("Batch: ", batch)

## Step 5: Create the annotations payload
Create the annotations payload using the snippets of code above

Labelbox support two formats for the annotations payload: NDJSON and Python Annotation types.

#### Python annotation
Here we create the complete labels ndjson payload of annotations only using python annotation format. There is one annotation for each reference to an annotation that we created. 

In [None]:
label = []
label.append(
    lb_types.Label(
        data={"global_key": global_key},
        annotations=[
            text_annotation,
            checklist_annotation,
            radio_annotation,
            nested_checklist_annotation,
            nested_radio_annotation,
        ],
    ))

### NDJSON annotations 
Here we create the complete label NDJSON payload of annotations only using NDJSON format. There is one annotation for each reference to an annotation that we created [above](https://colab.research.google.com/drive/1rFv-VvHUBbzFYamz6nSMRJz1mEg6Ukqq#scrollTo=3umnTd-MfI0o&line=1&uniqifier=1).

In [None]:
label_ndjson = []
for annotations in [
        text_annotation_ndjson,
        checklist_annotation_ndjson,
        radio_annotation_ndjson,
        nested_radio_annotation_ndjson,
        nested_checklist_annotation_ndjson,
]:
    annotations.update({"dataRow": {"globalKey": global_key}})
    label_ndjson.append(annotations)

### Step 6: Upload annotations to a project as pre-labels or complete labels

#### Model Assisted Labeling (MAL)
For the purpose of this tutorial only run one of the label_ndjosn annotation type tools at the time (NDJSON or Annotation types). Delete the previous labels before uploading labels that use the 2nd method (ndjson)

In [None]:
# Upload our label using Model-Assisted Labeling
upload_job = lb.MALPredictionImport.create_from_objects(
    client=client,
    project_id=project.uid,
    name=f"mal_job-{str(uuid.uuid4())}",
    predictions=label,
)

upload_job.wait_until_done()
print("Errors:", upload_job.errors)
print("Status of uploads: ", upload_job.statuses)

#### Label Import

In [None]:
# Upload label for this data row in project
upload_job = lb.LabelImport.create_from_objects(
    client=client,
    project_id=project.uid,
    name="label_import_job" + str(uuid.uuid4()),
    labels=label,
)

upload_job.wait_until_done()
print("Errors:", upload_job.errors)
print("Status of uploads: ", upload_job.statuses)

### Optional deletions for cleanup 

In [None]:
# project.delete()
# dataset.delete()