<td>
   <a target="_blank" href="https://labelbox.com" ><img src="https://labelbox.com/blog/content/images/2021/02/logo-v4.svg" width=256/></a>
</td>

<td>
<a href="https://colab.research.google.com/github/Labelbox/labelbox-python/blob/develop/examples/annotation_import/pdf.ipynb" target="_blank"><img
src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"></a>
</td>

<td>
<a href="https://github.com/Labelbox/labelbox-python/tree/develop/examples/annotation_import/pdf.ipynb" target="_blank"><img
src="https://img.shields.io/badge/GitHub-100000?logo=github&logoColor=white" alt="GitHub"></a>
</td>

# PDF Annotation Import
* This notebook will provide examples of each supported annotation type for PDF assets. It will cover the following:
    * Model-Assisted Labeling (MAL) - used to provide pre-annotated data for your labelers. This will enable a reduction in the total amount of time to properly label your assets. Model-assisted labeling does not submit the labels automatically, and will need to be reviewed by a labeler for submission.

* For information on what types of annotations are supported per data type, refer to this documentation:
    * https://docs.labelbox.com/docs/model-assisted-labeling#option-1-import-via-python-annotation-types-recommended

* Notes:
    * Wait until the import job is complete before opening the Editor to make sure all annotations are imported properly.

In [None]:
!pip install -q 'labelbox[data]'

# Imports

In [1]:
from labelbox.schema.ontology import OntologyBuilder, Tool, Classification, Option
from labelbox import Client, LabelingFrontend, MALPredictionImport
from labelbox.data.annotation_types import (
    Label, ImageData, ObjectAnnotation, 
    Rectangle, Point,
    Radio, Checklist, Text,
    ClassificationAnnotation, ClassificationAnswer
)
from labelbox.data.serialization import NDJsonConverter
from labelbox.schema.media_type import MediaType
import uuid
import json

# API Key and Client
Provide a valid api key below in order to properly connect to the Labelbox Client.

In [3]:
# Add your api key
API_KEY = "YOUR API KEY"
client = Client(api_key=API_KEY)

---- 
### Steps
1. Make sure project is setup
2. Collect annotations
3. Upload

### Project setup

First, we create an ontology with all the possible tools and classifications supported for PDF. The official list of supported annotations to import can be found here:
- [Model-Assisted Labeling](https://docs.labelbox.com/docs/model-assisted-labeling) (annotations/labels are not submitted)
- [PDF Annotations](https://docs.labelbox.com/docs/document-annotations)

In [4]:
ontology_builder = OntologyBuilder(
  tools=[ 
    Tool( # Bounding Box tool given the name "box"
      tool=Tool.Type.BBOX, 
      name="box")], 
  classifications=[ 
    Classification( # Text classification given the name "text"
      class_type=Classification.Type.TEXT,
      instructions="text"), 
    Classification( # Checklist classification given the name "text" with two options: "first_checklist_answer" and "second_checklist_answer"
      class_type=Classification.Type.CHECKLIST, 
      instructions="checklist", 
      options=[
        Option(value="first_checklist_answer"),
        Option(value="second_checklist_answer")            
      ]
    ), 
    Classification( # Radio classification given the name "text" with two options: "first_radio_answer" and "second_radio_answer"
      class_type=Classification.Type.RADIO, 
      instructions="radio", 
      options=[
        Option(value="first_radio_answer"),
        Option(value="second_radio_answer")
      ]
    )
  ]
)

In [5]:
ontology_builder

OntologyBuilder(tools=[Tool(tool=<Type.BBOX: 'rectangle'>, name='box', required=False, color=None, classifications=[], schema_id=None, feature_schema_id=None)], classifications=[Classification(class_type=<Type.TEXT: 'text'>, instructions='text', required=False, options=[], schema_id=None, feature_schema_id=None, scope=None), Classification(class_type=<Type.CHECKLIST: 'checklist'>, instructions='checklist', required=False, options=[Option(value='first_checklist_answer', label='first_checklist_answer', schema_id=None, feature_schema_id=None, options=[]), Option(value='second_checklist_answer', label='second_checklist_answer', schema_id=None, feature_schema_id=None, options=[])], schema_id=None, feature_schema_id=None, scope=None), Classification(class_type=<Type.RADIO: 'radio'>, instructions='radio', required=False, options=[Option(value='first_radio_answer', label='first_radio_answer', schema_id=None, feature_schema_id=None, options=[]), Option(value='second_radio_answer', label='second

In [6]:
# Create two Labelbox projects
mal_project = client.create_project(name="pdf_mal_project", media_type=MediaType.Document)

# Create one Labelbox dataset
dataset = client.create_dataset(name="pdf_annotation_import_demo_dataset")

# Grab an example image and create a Labelbox data row
test_pdf_url = "https://www.buds.com.ua/images/Lorem_ipsum.pdf"
data_row = dataset.create_data_row(row_data=test_pdf_url)

# Setup your ontology / labeling editor
editor = next(client.get_labeling_frontends(where=LabelingFrontend.name == "Editor")) # Unless using a custom editor, do not modify this

mal_project.setup(editor, ontology_builder.asdict()) # Connect your ontology and editor to your MAL project
mal_project.datasets.connect(dataset) # Connect your dataset to your MAL project

### Create Label using Annotation Type Objects
* It is recommended to use the Python SDK's annotation types for importing into Labelbox.

### Object Annotations

In [7]:
box_annotation = ObjectAnnotation(
    name='box', 
    extra={'unit': 'POINTS','page': 0}, #pages are 0-indexed, 0 indicates page 1 
    value=Rectangle(
        extra={}, 
        start=Point(x=557,y=898),
        end=Point(x=852,y=1140)
    ))

In [8]:
text_annotation=ClassificationAnnotation(
    value=Text( # String value for the text annotation
        answer="the answer to the text question" 
    ), 
    name="text" # Name of the classification in your ontology
)

checklist_annotation=ClassificationAnnotation(
    value=Checklist(
        answer=[ # List of the checklist answers in your ontology
            ClassificationAnswer(name="first_checklist_answer"),
            ClassificationAnswer(name="second_checklist_answer")
        ]
    ), 
    name="checklist" # Name of the classification in your ontology
)

radio_annotation=ClassificationAnnotation(
    value=Radio(
        answer=ClassificationAnswer(
            name="second_radio_answer" # Name of the radio answer in your ontology
        )
    ), 
    name="radio" # Name of the classification in your ontology
)

### Create a Label object with all of our annotations

In [9]:
# Create a Label object by identifying the applicable data row in Labelbox and providing a list of annotations
label = Label(
    data=ImageData(
        uid=data_row.uid),
    annotations = [
       box_annotation, 
        text_annotation, checklist_annotation, radio_annotation
    ]
)


label.__dict__



{'uid': None,
 'data': ImageData(im_bytes=None,file_path=None,url=None,arr=None),
 'annotations': [ObjectAnnotation(name='box', feature_schema_id=None, extra={'unit': 'POINTS', 'page': 0}, value=Rectangle(extra={}, start=Point(extra={}, x=557.0, y=898.0), end=Point(extra={}, x=852.0, y=1140.0)), classifications=[]),
  ClassificationAnnotation(name='text', feature_schema_id=None, extra={}, value=Text(answer='the answer to the text question')),
  ClassificationAnnotation(name='checklist', feature_schema_id=None, extra={}, value=Checklist(name='checklist', answer=[ClassificationAnswer(name='first_checklist_answer', feature_schema_id=None, extra={}, keyframe=None), ClassificationAnswer(name='second_checklist_answer', feature_schema_id=None, extra={}, keyframe=None)])),
  ClassificationAnnotation(name='radio', feature_schema_id=None, extra={}, value=Radio(answer=ClassificationAnswer(name='second_radio_answer', feature_schema_id=None, extra={}, keyframe=None)))],
 'extra': {}}

### Model Assisted Labeling 

To do model-assisted labeling, we need to convert a Label object into an NDJSON. 

This is easily done with using the NDJSONConverter class

Notes:
* the NDJsonConverter takes in a list of labels

In [10]:
# Convert our label from a Labelbox class object to the underlying NDJSON format required for upload - uploads can be directly built in this syntax as well
mal_ndjson = list(NDJsonConverter.serialize([label]))
mal_ndjson

[{'uuid': '4a1ca900-85d7-4643-883e-277170000f9c',
  'dataRow': {'id': 'cl7hrmmi90m5c0y6h5nk00u02'},
  'name': 'box',
  'page': 0,
  'unit': 'POINTS',
  'classifications': [],
  'bbox': {'top': 898.0, 'left': 557.0, 'height': 242.0, 'width': 295.0}},
 {'name': 'text',
  'answer': 'the answer to the text question',
  'uuid': '35588dd3-9050-4471-ac13-a8ed6ac1fd4a',
  'dataRow': {'id': 'cl7hrmmi90m5c0y6h5nk00u02'}},
 {'name': 'checklist',
  'uuid': '38bc716c-d199-4e6e-bd31-0d2128716444',
  'dataRow': {'id': 'cl7hrmmi90m5c0y6h5nk00u02'},
  'answer': [{'name': 'first_checklist_answer'},
   {'name': 'second_checklist_answer'}]},
 {'name': 'radio',
  'answer': {'name': 'second_radio_answer'},
  'uuid': '8a21a248-9669-4465-b0c0-5a8d75903c8f',
  'dataRow': {'id': 'cl7hrmmi90m5c0y6h5nk00u02'}}]

In [11]:
# Upload our label using Model-Assisted Labeling
upload_job = MALPredictionImport.create_from_objects(
    client = client, 
    project_id = mal_project.uid, 
    name="mal_job", 
    predictions=mal_ndjson)

In [13]:
# Errors will appear for each annotation that failed.
# Empty list means that there were no errors
# This will provide information only after the upload_job is complete, so we do not need to worry about having to rerun
print("Errors:", upload_job.errors)

Errors: []
