<td>
   <a target="_blank" href="https://labelbox.com" ><img src="https://labelbox.com/blog/content/images/2021/02/logo-v4.svg" width=256/></a>
</td>

<td>
<a href="https://colab.research.google.com/drive/1Xg-kn6BaYRLl-F4bMJVVopLmgEyQRTTk#scrollTo=1PnC-9oXvjUV" target="_blank"><img
src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"></a>
</td>

<td>
<a href="https://github.com/Labelbox/labelpandas/blob/main/notebooks/labelpandas_demo_notebook.ipynb" target="_blank"><img
src="https://img.shields.io/badge/GitHub-100000?logo=github&logoColor=white" alt="GitHub"></a>
</td>

# Introduction to LabelPandas
*The official open-sourced API to integrate tablular data with Labelbox*

This notebook will provide examples of each supported annotation type for image assets. 

While no code is required, the column names need to be in a specific format to tell Labelbox which columns correspond to which kinds of data.

## ***Uploading Data Rows***

### **Required (and recommended) Columns**
_____________________

**Required:**
- `row_data` - This column must be URLs that point to 

**Recommended:**
- `global_key` 
  - This column must be unique identifiers for data_rows
  - If none is provided, will default to `row_data` column
- `global_key`
  - This column is non-unique identifiers for data_rows
  - If none is provided, will default to `global_key` column
- `dataset_id` 
  - This column will indicate which data rows go to which dataset
  - If not provided as a column, must be provided as in input argument
- `project_id` 
  - This column will indicate which data rows get set to which projects
  - If not provided as a column, must be provided as in input argument

## **Uploading Metadata, Attachments and Annotations**

Columns must be named using a `divider`
  - *In this context, a `divider` is a separator in a given string - the default `divider` value is `"///"`*

### **Metadata**
_____________________

For metadata, the column name must be " `metadata` + `divider` + `metadata_type` + `divider` + `metadata_field_name` "
  - Example: `metadata///string///sample_metadata_field_name`
  - `metadata_type` must be one of the following:
    - `string`, `enum`, `datetime` `enum` 
  - If the `metadata_field_name` doesn't exist yet in Labelbox, LabelPandas will create it for you


The values for metadata fields must correspond with the metadata type per Labelbox docs
  - More here:
    - [Labelbox definition of metadata](https://docs.labelbox.com/docs/datarow-metadata)
    - [Labelbox docs on creating metadata](https://docs.labelbox.com/docs/createmodify-metadata-schema)    

### **Attachments**
_____________________

For attachments, the column name must be " `attachment` + `divider` + `attachment_type` + `divider` + `column_name` "
  - Example: `attachment///raw_text///sample_column_name`
  - `attachment_type` must be one of the following:
    - `image`, `video`, `raw_text`, `html`, `text_url`


Values for attachments must correspond with the attachment type per Labelbox docs
  - More here: 
    - [Labelbox docs on attachments](https://docs.labelbox.com/docs/asset-attachments)
    

### **Annotations**
_____________________

*Note:*
*There must also be a `project_id` column, or an input argument for `project_id` when using LabelPandas to upload annotations*

*There must also be an `upload_method` provided when using LabelPandas*
  - *`upload_method` must be one of the following:*
    - *`"mal"` (uploads annotations as pre-labels)*
    - *`"import"` (uploads annotations as submitted labels)*


- For annotations, the column name must be `annotation` + `divider` + `annotation_type` + `divider` + `top_level_feature_name`
  - Example: `annotation///bbox///bbox_tool_name`
  - `annotation_type` must be one of the following:
    - `bbox`, `polygon`, `point`, `mask`, `line`, `named-entity`, `radio`, `checklist`, `text`
- Values for annotations must correspond with the following, per annotation type:

_____________________
_____________________

**Row-Level Formats for Tool Annotations**
- `bbox` (this example is two annotations)
```
[
        [[top, left, height, width], [nested_classification_name_paths]], 
        [[top, left, height, width], [nested_classification_name_paths]]
]
```
- `polygon` (this example is two annotations)
```
[
        [[(x, y), (x, y),...(x, y)], [nested_classification_name_paths]], 
        [[(x, y), (x, y),...(x, y)], [nested_classification_name_paths]]
]
```
- `line` (this example is two annotations)
```
[
        [[(x, y), (x, y),...(x, y)], [nested_classification_name_paths]], 
        [[(x, y), (x, y),...(x, y)], [nested_classification_name_paths]]
]
```
- `point` (this example is two annotations)
```
[
        [[x, y], [nested_classification_name_paths]], 
        [[x, y], [nested_classification_name_paths]]
]
```
- `mask` (this example is two annotations)
```
[
        [[URL, colorRGB], [nested_classification_name_paths]], 
        [[URL, colorRGB], [nested_classification_name_paths]]
]
```
- `named-entity` (this example is two annotations)
```
[
        [[start, end], [nested_classification_name_paths]], 
        [[start, end], [nested_classification_name_paths]]
]
```

**Row-Level Formats for Classification Annotations**
- `radio`, `checklist` and `text`
```
[[answer_name_paths]]
```
  - Note: the last string in a text name path is the text value itself
_____________________
_____________________  

**Name Path Explanation**
- Name paths can be best explained as the list of features that lead to the leaf feature, merged into a single string, where features are separated by a `divider`
  - It will look like `parent_feature_name` + `divider` + `child_feature_name` and so on
  - LabelPandas can handle any level of depth in name paths
- The `divider` used should be the same `divider` used in your column names
- See the below notebook for examples of name paths

# Example Notebook

### Import Labelbox and Labelpandas

In [None]:
## Install labelbox and labelpandas
!pip install labelpandas -q
!pip install labelbox --upgrade -q

[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/185.5 KB[0m [31m?[0m eta [36m-:--:--[0m[2K     [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m [32m184.3/185.5 KB[0m [31m5.3 MB/s[0m eta [36m0:00:01[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m185.5/185.5 KB[0m [31m2.4 MB/s[0m eta [36m0:00:00[0m
[?25h

In [None]:
## Import your libraries
import labelpandas as lp
import pandas as pd
# Imported to create an example ontology - not required for typical runs of labelpandas
from labelbox.schema.ontology import OntologyBuilder, Tool, Classification, Option

### Configure your notebook

In [None]:
api_key = ""
demo_name = "test-labelpandas-demo" ## Used to create a project, dataset and ontology

In [None]:
client = lp.Client(api_key)

### Create a project, dataset and ontology

In [None]:
project = client.lb_client.create_project(name=demo_name) # Create a demo project
dataset = client.lb_client.create_dataset(name=demo_name)# Create a demo dataset to upload data rows to



In [None]:
ontology_builder = OntologyBuilder(
    classifications=[ # List of Classification objects
        Classification( # Radio classification given the name "text" with two options: "first_radio_answer" and "second_radio_answer"
            class_type=Classification.Type.RADIO, 
            instructions="sample_radio_question", 
            options=[Option(value="sample_radio_answer_1"), Option(value="sample_radio_answer_2")]
        ),
        Classification( # Checklist classification given the name "text" with two options: "first_checklist_answer" and "second_checklist_answer"
            class_type=Classification.Type.CHECKLIST, 
            instructions="sample_checklist_question", 
            options=[Option(value="sample_checklist_answer_1"), Option(value="sample_checklist_answer_2")]
        ), 
        Classification( # Text classification given the name "text"
            class_type=Classification.Type.TEXT,
            instructions="sample_free_text_question"
        ),
        Classification( # Radio classification where one answer has a nested radio classification
            class_type=Classification.Type.RADIO, 
            instructions="sample_nested_radio_question",
            options=[
                Option(value="sample_branch_radio_answer_1",
                    options=[
                        Classification(
                            class_type=Classification.Type.RADIO,
                            instructions="sample_sub_radio_question",
                            options=[Option("sample_sub_radio_answer_1"), Option("sample_sub_radio_answer_2")]
                        )
                    ]
                ),
                Option(value="sample_leaf_radio_answer_2")
            ]
        )
    ],
    tools=[ # List of Tool objects
        Tool( # Bounding Box tool
            tool=Tool.Type.BBOX, 
            name="sample_bounding_box"), 
        Tool( # Bounding Box tool with a nested radio classification
            tool=Tool.Type.BBOX, 
            name="sample_nested_bounding_box",
            classifications=[
                Classification(
                    class_type=Classification.Type.TEXT,
                    instructions="sample_tool_sub_text_question"
                ),
            ]
        ), 
        Tool( # Polygon tool
            tool=Tool.Type.POLYGON, 
            name="sample_polygon"
        ),
        Tool( # Polygon tool with a nested text classificatoin
            tool=Tool.Type.POLYGON, 
            name="sample_nested_polygon",
            classifications=[
                Classification(
                    class_type=Classification.Type.TEXT,
                    instructions="sample_tool_sub_radio_question",
                    options=[Option("sample_sub_radio_answer_1"), Option("sample_sub_radio_answer_2")]
                ),
            ]            
        ),        
        Tool( # Segmentation mask tool given the name "mask"
            tool=Tool.Type.SEGMENTATION, 
            name="sample_segmentation_mask"
        ),
 	      Tool( # Point tool given the name "point"
            tool=Tool.Type.POINT, 
            name="sample_point"
        ), 
        Tool( # Polyline tool given the name "line"
            tool=Tool.Type.LINE, 
            name="sample_polyline"
        )
    ]
)

ontology = client.lb_client.create_ontology(demo_name, ontology_builder.asdict())

In [None]:
project.setup_editor(ontology)

### Load in your CSV of data - 1 data row example

*This data row has all metadata types, tool and classification types except for `named-entity` which is applicable to text assets, which LabelPandas also supports*

In [None]:
row_data_url = "https://storage.googleapis.com/diagnostics-demo-data/coco/COCO_train2014_000000523272.jpg",
sample_mask_url = "https://api.labelbox.com/masks/feature/cldr1h24y00113b6jv8npf9nk?token=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJ1c2VySWQiOiJja2xndGl0cGdkY2tnMDc2MGp0bWVoa2RuIiwib3JnYW5pemF0aW9uSWQiOiJja2xndGl0cDBnaTUwMDczMmRnbWcwcDhsIiwiaWF0IjoxNjc1NTgwODkzLCJleHAiOjE2NzgxNzI4OTN9.pKI1_n_noWLHPIUCfwixI3xzTtmHM_bSTuJQasp_OHU"

sample_data_row = {
    "row_data" : row_data_url,
    "global_key" : "test-demo-data-523272.jpg",
    "attachment///image///image_attachment_column" : [
        "https://assets.labelbox.com/asset/ckv0dyjhq1d5f0zxtezm059t5"
    ],
    "attachment///video///video_attachment_column" : [
        "https://assets.labelbox.com/asset/cl2y8dnjm08en08t0at63au39"
    ],
    "attachment///raw_text///raw_text_attachment_column" : [
        "Sample raw text"
    ],
    "attachment///html///html_attachment_column" : [
        "https://storage.googleapis.com/labelbox-sample-datasets/Docs/windy.html"
    ],
    "attachment///text_url///text_url_attachment_column" : [
        "https://labelbox.com"
    ],
    "metadata///datetime///metadata_datetime_column" : [
        "January 15th, 2022 3:45 PM"
    ],
    "metadata///enum///metadata_enum_column" : [
        "valid"
    ],
    "metadata///string///metadata_string_column" : [
        "Sample string metadata"
    ],
    "metadata///number///metadata_number_column" : [
        "0"
    ],
    "annotation///bbox///sample_bounding_box" : [
        [ # # 1 Annotation = [ [top, left, bottom, right], [list_of_name_paths] ] || 1 Label = [Annotation, Annotation]
            [[75, 101, 26, 18], []], 
            [[74, 160, 24, 14], []], 
            [[72, 214, 13, 12], []], 
            [[9, 355, 39, 31], []], 
            [[0, 429, 31, 30], []]
        ]
    ],
    "annotation///bbox///sample_nested_bounding_box" : [
        [ # 1 Annotation = [ [top, left, bottom, right], [list_of_name_paths] ] || 1 Label = [Annotation, Annotation]
            [[59, 270, 103, 190], ['sample_tool_sub_text_question///Dog']]
        ]
    ],
    "annotation///polygon///sample_polygon" : [
        [ #  1 Annotation = [ [[x,y], [x,y], [x,y], [x,y]], [list_of_name_paths] ] || 1 Label = [Annotation, Annotation]
            [[[148.789, 166.28], [136.773, 166.28], [130.572, 173.257], [126.114, 222.676], [127.083, 235.661], [132.122, 242.249], [149.176, 242.637]], []],
            [[[437.161, 167.056], [438.324, 240.699], [453.052, 241.087], [457.703, 227.133], [457.316, 208.529], [455.378, 186.823], [453.44, 170.932]], []]
        ]
    ],
    "annotation///polygon///sample_nested_polygon" : [
        [ # 1 Annotation = [ [[x,y], [x,y], [x,y], [x,y]], [list_of_name_paths] ] || 1 Label = [Annotation, Annotation]
            [[[265.843, 266.668], [265.068, 295.738], [325.533, 296.126], [324.37, 267.056]], ['sample_tool_sub_radio_question///License plate']]
        ]
    ],
    "annotation///mask///sample_segmentation_mask" : [
        [ # 1 Annotation = [ [url, [R,G,B]], [list_of_name_paths] ] || 1 Label = [Annotation, Annotation]
            [[sample_mask_url, [255, 255, 255]], []]
        ]
    ],
    "annotation///point///sample_point" : [
        [ # 1 Annotation = [ [x,y], [list_of_name_paths] ] || 1 Label = [Annotation, Annotation]
            [[138.971, 285.793], []], 
            [[445.971, 283.293], []]
        ]
    ],
    "annotation///line///sample_polyline" : [
        [ #  1 Annotation = [ [[x,y], [x,y], [x,y], [x,y]], [list_of_name_paths] ] || 1 Label = [Annotation, Annotation]
            [[[0, 145.792], [58.486, 156.292], [87.478, 160.542], [116.725, 160.667], [131.348, 151.23], [147.659, 148.511]], []]
        ]
    ],
    "annotation///radio///sample_nested_radio_question" : [
        [['sample_branch_radio_answer_1///sample_sub_radio_question///sample_sub_radio_answer_1']]
    ],
    "annotation///checklist///sample_checklist_question" : [
        [['sample_checklist_answer_1', 'sample_checklist_answer_2']]
    ],
    "annotation///text///sample_free_text_question" : [
        [["Free text answer"]]
    ],
    "annotation///radio///sample_radio_question" : [
        [['sample_radio_answer_1']]
    ]
}

In [None]:
sample_df = pd.DataFrame.from_dict(sample_data_row)
sample_df.head()

Unnamed: 0,row_data,global_key,attachment///image///image_attachment_column,attachment///video///video_attachment_column,attachment///raw_text///raw_text_attachment_column,attachment///html///html_attachment_column,attachment///text_url///text_url_attachment_column,metadata///datetime///metadata_datetime_column,metadata///enum///metadata_enum_column,metadata///string///metadata_string_column,...,annotation///bbox///sample_nested_bounding_box,annotation///polygon///sample_polygon,annotation///polygon///sample_nested_polygon,annotation///mask///sample_segmentation_mask,annotation///point///sample_point,annotation///line///sample_polyline,annotation///radio///sample_nested_radio_question,annotation///checklist///sample_checklist_question,annotation///text///sample_free_text_question,annotation///radio///sample_radio_question
0,https://storage.googleapis.com/diagnostics-dem...,test-demo-data-523272.jpg,https://assets.labelbox.com/asset/ckv0dyjhq1d5...,https://assets.labelbox.com/asset/cl2y8dnjm08e...,Sample raw text,https://storage.googleapis.com/labelbox-sample...,https://labelbox.com,"January 15th, 2022 3:45 PM",valid,Sample string metadata,...,"[[[59, 270, 103, 190], [sample_tool_sub_text_q...","[[[[148.789, 166.28], [136.773, 166.28], [130....","[[[[265.843, 266.668], [265.068, 295.738], [32...",[[[https://api.labelbox.com/masks/feature/cldr...,"[[[138.971, 285.793], []], [[445.971, 283.293]...","[[[[0, 145.792], [58.486, 156.292], [87.478, 1...",[[sample_branch_radio_answer_1///sample_sub_ra...,"[[sample_checklist_answer_1, sample_checklist_...",[[Free text answer]],[[sample_radio_answer_1]]


### Execute Upload

In [None]:
results = client.create_data_rows_from_table(table=sample_df, dataset_id=dataset.uid, project_id=project.uid, upload_method="import", verbose=True)

Creating upload list - 1 rows in Pandas DataFrame
Beginning data row upload for dataset ID cldutoa031x3w07z65zu6eq07: uploading 1 data rows
Batch #1: 1 data rows
Success: Upload batch number 1 successful
Upload complete - all data rows uploaded
Sending 1 data rows to project with ID clduto9ri24sb071c3xlchtzj
All data rows have been batched to the specified project(s)
Uploading annotations as submitted labels (Label Import)
Uploading 17 annotations for 1 data rows to project with ID clduto9ri24sb071c3xlchtzj
Success: upload batch number 1 complete


In [None]:
results

{'data_row_upload_results': [],
 'batch_to_project_results': [],
 'annotation_upload_results': []}