<td>
   <a target="_blank" href="https://labelbox.com" ><img src="https://labelbox.com/blog/content/images/2021/02/logo-v4.svg" width=256/></a>
</td>

<td>
<a href="https://colab.research.google.com/drive/1MTLXL32JFGgXV1btgq-1VkGuu7U9Un_n" target="_blank"><img
src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"></a>
</td>

<td>
<a href="https://github.com/Labelbox/labelpandas/blob/main/notebooks/full-import.ipynb" target="_blank"><img
src="https://img.shields.io/badge/GitHub-100000?logo=github&logoColor=white" alt="GitHub"></a>
</td>

# _**Creating Data Rows with Metadata, Attachments and Annotations with LabelPandas**_

## _**Documentation**_

### **Data Rows**
_____________________

**Requirements:**

- A `row_data` column - This column must be URLs that point to the asset to-be-uploaded

- Either a `dataset_id` column or an input argument for `dataset_id`
  - If uploading to multiple datasets, provide a `dataset_id` column 
  - If uploading to one dataset, provide a `dataset_id` input argument
    - _This can still be a column if it's already in your CSV file_

**Recommended:**
- A `global_key` column
  - This column contains unique identifiers for your data rows
  - If none is provided, will default to your `row_data` column
- An `external_id` column
  - This column contains non-unique identifiers for your data rows
  - If none is provided, will default to your `global_key` column  

**Optional:**
- A `project_id` columm or an input argument for `project_id`
  - If batching to multiple projects, provide a `project_id` column
  - If batching to one project, provide a `project_id` input argument
    - _This can still be a column if it's already in your CSV file_

### **Attachments**
_____________________

For attachments, the column name must be " `attachment` + `divider` + `attachment_type` + `divider` + `column_name` "
  - Example: `attachment///raw_text///sample_column_name`
  - `attachment_type` must be one of the following:
    - `image`, `video`, `raw_text`, `html`, `text_url`


Values for attachments must correspond with the attachment type per Labelbox docs
  - More here: 
    - [Labelbox docs on attachments](https://docs.labelbox.com/docs/asset-attachments)

### **Metadata**
_____________________

For metadata, the column name must be " `metadata` + `divider` + `metadata_type` + `divider` + `metadata_field_name` "
  - Example: `metadata///string///sample_metadata_field_name`
  - `metadata_type` must be one of the following:
    - `string`, `enum`, `datetime`, `number` 
  - If the `metadata_field_name` doesn't exist yet in Labelbox, LabelPandas will create it for you


The values for metadata fields must correspond with the metadata type per Labelbox docs
  - More here:
    - [Labelbox definition of metadata](https://docs.labelbox.com/docs/datarow-metadata)
    - [Labelbox docs on creating metadata](https://docs.labelbox.com/docs/createmodify-metadata-schema)    

### **Annotations**
_____________________

*Note:*
*There must also be a `project_id` column, or an input argument for `project_id` when using LabelPandas to upload annotations*

*There must also be an `upload_method` provided when using LabelPandas*
  - *`upload_method` must be one of the following:*
    - *`"mal"` (uploads annotations as pre-labels)*
    - *`"import"` (uploads annotations as submitted labels)*


- For annotations, the column name must be `annotation` + `divider` + `annotation_type` + `divider` + `top_level_feature_name`
  - Example: `annotation///bbox///bbox_tool_name` where, in this case, the bounding tool name in your Labelbox ontology is "bbox_tool_name"
  - `annotation_type` must be one of the following:
    - `bbox`, `polygon`, `point`, `mask`, `line`, `named-entity`, `radio`, `checklist`, `text`
- Values for annotations must correspond with the following, per annotation type:

_____________________
_____________________

**Row-Level Formats for Tool Annotations**
- `bbox` (this example is two annotations)
```
[
        [[top, left, height, width], [nested_classification_name_paths]], 
        [[top, left, height, width], [nested_classification_name_paths]]
]
```
- `polygon` (this example is two annotations)
```
[
        [[(x, y), (x, y),...(x, y)], [nested_classification_name_paths]], 
        [[(x, y), (x, y),...(x, y)], [nested_classification_name_paths]]
]
```
- `line` (this example is two annotations)
```
[
        [[(x, y), (x, y),...(x, y)], [nested_classification_name_paths]], 
        [[(x, y), (x, y),...(x, y)], [nested_classification_name_paths]]
]
```
- `point` (this example is two annotations)
```
[
        [[x, y], [nested_classification_name_paths]], 
        [[x, y], [nested_classification_name_paths]]
]
```
- `mask` (this example is two annotations)
```
[
        [[URL, colorRGB], [nested_classification_name_paths]], 
        [[URL, colorRGB], [nested_classification_name_paths]]
]
                      OR
[
        [[numpy_array, colorRGB], [nested_classification_name_paths]], 
        [[numpy_array, colorRGB], [nested_classification_name_paths]]
]
                      OR
[
        [[png_bytes, None], [nested_classification_name_paths]], 
        [[png_bytes, None], [nested_classification_name_paths]]
]
```
- `named-entity` (this example is two annotations)
```
[
        [[start, end], [nested_classification_name_paths]], 
        [[start, end], [nested_classification_name_paths]]
]
```

**Row-Level Formats for Classification Annotations**
- `radio`, `checklist` and `text`
```
[[answer_name_paths]]
```
  - Note: the last string in a text name path is the text answer value itself
_____________________
_____________________  

## _**Code**_

Install LabelPandas

In [1]:
!pip install labelpandas -q


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.0.1[0m[39;49m -> [0m[32;49m23.1.2[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpython3.10 -m pip install --upgrade pip[0m


In [2]:
import labelpandas as lp
import pandas as pd
# Imported to create an example ontology - not required for typical runs of LabelPandas
import labelbox as lb

In [3]:
csv_path = "https://raw.githubusercontent.com/Labelbox/labelpandas/main/datasets/full-import.csv" # Path to your CSV file
api_key = ""

Load a CSV

In [4]:
df = pd.read_csv(csv_path)
df.head()

Unnamed: 0,row_data,global_key,external_id,annotation///bbox///sample_bounding_box,annotation///bbox///sample_nested_bounding_box,annotation///polygon///sample_polygon,annotation///polygon///sample_nested_polygon,annotation///mask///sample_segmentation_mask,annotation///point///sample_point,annotation///line///sample_polyline,...,annotation///text///sample_free_text_question,attachment///image///sample_col_1,attachment///video///sample_col_2,attachment///text_url///sample_col_3,attachment///raw_text///sample_col_4,attachment///html///sample_col_5,metadata///string///LabelPandas-String,metadata///number///LabelPandas-Number,metadata///enum///LabelPandas-Enum,metadata///datetime///LabelPandas-Datetime
0,https://storage.googleapis.com/labelbox-datase...,labelpandas-test-gCbn5IeZtE92OaUbyl1ZjQ.jpg,gCbn5IeZtE92OaUbyl1ZjQ.jpg,"[[[1853, 191, 213, 304], []], [[1828, 749, 154...","[[[1813, 1066, 259, 285], ['sample_tool_sub_te...","[[[[3363.98, 1180.19], [3205.616, 1349.865], [...","[[[[1341.067, 2550.793], [1412.708, 2545.137],...",[[['iVBORw0KGgoAAAANSUhEUgAAD8AAAAvQCAAAAADlnp...,"[[[1936.818, 2509.317], []], [[732.12, 2473.49...","[[[[1416.479, 1962.584], [2768.229, 2235.95], ...",...,,https://storage.googleapis.com/labelbox-sample...,https://storage.googleapis.com/labelbox-sample...,https://storage.googleapis.com/labelbox-sample...,Sample Raw Text,https://storage.googleapis.com/labelbox-sample...,Raw Text String 1,3999,A,01/30/1926 11:39 PM
1,https://storage.googleapis.com/labelbox-datase...,labelpandas-test-1MnLIosQZmXH3T-iU-4mtQ.jpg,1MnLIosQZmXH3T-iU-4mtQ.jpg,"[[[1463, 2155, 125, 200], []]]","[[[1362, 938, 237, 198], []]]","[[[[19.327, 2805.486], [701.995, 2515.586], [1...","[[[[1902.743, 1460.723], [1897.132, 1543.017],...",[[['iVBORw0KGgoAAAANSUhEUgAAD6AAAAu4CAAAAADQ3h...,"[[[430.798, 1477.556], []], [[254.988, 1451.37...","[[[[9.975, 1548.628], [797.382, 1658.978], [12...",...,[['Sample text answer']],https://storage.googleapis.com/labelbox-sample...,https://storage.googleapis.com/labelbox-sample...,https://storage.googleapis.com/labelbox-sample...,Sample Raw Text,https://storage.googleapis.com/labelbox-sample...,Raw Text String 3,2673,B,09/11/1925 12:09 PM
2,https://storage.googleapis.com/labelbox-datase...,labelpandas-test-qm4W6ktKCGR22n21A3o_0A.jpg,qm4W6ktKCGR22n21A3o_0A.jpg,"[[[1315, 2003, 147, 350], []], [[1340, 913, 12...","[[[999, 25, 146, 350], ['sample_tool_sub_text_...","[[[[2000.574, 856.19], [2023.466, 944.327], [2...","[[[[1775.08, 967.219], [1759.055, 1003.848], [...",[[['iVBORw0KGgoAAAANSUhEUgAADMAAAAcsCAAAAADZjE...,"[[[552.606, 554.005], []], [[2016.599, 358.272...","[[[[19.204, 489.905], [558.329, 811.549], [955...",...,,https://storage.googleapis.com/labelbox-sample...,https://storage.googleapis.com/labelbox-sample...,https://storage.googleapis.com/labelbox-sample...,Sample Raw Text,https://storage.googleapis.com/labelbox-sample...,Raw Text String 5,3409,C,06/28/1960 04:47 AM


Create a project, dataset and ontology

In [5]:
client = lp.Client(lb_api_key=api_key)

In [6]:
project = client.lb_client.create_project(name="LabelPandas-demo", media_type=lb.MediaType.Image)
dataset = client.lb_client.create_dataset(name="LabelPandas-demo-dataset")

Default createProject behavior will soon be adjusted to prefer batch projects. Pass in `queue_mode` parameter explicitly to opt-out for the time being.


In [7]:
ontology_builder = lb.OntologyBuilder(
    classifications=[ 
        lb.Classification( # Radio classification
            class_type=lb.Classification.Type.RADIO, name="sample_radio_question", 
            options=[lb.Option(value="sample_radio_answer_1"), lb.Option(value="sample_radio_answer_2")]
        ),
        lb.Classification( # Checklist classification
            class_type=lb.Classification.Type.CHECKLIST, name="sample_checklist_question", 
            options=[lb.Option(value="sample_checklist_answer_1"), lb.Option(value="sample_checklist_answer_2")]
        ), 
        lb.Classification( # Text classification
            class_type=lb.Classification.Type.TEXT, name="sample_free_text_question"
        ),
        lb.Classification( # Radio classification where one answer has a nested radio classification
            class_type=lb.Classification.Type.RADIO, name="sample_nested_radio_question",
            options=[
                lb.Option(
                    value="sample_branch_radio_answer_1", 
                    options=[
                        lb.Classification(
                            class_type=lb.Classification.Type.RADIO, name="sample_sub_radio_question", 
                            options=[lb.Option("sample_sub_radio_answer_1"), lb.Option("sample_sub_radio_answer_2")]
                        )
                    ]
                ), 
                lb.Option(value="sample_leaf_radio_answer_2")
            ]
        )
    ],
    tools=[ # List of Tool objects
        lb.Tool( # Bounding Box tool
            tool=lb.Tool.Type.BBOX, name="sample_bounding_box"), 
        lb.Tool( # Bounding Box tool with a nested text classification
            tool=lb.Tool.Type.BBOX,  name="sample_nested_bounding_box",
            classifications=[
                lb.Classification(class_type=lb.Classification.Type.TEXT, name="sample_tool_sub_text_question"),]
        ),
        lb.Tool( # Polygon tool
            tool=lb.Tool.Type.POLYGON, name="sample_polygon"
        ),
        lb.Tool( # Polygon tool with a nested radio classification
            tool=lb.Tool.Type.POLYGON, name="sample_nested_polygon",
            classifications=[
                lb.Classification(
                    class_type=lb.Classification.Type.TEXT, name="sample_tool_sub_radio_question",
                    options=[lb.Option("sample_sub_radio_answer_1"), lb.Option("sample_sub_radio_answer_2")]
                ),
            ]            
        ),        
        lb.Tool( # Segmentation mask tool given the name "mask"
            tool=lb.Tool.Type.SEGMENTATION, name="sample_segmentation_mask"
        ),
 	      lb.Tool( # Point tool given the name "point"
            tool=lb.Tool.Type.POINT, name="sample_point"
        ), 
        lb.Tool( # Polyline tool given the name "line"
            tool=lb.Tool.Type.LINE, name="sample_polyline"
        )
    ]
)

ontology = client.lb_client.create_ontology("LabelPandas-demo", ontology_builder.asdict())

project.setup_editor(ontology)

Upload to Labelbox

In [8]:
results = client.create_data_rows_from_table(
    table = df,
    dataset_id = dataset.uid,
    project_id = project.uid,
    upload_method = "import", # Must be either "import" or "mal"
    skip_duplicates = False, # If True, will skip data rows where a global key is already in use
    mask_method = "png", # Input masks must be either "png", "url", or "array"
    verbose = True, # If True, prints information about code execution
)

Creating upload list - 3 rows in Pandas DataFrame
Upload generated
Beginning upload for Dataset with ID clh7zttoj054s07018q6n3lsn
Vetting global keys
Global keys vetted
Beginning data row upload for Dataset with ID clh7zttoj054s07018q6n3lsn - uploading 3 data rows
Batch #1: 3 data rows
Success: Upload batch number 1 successful
Upload complete - all data rows uploaded
Sending 3 data rows to project with ID clh7ztt9p05s3072gbxkx4ofu
All data rows have been batched to the specified project(s)
Uploading annotations as submitted labels (Label Import)
Uploading 51 annotations for 3 data rows to project with ID clh7ztt9p05s3072gbxkx4ofu
Success: upload batch number 1 complete
