<td>
   <a target="_blank" href="https://labelbox.com" ><img src="https://labelbox.com/blog/content/images/2021/02/logo-v4.svg" width=256/></a>
</td>

<td>
<a href="https://colab.research.google.com/github/Labelbox/labelbox-python/blob/master/examples/basics/data_row_metadata.ipynb" target="_blank"><img
src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"></a>
</td>

<td>
<a href="https://github.com/Labelbox/labelbox-python/tree/master/examples/basics/data_row_metadata.ipynb" target="_blank"><img
src="https://img.shields.io/badge/GitHub-100000?logo=github&logoColor=white" alt="GitHub"></a>
</td>

# Data Row Metadata

Metadata is useful to better understand data on the platform to help with labeling review, model diagnostics, and data selection. This **should not be confused with attachments**. Attachments provide additional context for labelers but is not searchable within Catalog.

## Setup

In [None]:
!pip install -q "labelbox[data]"

In [None]:
import labelbox as lb
from datetime import datetime
from pprint import pprint
from uuid import uuid4

In [None]:
# Add your api key
API_KEY = ""
client = lb.Client(api_key=API_KEY)

## Metadata ontology

We use a similar system for managing metadata as we do feature schemas. Metadata schemas are strongly typed to ensure we can provide the best experience in the App. Each metadata field can be uniquely accessed by id. Names are unique within the kind of metadata, reserved or custom. A DataRow can have a maximum of 5 metadata fields at a time.

### Metadata kinds

* **Enum**: A classification with options, only one option can be selected at a time
* **DateTime**: A utc ISO datetime 
* **Embedding**: 128 float 32 vector used for similarity
* **String**: A string of less than 500 characters

### Reserved fields

* **tag**: a free text field
* **split**: enum of train-valid-test
* **captureDateTime**: ISO 8601 datetime field. All times must be in UTC

### Custom fields

You can create your own fields from within the app by navigating to the [metadata schema page](https://app.labelbox.com/schema/metadata)

In [None]:
mdo = client.get_data_row_metadata_ontology()

In [None]:
# list all your metadata ontology as a dictionary accessable by id 
metadata_ontologies = mdo.fields_by_id
pprint(metadata_ontologies, indent=2)

In [None]:
# access by name
split_field = mdo.reserved_by_name["split"]
train_field = mdo.reserved_by_name["split"]["train"]

In [None]:
tag_field = mdo.reserved_by_name["tag"]

In [None]:
tag_field

## Construct metadata fields

To construct a metadata field you must provide the Schema Id for the field and the value that will be uploaded. You can either construct a DataRowMetadataField object or specify the Schema Id and value in a dictionary format.





Option 1: Specify metadata with a list of `DataRowMetadataField` objects. This is the recommended option since it comes with validation for metadata fields.

In [None]:
# Construct a metadata field of string kind
tag_metadata_field = lb.DataRowMetadataField(
    name="tag",  # specify the schema name
    value="tag_string", # typed inputs
)

# Construct an metadata field of datetime kind
capture_datetime_field = lb.DataRowMetadataField(
    name="captureDateTime",  # specify the schema id
    value=datetime.utcnow(), # typed inputs
)

# Construct a metadata field of Enums options
split_metadta_field = lb.DataRowMetadataField(
    name="split",  # specify the schema id
    value="train", # typed inputs
)

Option 2: Alternatively, you can specify the metadata fields with dictionary format without declaring the `DataRowMetadataField` objects.


In [None]:
# Construct a dictionary of string metadata
tag_metadata_field_dict = {
    "name": "tag",
    "value": "tag_string",
}

# Construct a dictionary of datetime metadata
capture_datetime_field_dict = {
    "name": "captureDateTime",
    "value": datetime.utcnow(),
}

# Construct a dictionary of Enums options metadata
split_metadta_field_dict = {
    "name": "split",
    "value": "train",
}

## Upload data rows together with metadata

Note: Currently, there is a 30k limit on bulk uploading data rows containing metadata.

In [None]:
# A simple example of uploading Data Rows with metadta
dataset = client.create_dataset(name="Simple Data Rows import with metadata example")

data_row = {"row_data": "https://storage.googleapis.com/labelbox-sample-datasets/Docs/basic.jpg", "global_key": str(uuid4())}
data_row['metadata_fields'] = [tag_metadata_field, capture_datetime_field,  split_metadta_field] 
# Also works with a list of dictionary as specified in Option 2. Uncomment the line below to try. 
# data_row['metadata_fields'] = [tag_metadata_field_dict, capture_datetime_field_dict, split_metadta_field_dict]

task = dataset.create_data_rows([data_row])
task.wait_till_done()

## Accessing metadata

You can examine an individual data row, including its metadata.

In [None]:
data_row = next(dataset.data_rows())
for metadata_field in data_row.metadata_fields:
  print(metadata_field['name'], ":", metadata_field['value'])

You can bulk export metadata given data row IDs

In [None]:
datarows_metadata = mdo.bulk_export([data_row.uid])
len(datarows_metadata)

## Upload/delete/update custom metadata for existing data rows

For a complete tutorial on how to update, upload and delete custom metadata please follow the steps in this [tutorial](https://colab.research.google.com/github/Labelbox/labelbox-python/blob/master/examples/basics/custom_embeddings.ipynb).

