<td>
   <a target="_blank" href="https://labelbox.com" ><img src="https://labelbox.com/blog/content/images/2021/02/logo-v4.svg" width=256/></a>
</td>

<td>
<a href="https://colab.research.google.com/github/Labelbox/labelbox-python/blob/develop/examples/basics/data_row_metadata.ipynb" target="_blank"><img
src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"></a>
</td>

<td>
<a href="https://github.com/Labelbox/labelbox-python/tree/develop/examples/basics/data_row_metadata.ipynb" target="_blank"><img
src="https://img.shields.io/badge/GitHub-100000?logo=github&logoColor=white" alt="GitHub"></a>
</td>

# Data Row Metadata

Metadata is useful to better understand data on the platform to help with labeling review, model diagnostics, and data selection. This **should not be confused with attachments**. Attachments provide additional context for labelers but is not searchable within Catalog.

## Metadata ontology

We use a similar system for managing metadata as we do feature schemas. Metadata schemas are strongly typed to ensure we can provide the best experience in the App. Each metadata field can be uniquely accessed by id. Names are unique within the kind of metadata, reserved or custom. A DataRow can have a maximum of 5 metadata fields at a time.

### Metadata kinds

* **Enum**: A classification with options, only one option can be selected at a time
* **DateTime**: A utc ISO datetime 
* **String**: A string of less than 500 characters

### Reserved fields

* **tag**: a free text field
* **split**: enum of train-valid-test
* **captureDateTime**: ISO 8601 datetime field. All times must be in UTC

### Custom fields

* **Embedding**: 128 float 32 vector used for similarity. To upload custom embeddings use the following [tutorial](https://colab.research.google.com/github/Labelbox/labelbox-python/blob/master/examples/basics/custom_embeddings.ipynb)
* Any metadata kind can be customized

## Setup

In [None]:
!pip install -q "labelbox[data]"

In [None]:
import labelbox as lb
from datetime import datetime
from pprint import pprint
from labelbox.schema.data_row_metadata import DataRowMetadataKind
from uuid import uuid4

In [None]:
# Add your api key
API_KEY = ""
client = lb.Client(api_key=API_KEY)

### Get the current metadata ontology 

In [None]:
mdo = client.get_data_row_metadata_ontology()

In [None]:
# list all your metadata ontology as a dictionary accessable by id
metadata_ontologies = mdo.fields_by_id
pprint(metadata_ontologies, indent=2)

### Access metadata by name

In [None]:
split_field = mdo.reserved_by_name["split"]
split_field

In [None]:
tag_field = mdo.reserved_by_name["tag"]
tag_field

In [None]:
train_field = mdo.reserved_by_name["split"]["train"]
train_field

## Construct metadata fields for existing metadata schemas

To construct a metadata field you must provide the name for the metadata field and the value that will be uploaded. You can either construct a DataRowMetadataField object or specify the name and value in a dictionary format.





Option 1: Specify metadata with a list of `DataRowMetadataField` objects. This is the recommended option since it comes with validation for metadata fields.

In [None]:
# Construct a metadata field of string kind
tag_metadata_field = lb.DataRowMetadataField(
    name="tag",
    value="tag_string",
)

# Construct an metadata field of datetime kind
capture_datetime_field = lb.DataRowMetadataField(
    name="captureDateTime",
    value=datetime.utcnow(),
)

# Construct a metadata field of Enums options
split_metadata_field = lb.DataRowMetadataField(
    name="split",
    value="train",
)


Option 2: You can also specify the metadata fields with dictionary format without declaring the `DataRowMetadataField` objects.


In [None]:
# Construct a dictionary of string metadata
tag_metadata_field_dict = {
    "name": "tag",
    "value": "tag_string",
}

# Construct a dictionary of datetime metadata
capture_datetime_field_dict = {
    "name": "captureDateTime",
    "value": datetime.utcnow(),
}

# Construct a dictionary of Enums options metadata
split_metadata_field_dict = {
    "name": "split",
    "value": "train",
}



## Create a custom metadata schema with their corresponding fields


In [None]:
# Final
custom_metadata_fields = []

# Create the schema for the metadata
number_schema = mdo.create_schema(
  name="numberMetadataCustom",
  kind=DataRowMetadataKind.number
)

# Add fields to the metadata schema
data_row_metadata_fields_number = lb.DataRowMetadataField(
    name=number_schema.name,
    value=5.0
)

custom_metadata_fields.append(data_row_metadata_fields_number)


In [None]:
# Create the schema for an enum metadata
custom_metadata_fields = []

enum_schema = mdo.create_schema(
  name="enumMetadata",
  kind=DataRowMetadataKind.enum,
  options=["option1", "option2"]
)

# Add fields to the metadata schema
data_row_metadata_fields_enum_1 = lb.DataRowMetadataField(
  name=enum_schema.name,
  value="option1"
)
custom_metadata_fields.append(data_row_metadata_fields_enum_1)


data_row_metadata_fields_enum_2 =  lb.DataRowMetadataField(
  name=enum_schema.name,
  value="option2"
)
custom_metadata_fields.append(data_row_metadata_fields_enum_2)



In [None]:
# Inspect the newly created metadata schemas
metadata_ontologies = mdo.fields_by_id
pprint(metadata_ontologies, indent=2)

## Create data rows with metadata

See our [documentation](https://docs.labelbox.com/docs/limits) for information on limits for uploading data rows in a single API operation.

In [None]:
# A simple example of uploading data rows with metadata
dataset = client.create_dataset(name="Simple Data Rows import with metadata example")
global_key = "s_basic.jpg"
data_row = {"row_data": "https://storage.googleapis.com/labelbox-sample-datasets/Docs/basic.jpg", "global_key": global_key}
# This line works with dictionaries as well as schemas and fields created with DataRowMetadataField
data_row['metadata_fields'] =  custom_metadata_fields + [ split_metadata_field , capture_datetime_field_dict, tag_metadata_field ]


task = dataset.create_data_rows([data_row])
task.wait_till_done()
result_task = task.result
print(result_task)

## Update data row metadata

In [None]:
# Get the data row that was uploaded in the previous cell
num_schema = mdo.get_by_name("numberMetadataCustom")

# Update the metadata
updated_metadata = lb.DataRowMetadataField(
  schema_id=num_schema.uid,
  value=10.2
)

# Create data row payload
data_row_payload = lb.DataRowMetadata(
  global_key=global_key,
  fields=[updated_metadata]
)

# Upsert the fields with the update metadata for number-metadata
mdo.bulk_upsert([data_row_payload])

## Update metadata schema

In [None]:
# update a name
number_schema = mdo.update_schema(name="numberMetadataCustom", new_name="numberMetadataCustomNew")

# update an Enum metadata schema option's name, this only applies to Enum metadata schema.
enum_schema = mdo.update_enum_option(
  name="enumMetadata",
  option="option1",
  new_option="option3"
)

## Accessing metadata

You can examine an individual data row, including its metadata.

In [None]:
data_row = next(dataset.data_rows())
for metadata_field in data_row.metadata_fields:
  print(metadata_field['name'], ":", metadata_field['value'])

You can bulk export metadata using data row IDs.

In [None]:
data_rows_metadata = mdo.bulk_export([data_row.uid])
len(data_rows_metadata)

## Delete custom metadata schema 
You can delete custom metadata schema by name. If you wish to delete a metadata schema, uncomment the line below and insert the desired name.

In [None]:
#status = mdo.delete_schema(name="<metadata schema name>")