<td>
   <a target="_blank" href="https://labelbox.com" ><img src="https://labelbox.com/blog/content/images/2021/02/logo-v4.svg" width=256/></a>
</td>

<td>
<a href="https://colab.research.google.com/github/Labelbox/labelbox-python/blob/develop/examples/basics/data_rows.ipynb" target="_blank"><img
src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"></a>
</td>

<td>
<a href="https://github.com/Labelbox/labelbox-python/tree/develop/examples/basics/data_rows.ipynb" target="_blank"><img
src="https://img.shields.io/badge/GitHub-100000?logo=github&logoColor=white" alt="GitHub"></a>
</td>

# Data rows

* Data rows are the items that are actually being labeled. We currently support the following:
    * Image
    * Text
    * Video
    * Geospatial / Tiled Imagery
    * Audio
    * Documents (Beta)
    * HTML (Beta)
    * DICOM (Beta)
* A data row is a member of a dataset 
* A data row cannot exist without belonging to a dataset.
* DataRows are added to labeling tasks by first attaching them to datasets and then attaching datasets to projects.

In [1]:
!pip install -q labelbox

[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/185.5 KB[0m [31m?[0m eta [36m-:--:--[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m185.5/185.5 KB[0m [31m5.5 MB/s[0m eta [36m0:00:00[0m
[?25h

In [2]:
from labelbox import DataRow, Client
from labelbox.schema.data_row_metadata import DataRowMetadataField
import uuid
import os

# API Key and Client
Provide a valid api key below in order to properly connect to the Labelbox Client.

In [4]:
# Add your api key
API_KEY = None
client = Client(api_key=API_KEY)
project = client.get_project(project_id = "<insert any project ID for testing>")

### Read

In [None]:
batches = project.batches() #get all batches in a project
data_rows = []

#get all data rows in a batch project
for batch in batches:
    data_rows.extend(list(batch.export_data_rows()))

data_row = data_rows[0] #Sample one data row

In [None]:
# Example information available in each Data Row
print("Associated dataset", data_row.dataset())
print("Associated label(s)", list(data_row.labels()))
print("External id", data_row.external_id)
print("Global id", data_row.global_key)

### Create Dataset + Data Row
* Create a single data row at a time

In [5]:
dataset = client.create_dataset(name="testing-dataset")
dataset.create_data_row(row_data="https://picsum.photos/200/300")

# It is reccomended that you use global keys or external ids but it is optional.
# These are useful for users to maintain references to a data_row.
dataset.create_data_row(
    row_data="https://picsum.photos/200/300",
    external_id=str(uuid.uuid4())
    )

# You can also upload metadata along with your data_row
mdo = client.get_data_row_metadata_ontology()
dataset.create_data_row(row_data="https://picsum.photos/200/300",
                        external_id=str(uuid.uuid4()),
                        metadata_fields=[
                            DataRowMetadataField(
                              schema_id=mdo.reserved_by_name["tag"].uid,  # specify the schema id
                              value="tag_string", # typed inputs
                            ),
                        ], 
)

<DataRow ID: cldp1qlcg0ps3071424p61jbl>

* Bulk create data rows (This is much faster than creating individual data rows)

In [None]:
data_rows = [
    {
        "row_data": "https://picsum.photos/id/829/200/300",
        "global_key": str(uuid.uuid4()),
        "external_ids": str(uuid.uuid4())
    }
]
bulk_create_data_row_task = dataset.create_data_rows(data_rows)
bulk_create_data_row_task.wait_till_done() #(Optional) blocking call for synchronous operations and debugging

In [None]:
# Local paths
local_data_path = '/tmp/test_data_row.txt'
with open(local_data_path, 'w') as file:
    file.write("sample data")

local_file_task = dataset.create_data_rows([local_data_path])
local_file_task.wait_till_done() #(Optional) blocking call for synchronous operations and debugging

In [None]:
# You can mix local files with urls
mix_task = dataset.create_data_rows([{
    "row_data": "https://picsum.photos/200/300",
    "global_key": str(uuid.uuid4())
}, local_data_path])

mix_task.wait_till_done() #(Optional) blocking call for synchronous operations and debugging

In [None]:
# Note that you cannot set external_ids at this time when uploading from local files.
# To do this you have to first
item_url = client.upload_file(local_data_path)
example_task = dataset.create_data_rows([{
    "row_data": item_url,
    "global_key": str(uuid.uuid4())
}])
example_task.wait_till_done() #(Optional) blocking call for synchronous operations and debugging

In [None]:
# You can bulk upload Data Rows with metadata
datarow_metadata = dataset.create_data_rows([{
    "row_data": "https://picsum.photos/200/300",
    "global_key": str(uuid.uuid4()),
    "metadata_fields": [
      DataRowMetadataField(
        schema_id=mdo.reserved_by_name["tag"].uid,  # specify the schema id
        value="tag_string", # typed inputs
      ),
    ], 
}])
datarow_metadata.wait_till_done() #(Optional) blocking call for synchronous operations and debugging

In [None]:
# You can bulk upload Data Rows with metadata
datarow_attachments = dataset.create_data_rows([{
    DataRow.row_data: "https://picsum.photos/200/300",
    DataRow.external_id: str(uuid.uuid4()),
    "metadata_fields": [
      DataRowMetadataField(
        schema_id=mdo.reserved_by_name["tag"].uid,  # specify the schema id
        value="tag_string", # typed inputs
      ),
    ], 
}])
datarow_metadata.wait_till_done() #(Optional) blocking call for synchronous operations and debugging

In [None]:
# To see the status or errors after calling wait_till_done()
print(datarow_metadata.status, datarow_metadata.errors)

### Update

In [None]:
# Useful for resigning urls
new_id = str(uuid.uuid4())
data_row.update(external_id=new_id)
print(data_row.external_id, new_id)

ff84a2a0-7fd0-4844-bde3-da8daa798b28 ff84a2a0-7fd0-4844-bde3-da8daa798b28


In [None]:
# Create attachments on data rows
# Attachments are visible for all projects connected to the data_row
data_row.create_attachment(attachment_type="TEXT",
                           attachment_value="LABELERS WILL SEE THIS")
# See more information here:
# https://docs.labelbox.com/reference/type-image
# Note that attachment_value must always be a string (url to a video/image or a text value to display)

<AssetAttachment ID: cldm8x4y30wlx07399urs0yg1>

### Delete

In [None]:
data_row.delete()
# Will remove from the dataset too

In [None]:
# Bulk delete a list of data_rows (in this case all of them we just uploaded)
DataRow.bulk_delete(list(dataset.data_rows()))