<td>
   <a target="_blank" href="https://labelbox.com" ><img src="https://labelbox.com/blog/content/images/2021/02/logo-v4.svg" width=256/></a>
</td>

<td>
<a href="https://colab.research.google.com/github/Labelbox/labelbox-python/blob/master/examples/basics/data_rows.ipynb" target="_blank"><img
src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"></a>
</td>

<td>
<a href="https://github.com/Labelbox/labelbox-python/tree/master/examples/basics/data_rows.ipynb" target="_blank"><img
src="https://img.shields.io/badge/GitHub-100000?logo=github&logoColor=white" alt="GitHub"></a>
</td>

# Data rows

* Data rows are the items that are actually being labeled. We currently support the following:
    * Image
    * Text
    * Video
    * Geospatial / Tiled Imagery
    * Audio
    * Documents (Beta)
    * HTML (Beta)
    * DICOM (Beta)
* A data row is a member of a dataset 
* A data row cannot exist without belonging to a dataset.
* DataRows are added to labeling tasks by first attaching them to datasets and then attaching datasets to projects.

In [None]:
!pip install labelbox

In [None]:
import labelbox as lb
import uuid
import os

* Set the following cell with your data to run this notebook

In [None]:
# Pick a project that has a dataset attached, data has external ids, and there are some labels
# This will modify the project so just pick a dummy one that you don't care about
PROJECT_ID = ""

# API Key and Client
Provide a valid api key below in order to properly connect to the Labelbox Client.

In [None]:
# Add your api key
API_KEY = None
client = lb.Client(api_key=API_KEY)

In [None]:
project = client.get_project(PROJECT_ID)
dataset = next(project.datasets())
# This is the same as
# -> dataset = client.get_dataset(dataset_id)

### Read

In [None]:
data_rows = dataset.data_rows()
data_row = next(data_rows)

In [None]:
# Url
print("Associated dataset", data_row.dataset())
print("Associated label(s)", next(data_row.labels()))
print("External id", data_row.external_id)

In [None]:
# External ids can be a reference to your internal datasets
data_row = dataset.data_row_for_external_id(data_row.external_id)
print(data_row)

### Create
* Create a single data row at a time

In [None]:
dataset = client.create_dataset(name="testing-dataset")
dataset.create_data_row(row_data="https://picsum.photos/200/300")

# It is reccomended that you use external ids but optional.
# These are useful for users to maintain references to a data_row.
dataset.create_data_row(row_data="https://picsum.photos/200/300",
                        external_id=str(uuid.uuid4()))

# You can also upload metadata along with your data_row
mdo = client.get_data_row_metadata_ontology()
dataset.create_data_row(row_data="https://picsum.photos/200/300",
                        external_id=str(uuid.uuid4()),
                        metadata_fields=[
                            lb.DataRowMetadataField(
                              schema_id=mdo.reserved_by_name["tag"].uid,  # specify the schema id
                              value="tag_string", # typed inputs
                            ),
                        ], 
)

* Bulk create data rows (This is much faster than creating individual data rows)

In [None]:
task1 = dataset.create_data_rows([{
    lb.DataRow.row_data: "https://picsum.photos/200/300"
}, {
    lb.DataRow.row_data: "https://picsum.photos/200/300"
}])

In [None]:
# Local paths
local_data_path = '/tmp/test_data_row.txt'
with open(local_data_path, 'w') as file:
    file.write("sample data")

task2 = dataset.create_data_rows([local_data_path])

In [None]:
# You can mix local files with urls
task3 = dataset.create_data_rows([{
    lb.DataRow.row_data: "https://picsum.photos/200/300"
}, local_data_path])

In [None]:
# Note that you cannot set external_ids at this time when uploading from local files.
# To do this you have to first
item_url = client.upload_file(local_data_path)
task4 = dataset.create_data_rows([{
    lb.DataRow.row_data: item_url,
    lb.DataRow.external_id: str(uuid.uuid4())
}])

In [None]:
# You can bulk upload Data Rows with metadata
task5 = dataset.create_data_rows([{
    lb.DataRow.row_data: "https://picsum.photos/200/300",
    lb.DataRow.external_id: str(uuid.uuid4()),
    "metadata_fields": [
      lb.DataRowMetadataField(
        schema_id=mdo.reserved_by_name["tag"].uid,  # specify the schema id
        value="tag_string", # typed inputs
      ),
    ], 
}])

In [None]:
# Blocking wait until complete
task1.wait_till_done()
task2.wait_till_done()
task3.wait_till_done()
task4.wait_till_done()
task5.wait_till_done()

print(task1.status, task2.status, task3.status, task4.status, task5.status)

### Update

In [None]:
# Useful for resigning urls
new_id = str(uuid.uuid4())
data_row.update(external_id=new_id)
print(data_row.external_id, new_id)

In [None]:
# We can also create attachments
# Attachments are visible for all projects connected to the data_row
data_row.create_attachment(attachment_type="TEXT",
                           attachment_value="LABELERS WILL SEE THIS ")
# See more information here:
# https://docs.labelbox.com/reference/type-image
# Note that attachment_value must always be a string (url to a video/image or a text value to display)

<AssetAttachment ID: ckporcvj61dni0y632e6cb217>

### Delete

In [None]:
data_row.delete()
# Will remove from the dataset too

In [None]:
# Bulk delete a list of data_rows (in this case all of them we just uploaded)
lb.DataRow.bulk_delete(list(dataset.data_rows()))