# Data rows

* Data rows are the items that are actually being labeled. We currently support the following:
    * Image
    * Text
    * Video
* A data row is a member of a dataset 
* A data row cannot exist without belonging to a dataset.
* DataRows are added to labeling tasks by first attaching them to datasets and then attaching datasets to projects.

In [None]:
!pip install labelbox

In [1]:
from labelbox import DataRow, Client
from getpass import getpass
import uuid
import os

In [2]:
# If you don't want to give google access to drive you can skip this cell
# and manually set `API_KEY` below.

COLAB = "google.colab" in str(get_ipython())
if COLAB:
    !pip install colab-env -qU
    from colab_env import envvar_handler
    envvar_handler.envload()

API_KEY = os.environ.get("LABELBOX_API_KEY")
if not os.environ.get("LABELBOX_API_KEY"):
    API_KEY = getpass("Please enter your labelbox api key")
    if COLAB:
        envvar_handler.add_env("LABELBOX_API_KEY", API_KEY)

* Set the following cell with your data to run this notebook

In [3]:
# Pick a project that has a dataset attached, data has external ids, and there are some labels
# This will modify the project so just pick a dummy one that you don't care about
PROJECT_ID = "ckpnfquwy0kyg0y8t9rwb99cz"
# Only update this if you have an on-prem deployment
ENDPOINT = "https://api.labelbox.com/graphql"

In [4]:
client = Client(api_key=API_KEY, endpoint=ENDPOINT)

In [5]:
project = client.get_project(PROJECT_ID)
dataset = next(project.datasets())
# This is the same as
# -> dataset = client.get_dataset(dataset_id)

### Read

In [6]:
data_rows = dataset.data_rows()
data_row = next(data_rows)

In [7]:
# Url
print("Associated dataset", data_row.dataset())
print("Associated label(s)", next(data_row.labels()))
print("External id", data_row.external_id)

Associated dataset <Dataset {'created_at': datetime.datetime(2021, 6, 8, 2, 40, 10, tzinfo=datetime.timezone.utc), 'description': '', 'name': 'image_mea_dataset', 'uid': 'ckpnfqv6g1rvb0ybt85hjephs', 'updated_at': datetime.datetime(2021, 6, 8, 2, 40, 10, tzinfo=datetime.timezone.utc)}>
Associated label(s) <Label {'agreement': None, 'benchmark_agreement': None, 'created_at': datetime.datetime(2021, 6, 8, 2, 42, 11, tzinfo=datetime.timezone.utc), 'is_benchmark_reference': False, 'label': '{"objects":[{"featureId":"ckpnftdgo00013h693jxji4wa","schemaId":"ckpnfqw600kyt0y8tgwsb01xg","title":"person","value":"person","color":"#ff0000","bbox":{"top":1044,"left":1460,"height":265,"width":118},"instanceURI":"https://api.labelbox.com/masks/feature/ckpnftdgo00013h693jxji4wa"},{"featureId":"ckpo2bsq800013h69mi1w6xz1","schemaId":"ckpnfqw610kyx0y8t4hotc6ld","title":"car","value":"car","color":"#00ffff","instanceURI":"https://api.labelbox.com/masks/feature/ckpo2bsq800013h69mi1w6xz1"}],"classifications"

In [8]:
# External ids can be a reference to your internal datasets
data_row = dataset.data_row_for_external_id(data_row.external_id)
print(data_row)

<DataRow {'created_at': datetime.datetime(2021, 6, 8, 2, 40, 10, tzinfo=datetime.timezone.utc), 'external_id': '3b983504-bfbd-4c26-8719-8ef2d5a2c14f', 'media_attributes': {'width': 2560, 'height': 1707, 'mimeType': 'image/jpeg'}, 'row_data': 'https://upload.wikimedia.org/wikipedia/commons/thumb/0/08/Kitano_Street_Kobe01s5s4110.jpg/2560px-Kitano_Street_Kobe01s5s4110.jpg', 'uid': 'ckpnfqvcb0t2o0yane73d3whi', 'updated_at': datetime.datetime(2021, 6, 9, 0, 51, tzinfo=datetime.timezone.utc)}>


### Create
* Create a single data row at a time

In [9]:
dataset = client.create_dataset(name="testing-dataset")
dataset.create_data_row(row_data="https://picsum.photos/200/300")

# It is reccomended that you use external ids but optional.
# These are useful for users to maintain references to a data_row.
dataset.create_data_row(row_data="https://picsum.photos/200/300",
                        external_id=str(uuid.uuid4()))

<DataRow ID: ckporcoee1c7s0z7fha6l5x0d>

* Bulk create data rows (This is much faster than creating individual data rows)

In [10]:
task1 = dataset.create_data_rows([{
    DataRow.row_data: "https://picsum.photos/200/300"
}, {
    DataRow.row_data: "https://picsum.photos/200/300"
}])

In [11]:
# Local paths
local_data_path = '/tmp/test_data_row.txt'
with open(local_data_path, 'w') as file:
    file.write("sample data")

task2 = dataset.create_data_rows([local_data_path])

In [12]:
# You can mix local files with urls
task3 = dataset.create_data_rows([{
    DataRow.row_data: "https://picsum.photos/200/300"
}, local_data_path])

In [13]:
# Note that you cannot set external_ids at this time when uploading from local files.
# To do this you have to first
item_url = client.upload_file(local_data_path)
task4 = dataset.create_data_rows([{
    DataRow.row_data: item_url,
    DataRow.external_id: str(uuid.uuid4())
}])

In [14]:
# Blocking wait until complete
task1.wait_till_done()
task2.wait_till_done()
task3.wait_till_done()
task4.wait_till_done()

print(task1.status, task2.status, task3.status, task4.status)

COMPLETE COMPLETE COMPLETE COMPLETE


### Update

In [15]:
# Useful for resigning urls
new_id = str(uuid.uuid4())
data_row.update(external_id=new_id)
print(data_row.external_id, new_id)

337e90de-c13c-48be-a87d-94d331b5e9a7 337e90de-c13c-48be-a87d-94d331b5e9a7


In [16]:
# We can also create attachments
# Attachments are visible for all projects connected to the data_row 
data_row.create_attachment(attachment_type="TEXT", attachment_value="LABELERS WILL SEE THIS ")
# See more information here:
# https://docs.labelbox.com/reference/type-image
# Note that attachment_value must always be a string (url to a video/image or a text value to display)

<AssetAttachment ID: ckporcvj61dni0y632e6cb217>

### Delete

In [17]:
data_row.delete()
# Will remove from the dataset too

In [18]:
# Bulk delete a list of data_rows (in this case all of them we just uploaded)
DataRow.bulk_delete(list(dataset.data_rows()))