<td>
   <a target="_blank" href="https://labelbox.com" ><img src="https://labelbox.com/blog/content/images/2021/02/logo-v4.svg" width=256/></a>
</td>

<td>
<a href="https://colab.research.google.com/github/Labelbox/labelbox-python/blob/develop/examples/basics/data_rows.ipynb" target="_blank"><img
src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"></a>
</td>

<td>
<a href="https://github.com/Labelbox/labelbox-python/tree/develop/examples/basics/data_rows.ipynb" target="_blank"><img
src="https://img.shields.io/badge/GitHub-100000?logo=github&logoColor=white" alt="GitHub"></a>
</td>

# Data rows

* Data rows are the assets that are being labeled. We currently support the following asset types:
    * Image
    * Text
    * Video
    * Geospatial / Tiled Imagery
    * Audio
    * Documents 
    * HTML 
    * DICOM 
    * Conversational
* A data row cannot exist without belonging to a dataset.
* Data rows are added to labeling tasks by first attaching them to datasets and then creating batches in projects

In [None]:
!pip install labelbox -q

In [None]:
import labelbox as lb
import uuid
import json

# API Key and Client
Provide a valid api key below in order to properly connect to the Labelbox Client.

In [None]:
# Add your api key
API_KEY = ""
client = lb.Client(api_key=API_KEY)

### Get data rows from projects

In [None]:
# Pick a project with batches that have data rows with global keys
PROJECT_ID = "<PROJECT-ID>"
project = client.get_project(PROJECT_ID)
batches = list(project.batches())
print(batches)
# This is the same as
# -> dataset = client.get_dataset(dataset_id)

### Fetch data rows from project's batches

Batches will need to be exported from your project as a export parameter. Before you can export from a project you will need an ontology attached.

In [None]:
client.enable_experimental = True

batch_ids = [batch.uid for batch in batches]

export_params = {
 "attachments": True,
  "metadata_fields": True,
  "data_row_details": True,
  "project_details": True,
  "performance_details": True,
  "batch_ids" : batch_ids # Include batch ids if you only want to export specific batches, otherwise,
  #you can export all the data without using this parameter
}
filters = {}

# A task is returned, this provides additional information about the status of your task, such as
# any errors encountered
export_task = project.export(params=export_params, filters=filters)
export_task.wait_till_done()

In [None]:
data_rows = []

def json_stream_handler(output: lb.JsonConverterOutput):
  data_row = json.loads(output.json_str)
  data_rows.append(data_row)


if export_task.has_errors():
  export_task.get_stream(
  converter=lb.JsonConverter(),
  stream_type=lb.StreamType.ERRORS
  ).start(stream_handler=lambda error: print(error))

if export_task.has_result():
  export_json = export_task.get_stream(
    converter=lb.JsonConverter(),
    stream_type=lb.StreamType.RESULT
  ).start(stream_handler=json_stream_handler)

In [None]:
# Get single data row
data_row = data_rows[0]
print(data_row)

### Get labels from the data row

In [None]:
print("Associated label(s)", data_row["projects"][project.uid]["labels"])
print("Global key", data_row["data_row"]["global_key"])

### Get data row ids by using global keys

In [None]:
global_key = "<ENTER GLOBAL KEY>"
task = client.get_data_row_ids_for_global_keys([global_key])
print(f"Data row id: {task['results']}")

### Create
* Create a single data row with and without metadata

In [None]:
dataset = client.create_dataset(name="data_rows_demo_dataset")

# It is recommended that you add global keys to your data rows.
dataset.create_data_row(row_data="https://storage.googleapis.com/labelbox-datasets/People_Clothing_Segmentation/jpeg_images/IMAGES/img_0002.jpeg",
                        global_key=str(uuid.uuid4()))

# You can also upload metadata along with your data row
mdo = client.get_data_row_metadata_ontology()
dataset.create_data_row(row_data="https://storage.googleapis.com/labelbox-datasets/People_Clothing_Segmentation/jpeg_images/IMAGES/img_0003.jpeg",
                        global_key=str(uuid.uuid4()),
                        metadata_fields=[
                            lb.DataRowMetadataField(
                              schema_id=mdo.reserved_by_name["tag"].uid,  # specify the schema id
                              value="tag_string", # typed inputs
                            ),
                        ],
                      )

### [Recommended] Bulk create data rows (This is much faster than creating individual data rows)

In [None]:
# Create a dataset
dataset = client.create_dataset(name="data_rows_demo_dataset_2")

uploads = []
# Generate data rows
for i in range(1,9):
    uploads.append({
        "row_data":  f"https://storage.googleapis.com/labelbox-datasets/People_Clothing_Segmentation/jpeg_images/IMAGES/img_000{i}.jpeg",
        "global_key": "TEST-ID-%id" % uuid.uuid1(),
        ## add metadata (optional)
        "metadata_fields": [
            lb.DataRowMetadataField(
                schema_id=mdo.reserved_by_name["tag"].uid,  # specify the schema id
                value="tag_string", # typed inputs
            ),
        ]
    })

task1 = dataset.create_data_rows(uploads)
task1.wait_till_done()
print("ERRORS: " , task1.errors)
print("RESULTS:" , task1.result)

### Create data rows with attachments

In [None]:
task2 = dataset.create_data_rows([{
    "row_data": "https://storage.googleapis.com/labelbox-datasets/People_Clothing_Segmentation/jpeg_images/IMAGES/img_0009.jpeg",
    "global_key": str(uuid.uuid4()),
    "attachments": [
                {
                    "type": "IMAGE_OVERLAY",
                    "value": "https://storage.googleapis.com/labelbox-sample-datasets/Docs/disease_attachment.jpeg"
                },
                {
                    "type": "RAW_TEXT",
                    "value": "IOWA, Zone 2232, June 2022 [Text string]"
                },
                {
                    "type": "TEXT_URL",
                    "value": "https://storage.googleapis.com/labelbox-sample-datasets/Docs/text_attachment.txt"
                },
                {
                    "type": "IMAGE",
                    "value": "https://storage.googleapis.com/labelbox-sample-datasets/Docs/disease_attachment.jpeg"
                },
                {
                    "type": "VIDEO",
                    "value": "https://storage.googleapis.com/labelbox-sample-datasets/Docs/drone_video.mp4"
                },
                {
                    "type": "HTML",
                    "value": "https://storage.googleapis.com/labelbox-sample-datasets/Docs/windy.html"
                },
                {
                    "type": "PDF_URL",
                    "value": "https://storage.googleapis.com/labelbox-datasets/arxiv-pdf/data/99-word-token-pdfs/0801.3483.pdf"
                }
            ]
    }])
print("ERRORS: " , task2.errors)
print("RESULTS:" , task2.result)

### Create data rows using data in your local path

In [None]:
# Local paths
local_data_path = "/tmp/test_data_row.txt"
with open(local_data_path, 'w') as file:
    file.write("sample data")

task3 = dataset.create_data_rows([local_data_path])
print("ERRORS: " , task3.errors)
print("RESULTS:" , task3.result)

In [None]:
# You can mix local files with urls when creating data rows
task4 = dataset.create_data_rows([{
    "row_data": "https://storage.googleapis.com/labelbox-datasets/People_Clothing_Segmentation/jpeg_images/IMAGES/img_0003.jpeg",
    "global_key": str(uuid.uuid4())
    }, {
    "row_data": local_data_path,
    "global_key": str(uuid.uuid4())
    }])
print("ERRORS: " , task4.errors)
print("RESULTS:" , task4.result)

### Update
Only two fields can be updated after a data row is created
1. Global keys 
2. Row data


In [None]:
data_row = client.get_data_row("<data_row_id_to_update>")
new_id = str(uuid.uuid4())
data_row.update(global_key=new_id, row_data="https://storage.googleapis.com/labelbox-datasets/People_Clothing_Segmentation/jpeg_images/IMAGES/img_0005.jpeg")
print(data_row)

### Create a single attachemt on an existing data row

In [None]:
# You can only create one attachment at the time.
data_row.create_attachment(attachment_type="RAW_TEXT",
                           attachment_value="LABELERS WILL SEE THIS ")

### Delete

* Delete a single data row

In [None]:
data_row = client.get_data_row("<data_row_id_to_delete>")
data_row.delete()

* Bulk delete data row objects

In [None]:
# Bulk delete a list of data_rows ( limit: 4K data rows per call)
lb.DataRow.bulk_delete(list(dataset.data_rows()))