<td>
   <a target="_blank" href="https://labelbox.com" ><img src="https://labelbox.com/blog/content/images/2021/02/logo-v4.svg" width=190/></a>
</td>

<td>
<a href="https://colab.research.google.com/github/Labelbox/labelbox-python/blob/master/examples/basics/batches.ipynb" target="_blank"><img
src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"></a>
</td>

<td>
<a href="https://github.com/Labelbox/labelbox-python/blob/master/examples/basics/batches.ipynb" target="_blank"><img
src="https://img.shields.io/badge/GitHub-100000?logo=github&logoColor=white" alt="GitHub"></a>
</td>

## Batches
https://docs.labelbox.com/docs/batches

* A Batch is collection of datarows picked out of a Data Set.
* A Datarow cannot be part of more than one batch in a project.
* Batches work for all data types, but there should only be one data type per batch.
* Batches may not be shared between projects.
* Batches may have Datarows from multiple Datasets.
* Datarows can only be attached to a Project as part of a single Batch.
* Currently only benchmarks quality settings is supported in batch projects
* You can set priority for each Batch.

In [None]:
!pip install "labelbox[data]"

In [None]:
import labelbox as lb
import random
import uuid

# API Key and Client
Provide a valid api key below in order to properly connect to the Labelbox Client.

In [None]:
# Add your api key
API_KEY = None
client = lb.Client(api_key=API_KEY)

In [None]:
# Create a dataset
dataset = client.create_dataset(name="Demo-Batches-Colab")

uploads = []
# Generate data rows
for i in range(1,9):
    uploads.append({
        'row_data':  f"https://storage.googleapis.com/labelbox-datasets/People_Clothing_Segmentation/jpeg_images/IMAGES/img_000{i}.jpeg",
        "global_key": "TEST-ID-%id" % uuid.uuid1(),
    })

data_rows = dataset.create_data_rows(uploads)
data_rows.wait_till_done()
print("ERRORS: " , data_rows.errors)
print("RESULT URL: ", data_rows.result_url)



ERRORS:  []
RESULT URL:  https://storage.labelbox.com/cl3ahv73w1891087qbwzs3edd%2Fdata-row-imports-results%2Fcl94vbi4g4ijw07y07shadc7k_cl94vbjcv1dh707y2f2g4cwh4.json?Expires=1665619363366&KeyName=labelbox-assets-key-3&Signature=VJOqZZUjnnT4s45on3zzYdcagOs


# Setup batch project

In [None]:
# Project defaults to batch mode with benchmark quality settings if the queue mode argument is not provided
# Queue mode will be deprecated once dataset mode is deprecated 

# Create a batch project with benchmark quality control. Consensus is currentely not supported with Batches
project = client.create_project( name="Demo-Batches-Project",                                 
                                  queue_mode=lb.QueueMode.Batch,
                                  auto_audit_percentage=1,
                                  auto_audit_number_of_labels=1,
                                  media_type=lb.MediaType.Image
                                )
print("Project Name:", project.name ,
      " Project Id:", project.uid  )

Project Name: Demo-Batches-Project  Project Id: cl94vbpr849gg08ytd6rd423x


### Select all data rows from the dataset created earlier that will be added to the batch.


In [None]:
data_row_ids = [dr.uid for dr in dataset.export_data_rows()]
print("Number of data row ids:", len(data_row_ids))

Number of data row ids: 8


## Select a random sample
This method is useful if you have large datasets and only want to work with a handful of data rows

In [None]:
sample = random.sample(data_row_ids, 4)

# Batch Manipulation

### Create a Batch:


In [None]:
batch = project.create_batch(
  "Demo-First-Batch", # Each batch in a project must have a unique name
  sample, # A list of data rows or data row ids
  5 # priority between 1(Highest) - 5(lowest)
)
# number of data rows in the batch
print("Number of data rows in batch: ", batch.size)

Number of data rows in batch:  4


### Manage Batches
Note: You can view your batch data through the *Data Rows tab*

In [None]:
## Export the data row ids
data_rows = [dr for dr in batch.export_data_rows()]
print("Data Rows in Batch: ", data_rows)

## List the batches in your project
for batch in project.batches():
    print("Batch Name: ", batch.name , "  Batch ID:", batch.uid)


Data Rows in Batch:  [<DataRow ID: cl94vbjjn0wb8075i74pcb54v>, <DataRow ID: cl94vbjjn0wb0075i9i542qtp>, <DataRow ID: cl94vbjjn0waw075i11rser6b>, <DataRow ID: cl94vbjjn0was075igz3789ff>]
Batch Name:  Demo-First-Batch   Batch ID: 39f3fb00-49c1-11ed-ad8c-4b0085ccfe8b


# Archive Batch

In [None]:
# archiving batch removes all queued data rows from the project
batch.remove_queued_data_rows()

## Clean up 
Uncomment and run the cell below to delete the batch, dataset and/or project created in this demo

In [None]:
# Delete Batch
#batch.delete()

# Delete Project
#project.delete()

# Delete DataSet
#dataset.delete()