## Overview of Labelbox

Labelbox is designed to streamline the process of creating, managing and training data for intelligent applications. It provides tools and workflows for annotating, managing, and interating over data, enabling data scientists and machine learning engineers to build high-quality training datasets efficiently.

Here's an overview of the main components and features of the Labelbox application:

1. **Data Import**: Labelbox allows users to import various types of data, including images, videos, text, and conversational data, from different sources such as local files, cloud storage (e.g., Google Cloud Storage, Amazon S3), and public URLs.

2. **Data Annotation**: Users can create and customize labeling projects for different types of data. Labeling tools include bounding boxes, polygons, keypoints, and segmentation masks, among others. Users can define custom label schemas and workflows tailored to their specific use cases.

3. **Automation and Model Integration**: Labelbox provides unique functionality to automate the annotation process, and it allows users to significantly speed up workflows. It provides access to a variety of pre-built labeling models, such as those for object detection and text classification (partial functionality is available in the UI) . Moreover, Labelbox empowers users to assess the performance of their custom models, enabling them to fine-tune and optimize their machine learning pipelines.

Labelbox also provides an SDK for customizing workflows, and automating tasks.




---



**Instructions:** Pick one of the Labelbox components and write a short demo demonstrating a specific functionality using the SDK. You have freedom to choose any functionality within each of the components.

**Deliverables**
   - Use a copy of this colab notebook to document the functionality. The demo should be an end-to-end runable example.
   - Please upload this notebook to a github repository and share the notebook github link with the team.



---



Labelbox package setup

In [None]:
!pip install -q "labelbox[data]"

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m238.8/238.8 kB[0m [31m5.2 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m4.5/4.5 MB[0m [31m19.5 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hcanceled[31mERROR: Operation cancelled by user[0m[31m
[0m

In [None]:
import labelbox as lb

ModuleNotFoundError: No module named 'labelbox'

In [None]:
# Add your API key
API_KEY = ""
# To get your API key go to your Labelbox account --> Workspace settings -> API -> Create API Key
client = lb.Client(api_key=API_KEY)

# Batch SDK Tutorial

After you have imported your dataset into Labelbox Catalog, created a project in LabelBox Annotate, you'll need to send data rows from Catalog to Annotate and the best way to do that is to create a batch of data rows and send that batch to a labeling project.

In [None]:
import base64
from IPython.display import Image, display
import matplotlib.pyplot as plt

def mm(graph):
    graphbytes = graph.encode("utf8")
    base64_bytes = base64.b64encode(graphbytes)
    base64_string = base64_bytes.decode("ascii")
    display(Image(url="https://mermaid.ink/img/" + base64_string))

mm("""
flowchart TB;
    A[Dataset 1] -->|batch| P(((Batch 1))) -->|add data rows| B(Label Queue)

    C[Dataset 2]
    B(Label queue) --> F(Editor) --> G(Workflow tasks)  --> H>Done] -->|export annotations| I[(Customer Database)]

    J(Ontology A) -->|connect ontology| Project_A

    O(Ontology B)

    K(Label queue) --> L(Editor) --> M(Workflow tasks)  --> N(Done)

    P(((Batch 1))) ~~~ C(Label Queue)

    subgraph fa:fa-layer-group Catalog

    A[(Dataset 1)]
    C[(Dataset 2)]


    end

    subgraph fa:fa-tags Annotate

    Project_A
    Project_B
    J{{Ontology A}}
    O{{Ontology B}} --> Project_B

    end

    subgraph Project_A


    B[[Label queue]]
    F{Editor}
    G((Workflow tasks))
    H>Done]

    end

    subgraph Project_B

    K[[Label queue]]
    L{Editor}
    M((Workflow tasks))
    N>Done]

    end



""")

In this tutorial, you'll learn how to:


**Batch Creation**
1.   Create a batch
2.   Create multiple batches
3.   Create batches from a dataset

**Batch Management**
1.   Get a batch
2.   Export the data rows
3.   Remove queued data rows
4.   Delete the labels
5.   Delete a batch
6.   Batch attributes

all via the Python SDK.


# **Batch Creation**

# 1. Create a batch

In this section, you'll create a batch in Catalog so it can be sent to a labeling project in Annotate. The two methods are;

### Global Key Method

The first method of creating a batch is by `global_keys`:

```python
project.create_batch(
  name="<unique_batch_name>",
  global_keys=["<key1>", "<key2>", "<key3>"],
  priority="<5>",
)

```

Replace all <xyz> placeholders with appropriate values:

*   unique_batch_name: Your batch name
*   global_keys: The global keys you would like to include in batch
*   priority(optional): Priority in which batch should be labeled, integer value should be between 1(highest) and 5 (lowest) if no value is provided, the batch will be assigned the lowest priority.





### Data Row Method

The second method of creating a batch is by `data_rows`:

```python
project.create_batch(
  name="<unique_batch_name>",
  data_rows=["<data_row_id>", "<data_row_id>"],
  priority=1,
  consensus_settings={"number_of_labels": 3, "coverage_percentage": 0.1}
)
```

Replace all <xyz> placeholders with appropriate values:

*   unique_batch_name: Your batch name
*   data_row_id: The data row id you would like to include in batch
*   priority(optional): Priority in which batch should be labeled, integer value should be between 1(highest) and 5 (lowest) if no value is provided, the batch will be assigned the lowest priority.
*   consensus_settings: If your project uses consensus, you can utilize consensus in your batch with a specified coverage and votes


# 2. Create multiple batches

The batch method is able to accept up to one million data rows. Due to the maximum batch size of one hundred thousand data rows, multiple batches will be created for any batches created with more than one hundred thousand  data rows.

Similar to the singular batch creation, multiple batch creation has two methods.

### Global Keys Method

The first method of creating multiple batches is by `global_keys`:

```python
task = project.create_batches(
  name_prefix="demo-create-batches-",
  global_keys=global_keys,
  priority=5
)

print("Errors: ", task.errors())
print("Result: ", task.result())
```

Replace all <xyz> placeholders with appropriate values:

* 	name_prefix: The batch name prefix that will precede a sequential three-digit number i.e.: batches-001, batches-002, batches-003, etc.
*   global_keys: The global keys you would like to include in batches
*   priority(optional): Priority in which batch should be labeled, integer value should be between 1(highest) and 5 (lowest) if no value is provided, the batch will be assigned the lowest priority.

### Data Rows Method

The second method of creating multiple batches is by `data_rows`:

```python
task = project.create_batches(
  name_prefix="demo-create-batches-",
  data_rows=data_rows,
  priority=5
)

print("Errors: ", task.errors())
print("Result: ", task.result())
```

Replace all <xyz> placeholders with appropriate values:

*   name_prefix: The batch name prefix that will precede a sequential three-digit number i.e.: batches-001, batches-002, batches-003, etc.
*   data_rows: The data rows you would like to include in batches
*   priority(optional): Priority in which batch should be labeled, integer value should be between 1(highest) and 5 (lowest) if no value is provided, the batch will be assigned the lowest priority.

# 3. Create batch(es) from dataset

If you wish to create batches in a project from all the data rows of a dataset, you can use create batch from dataset method. If dataset contains more than one hundred thousand rows of data, multiple batches will be created.



```python
dataset = client.get_dataset("<dataset_id>")

task = project.create_batches_from_dataset(
    name_prefix="demo-dataset-",
    dataset_id=dataset.uid,
    priority=5
)

print("Errors: ", task.errors())
print("Result: ", task.result())

```

Replace all placeholders with appropriate values:

*    name_prefix: The batch name prefix that will precede a sequential three digit-number i.e.: batches-001, batches-002, batches-003, etc.
*    data_id: The dataset UID would like to include in batches
*    priority(optional): Priority in which batch should be labeled, integer value should be between 1(highest) and 5 (lowest) if no value is provided, the batch will be assigned the lowest priority.


# **Batch Management**

# 1. Get a batch

Once you create a batch, they become accessible as objects of the ```
Project``` class.

There are several methodologies in retrieving your batches, depending on the scope of your labelling needs.

### Get a Project

This method retrieves projects:

```python
project = client.get_project("<project_id>")
```

### Get all the batches

This method retrieves all batches:

[Note: This will return a paginated collection of Batch objects]

```python
batches = project.batches()
```

After you have generated a paginated collection, you can convert it to list for easier use:

```python
list(batches)
```

### Get one batch

This method retrieves a singular batch:

```python
batch = next(batches)
```

### Inspect all batches

This method prints all batches:

```python
for batch in batches:
  print(batch)
```





# 2. Export data rows

This method returns a generator that produces all data rows currently in the batch:


```python
data_rows = batch.export_data_rows(include_metadata=False)
```

To inspect the data rows:

```python
for data_row in data_rows:
  print(data_row)
```





# 3. Remove queued data rows

This method removes queued data rows from the batch and consequently the labeling queue of the project:

```python
batch.remove_queued_data_rows()
```



# 4. Delete labels

This method deletes the labels made on the data rows in the batch and re-queues the data rows for labeling:

```python
batch.delete_labels()
```

Alternatively, you can re-queue the data with labels as templates:

```python
batch.delete_labels(set_labels_as_template=True)
```




# 5. Delete batch

This method deletes batches:

[Note: Before a batch can be deleted, all labels made on the data rows in the batch must be deleted]

```python
batch.delete()
```



# 6. Batch attributes

There are several methodologies in retrieving the attribute of your batches, depending on your needs.

### Get batch name



```python
batch.name
```

### When batch was created

```python
batch.created_at
```

### When batch was last updated

```python
batch.updated_at
```

### The size, number of data rows in the batch integer

```python
batch.size
```

### The relationship to the Project oject



```python
project = batch.project()
```






