<td>
   <a target="_blank" href="https://labelbox.com" ><img src="https://labelbox.com/blog/content/images/2021/02/logo-v4.svg" width=190/></a>
</td>

<td>
<a href="https://colab.research.google.com/github/Labelbox/labelbox-python/blob/develop/examples/basics/basics.ipynb" target="_blank"><img
src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"></a>
</td>

<td>
<a href="https://github.com/Labelbox/labelbox-python/tree/develop/examples/basics/basics.ipynb" target="_blank"><img
src="https://img.shields.io/badge/GitHub-100000?logo=github&logoColor=white" alt="GitHub"></a>
</td>

# Basic project setup

This notebook is used to go over the basic of the Python SDK, what a db object is, and how to interact with it. 



In [None]:
!pip install labelbox

In [2]:
from labelbox import Project, Dataset, Client, DataRow
from labelbox.schema.queue_mode import QueueMode
from labelbox.schema.media_type import MediaType
import random
import uuid
import os

# API Key and Client
Provide a valid api key below in order to properly connect to the Labelbox Client.

In [3]:
# Add your api key
API_KEY=None
# To get your API key go to: Workspace settings -> API -> Create API Key
client = Client(api_key=API_KEY)

In [4]:
# For the purpose of this demo get a single project/dataset id from your organization

# Get a single Project id
projects = client.get_projects()
project_id=list(projects)[0].uid
project_name=list(projects)[0].name
print("Project ID: ", project_id)
print("Project Name:", project_name)
print("Number of projects in your org:", len(list(projects)))

print("-" * 40)

# Get a single dataset id
datasets = client.get_datasets()
dataset_id = list(datasets)[0].uid
dataset_name = list(datasets)[0].name
print("Dataset ID: ", dataset_id)
print("Dataset Name:" , dataset_name)
print("Number of datasets in your org:", len(list(datasets)))

Project ID:  cl9rmkr5a4hiy07v5ey34ahtk
Project Name: label_import_project_demo
Number of projects in your org: 88
----------------------------------------
Dataset ID:  cl9rmksvo3wv207y6h158giqo
Dataset Name: annotation_import_demo_dataset
Number of datasets in your org: 78


In [5]:
# Fetch the project and dataset by using the IDs fetched in the previous cell
project = client.get_project(project_id)
dataset = client.get_dataset(dataset_id)

In [6]:
print("Project: ", project)
print("Dataset: ", dataset)

Project:  <Project {'auto_audit_number_of_labels': 1, 'auto_audit_percentage': 1, 'created_at': datetime.datetime(2022, 10, 27, 22, 16, 33, tzinfo=datetime.timezone.utc), 'description': '', 'last_activity_time': datetime.datetime(2022, 10, 27, 22, 16, 46, tzinfo=datetime.timezone.utc), 'media_type': <MediaType.Image: 'IMAGE'>, 'name': 'label_import_project_demo', 'queue_mode': <QueueMode.Batch: 'BATCH'>, 'setup_complete': datetime.datetime(2022, 10, 27, 22, 16, 45, tzinfo=datetime.timezone.utc), 'uid': 'cl9rmkr5a4hiy07v5ey34ahtk', 'updated_at': datetime.datetime(2022, 10, 27, 22, 16, 46, tzinfo=datetime.timezone.utc)}>
Dataset:  <Dataset {'created_at': datetime.datetime(2022, 10, 27, 22, 16, 36, tzinfo=datetime.timezone.utc), 'description': '', 'name': 'annotation_import_demo_dataset', 'row_count': 5, 'uid': 'cl9rmksvo3wv207y6h158giqo', 'updated_at': datetime.datetime(2022, 10, 27, 22, 16, 37, tzinfo=datetime.timezone.utc)}>


### Fields
* All db objects have fields (look at the source code to see them https://github.com/Labelbox/labelbox-python/blob/develop/labelbox/schema/project.py)
* These fields are attributes of the object

In [7]:
print(project.name)
print(dataset.name)

label_import_project_demo
annotation_import_demo_dataset


* Fields can be updated. This will be reflected server side (you will see it in labelbox) 

In [8]:
project.update(description="new description field")
print(project.description)

new description field


### Pagination
* Queries that return a list of database objects are return as a PaginatedCollection
* Limits the data that is being returned for better performance

In [9]:
labels_paginated_collection = project.labels()
print("Type of collection: ", type(labels_paginated_collection))

# A paginated collection can be parsed by using list()
# list(paginated...) should be avoided for queries that could return more than a dozen results
print("Number of labels :", len(list(labels_paginated_collection)))

Type of collection:  <class 'labelbox.pagination.PaginatedCollection'>
Number of labels : 0


In [10]:
# Note that if you selected a `project_id` without any labels this will raise `StopIteration`
# Iterate over the paginated collection
try: 
  single_label = next(project.labels())
  print(single_label)
except StopIteration: 
  print("Project has no labels !")

Project has no labels !


### Query parameters
* Query with the following conventions:
    * `DbObject.Field`

In [11]:
datasets = client.get_datasets(where=Dataset.name == dataset_name)

projects = client.get_projects(
    where=((Project.name == project_name) &
           (Project.description == "new description field")))

# The above two queries return PaginatedCollections because the filter parameters aren't guaranteed to be unique.
# So even if there is one element returned it is in a paginatedCollection.
print(projects)
print(next(projects, None))
print(next(projects, None))
print(next(projects, None))
# We can see there is only one.

<labelbox.pagination.PaginatedCollection object at 0x7f463f4e70d0>
<Project {'auto_audit_number_of_labels': 1, 'auto_audit_percentage': 1, 'created_at': datetime.datetime(2022, 10, 27, 22, 16, 33, tzinfo=datetime.timezone.utc), 'description': 'new description field', 'last_activity_time': datetime.datetime(2022, 10, 28, 13, 20, 59, tzinfo=datetime.timezone.utc), 'media_type': <MediaType.Image: 'IMAGE'>, 'name': 'label_import_project_demo', 'queue_mode': <QueueMode.Batch: 'BATCH'>, 'setup_complete': datetime.datetime(2022, 10, 27, 22, 16, 45, tzinfo=datetime.timezone.utc), 'uid': 'cl9rmkr5a4hiy07v5ey34ahtk', 'updated_at': datetime.datetime(2022, 10, 28, 13, 20, 59, tzinfo=datetime.timezone.utc)}>
None
None


### Querying Limitations
* The DbObject used for the query must be the same as the DbObject returned by the querying function.  
* The below query is not valid since get_project returns a project not a dataset
>  `>>> projects = client.get_projects(where = Dataset.name == "dataset_name")`


# Relationships between projects and batches/datasets



In [21]:
# Since the project we created only has batches, we can't query for datasets. 
# sample_project_datasets = project.datasets() --> Run if project is in dataset mode
sample_project_batches = project.batches()

list(sample_project_batches)

for b in sample_project_batches:
  print(f" Name of project : {b.project().name}")
  print(f" Name of batches in project: {b.name}")

 Name of project : label_import_project_demo
 Name of batches in project: first-batch-LI-demo
