<td>
   <a target="_blank" href="https://labelbox.com" ><img src="https://labelbox.com/blog/content/images/2021/02/logo-v4.svg" width=190/></a>
</td>

<td>
<a href="https://colab.research.google.com/github/Labelbox/labelbox-python/blob/develop/examples/basics/basics.ipynb" target="_blank"><img
src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"></a>
</td>

<td>
<a href="https://github.com/Labelbox/labelbox-python/tree/develop/examples/basics/basics.ipynb" target="_blank"><img
src="https://img.shields.io/badge/GitHub-100000?logo=github&logoColor=white" alt="GitHub"></a>
</td>

# Basic project/datasets overview

This notebook is used to go over the basic of the Python SDK, such as what a db object is, and how to interact with it. 



In [None]:
!pip install labelbox

In [9]:
from labelbox import Project, Dataset, Client, DataRow
import random

# API Key and Client
Provide a valid api key below in order to properly connect to the Labelbox Client.

In [10]:
# Add your api key
API_KEY=None
# To get your API key go to: Workspace settings -> API -> Create API Key
client = Client(api_key=API_KEY)

In [13]:
# For the purpose of this demo get a single project/dataset id from your organization

# Get a single Project id
# get_projects returns a PaginatedCollection object, which is iterable. 
project = next(client.get_projects())
project_id=project.uid
project_name=project.name
print("Project ID: ", project_id)
print("Project Name:", project_name)

print("-" * 40)

# Get a single dataset id
# get_datasets returns a PaginatedCollection object, which is iterable. 
dataset = next(client.get_datasets())
dataset_id = dataset.uid
dataset_name = dataset.name
print("Dataset ID: ", dataset_id)
print("Dataset Name:" , dataset_name)

Project ID:  cl9smiqo23hk307y27k42cajv
Project Name: html-editor
----------------------------------------
Dataset ID:  cl9sywtkj2gsv07vk2isaeadj
Dataset Name: text_test.json


In [14]:
# Fetch the project and dataset by using the IDs fetched in the previous cell
project = client.get_project(project_id)
dataset = client.get_dataset(dataset_id)

In [15]:
print("Project: ", project)
print("Dataset: ", dataset)

Project:  <Project {'auto_audit_number_of_labels': 1, 'auto_audit_percentage': 1, 'created_at': datetime.datetime(2022, 10, 28, 15, 2, 45, tzinfo=datetime.timezone.utc), 'description': '', 'last_activity_time': datetime.datetime(2022, 10, 28, 15, 47, 41, tzinfo=datetime.timezone.utc), 'media_type': <MediaType.Image: 'IMAGE'>, 'name': 'html-editor', 'queue_mode': <QueueMode.Batch: 'BATCH'>, 'setup_complete': None, 'uid': 'cl9smiqo23hk307y27k42cajv', 'updated_at': datetime.datetime(2022, 10, 28, 15, 47, 41, tzinfo=datetime.timezone.utc)}>
Dataset:  <Dataset {'created_at': datetime.datetime(2022, 10, 28, 20, 49, 38, tzinfo=datetime.timezone.utc), 'description': '', 'name': 'text_test.json', 'row_count': 3, 'uid': 'cl9sywtkj2gsv07vk2isaeadj', 'updated_at': datetime.datetime(2022, 10, 28, 20, 49, 40, tzinfo=datetime.timezone.utc)}>


### Fields
* All db objects have fields (look at the source code to see them https://github.com/Labelbox/labelbox-python/blob/develop/labelbox/schema/project.py)
* These fields are attributes of the object

In [16]:
print(project.name)
print(dataset.name)

html-editor
text_test.json


* Fields can be updated. This will be reflected server side (you will see it in labelbox) 

In [17]:
project.update(description="new description field")
print(project.description)

new description field


### Pagination
* Queries that return a list of database objects are return as a PaginatedCollection
* Limits the data that is being returned for better performance

In [18]:
labels_paginated_collection = project.labels()
print("Type of collection: ", type(labels_paginated_collection))

# A paginated collection can be parsed by using list()
# list(paginated...) should be avoided for queries that could return more than a dozen results
print("Number of labels :", len(list(labels_paginated_collection)))

Type of collection:  <class 'labelbox.pagination.PaginatedCollection'>
Number of labels : 0


In [19]:
# Note that if you selected a `project_id` without any labels this will raise `StopIteration`
# Iterate over the paginated collection
try: 
  single_label = next(project.labels())
  print(single_label)
except StopIteration: 
  print("Project has no labels !")

Project has no labels !


### Query parameters
* Query with the following conventions:
    * `DbObject.Field`

In [20]:
datasets = client.get_datasets(where=Dataset.name == dataset_name)

projects = client.get_projects(
    where=((Project.name == project_name) &
           (Project.description == "new description field")))

# The above two queries return PaginatedCollections because the filter parameters aren't guaranteed to be unique.
# So even if there is one element returned it is in a paginatedCollection.
print(projects)
print(next(projects, None))
print(next(projects, None))
print(next(projects, None))
# We can see there is only one.

<labelbox.pagination.PaginatedCollection object at 0x7fe3c7a49e90>
<Project {'auto_audit_number_of_labels': 1, 'auto_audit_percentage': 1, 'created_at': datetime.datetime(2022, 10, 28, 15, 2, 45, tzinfo=datetime.timezone.utc), 'description': 'new description field', 'last_activity_time': datetime.datetime(2022, 11, 1, 19, 18, 21, tzinfo=datetime.timezone.utc), 'media_type': <MediaType.Image: 'IMAGE'>, 'name': 'html-editor', 'queue_mode': <QueueMode.Batch: 'BATCH'>, 'setup_complete': None, 'uid': 'cl9smiqo23hk307y27k42cajv', 'updated_at': datetime.datetime(2022, 11, 1, 19, 18, 21, tzinfo=datetime.timezone.utc)}>
None
None


### Querying Limitations
* The DbObject used for the query must be the same as the DbObject returned by the querying function.  
* The below query is not valid since get_project returns a project not a dataset
>  `>>> projects = client.get_projects(where = Dataset.name == "dataset_name")`


# Relationships between projects and batches/datasets



In [21]:
# Since the project we created only has batches, we can't query for datasets. 
# sample_project_datasets = project.datasets() --> Run if project is in dataset mode
sample_project_batches = project.batches()

list(sample_project_batches)

for b in sample_project_batches:
  print(f" Name of project : {b.project().name}")
  print(f" Name of batches in project: {b.name}")

 Name of project : html-editor
 Name of batches in project: testsss
