# Datasets

* Datasets are collections of data rows (image, video, or text to be labeled)
* Datasets are used to define units of work.
    * Attaching a dataset to a project will add all data rows in the dataset to the project (and add them to the queue)
* Datasets are not required to be fixed in size (you can add data rows at any time). 
    * However, if you add data rows to a dataset, all projects associated with this dataset will add the new data rows to its queue

In [1]:
!pip install labelbox

In [2]:
from labelbox import Client
from getpass import getpass
import uuid
import os

In [3]:
# If you don't want to give google access to drive you can skip this cell
# and manually set `API_KEY` below.

COLAB = "google.colab" in str(get_ipython())
if COLAB:
    !pip install colab-env -qU
    from colab_env import envvar_handler
    envvar_handler.envload()

API_KEY = os.environ.get("LABELBOX_API_KEY")
if not os.environ.get("LABELBOX_API_KEY"):
    API_KEY = getpass("Please enter your labelbox api key")
    if COLAB:
        envvar_handler.add_env("LABELBOX_API_KEY", API_KEY)

* Set the following cell with your data to run this notebook

In [4]:
# Pick a dataset that has attached data_rows
DATASET_ID = "ckm4xyfua04cf0z7a3wz58kgj"
# Only update this if you have an on-prem deployment
ENDPOINT = "https://api.labelbox.com/graphql"

In [5]:
client = Client(api_key=API_KEY, endpoint=ENDPOINT)

### Read

In [6]:
# Can be fetched by name (using a query - see basics), or using an id directly
dataset = client.get_dataset(DATASET_ID)

In [7]:
print(dataset)

<Dataset {'created_at': datetime.datetime(2021, 3, 11, 14, 3, 12, tzinfo=datetime.timezone.utc), 'description': '', 'name': 'animal_demo_ds', 'uid': 'ckm4xyfua04cf0z7a3wz58kgj', 'updated_at': datetime.datetime(2021, 3, 11, 14, 3, 12, tzinfo=datetime.timezone.utc)}>


In [8]:
# We can see the data rows associated with a dataset
data_rows = dataset.data_rows()
next(data_rows)  # Print first one

<DataRow ID: ckm4y6s531rnq0rb6bobqa6j7>

In [9]:
# Attached projects
print("Projects with this dataset attached :", list(dataset.projects()))
print("Dataset name :", dataset.name)

Projects with this dataset attached : [<Project ID: ckm4xyfncfgja0760vpfdxoro>]
Dataset name : animal_demo_ds


In [10]:
# A dataset is the way to list all data rows
data_row = next(dataset.data_rows())

### Create

In [11]:
new_dataset = client.create_dataset(name="my_new_dataset")
print(new_dataset)

<Dataset {'created_at': datetime.datetime(2021, 3, 17, 11, 11, 7, tzinfo=datetime.timezone.utc), 'description': '', 'name': 'my_new_dataset', 'uid': 'ckmdcg8lf04px0y9ge67bbxa5', 'updated_at': datetime.datetime(2021, 3, 17, 11, 11, 7, tzinfo=datetime.timezone.utc)}>


* Add data rows
* See the [data rows](https://colab.research.google.com/github/Labelbox/labelbox-python/blob/develop/examples/basics/data_rows.ipynb#scrollTo=successful-patch) notebook for more about adding data rows

In [None]:
dataset.create_data_row(row_data="https://picsum.photos/200/300")

### Update

In [12]:
new_dataset.update(name="new_name")

* See the data rows notebook `Create` section on how to add data_rows to a dataset.

### Delete

In [13]:
new_dataset.delete()