<a href="https://colab.research.google.com/github/Hirundo-io/hirundo-client/blob/clnt-9-add-jupyter-notebooks-to-github/notebooks/Hirundo_Dataset_Optimization_S3.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# How to use Hirundo's Dataset Optimization (S3)

---



Let's start with a simple example using a dataset we've prepared and uploaded to an AWS S3 bucket.

## AWS S3 bucket example

1. We import `os` and `google.colab`'s `userdata` to get our secrets and assign them to environment variables.

In [None]:
%pip install hirundo
import os

from google.colab import userdata

os.environ["AWS_ACCESS_KEY"] = userdata.get("AWS_ACCESS_KEY")
os.environ["AWS_SECRET_ACCESS_KEY"] = userdata.get("AWS_ACCESS_KEY")
os.environ["API_HOST"] = userdata.get("API_HOST")
os.environ["API_KEY"] = userdata.get("API_KEY")

2. We import the `OptimizationDataset` class, as well as the `LabelingType` enum, the `StorageIntegration` (to indicate where the dataset files are saved) class, `the StorageTypes` enum, and the `StorageS3` storage class

In [None]:
from hirundo import (
    LabelingType,
    OptimizationDataset,
    StorageIntegration,
    StorageS3,
    StorageTypes,
)

3. First we create the `OptimizationDataset` object

In [None]:
test_dataset = OptimizationDataset(
    name="AWS-test-OD-BDD-validation-dataset",
    labeling_type=LabelingType.ObjectDetection,
    storage_integration=StorageIntegration(
        name="AWS-open-source-datasets",
        type=StorageTypes.S3,
        s3=StorageS3(
            bucket_url="s3://hirundo-open-source-datasets",
            region_name="il-central-1",
            access_key_id=os.environ["AWS_ACCESS_KEY"],
            secret_access_key=os.environ["AWS_SECRET_ACCESS_KEY"],
        ),
    ),
    root="/bdd100k_val_hirundo.zip/bdd100k",
    dataset_metadata_path="bdd100k.csv",
)

4. Now that we have created our dataset, we can launch a dataset optimization run

In [None]:
run_id = test_dataset.run_optimization()
print("Running optimization. Run ID is ", run_id)
test_dataset.check_run()