<a href="https://colab.research.google.com/github/Hirundo-io/hirundo-python-sdk/blob/clnt-9-add-jupyter-notebooks-to-github/notebooks/Hirundo_Dataset_QA_HuggingFace.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# How to use Hirundo's Dataset QA (HuggingFace)

---



Let's start with a simple example using a dataset we've prepared on HuggingFace's datasets.

## HuggingFace `datasets` example

1. We import `os` and `google.colab`'s `userdata` to get our secrets and assign them to environment variables.

In [None]:
%pip install hirundo
import os

try:
    from google.colab import userdata  # type: ignore  In Google Colab, this will work

    os.environ["API_HOST"] = userdata.get("API_HOST")
    os.environ["API_KEY"] = userdata.get("API_KEY")
except ModuleNotFoundError:
    print(
        "You are not in Google Colab. Please set the API_HOST and API_KEY environment variables manually."
    )

2. We import the `GitRepo` class, the `QADataset` class, as well as the `LabelingType` enum, the `StorageConfig` class (to indicate where the dataset files are saved), `the StorageTypes` enum, and the `StorageGit` storage class

In [None]:
from hirundo import (
    GitRepo,
    HirundoCSV,
    LabelingType,
    QADataset,
    StorageConfig,
    StorageGit,
    StorageTypes,
)

3. First we create the `QADataset` object

In [None]:
git_storage = StorageGit(
    repo=GitRepo(
        name="BDD-100k-validation dataset",
        repository_url="https://huggingface.co/datasets/hirundo-io/bdd100k-validation-only",
    ),
    branch="main",
)
test_dataset = QADataset(
    name="HuggingFace-test-OD-BDD-validation dataset",
    labeling_type=LabelingType.OBJECT_DETECTION,
    storage_config=StorageConfig(
        name="BDD-100k-validation-dataset",
        type=StorageTypes.GIT,
        git=git_storage,
    ),
    labeling_info=HirundoCSV(
        csv_url=git_storage.get_url("/bdd100k_val_hirundo.zip/bdd100k/bdd100k.csv"),
        #  csv_url="https://huggingface.co/datasets/hirundo-io/bdd100k-validation-only/bdd100k.csv",
    ),
    data_root_url=git_storage.get_url("/bdd100k_val_hirundo.zip/bdd100k"),
    #  data_root_url="https://huggingface.co/datasets/hirundo-io/bdd100k-validation-only/bdd100k_val_hirundo.zip/bdd100k",
)

4. Now that we have created our dataset, we can launch a dataset QA run

In [None]:
run_id = test_dataset.run_qa()
print("Running dataset QA. Run ID is ", run_id)
test_dataset.check_run()