# Load PandaSet

This notebook shows how to load the PandaSet dataset into a 3LC Table. This involves preparing point clouds, cuboid and semantic segmentation data, and writing them using the [bulk data pattern](https://docs.3lc.ai/3lc/latest/tutorials/geometry/bulk_data.html#bulk-data-tutorial). For details on the ingestion process, see the [loading script](./load_pandaset.py)

Running this notebook requires the [PandaSet DevKit](https://github.com/scaleapi/pandaset-devkit/blob/master/README.md), which can be installed as an extra to this repo: 

```bash
pip install -e .[pandaset]
```

The dataset can be downloaded from [HuggingFace](https://huggingface.co/datasets/georghess/pandaset). If you have already downloaded `pandaset.zip`, ensure the dataset root below points to the unzipped pandaset directory.

If not, the notebook will download `pandaset.zip` and unzip it into the dataset root directory. This requires authentication with HuggingFace, for example by setting the `HF_TOKEN` environment variable.

> ⚠️ Storage requirements
>
> The unzipped dataset is ~42GB, and ingesting all sequencesinto 3LC will
> require another 50GB of disk space. Ensure you have enough free space before
> running the notebook.


## Project Setup

In [None]:
PROJECT_NAME = "3LC Tutorials"
DATASET_NAME = "pandaset"
TABLE_NAME = "pandaset"
DATA_PATH = "../../../data"
DATASET_ROOT = "D:/Data/pandaset"

## Imports

In [None]:
from pathlib import Path

from load_pandaset import load_pandaset

## Prepare Dataset

In [None]:
DATASET_ROOT = Path(DATASET_ROOT)

if not DATASET_ROOT.exists():
    import zipfile

    from huggingface_hub import hf_hub_download

    print("Downloading dataset from HuggingFace")
    hf_hub_download(
        repo_id="georghess/pandaset",
        repo_type="dataset",
        filename="pandaset.zip",
        local_dir=DATASET_ROOT.parent.absolute().as_posix(),
    )

    with zipfile.ZipFile(f"{DATASET_ROOT.parent}/pandaset.zip", "r") as zip_ref:
        zip_ref.extractall(DATASET_ROOT.parent)
else:
    print(f"Dataset root {DATASET_ROOT} already exists")

## Create Table

In [None]:
table = load_pandaset(
    dataset_root=DATASET_ROOT,
    table_name=TABLE_NAME,
    dataset_name=DATASET_NAME,
    project_name=PROJECT_NAME,
    data_path=DATA_PATH,
    max_frames=None,
    max_sequences=None,
)