<a href="https://colab.research.google.com/github/Hirundo-io/hirundo-client/blob/clnt-9-add-jupyter-notebooks-to-github/notebooks/Create_cifar100_dataset_GCP.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# How to create a Hirundo dataset (GCP Storage Bucket)

--

0. Install `torchvision` and `pandas`, set the `GCP_CREDENTIALS` environment variable and set `bucket_name`.

In [None]:
%pip install torchvision pandas
import os

from google.colab import userdata

os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = userdata.get("GCP_CREDENTIALS_RW")

bucket_name = "cifar100bucket"  # @param {type:"string"}

1. Import `tempfile` to create a temporary directory & `CIFAR100` from `torchvision.datasets` to download the dataset.

In [None]:
import tempfile

from torchvision.datasets import CIFAR100

temp_dir = tempfile.TemporaryDirectory()
temp_dir_name = temp_dir.name
cifar100 = CIFAR100(temp_dir_name, download=True)

2. Import `Path` from `pathlib` and `pandas` to create `DataFrame`.

In [None]:
from pathlib import Path

import pandas as pd

3. Create `DataFrame`.

In [None]:
temp_dir_dataset_path = Path(temp_dir_name) / "dataset"
temp_dir_dataset_path.mkdir()
img_dir = temp_dir_dataset_path / "images"
img_dir.mkdir()

csv = pd.DataFrame(columns=["image_path", "class_name"], index=range(len(cifar100)))
for i, item in enumerate(cifar100):
    image, target = item
    image_path = img_dir / f"{i}.png"
    csv.loc[i] = {
        "image_path": image_path.relative_to(temp_dir_dataset_path),
        "class_name": cifar100.classes[target],
    }
    image.save(image_path)

csv.to_csv(temp_dir_dataset_path / "cifar100.csv")

4. Upload the CSV and images to GCP Storage Bucket.

In [None]:
!gsutil -m rsync $img_dir gs://$bucket_name/$img_dir
!gsutil cp $temp_dir_dataset_path/cifar100.csv gs://$bucket_name/cifar100.csv

5. Get `cifar100` class list.

In [None]:
'"' + '", "'.join(cifar100.classes) + '"'

5. Cleanup the temporary folder.

In [None]:
temp_dir.cleanup()