## understanding the task

there are new images from users, uploaded on daily basis

we need to ingest them and prepare for further usage and processing

lets assume it's a dish classification case

points to consider:
* we expect to have some existing main image pool which we plan to extend with new images

* we ingest images daily
    * schedule or other trigger?

* data preparation
        * rescale to have a maximum side length of 512 pixels

* historization
    * unlikely

* where to store images and labels

* workflow
    * ingest as raw
    * process as rescaled
    * put through ML model and label as labeled
    * manually validate?
    * extend the main image



In [1]:
from datasets import load_dataset
import os
import tqdm

# --- Settings ---
CLASSES = ["pizza", "sushi"]   # choose your two classes
N = 10                         # how many per class
OUT_DIR = "../data/raw"
# ----------------

# Load a manageable subset (first 5000 train samples)
ds = load_dataset("ethz/food101", split="train")

# Map label IDs to names
label_names = ds.features["label"].names
name_to_id = {n: i for i, n in enumerate(label_names)}

# Ensure output folders exist
for cls in CLASSES:
    os.makedirs(os.path.join(OUT_DIR, cls), exist_ok=True)

# Save images
saved_counts = {cls: 0 for cls in CLASSES}

for ex in ds:
    label = label_names[ex["label"]]
    if label in CLASSES and saved_counts[label] < N:
        idx = saved_counts[label] + 1
        fname = f"{idx:02d}_{label}.jpg"
        path = os.path.join(OUT_DIR, label, fname)
        ex["image"].save(path)
        saved_counts[label] += 1
        print("Saved", path)
    # Stop if done
    if all(saved_counts[c] >= N for c in CLASSES):
        break

print("Done! Saved:", saved_counts)


  from .autonotebook import tqdm as notebook_tqdm


Saved ../data/raw/pizza/01_pizza.jpg
Saved ../data/raw/pizza/02_pizza.jpg
Saved ../data/raw/pizza/03_pizza.jpg
Saved ../data/raw/pizza/04_pizza.jpg
Saved ../data/raw/pizza/05_pizza.jpg
Saved ../data/raw/pizza/06_pizza.jpg
Saved ../data/raw/pizza/07_pizza.jpg
Saved ../data/raw/pizza/08_pizza.jpg
Saved ../data/raw/pizza/09_pizza.jpg
Saved ../data/raw/pizza/10_pizza.jpg




Saved ../data/raw/sushi/01_sushi.jpg
Saved ../data/raw/sushi/02_sushi.jpg
Saved ../data/raw/sushi/03_sushi.jpg
Saved ../data/raw/sushi/04_sushi.jpg
Saved ../data/raw/sushi/05_sushi.jpg
Saved ../data/raw/sushi/06_sushi.jpg
Saved ../data/raw/sushi/07_sushi.jpg
Saved ../data/raw/sushi/08_sushi.jpg
Saved ../data/raw/sushi/09_sushi.jpg
Saved ../data/raw/sushi/10_sushi.jpg
Done! Saved: {'pizza': 10, 'sushi': 10}
