# AutoChef

## Prepare environment and process data

Data Source: https://www.kaggle.com/datasets/irkaal/foodcom-recipes-and-reviews?resource=download

Data Source v2: https://app.roboflow.com/bens-workspace-3xdyh/fridge-detection-aymme/browse?queryText=&pageSize=50&startingIndex=0&browseQuery=true

In [1]:
from transformers import AutoProcessor, AutoModelForZeroShotImageClassification
import kagglehub
import pandas as pd
import numpy
import re
import PIL.Image
from ultralytics import YOLO
import torch

  from .autonotebook import tqdm as notebook_tqdm

A module that was compiled using NumPy 1.x cannot be run in
NumPy 2.3.2 as it may crash. To support both 1.x and 2.x
versions of NumPy, modules must be compiled with NumPy 2.0.
Some module may need to rebuild instead e.g. with 'pybind11>=2.12'.

If you are a user of the module, the easiest solution will be to
downgrade to 'numpy<2' or try to upgrade the affected module.
We expect that some modules will need time to support NumPy 2.

Traceback (most recent call last):  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "C:\Users\bps78\Documents\GitHub\AutoDJ\.venv\Lib\site-packages\ipykernel_launcher.py", line 18, in <module>
    app.launch_new_instance()
  File "C:\Users\bps78\Documents\GitHub\AutoDJ\.venv\Lib\site-packages\traitlets\config\application.py", line 1075, in launch_instance
    app.start()
  File "C:\Users\bps78\Documents\GitHub\AutoDJ\.venv\Lib\site-packages\ipyke

In [None]:
!pip install torch torchvision torchaudio

In [None]:


processor = AutoProcessor.from_pretrained("openai/clip-vit-base-patch32")
model = AutoModelForZeroShotImageClassification.from_pretrained("openai/clip-vit-base-patch32")

In [None]:


path = kagglehub.dataset_download("irkaal/foodcom-recipes-and-reviews")

print("Path to dataset files:", path)

In [None]:
recipes = pd.read_csv(path + "/recipes.csv")

In [None]:
print(recipes.shape)
display(recipes.head())
print(recipes.columns)

In [None]:

#Convert R-style vector strings to Python lists for 'RecipeIngredientParts' column
def r_vector_to_list(s):
    # Remove c( and )
    s = s.strip()
    s = re.sub(r'^c\(|\)$', '', s)
    # Split by comma, strip quotes and whitespace
    return [item.strip().strip('"').strip("'") for item in s.split(',')]

recipes['RecipeIngredientParts'] = recipes['RecipeIngredientParts'].apply(r_vector_to_list)

## Core functionality

In [None]:
all_ingredients = recipes['RecipeIngredientParts'].explode().unique().tolist()
print(len(all_ingredients))

Use CLIP to match images to ingredients

In [None]:
test_image = PIL.Image.open("fridge_test.jpg")

batch_size = 100
ingredient_scores = []

for i in range(0, len(all_ingredients), batch_size):
    batch_ingredients = all_ingredients[i:i + batch_size]
    inputs = processor(text=batch_ingredients, images=test_image, return_tensors="pt", padding=True)
    outputs = model(**inputs)
    scores = outputs.logits_per_image[0].detach().cpu().numpy()
    ingredient_scores.extend(zip(batch_ingredients, scores))

In [None]:
# Get the ingredients present in the image, sorted by score
ingredient_scores.sort(key=lambda x: x[1], reverse=True)
top_ingredients = [(ingredient, score) for ingredient, score in ingredient_scores if score > 20]
print("Top ingredients in the image:")
for ingredient, score in top_ingredients:
    print(f"{ingredient}: {score:.4f}")



Use the dataset to find recipes that match a set of ingredients

## Take 2 - use YOLO for a simplified approach

In [11]:

print("CUDA Available:", torch.cuda.is_available())
print("Device Name:", torch.cuda.get_device_name(0) if torch.cuda.is_available() else "No GPU")

print(torch.__version__)
print(torch.version.cuda)


CUDA Available: True
Device Name: NVIDIA GeForce GTX 1660
2.3.1+cu118
11.8
Torch: 2.3.1+cu118
Torchvision: 0.18.1+cu118
CUDA available: True
tensor([0], device='cuda:0')


In [2]:
model = YOLO("yolov8n.pt")
model.train(data="Fridge detection.v1i.yolov8/data.yaml", epochs=50, imgsz=640)

Ultralytics 8.3.174  Python-3.11.9 torch-2.3.1+cu118 CUDA:0 (NVIDIA GeForce GTX 1660, 6144MiB)
[34m[1mengine\trainer: [0magnostic_nms=False, amp=True, augment=False, auto_augment=randaugment, batch=16, bgr=0.0, box=7.5, cache=False, cfg=None, classes=None, close_mosaic=10, cls=0.5, conf=None, copy_paste=0.0, copy_paste_mode=flip, cos_lr=False, cutmix=0.0, data=Fridge detection.v1i.yolov8/data.yaml, degrees=0.0, deterministic=True, device=None, dfl=1.5, dnn=False, dropout=0.0, dynamic=False, embed=None, epochs=50, erasing=0.4, exist_ok=False, fliplr=0.5, flipud=0.0, format=torchscript, fraction=1.0, freeze=None, half=False, hsv_h=0.015, hsv_s=0.7, hsv_v=0.4, imgsz=640, int8=False, iou=0.7, keras=False, kobj=1.0, line_width=None, lr0=0.01, lrf=0.01, mask_ratio=4, max_det=300, mixup=0.0, mode=train, model=yolov8n.pt, momentum=0.937, mosaic=1.0, multi_scale=False, name=train9, nbs=64, nms=False, opset=None, optimize=False, optimizer=auto, overlap_mask=True, patience=100, perspective=0.0

[34m[1mtrain: [0mScanning C:\Users\bps78\Documents\GitHub\AutoDJ\Fridge detection.v1i.yolov8\train\labels.cache... 2232 images, 0 backgrounds, 0 corrupt: 100%|██████████| 2232/2232 [00:00<?, ?it/s]


[34m[1mval: [0mFast image access  (ping: 0.10.1 ms, read: 178.728.8 MB/s, size: 34.8 KB)


[34m[1mval: [0mScanning C:\Users\bps78\Documents\GitHub\AutoDJ\Fridge detection.v1i.yolov8\valid\labels.cache... 103 images, 0 backgrounds, 0 corrupt: 100%|██████████| 103/103 [00:00<?, ?it/s]


Plotting labels to C:\Users\bps78\Documents\GitHub\shot-tracer\runs\detect\train9\labels.jpg... 
[34m[1moptimizer:[0m 'optimizer=auto' found, ignoring 'lr0=0.01' and 'momentum=0.937' and determining best 'optimizer', 'lr0' and 'momentum' automatically... 
[34m[1moptimizer:[0m AdamW(lr=0.000294, momentum=0.9) with parameter groups 57 weight(decay=0.0), 64 weight(decay=0.0005), 63 bias(decay=0.0)
Image sizes 640 train, 640 val
Using 8 dataloader workers
Logging results to [1mC:\Users\bps78\Documents\GitHub\shot-tracer\runs\detect\train9[0m
Starting training for 50 epochs...

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


  0%|          | 0/140 [00:00<?, ?it/s]


RuntimeError: Caught RuntimeError in DataLoader worker process 0.
Original Traceback (most recent call last):
  File "C:\Users\bps78\Documents\GitHub\AutoDJ\.venv\Lib\site-packages\torch\utils\data\_utils\worker.py", line 308, in _worker_loop
    data = fetcher.fetch(index)  # type: ignore[possibly-undefined]
           ^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\bps78\Documents\GitHub\AutoDJ\.venv\Lib\site-packages\torch\utils\data\_utils\fetch.py", line 51, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\bps78\Documents\GitHub\AutoDJ\.venv\Lib\site-packages\torch\utils\data\_utils\fetch.py", line 51, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
            ~~~~~~~~~~~~^^^^^
  File "C:\Users\bps78\Documents\GitHub\AutoDJ\.venv\Lib\site-packages\ultralytics\data\base.py", line 379, in __getitem__
    return self.transforms(self.get_image_and_label(index))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\bps78\Documents\GitHub\AutoDJ\.venv\Lib\site-packages\ultralytics\data\augment.py", line 202, in __call__
    data = t(data)
           ^^^^^^^
  File "C:\Users\bps78\Documents\GitHub\AutoDJ\.venv\Lib\site-packages\ultralytics\data\augment.py", line 2192, in __call__
    labels["img"] = self._format_img(img)
                    ^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\bps78\Documents\GitHub\AutoDJ\.venv\Lib\site-packages\ultralytics\data\augment.py", line 2243, in _format_img
    img = torch.from_numpy(img)
          ^^^^^^^^^^^^^^^^^^^^^
RuntimeError: Numpy is not available
