# Embedding demo (GPU/CUDA)

This notebook mirrors the CPU embedding flow but uses CuPy/cuML on GPU. It loads the supermarket CSV directly (no SQLite).

Prereqs in the runtime:
- GPU available (e.g., Colab runtime set to GPU)
- `cupy` (matching CUDA version) and `cuml` installed.

If the imports fail or no GPU is present, the notebook will raise an ImportError.

In [1]:
!nvcc --version
!nvidia-smi

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2024 NVIDIA Corporation
Built on Thu_Jun__6_02:18:23_PDT_2024
Cuda compilation tools, release 12.5, V12.5.82
Build cuda_12.5.r12.5/compiler.34385749_0
Sun Dec 14 10:24:05 2025       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.15              Driver Version: 550.54.15      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|   0  Tesla T4                       Off |   00000000:00:04.0 Off |                    0 |
| N/A   76C    P8             14W /   70W |       0MiB /  15360MiB |      0%      Default |
|                       

In [2]:
!git clone https://github.com/JacobRhys/shopping-data-procesor.git /content/shopping-data-procesor

fatal: destination path '/content/shopping-data-procesor' already exists and is not an empty directory.


In [3]:
# %pip install cupy-cuda12x

In [4]:
# %pip install cuml

In [5]:
import sys, pathlib
from pathlib import Path
import pandas as pd
from google.colab import files, drive

try:
    import cupy as cp
    from cuml.decomposition import TruncatedSVD
except Exception as exc:
    raise ImportError(
        "CUDA/CuPy/cuML required. Install cupy & cuml for your CUDA version."
    ) from exc

ROOT = pathlib.Path("/content/shopping-data-procesor")
if str(ROOT) not in sys.path:
    sys.path.insert(0, str(ROOT))

from app import CoOccurrenceStore

In [6]:
# Check GPU
try:
    _ = cp.cuda.Device(0).compute_capability
except Exception as exc:
    raise RuntimeError("No CUDA GPU detected.") from exc

In [7]:
!ls "/content/"

sample_data  shopping-data-procesor  Supermarket_dataset_PAI.csv


In [8]:
df = pd.read_csv("/content/shopping-data-procesor/data/Supermarket_dataset_PAI.csv")
df.head()

Unnamed: 0,Member_number,Date,itemDescription
0,1808,21-07-2015,tropical fruit
1,2552,05-01-2015,whole milk
2,2300,19-09-2015,pip fruit
3,1187,12-12-2015,other vegetables
4,3037,01-02-2015,whole milk


In [10]:
# Build co-occurrence store from CSV (CPU-side aggregation)
store = CoOccurrenceStore()
for _, group in df.groupby(["Member_number", "Date"]):
    items = pd.unique(group["itemDescription"])
    store.add_transaction(items)
items = store.items()
len(items)

167

In [11]:
# Build dense matrix on CPU, then move to GPU
from app.embedding_cpu import build_dense_matrix  # reuse builder

mat_cpu, items = build_dense_matrix(store)
mat_gpu = cp.asarray(mat_cpu, dtype=cp.float32)
mat_gpu.shape

(167, 167)

In [12]:
# Truncated SVD on GPU
k = min(20, mat_gpu.shape[0])
svd = TruncatedSVD(n_components=k, random_state=42)
emb_gpu = svd.fit_transform(mat_gpu)
emb_gpu.shape

(167, 20)

In [13]:
# Move embeddings to CPU numpy for cosine calculations
import numpy as np

emb = cp.asnumpy(emb_gpu)
emb.shape

(167, 20)

In [14]:
# Recommendations using existing CPU helpers (cosine on numpy)
from app.embedding_cpu import recommend_for_item, recommend_for_basket

query_item = "whole milk"
basket = ["whole milk", "yogurt"]

item_recs = recommend_for_item(items, emb, query_item, top_k=5)
basket_recs = recommend_for_basket(items, emb, basket, top_k=5)
item_recs, basket_recs

([('mayonnaise', 0.9400085210800171),
  ('red/blush wine', 0.9302767515182495),
  ('salty snack', 0.9254679083824158),
  ('misc. beverages', 0.92173832654953),
  ('specialty chocolate', 0.9116060137748718)],
 [('red/blush wine', 0.9352493286132812),
  ('dessert', 0.9329813718795776),
  ('salty snack', 0.9329189658164978),
  ('dishes', 0.9302870631217957),
  ('hygiene articles', 0.9267566800117493)])

In [15]:
# Sanity checks
assert item_recs, "No item recommendations returned"
assert basket_recs, "No basket recommendations returned"
assert query_item not in {name for name, _ in item_recs}
assert not ({*basket} & {name for name, _ in basket_recs})
print("GPU embedding pipeline sanity checks passed.")

GPU embedding pipeline sanity checks passed.
