# Serve Recommendations Notebook

This notebook mirrors `src/inference/serve_recommendations.py` and lets you:

- Load the trained two-tower SBERT model and product corpus.
- Run ad-hoc queries to get product recommendations (embedding-based: same encoder for query and products; top-k by **cosine similarity**â€”no text generation).
- Optionally pull queries from `processed/.../eval_queries.json` by ID.

**Embedding index:** Product embeddings are cached on disk under `<corpus_parent>/.embedding_index/` (per model + corpus) so they are not recomputed every run. Use `use_index=False` in `load_recommender()` to disable loading/saving the cache.

Adjust paths as needed if you move the project or models.

In [6]:
from pathlib import Path
import json
import logging

from src.constants import DEFAULT_CORPUS_PATH, DEFAULT_MODEL_DIR, DEFAULT_PROCESSED_DIR, PROJECT_ROOT
from src.inference.serve_recommendations import load_recommender
from src.utils import resolve_processed_dir

logging.basicConfig(level=logging.INFO, format="%(message)s")

# Data prep writes to a param subdir (e.g. processed/p5_mp20_ef0.1/). Resolve so corpus path exists.
processed_dir, _ = resolve_processed_dir(DEFAULT_PROCESSED_DIR, DEFAULT_PROCESSED_DIR)
corpus_path = processed_dir / "eval_corpus.json"

PROJECT_ROOT, DEFAULT_MODEL_DIR, corpus_path

(PosixPath('/Users/chen_bowen/AI & ML/Projects/Instacart_Personalization'),
 PosixPath('/Users/chen_bowen/AI & ML/Projects/Instacart_Personalization/models/two_tower_sbert/final'),
 PosixPath('/Users/chen_bowen/AI & ML/Projects/Instacart_Personalization/processed/p5_mp20_ef0.1/eval_corpus.json'))

In [7]:
# Configure paths (model_dir; corpus_path was set in previous cell from resolved processed subdir)
model_dir = DEFAULT_MODEL_DIR

print("Model dir:", model_dir)
print("Corpus path:", corpus_path)

# use_index=True (default): load/save product embeddings in .embedding_index/ to avoid recomputing each run
rec = load_recommender(
    model_dir=model_dir,
    corpus_path=corpus_path,
    use_index=True,  # set False to recompute product embeddings every time
)
rec

Use pytorch device_name: mps
Load pretrained SentenceTransformer: /Users/chen_bowen/AI & ML/Projects/Instacart_Personalization/models/two_tower_sbert/final


Model dir: /Users/chen_bowen/AI & ML/Projects/Instacart_Personalization/models/two_tower_sbert/final
Corpus path: /Users/chen_bowen/AI & ML/Projects/Instacart_Personalization/processed/p5_mp20_ef0.1/eval_corpus.json


Loading weights:   0%|          | 0/103 [00:00<?, ?it/s]

Batches:   0%|          | 0/777 [00:00<?, ?it/s]

Saved embedding index to /Users/chen_bowen/AI & ML/Projects/Instacart_Personalization/processed/p5_mp20_ef0.1/.embedding_index/8f5c32386f084da2 (49688 products)
Loaded model from /Users/chen_bowen/AI & ML/Projects/Instacart_Personalization/models/two_tower_sbert/final, corpus 49688 products from /Users/chen_bowen/AI & ML/Projects/Instacart_Personalization/processed/p5_mp20_ef0.1/eval_corpus.json


<src.inference.serve_recommendations.Recommender at 0x13540f320>

In [10]:
def recommend_notebook(
    query: str,
    top_k: int = 10,
    exclude_product_ids: set[str] | None = None,
):
    """Helper to get and pretty-print recommendations (top-k by cosine similarity)."""
    results = rec.recommend(query=query, top_k=top_k, exclude_product_ids=exclude_product_ids)
    print(f"Top-{top_k} recommendations:")
    for i, (pid, score) in enumerate(results, 1):
        text = rec.pid_to_text[pid]
        print(f"  {i}. product_id={pid} (score={score:.4f}) {text}...")
    return results

# Example demo query (same as in src.inference.serve_recommendations.main)
demo_query = "[+7d w4h14] Organic Milk, Whole Wheat Bread."
results = recommend_notebook(demo_query, top_k=10)

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Top-10 recommendations:
  1. product_id=13517 (score=0.7639) Product: Whole Wheat Bread. Aisle: bread. Department: bakery....
  2. product_id=34479 (score=0.7101) Product: Whole Wheat Walnut Bread. Aisle: bread. Department: bakery....
  3. product_id=48628 (score=0.7062) Product: Organic Whole Wheat Bread. Aisle: bread. Department: bakery....
  4. product_id=1463 (score=0.6928) Product: Organic Milk. Aisle: milk. Department: dairy eggs....
  5. product_id=16490 (score=0.6510) Product: Old Fashioned Whole Wheat Bread. Aisle: bread. Department: bakery....
  6. product_id=6454 (score=0.6351) Product: Whole Wheat Bread Loaf. Aisle: bread. Department: bakery....
  7. product_id=16611 (score=0.6227) Product: Milk. Aisle: milk. Department: dairy eggs....
  8. product_id=44103 (score=0.6177) Product: Honey Whole Wheat Bread. Aisle: bread. Department: bakery....
  9. product_id=14682 (score=0.6075) Product: Whole Grain Bread. Aisle: bread. Department: bakery....
  10. product_id=46988 (score=0.

In [9]:
# Optional: use an eval query from processed/eval_queries.json by ID
eval_queries_path = corpus_path.parent / "eval_queries.json"
print("Eval queries path:", eval_queries_path)

if eval_queries_path.exists():
    with open(eval_queries_path, "r") as f:
        eval_queries = json.load(f)
    print(f"Loaded {len(eval_queries)} eval queries")

    # Pick an ID to inspect
    sample_id = next(iter(eval_queries.keys()))
    query = eval_queries[sample_id]
    print(f"Sample eval_query_id={sample_id}\n{query}\n")

    _ = recommend_notebook(query, top_k=10)
else:
    print("No eval_queries.json found next to corpus; skip eval-query-based example.")

Eval queries path: /Users/chen_bowen/AI & ML/Projects/Instacart_Personalization/processed/p5_mp20_ef0.1/eval_queries.json
Loaded 13120 eval queries
Sample eval_query_id=3178496
[+30d w5h23] Chicken Base, Organic, Lemonade, Black Bean Garlic Sauce; [+8d w6h1] Wild Sardines in Extra Virgin Olive Oil, Pulp Free Orange Juice, Organic Multigrain with Flax English Muffins, Organic Baby Spinach, Organic Strawberries, Organic Lemon, Organic Edamame, Organic Red Grape Tomato, Organic Dill Weed; [+6d w5h21] Organic Baby Spinach, Organic Strawberries, Organic Lemon, Pulp Free Orange Juice, Organic Dill Weed, Country Dijon Mustard, Panko Bread Crumbs, Duck Liver Mousse w/ Cognac.



Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Top-10 recommendations:
  1. product_id=22170 (score=0.6053) Product: Organic Multigrain with Flax English Muffins. Aisle...
  2. product_id=34044 (score=0.5992) Product: Organic Orange Juice with Pulp. Aisle: refrigerated...
  3. product_id=39108 (score=0.5956) Product: Pulp Free Orange Juice. Aisle: refrigerated. Depart...
  4. product_id=47731 (score=0.5768) Product: Pulp Free Orange Juice With Tangerine. Aisle: refri...
  5. product_id=23034 (score=0.5758) Product: Juice, Original + Honey, Exposed, Pulp Free. Aisle:...
  6. product_id=17326 (score=0.5720) Product: Natural Wild Caught Brisling Sardines in Extra Virg...
  7. product_id=21683 (score=0.5675) Product: Organic Multigrain English Muffins. Aisle: buns rol...
  8. product_id=12331 (score=0.5669) Product: Wild Sardines in Extra Virgin Olive Oil with Lemon....
  9. product_id=45079 (score=0.5666) Product: Blood Orange Lemonade. Aisle: juice nectars. Depart...
  10. product_id=3585 (score=0.5653) Product: Exposed Pulp and Juic