# Vibe Matcher Prototype

This notebook demonstrates a small recommender that matches a user's **vibe query** to fashion products using embeddings + cosine similarity.

It supports using OpenAI's `text-embedding-ada-002`


## 1) Data Preparation
We'll create a small Pandas DataFrame with mock products (5-10 items).

In [24]:
import pandas as pd

products = [
    {"name": "Boho Dress", "desc": "Flowy, earthy tones perfect for music festivals and beach vibes.", "vibes": ["boho","festival","earthy"]},
    {"name": "Urban Bomber Jacket", "desc": "Cropped bomber with reflective details — energetic urban streetwear.", "vibes": ["urban","street","energetic"]},
    {"name": "Cozy Knit Sweater", "desc": "Soft oversized knit for relaxed, cozy days at home or cafes.", "vibes": ["cozy","casual","comfort"]},
    {"name": "Minimal Slip Dress", "desc": "Sleek minimal slip dress for elegant evenings and modern formalwear.", "vibes": ["minimal","elegant","formal"]},
    {"name": "Sporty Runner Sneakers", "desc": "Lightweight sneakers built for energetic city runs and athleisure.", "vibes": ["sporty","energetic","athleisure"]},
    {"name": "Glam Sequin Top", "desc": "Sparkling sequin top designed for party nights and statement looks.", "vibes": ["party","glam","sparkle"]},
    {"name": "Vintage Denim Jacket", "desc": "Worn-in denim jacket with patched details, casual vintage charm.", "vibes": ["vintage","casual","retro"]}
]
df = pd.DataFrame(products)
df


Unnamed: 0,name,desc,vibes
0,Boho Dress,"Flowy, earthy tones perfect for music festival...","[boho, festival, earthy]"
1,Urban Bomber Jacket,Cropped bomber with reflective details — energ...,"[urban, street, energetic]"
2,Cozy Knit Sweater,"Soft oversized knit for relaxed, cozy days at ...","[cozy, casual, comfort]"
3,Minimal Slip Dress,Sleek minimal slip dress for elegant evenings ...,"[minimal, elegant, formal]"
4,Sporty Runner Sneakers,Lightweight sneakers built for energetic city ...,"[sporty, energetic, athleisure]"
5,Glam Sequin Top,Sparkling sequin top designed for party nights...,"[party, glam, sparkle]"
6,Vintage Denim Jacket,"Worn-in denim jacket with patched details, cas...","[vintage, casual, retro]"


## 2) Embeddings
This section shows two paths:
1. **OpenAI** embeddings (if you set `OPENAI_API_KEY`) using `text-embedding-ada-002`.
2. **Fallback TF-IDF** embeddings so the notebook runs offline.

The code below will try OpenAI first, and fall back automatically.


In [25]:
import os
os.environ['OPENAI_API_KEY'] = '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'

In [26]:
import os
import time
import numpy as np
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity

OPENAI_KEY = os.environ.get('OPENAI_API_KEY')  # or set manually
use_openai = False
embeddings = None

descs = df['desc'].tolist()
if OPENAI_KEY:
    try:
        import openai
        openai.api_key = OPENAI_KEY
        # Example: batch embeddings request (uncomment to use live OpenAI)
        # resp = openai.Embedding.create(model='text-embedding-ada-002', input=descs)
        # embeddings = np.array([r['embedding'] for r in resp['data']])
        use_openai = True
    except Exception as e:
        print('OpenAI import/call failed — falling back to TF-IDF. Error:', e)

if not use_openai:
    print('Using TF-IDF fallback embeddings (no OpenAI API key provided).')
    vect = TfidfVectorizer()
    embeddings = vect.fit_transform(descs)

print('Embeddings ready. Shape (if dense):', getattr(embeddings, 'shape', 'sparse'))


Embeddings ready. Shape (if dense): sparse


## 3) Vector Search — Cosine Similarity
Compute cosine similarity between a query and all product embeddings, then return top-3 matches.

In [27]:
def match_query(query, embeddings, top_k=3, threshold=0.4, use_vect=None):
    """
    If embeddings is a numpy array or scipy sparse matrix.
    If use_vect is provided (TfidfVectorizer), transform the query.
    """
    if use_vect is not None:
        qv = use_vect.transform([query])
    else:
        # assume embeddings were created from OpenAI and we need to call API for query
        # in that case, raise to indicate user should compute query embedding with OpenAI
        raise RuntimeError('No vectorizer provided for local query embedding. Provide OPENAI_API_KEY to use remote embeddings or run with TF-IDF fallback.')

    sims = cosine_similarity(qv, embeddings)[0]
    idxs = sims.argsort()[::-1]
    results = []
    for i in idxs[:top_k]:
        results.append({'name': df.iloc[i]['name'], 'desc': df.iloc[i]['desc'], 'score': float(sims[i])})
    # Fallback handling
    if results[0]['score'] < threshold:
        fallback_msg = 'No strong match found (top score < 0.40). Consider broadening your query.'
    else:
        fallback_msg = None
    return results, fallback_msg

# Example usage (uses the TF-IDF vectorizer if we're in fallback mode)
if 'vect' in globals():
    queries = ['energetic urban chic', 'cozy loungewear', 'party-ready sparkle']
    for q in queries:
        res, fb = match_query(q, embeddings, top_k=3, threshold=0.4, use_vect=vect)
        print('Query:', q)
        for r in res:
            print('  -', r['name'], f"(score={r['score']:.4f})")
        if fb:
            print('  Fallback:', fb)
        print()


Query: energetic urban chic
  - Boho Dress (score=1.0000)

Query: cozy loungewear
  - Boho Dress (score=1.0000)

Query: party-ready sparkle
  - Boho Dress (score=1.0000)



## 4) Testing, Evaluation & Latency
Run multiple queries, log similarity scores, and compute simple metrics (e.g., # of queries with top score > 0.7). We'll also measure latency with `time.perf_counter()`.

In [28]:
import time
queries = ['energetic urban chic', 'cozy loungewear', 'party-ready sparkle']
records = []
for q in queries:
    t0 = time.perf_counter()
    res, fb = match_query(q, embeddings, top_k=3, threshold=0.4, use_vect=vect)
    t1 = time.perf_counter()
    records.append({'query': q, 'top_score': res[0]['score'], 'latency_ms': (t1-t0)*1000})

import pandas as pd
pd.DataFrame(records)


Unnamed: 0,query,top_score,latency_ms
0,energetic urban chic,1.0,1.565959
1,cozy loungewear,1.0,1.237905
2,party-ready sparkle,1.0,1.189907


## 5) Reflection
Notes and possible improvements:
- Use OpenAI embeddings (higher semantic quality) and store them in a vector DB (Pinecone, FAISS) for speed & scale.
- Add product images + CLIP/multi-modal embeddings for richer matching.
- Personalization: include user history and re-rank results.
- Edge cases: very short queries, unseen slang — handle with query expansion or spell-correction.


----
### Notes
- This notebook is meant to be drop-in for Colab. If you want to use real OpenAI embeddings, set `OPENAI_API_KEY` in Colab by running `import os; os.environ['OPENAI_API_KEY']='sk-...'` before the embeddings cell.
