In [1]:
import json, os, numpy as np, faiss
from pathlib import Path
from dotenv import load_dotenv
from openai import OpenAI

load_dotenv()
client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))

IDX_DIR = Path("index")
META_PATH = IDX_DIR / "meta.jsonl"
INDEX_PATH = IDX_DIR / "faiss.index"

docs = [json.loads(l) for l in META_PATH.read_text(encoding="utf-8").splitlines()]
index = faiss.read_index(str(INDEX_PATH))

EMBED_MODEL = "text-embedding-3-small"
CHAT_MODEL = "gpt-4o-mini"

In [2]:
def embed_query(q: str):
    e = client.embeddings.create(model=EMBED_MODEL, input=[q]).data[0].embedding
    v = np.array(e, dtype="float32")
    return v / np.linalg.norm(v)

def search(query: str, k=6, brand=None, min_score=0.3):
    v = embed_query(query)
    D, I = index.search(np.array([v]), k*2)  # over-retrieve then filter
    hits = []
    for score, idx in zip(D[0], I[0]):
        if idx < 0:
            continue
        d = docs[idx]
        if brand and d.get("brand") != brand:
            continue
        if score < min_score:
            continue
        d = d.copy()
        d["score"] = float(score)
        hits.append(d)
        if len(hits) >= k:
            break
    return hits

In [3]:
SYSTEM = (
    "You are a careful assistant answering questions about consumer shampoos. "
    "ONLY use the provided context; if the answer isn't covered, say you don't know. "
    "Be concise, compare products when relevant, and include caveats (e.g., sensitive scalp)."
)

def build_context(docs):
    lines = []
    for i, d in enumerate(docs, start=1):
        brand = d.get("brand", "?")
        src = d.get("source", "?")
        lines.append(f"[{i}] {src.upper()} | Brand: {brand} | score={d['score']:.3f}\n{d['content']}")
    return "\n\n".join(lines)

def answer(query: str, k=6, brand=None):
    ctx_docs = search(query, k=k, brand=brand)
    context = build_context(ctx_docs) if ctx_docs else "No context."
    messages = [
        {"role": "system", "content": SYSTEM},
        {"role": "user", "content": f"Context:\n{context}\n\nQuestion: {query}\n\nIf you cite, reference chunks by [number]."}
    ]
    resp = client.chat.completions.create(
        model=CHAT_MODEL,
        messages=messages,
        temperature=0.2,
        max_tokens=350
    )
    return resp.choices[0].message.content, ctx_docs

In [4]:
q1 = "Which shampoo seems best for dandruff and a sensitive scalp?"
ans, ctx = answer(q1, k=6)
print(ans)
print("\n— Sources —")
for i, d in enumerate(ctx,  start=1):
    print(f"[{i}] {d['source']} | {d.get('brand','?')} | score={d['score']:.3f}")

For dandruff and a sensitive scalp, **Head & Shoulders Classic Clean Anti-Dandruff Shampoo** is specifically designed to combat flakes, itch, and dryness associated with dandruff, making it a strong choice for those dealing with these issues [2]. It is also gentle enough for daily use and safe for color-treated hair.

On the other hand, **CeraVe Hydrating Shampoo** is highly recommended for sensitive scalps due to its fragrance-free formula and lack of harsh surfactants. Users have reported it effectively keeps their scalp itch-free and cleanses without leaving residue [1][3][4]. However, it is not specifically marketed as an anti-dandruff shampoo.

If dandruff is the primary concern, Head & Shoulders may be more effective, while CeraVe is better suited for those with a sensitive scalp who want to avoid irritants.

— Sources —
[1] review | CeraVe | score=0.647
[2] description | Head & Shoulders | score=0.639
[3] review | CeraVe | score=0.630
[4] review | CeraVe | score=0.616
[5] review

In [5]:
q2 = "Does any product cause dryness or buildup according to reviews?"
ans, ctx = answer(q2, k=6)
print(ans)
for i, d in enumerate(ctx,  start=1):
    print(f"[{i}] {d['source']} | {d.get('brand','?')} | score={d['score']:.3f}")

Yes, both Head & Shoulders and Dove's Dryness & Itch Relief dandruff shampoo are noted to cause dryness and buildup. The reviewer mentioned that these products left a "gross residue" on their hair, making it feel both dry and greasy, and they experienced itchiness by the next day [1]. Additionally, the Pantene shampoo was mentioned to potentially cause mild buildup on finer sections of hair, despite its benefits [5].
[1] review | CeraVe | score=0.471
[2] review | Dove | score=0.471
[3] review | CeraVe | score=0.462
[4] review | CeraVe | score=0.431
[5] review | Pantene | score=0.425
[6] review | CeraVe | score=0.425


In [6]:
q3 = "Compare Head & Shoulders vs CeraVe for oil control and color-treated hair."
ans, ctx = answer(q3, k=6)
print(ans)
for i, d in enumerate(ctx,  start=1):
    print(f"[{i}] {d['source']} | {d.get('brand','?')} | score={d['score']:.3f}")

When comparing Head & Shoulders and CeraVe for oil control and color-treated hair, CeraVe appears to be the more favorable option, especially for those with sensitive scalps.

**Oil Control:**
- **CeraVe Gentle Hydrating Shampoo**: Users report that it effectively balances oil production, allowing them to go longer between washes without looking greasy. It has been noted to help with oiliness while maintaining scalp health, particularly for those who typically need to wash their hair daily due to oil buildup [1][2].
- **Head & Shoulders Classic Clean**: While it offers protection against oil and dryness, its primary focus is on dandruff treatment rather than oil control specifically. It is designed to combat flakes and itchiness associated with dandruff, which may not directly address oiliness as effectively as CeraVe [4].

**Color-Treated Hair:**
- **CeraVe**: This shampoo is explicitly stated to be safe for color-treated hair, with users confirming that it does not cause fading or du