# RAG Notebook: Tavily (via `TAVIL_API_KEY`) + Groq (`GROQ_API_KEY`)

This notebook:
- Loads **`TAVIL_API_KEY`** (or fallback to `TAVILY_API_KEY`) and **`GROQ_API_KEY`** from your environment.
- Runs a web search using Tavily.
- Normalizes results, builds context, and asks **Groq** to generate a concise answer **with citations**.

## 1) Install dependencies (if needed)

In [None]:
# If running locally, uncomment to install:
# %pip install requests python-dotenv

## 2) Load environment & set config

In [18]:
import os, requests, json
from dotenv import load_dotenv

load_dotenv()  # Reads .env if present

# Read keys (supports your variable names exactly)
TAVIL_API_KEY   = os.getenv("TAVIL_API_KEY", "")
# Transparent fallback in case you also have TAVILY_API_KEY set
# Keys (set whichever you have)
GROQ_API_KEY = os.getenv("GROQ_API_KEY", "")
GROK_API_KEY = os.getenv("GROK_API_KEY", "")  # xAI

GROK_MODEL = os.getenv("GROK_MODEL", "grok-2-latest")

def have(provider:str) -> bool:
    return (provider == "groq" and bool(GROQ_API_KEY)) or (provider == "grok" and bool(GROK_API_KEY))

print("Groq key found? ", bool(GROQ_API_KEY))
print("Grok (xAI) key found? ", bool(GROK_API_KEY))

Groq key found?  True
Grok (xAI) key found?  True


## 3) Helpers: HTTP helpers + JSON pretty

In [19]:
def pjson(obj):
    print(json.dumps(obj, indent=2, ensure_ascii=False))

def safe_post(url, **kwargs):
    try:
        r = requests.post(url, timeout=45, **kwargs)
        r.raise_for_status()
        return r
    except Exception as e:
        print(f"[POST ERROR] {url} -> {e}")
        if hasattr(e, 'response') and e.response is not None:
            print(e.response.text)
        return None

def safe_get(url, **kwargs):
    try:
        r = requests.get(url, timeout=45, **kwargs)
        r.raise_for_status()
        return r
    except Exception as e:
        print(f"[GET ERROR] {url} -> {e}")
        if hasattr(e, 'response') and e.response is not None:
            print(e.response.text)
        return None

## 4) Tavily search (uses `TAVIL_API_KEY`)

In [20]:
def tavily_search(query: str, max_results: int = 5):
    if not TAVIL_API_KEY:
        raise RuntimeError("Missing TAVIL_API_KEY (or TAVILY_API_KEY). Add it to your environment or .env.")
    url = "https://api.tavily.com/search"
    headers = {"Authorization": f"Bearer {TAVIL_API_KEY}"}
    payload = {"query": query, "max_results": max_results}
    r = safe_post(url, headers=headers, json=payload)
    return r.json() if r else {}

# Quick test (change the query as you like)
query = "today's La Liga matches"
tav = tavily_search(query, max_results=5)
pjson(tav)

RuntimeError: Missing TAVIL_API_KEY (or TAVILY_API_KEY). Add it to your environment or .env.

## 5) Normalize results & build a compact context for RAG

In [6]:
from typing import List, Dict

def normalize_tavily(tav: Dict) -> List[Dict]:
    items = []
    if isinstance(tav, dict):
        for r in tav.get("results", []):
            items.append({
                "source": "tavily",
                "title": r.get("title"),
                "url": r.get("url"),
                "snippet": r.get("content") or "",
                "score": r.get("score")
            })
    # Deduplicate by URL
    uniq, seen = [], set()
    for it in items:
        u = it.get("url")
        if u and u not in seen:
            uniq.append(it)
            seen.add(u)
    return uniq

def build_context_from_items(items: List[Dict], top_k: int = 5) -> str:
    blocks = []
    for r in items[:top_k]:
        title = r.get("title") or ""
        url = r.get("url") or ""
        snip = r.get("snippet") or ""
        blocks.append(f"{title} {snip} Source: {url}")
    return "\n\n---\n\n".join(blocks)

norm_items = normalize_tavily(tav)
context = build_context_from_items(norm_items, top_k=5)
print(context[:1000] + ("..." if len(context) > 1000 else ""))

La Liga fixtures | Football La Liga · 08:00 EST. Levante. v. Celta Vigo · 10:15 EST. Alaves. v. Espanyol · 12:00 EST. Real Betis. v. Mallorca · 12:30 EST. Barcelona. v. Elche Source: https://www.theguardian.com/football/laligafootball/fixtures

---

Spanish La Liga - Scores & Fixtures - Football Sunday 26th October · Mallorca versus Levante kick off 13:00. MallorcaMallorcaMallorca. 13:00 13:00 · Real Madrid versus Barcelona kick off 15:15. Real Madrid Source: https://www.bbc.com/sport/football/spanish-la-liga/scores-fixtures

---

Spanish La Liga Scores & Fixtures Sunday 19th October · View fixture. Elche are scheduled to play Athletic Bilbao . · View fixture. Celta Vigo are scheduled to play Real Sociedad . · View fixture. Source: https://www.skysports.com/la-liga-scores-fixtures

---

Spain LaLiga Live Scores | Football LiveScore provides you with all the latest football scores from today's LaLiga matches. Real time live football scores and fixtures from Spain LaLiga. Keep Source: ht

## 6) Ask Groq (uses `GROQ_API_KEY`) with RAG context

In [21]:
# --- UNIFIED LLM CALLER ---
def ask_llm(question: str, context: str, provider: str = "groq", temperature: float = 0.2) -> str:
    """
    provider: 'groq' or 'grok'
    Uses OpenAI-compatible /v1/chat/completions for both vendors.
    """
    system_msg = (
        "You are a concise sports assistant. Use ONLY the provided context for factual claims. "
        "Include a short citations list with exact source URLs used. "
        "If the context lacks the answer, say you do not have enough information."
    )
    messages = [
        {"role": "system", "content": system_msg},
        {"role": "user", "content": f"Context:\n{context}\n\nQuestion:\n{question}"}
    ]

    if provider == "groq":
        if not GROQ_API_KEY:
            return "[ERR] GROQ_API_KEY is missing."
        url = "https://api.groq.com/openai/v1/chat/completions"
        headers = {"Authorization": f"Bearer {GROQ_API_KEY}", "Content-Type": "application/json"}
        payload = {"model": GROQ_MODEL, "messages": messages, "temperature": temperature, "stream": False}

    elif provider == "grok":  # xAI
        if not GROK_API_KEY:
            return "[ERR] GROK_API_KEY is missing."
        url = "https://api.x.ai/v1/chat/completions"
        headers = {"Authorization": f"Bearer {GROK_API_KEY}", "Content-Type": "application/json"}
        payload = {"model": GROK_MODEL, "messages": messages, "temperature": temperature, "stream": False}

    else:
        return "[ERR] provider must be 'groq' or 'grok'."

    try:
        r = requests.post(url, headers=headers, json=payload, timeout=60)
        # Helpful error surface
        if r.status_code != 200:
            return f"[ERR {r.status_code}] {r.text}"
        data = r.json()
        # Both APIs return OpenAI-like shape
        return data["choices"][0]["message"]["content"]
    except Exception as e:
        return f"[ERR] {e}"

In [10]:
# --- OPTIONAL: Groq SDK path (matches your summarizer.py style) ---
try:
    from groq import Groq
    _groq_ok = True
except Exception:
    _groq_ok = False
    print("[INFO] groq SDK not installed. You can pip install groq or stick to the HTTP version.")

def ask_groq_sdk(question: str, context: str, model: str = None, temperature: float = 0.2) -> str:
    if not _groq_ok:
        return "[ERR] groq SDK not installed."
    if not GROQ_API_KEY:
        return "[ERR] GROQ_API_KEY is missing."
    if model is None:
        model = GROQ_MODEL
    client = Groq(api_key=GROQ_API_KEY)

    system_msg = (
        "You are a concise sports assistant. Use ONLY the provided context for factual claims. "
        "Include a short citations list with exact source URLs used. "
        "If the context lacks the answer, say you do not have enough information."
    )
    try:
        chat = client.chat.completions.create(
            model=model,
            messages=[
                {"role": "system", "content": system_msg},
                {"role": "user", "content": f"Context:\n{context}\n\nQuestion:\n{question}"}
            ],
            temperature=temperature,
            stream=False
        )
        return chat.choices[0].message.content
    except Exception as e:
        return f"[ERR] {e}"

In [22]:
question = "Summarize today's La Liga matches and provide links to sources."

# Pick the provider you actually have a key for:
provider = "groq" if have("groq") else "grok"

print("Using provider:", provider)
ans = ask_llm(question, context=context, provider=provider)
print(ans)

Using provider: groq
Today's La Liga matches include:
- Levante v. Celta Vigo 
- Alaves v. Espanyol 
- Real Betis v. Mallorca 
- Barcelona v. Elche 

Citations:
1. https://www.theguardian.com/football/laligafootball/fixtures
2. https://www.bbc.com/sport/football/spanish-la-liga/scores-fixtures
3. https://www.skysports.com/la-liga-scores-fixtures
4. https://www.livescore.com/en/football/spain/laliga/
5. https://www.fotmob.com/leagues/87/overview/laliga
