# Minimal RAG: Tavily + Groq (Clean Notebook)

This notebook lets you ask a question, performs a **Tavily web search**, builds a short **context**, and asks **Groq** to answer.  
**Requirements:** set `TAVIL_API_KEY` (or `TAVILY_API_KEY`) and `GROQ_API_KEY` in your environment or a local `.env` file.

## 1) (Optional) Install dependencies

In [None]:
# If running locally, uncomment:
# %pip install requests python-dotenv

## 2) Load environment + basic settings

In [3]:
import os, json, requests
from typing import List, Dict
from dotenv import load_dotenv

load_dotenv()

# Keys (TAVIL_API_KEY preferred; fallback to TAVILY_API_KEY)
TAVIL_API_KEY = os.getenv("TAVIL_API_KEY", "") or os.getenv("TAVILY_API_KEY", "")
GROQ_API_KEY  = os.getenv("GROQ_API_KEY", "")

# Model (you can change if you like)
GROQ_MODEL = os.getenv("GROQ_MODEL", "")

print("TAVIL_API_KEY present? ", bool(TAVIL_API_KEY))
print("GROQ_API_KEY present?  ", bool(GROQ_API_KEY))
print("Groq model:            ", GROQ_MODEL)

TAVIL_API_KEY present?  True
GROQ_API_KEY present?   True
Groq model:             llama-3.3-70b-versatile


## 3) Helpers (HTTP + pretty JSON)

In [4]:
def pjson(obj):
    import json
    print(json.dumps(obj, indent=2, ensure_ascii=False))

def safe_post(url, **kwargs):
    import requests
    try:
        r = requests.post(url, timeout=45, **kwargs)
        r.raise_for_status()
        return r
    except Exception as e:
        print(f"[POST ERROR] {url} -> {e}")
        if hasattr(e, 'response') and e.response is not None:
            print(e.response.text)
        return None

## 4) Tavily search

In [5]:
def tavily_search(query: str, max_results: int = 5):
    if not TAVIL_API_KEY:
        raise RuntimeError("Missing TAVIL_API_KEY (or TAVILY_API_KEY). Put it in your environment or .env")
    url = "https://api.tavily.com/search"
    headers = {"Authorization": f"Bearer {TAVIL_API_KEY}"}
    payload = {"query": query, "max_results": max_results}
    r = safe_post(url, headers=headers, json=payload)
    return r.json() if r else {}

## 5) Normalize results + build compact context

In [6]:
from typing import List, Dict

def normalize_tavily(tav: Dict) -> List[Dict]:
    items = []
    if isinstance(tav, dict):
        for r in tav.get("results", []):
            items.append({
                "title": r.get("title"),
                "url": r.get("url"),
                "snippet": r.get("content") or "",
                "score": r.get("score")
            })
    uniq, seen = [], set()
    for it in items:
        u = it.get("url")
        if u and u not in seen:
            uniq.append(it)
            seen.add(u)
    return uniq

def build_context(items: List[Dict], top_k: int = 5) -> str:
    blocks = []
    for r in items[:top_k]:
        title = r.get("title") or ""
        url = r.get("url") or ""
        snip = r.get("snippet") or ""
        blocks.append(f"{title}\n{snip}\nSource: {url}")
    return "\n\n---\n\n".join(blocks)

## 6) Ask Groq with context (HTTP API)

In [None]:
def ask_groq(question: str, context: str, temperature: float = 0.2, model: str = None) -> str:
    if not GROQ_API_KEY:
        raise RuntimeError("Missing GROQ_API_KEY. Put it in your environment or .env")
    if model is None:
        model = GROQ_MODEL
    url = "https://api.groq.com/openai/v1/chat/completions"
    headers = {"Authorization": f"Bearer {GROQ_API_KEY}", "Content-Type": "application/json"}
    system_msg = (
        "You are a concise sports assistant. Use ONLY the provided context for factual claims. "
        "Include a short citations list with exact source URLs used. "
        "If the context lacks the answer, say you do not have enough information."
        "Do not provide any answers for content not related to sports"
    )
    messages = [
        {"role": "system", "content": system_msg},
        {"role": "user", "content": f"Context:\n{context}\n\nQuestion:\n{question}"}]
    payload = {"model": model, "messages": messages, "temperature": temperature, "stream": False}
    r = safe_post(url, headers=headers, json=payload)
    if not r:
        return "[ERR] Groq request failed."
    try:
        data = r.json()
        return data["choices"][0]["message"]["content"]
    except Exception as e:
        return f"[ERR] {e}"

## 7) One-call helper: search → context → Groq answer

In [8]:
def ask_with_web_search(user_query: str, top_k: int = 5) -> dict:
    tav = tavily_search(user_query, max_results=max(5, top_k))
    items = normalize_tavily(tav)
    ctx = build_context(items, top_k=top_k)
    ans = ask_groq(user_query, context=ctx)
    return {"answer": ans, "citations": items[:top_k]}

## 8) Ask your own question

In [9]:
user_query = "Who are the top goal scorers in Premier League this week?"
res = ask_with_web_search(user_query, top_k=5)
print(res["answer"])

print("\nCitations:")
for c in res["citations"]:
    print(f"- {c['title']} -> {c['url']}")

According to the provided sources, the top goal scorers in the Premier League are:

1. Erling Haaland with 5-9 goals (sources: Transfermarkt, SuperSport, Goal)
2. Other top scorers include Viktor Gyökeres, Antoine Semenyo, A. Semenyo, J. Anthony, and others with 3-6 goals (sources: SuperSport, Goal)

Please note that the exact number of goals for each player might vary slightly depending on the source.

Citations:
- https://www.transfermarkt.com/premier-league/torschuetzenliste/wettbewerb/GB1
- https://supersport.com/football/premier-league/top-scorers
- https://www.goal.com/en-us/premier-league/top-players/2kwbbcootiqqgmrzs6o5inle5

Citations:
- List of goalscorers Premier League 25/26 - Transfermarkt -> https://www.transfermarkt.com/premier-league/torschuetzenliste/wettbewerb/GB1
- Premier League Top Scorers | SuperSport -> https://supersport.com/football/premier-league/top-scorers
- Premier League Top Players - Goals, Assists, Tackles, Clean Sheets ... -> https://www.goal.com/en-us/