# üèÜ World Cup Squad Builder ‚Äî Reasoning Pipeline

**A fully grounded, multi-step LangChain pipeline to build your dream 23-man World Cup squad.**

### Pipeline Architecture
| Stage | Tool Name | Description |
|---|---|---|
| 1 | `dataset_discovery_tool` | Scans local datasets & lists all external data sources with URLs |
| 2 | `data_ingestion_tool` | Loads CSV stats + scrapes FIFA WC 2026 (BeautifulSoup4) + API-Football (cached) |
| 3 | `retrieval_or_filter_tool` | FAISS semantic search + position/stat filters |
| 4 | `reasoning_or_aggregation_tool` | Constraint engine: max 23, min 3 GKs, budget cap |
| 5 | `llm_synthesis_tool` | LLM-generated per-player justifications |
| 6 | `report_generation_tool` | Formatted squad + interactive Plotly visualisations |

**Memory** ‚Äî persists user `criteria` and `budget` across pipeline runs.

## 0 ¬∑ Setup & Imports

In [26]:
import os, json, glob, hashlib, time, warnings
import requests
import pandas as pd
from pathlib import Path
from datetime import datetime
from bs4 import BeautifulSoup
from dotenv import load_dotenv
import plotly.graph_objects as go
import plotly.express as px
from IPython.display import display, Markdown

warnings.filterwarnings("ignore")
load_dotenv()

# ‚îÄ‚îÄ LangChain ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain_community.vectorstores import FAISS
from langchain.schema import Document
from langchain.tools import tool
from langchain.prompts import PromptTemplate, ChatPromptTemplate
from langchain.chains import LLMChain, SequentialChain
from langchain.memory import ConversationBufferMemory
from langchain.agents import initialize_agent, AgentType

print("‚úÖ All packages imported successfully.")

‚úÖ All packages imported successfully.


## ‚öôÔ∏è Configuration
**Edit these values before running the pipeline.**

In [27]:
# ‚îÄ‚îÄ API Keys ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ
OPENAI_API_KEY    = os.getenv("OPENAI_API_KEY", "your-openai-api-key-here")
RAPIDAPI_KEY      = os.getenv("RAPIDAPI_KEY", "")      # Optional ‚Äì API-Football via RapidAPI
LLM_MODEL         = "gpt-4o-mini"                      # or "gpt-4o"

# ‚îÄ‚îÄ User Preferences (persisted in memory) ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ
USER_CRITERIA = "fast defenders with high interceptions, clinical strikers with xG > 5, creative midfielders with high assist rates and progressive passes"
BUDGET        = None          # ‚Ç¨M cap, e.g. 500.0  ‚Äî None = unlimited

# ‚îÄ‚îÄ Constants ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ
DATA_PATH   = "datasets/players_data-2024_2025.csv"
CACHE_DIR   = Path("cache")
FAISS_INDEX = "faiss_index"
MAX_SQUAD   = 23
POSITION_SLOTS = {"GK": (3, 3), "DF": (5, 7), "MF": (5, 8), "FW": (4, 6)}

# ‚îÄ‚îÄ External Data Sources (cited) ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ
DATA_SOURCES = {
    "local_csv":   {"path": DATA_PATH, "description": "FBRef 2024-25 season player stats (5 major leagues)"},
    "fifa_wiki":   {"url": "https://en.wikipedia.org/wiki/2026_FIFA_World_Cup",
                   "description": "FIFA World Cup 2026 ‚Äî host info, format, groups"},
    "api_football":{"url": "https://rapidapi.com/api-sports/api/api-football",
                   "description": "API-Football (RapidAPI) ‚Äî live & historical match data"},
}

CACHE_DIR.mkdir(exist_ok=True)
print(f"üéØ Criteria : {USER_CRITERIA}")
print(f"üí∞ Budget   : {'‚Ç¨{:.0f}M'.format(BUDGET) if BUDGET else 'No limit'}")
print(f"üìÅ Cache dir: {CACHE_DIR.resolve()}")

üéØ Criteria : fast defenders with high interceptions, clinical strikers with xG > 5, creative midfielders with high assist rates and progressive passes
üí∞ Budget   : No limit
üìÅ Cache dir: /Users/GenAI/cache


## üß† Memory ‚Äî Persisting User Preferences
A lightweight preferences store remembers `criteria` and `budget` across pipeline calls.

In [28]:
class SquadBuilderMemory:
    """
    Persists at least two user preferences across pipeline runs:
      - criteria_history : rolling log of squad criteria strings
      - budget_preference: last used budget cap (or None)
    Also wraps a LangChain ConversationBufferMemory for the agent.
    """
    _prefs_file = CACHE_DIR / "user_preferences.json"

    def __init__(self):
        self.langchain_memory = ConversationBufferMemory(
            memory_key="chat_history", return_messages=True
        )
        self._prefs = self._load()

    def _load(self) -> dict:
        if self._prefs_file.exists():
            return json.loads(self._prefs_file.read_text())
        return {"criteria_history": [], "budget_preference": None, "squad_history": []}

    def _save(self):
        self._prefs_file.write_text(json.dumps(self._prefs, indent=2))

    def update(self, criteria: str, budget):
        self._prefs["criteria_history"].append(
            {"criteria": criteria, "timestamp": datetime.now().isoformat()}
        )
        self._prefs["criteria_history"] = self._prefs["criteria_history"][-5:]  # keep last 5
        self._prefs["budget_preference"] = budget
        self._save()

    def save_squad(self, squad_summary: str):
        self._prefs["squad_history"].append(
            {"summary": squad_summary[:200], "timestamp": datetime.now().isoformat()}
        )
        self._prefs["squad_history"] = self._prefs["squad_history"][-3:]
        self._save()

    @property
    def last_criteria(self) -> str:
        hist = self._prefs["criteria_history"]
        return hist[-1]["criteria"] if hist else ""

    @property
    def last_budget(self):
        return self._prefs["budget_preference"]

    def summary(self) -> str:
        return (
            f"üìã Memory | Last criteria: '{self.last_criteria}' | "
            f"Last budget: {self.last_budget} | "
            f"Squads built: {len(self._prefs['squad_history'])}"
        )


memory = SquadBuilderMemory()
memory.update(USER_CRITERIA, BUDGET)
print(memory.summary())

üìã Memory | Last criteria: 'fast defenders with high interceptions, clinical strikers with xG > 5, creative midfielders with high assist rates and progressive passes' | Last budget: None | Squads built: 2


## üóÑÔ∏è Caching Layer
All HTTP responses are cached to disk (keyed by URL hash). Avoids repeated API calls.

In [29]:
def cache_key(url: str) -> str:
    return hashlib.md5(url.encode()).hexdigest() + ".json"


def cached_get(url: str, headers: dict | None = None, ttl_hours: int = 24) -> dict | None:
    """
    HTTP GET with local disk cache (JSON). Respects TTL.
    Returns parsed JSON or {"html": ...} for HTML responses.
    """
    path = CACHE_DIR / cache_key(url)

    # ‚îÄ‚îÄ Serve from cache if fresh ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ
    if path.exists():
        cached = json.loads(path.read_text())
        age_h = (time.time() - cached["cached_at"]) / 3600
        if age_h < ttl_hours:
            print(f"   üíæ Cache hit  ({age_h:.1f}h old): {url[:60]}...")
            return cached["data"]

    # ‚îÄ‚îÄ Fetch live ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ
    print(f"   üåê Fetching: {url[:80]}")
    try:
        resp = requests.get(url, headers=headers or {}, timeout=15)
        resp.raise_for_status()
        ct = resp.headers.get("Content-Type", "")
        data = resp.json() if "json" in ct else {"html": resp.text, "url": url}
    except Exception as e:
        print(f"   ‚ö†Ô∏è  Request failed: {e}")
        return None

    path.write_text(json.dumps({"cached_at": time.time(), "data": data}))
    return data


print("‚úÖ Caching layer ready.")

‚úÖ Caching layer ready.


---
## Tool 1 ‚Äî `dataset_discovery_tool`
Discovers all available local datasets and external data sources with cited URLs.

In [30]:
@tool
def dataset_discovery_tool(query: str = "") -> str:
    """
    Discovers available data sources for World Cup squad building.
    Returns local file inventory and external API/web source catalogue with URLs.
    Input: optional filter keyword (e.g. 'player', 'fixtures').
    """
    # ‚îÄ‚îÄ Local files ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ
    local_files = []
    for f in glob.glob("datasets/**/*", recursive=True):
        p = Path(f)
        if p.is_file():
            local_files.append({
                "file":        str(p),
                "size_kb":     round(p.stat().st_size / 1024, 1),
                "description": DATA_SOURCES.get("local_csv", {}).get("description", "Player statistics"),
            })

    # ‚îÄ‚îÄ External sources ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ
    external = [
        {
            "name":   "FBRef 2024-25 Player Stats (CSV)",
            "source": "local",
            "path":   DATA_PATH,
        },
        {
            "name":        "FIFA World Cup 2026 ‚Äî Wikipedia",
            "url":         DATA_SOURCES["fifa_wiki"]["url"],
            "description": DATA_SOURCES["fifa_wiki"]["description"],
            "method":      "BeautifulSoup4 HTML parsing",
        },
        {
            "name":        "API-Football (RapidAPI)",
            "url":         DATA_SOURCES["api_football"]["url"],
            "description": DATA_SOURCES["api_football"]["description"],
            "method":      "REST API ‚Äî requires RAPIDAPI_KEY header",
            "endpoints":   [
                "/players?league=1&season=2024  ‚Üí World Cup player stats",
                "/teams?league=1               ‚Üí Participating teams",
                "/fixtures?league=1&season=2026 ‚Üí Match fixtures",
            ],
        },
    ]

    result = {
        "local_datasets":  local_files,
        "external_sources": external,
        "total_local_files": len(local_files),
    }
    output = json.dumps(result, indent=2)
    print("üìÇ dataset_discovery_tool ‚Üí")
    print(f"   {len(local_files)} local file(s) | {len(external)} external source(s)")
    return output


# ‚îÄ‚îÄ Run Tool 1 ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ
discovery_result = json.loads(dataset_discovery_tool.invoke("player stats"))
for src in discovery_result["external_sources"]:
    print(f"  ‚Ä¢ {src['name']}")
    if 'url' in src:
        print(f"    URL: {src['url']}")

üìÇ dataset_discovery_tool ‚Üí
   1 local file(s) | 3 external source(s)
  ‚Ä¢ FBRef 2024-25 Player Stats (CSV)
  ‚Ä¢ FIFA World Cup 2026 ‚Äî Wikipedia
    URL: https://en.wikipedia.org/wiki/2026_FIFA_World_Cup
  ‚Ä¢ API-Football (RapidAPI)
    URL: https://rapidapi.com/api-sports/api/api-football


---
## Tool 2 ‚Äî `data_ingestion_tool`
Loads the local CSV, scrapes FIFA WC 2026 context via BeautifulSoup4, and queries API-Football (with disk cache + graceful fallback).

In [31]:
# ‚îÄ‚îÄ Helpers used inside the tool ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ

def load_player_csv(path: str) -> pd.DataFrame:
    """Load CSV, select key columns, deduplicate, add synthetic market value."""
    df = pd.read_csv(path, low_memory=False)
    keep = [
        "Player","Nation","Pos","Squad","Comp","Age",
        "Gls","Ast","G+A","xG","xAG",
        "PrgC","PrgP","PrgR",
        "SCA","GCA","Tkl","Int","Clr",
        "CrdY","CrdR","MP","Starts","Min",
        "GA90","Save%","CS%",
    ]
    keep = [c for c in keep if c in df.columns]
    df = df[keep].copy()
    for col in [c for c in keep if c not in ("Player","Nation","Pos","Squad","Comp")]:
        df[col] = pd.to_numeric(df[col], errors="coerce")
    df = df.sort_values("Min", ascending=False).drop_duplicates(subset="Player").reset_index(drop=True)
    df["PrimaryPos"] = df["Pos"].str.split(",").str[0]
    df["MarketValue"] = (
        (df["Gls"].fillna(0)*2.5)+(df["Ast"].fillna(0)*1.5)+
        (df["xG"].fillna(0)*1.0)+(df["SCA"].fillna(0)*0.05)+
        (df["Tkl"].fillna(0)*0.1)-
        (df["Age"].fillna(27)-24).clip(-5,5)*1.2+
        (df["Min"].fillna(0)/90)*0.3+5
    ).clip(lower=1).round(1)
    return df


def scrape_wc2026_wikipedia() -> dict:
    """Scrape FIFA WC 2026 Wikipedia page for tournament context."""
    url = DATA_SOURCES["fifa_wiki"]["url"]
    data = cached_get(url, ttl_hours=72)
    if not data or "html" not in data:
        return {"error": "Wikipedia scrape failed", "source": url}

    soup = BeautifulSoup(data["html"], "lxml")
    result = {"source_url": url, "source": "Wikipedia ‚Äî scraped via BeautifulSoup4"}
    result["title"] = soup.find("h1").get_text(strip=True) if soup.find("h1") else "2026 FIFA World Cup"

    infobox = soup.find("table", class_="infobox")
    if infobox:
        info = {}
        for row in infobox.find_all("tr"):
            th, td = row.find("th"), row.find("td")
            if th and td:
                info[th.get_text(strip=True)] = td.get_text(" ", strip=True)[:120]
        result["infobox"] = info

    paras = soup.select("#mw-content-text p")
    intro = next((p.get_text(strip=True) for p in paras if len(p.get_text(strip=True)) > 80), "")
    result["intro"] = intro[:400]
    return result


# ‚îÄ‚îÄ API constants (confirmed working endpoints ‚Äî free tier) ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ
_API_HOST    = "free-api-live-football-data.p.rapidapi.com"
_API_BASE    = f"https://{_API_HOST}"
_API_HEADERS = lambda key: {
    "x-rapidapi-host": _API_HOST,
    "x-rapidapi-key":  key,
}

# Confirmed working endpoints on this free tier:
#   /football-players-search?search=<name>  ‚Üí player search & current team
#   /football-get-all-leagues               ‚Üí full league catalogue (WC = id 77)
#   /football-current-live                  ‚Üí live match scores
_API_ENDPOINTS = {
    "player_search":  "/football-players-search",
    "all_leagues":    "/football-get-all-leagues",
    "live_scores":    "/football-current-live",
}


def fetch_football_api(endpoint_key: str, params: dict | None = None) -> dict | None:
    """
    Call the free-tier Football API via RapidAPI.
    Endpoints are disk-cached; falls back gracefully if key not set.
    Host  : free-api-live-football-data.p.rapidapi.com
    Source: https://rapidapi.com/heisenbug/api/free-api-live-football-data
    """
    if not RAPIDAPI_KEY:
        return {"error": "RAPIDAPI_KEY not set ‚Äî skipping live API",
                "source_url": f"https://rapidapi.com/heisenbug/api/free-api-live-football-data"}

    path = _API_ENDPOINTS.get(endpoint_key, endpoint_key)
    qs   = ("?" + "&".join(f"{k}={v}" for k, v in params.items())) if params else ""
    url  = f"{_API_BASE}{path}{qs}"
    return cached_get(url, headers=_API_HEADERS(RAPIDAPI_KEY), ttl_hours=6)


def enrich_players_from_api(df: pd.DataFrame, sample_n: int = 30) -> dict:
    """
    Use football-players-search to verify current club & team ID for top players.
    Returns a dict mapping player name ‚Üí {teamId, teamName} for enrichment.
    """
    enriched = {}
    top_players = df.nlargest(sample_n, "MarketValue")["Player"].tolist()
    for name in top_players:
        search_term = name.split()[-1]  # last name
        result = fetch_football_api("player_search", {"search": search_term})
        if result and "response" in result:
            suggestions = result["response"].get("suggestions", [])
            # Match by name similarity
            for s in suggestions:
                if any(part.lower() in s.get("name","").lower() for part in name.split()):
                    enriched[name] = {"api_team": s.get("teamName",""), "api_id": s.get("id","")}
                    break
    return enriched


# ‚îÄ‚îÄ Global state populated by Tool 2 ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ
_players_df: pd.DataFrame | None = None
_wc_context: dict = {}
_api_data:   dict = {}
_api_enrichment: dict = {}


@tool
def data_ingestion_tool(source: str = "all") -> str:
    """
    Ingests player data from multiple sources with disk caching:
    1. Local CSV  ‚Äî FBRef 2024-25 season stats (5 major leagues)
    2. Wikipedia  ‚Äî FIFA WC 2026 page scraped via BeautifulSoup4
    3. Football API (RapidAPI) ‚Äî live enrichment via football-players-search,
       football-get-all-leagues (WC league id=77), football-current-live
    Input: 'all' | 'csv' | 'web' | 'api'
    """
    global _players_df, _wc_context, _api_data, _api_enrichment
    summary = {}

    if source in ("all", "csv"):
        print("   üìä Loading local CSV...")
        _players_df = load_player_csv(DATA_PATH)
        summary["csv"] = {
            "rows":      len(_players_df),
            "positions": _players_df["PrimaryPos"].value_counts().to_dict(),
            "source":    DATA_PATH,
            "citation":  "FBRef.com ‚Äî 2024-25 season stats across 5 top European leagues",
        }

    if source in ("all", "web"):
        print("   üåê Scraping FIFA WC 2026 (Wikipedia)...")
        _wc_context = scrape_wc2026_wikipedia()
        summary["web"] = {
            "title":      _wc_context.get("title", "N/A"),
            "source_url": _wc_context.get("source_url"),
            "status":     "ok" if "intro" in _wc_context else "partial",
        }

    if source in ("all", "api"):
        print("   üîå Querying Football API (RapidAPI)...")

        # 1. Get all leagues ‚Äî confirm WC 2026 entry (id=77)
        leagues_data = fetch_football_api("all_leagues") or {}
        leagues      = leagues_data.get("response", {}).get("leagues", [])
        wc_league    = next((l for l in leagues if l.get("id") == 77), {})

        # 2. Get live scores for context
        live_data    = fetch_football_api("live_scores") or {}
        live_matches = live_data.get("response", {}).get("live", [])

        # 3. Enrich top players with current club data
        if _players_df is not None:
            print("      Enriching top 20 players via player search API...")
            _api_enrichment = enrich_players_from_api(_players_df, sample_n=20)

        _api_data = {
            "wc_league":      wc_league,
            "live_match_count": len(live_matches),
            "enriched_players": len(_api_enrichment),
        }

        summary["api"] = {
            "host":             _API_HOST,
            "source_url":       f"https://rapidapi.com/heisenbug/api/free-api-live-football-data",
            "wc_league":        wc_league.get("name", "Not found"),
            "wc_league_id":     wc_league.get("id", "N/A"),
            "live_matches":     len(live_matches),
            "enriched_players": len(_api_enrichment),
            "endpoints_used":   list(_API_ENDPOINTS.values()),
            "status":           "ok" if not RAPIDAPI_KEY or wc_league else "skipped",
            "note":             "RAPIDAPI_KEY not set" if not RAPIDAPI_KEY else "Live data fetched",
        }

    print(f"   ‚úÖ Ingestion complete ‚Äî {summary.get('csv',{}).get('rows',0)} players loaded")
    return json.dumps(summary, indent=2)


# ‚îÄ‚îÄ Run Tool 2 ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ
ingestion_result = json.loads(data_ingestion_tool.invoke("all"))
print(f"\nCSV : {ingestion_result['csv']['rows']} players")
print(f"Web : {ingestion_result['web']['title']} ({ingestion_result['web']['status']})")
api  = ingestion_result.get("api", {})
print(f"API : {api.get('note','')} | WC league: {api.get('wc_league','N/A')} (id={api.get('wc_league_id','N/A')})")
print(f"     Live matches: {api.get('live_matches',0)} | Enriched players: {api.get('enriched_players',0)}")


   üìä Loading local CSV...
   üåê Scraping FIFA WC 2026 (Wikipedia)...
   üåê Fetching: https://en.wikipedia.org/wiki/2026_FIFA_World_Cup
   ‚ö†Ô∏è  Request failed: 403 Client Error: Forbidden for url: https://en.wikipedia.org/wiki/2026_FIFA_World_Cup
   üîå Querying Football API (RapidAPI)...
   üíæ Cache hit  (0.4h old): https://free-api-live-football-data.p.rapidapi.com/football-...
   üíæ Cache hit  (0.4h old): https://free-api-live-football-data.p.rapidapi.com/football-...
      Enriching top 20 players via player search API...
   üíæ Cache hit  (0.2h old): https://free-api-live-football-data.p.rapidapi.com/football-...
   üíæ Cache hit  (0.2h old): https://free-api-live-football-data.p.rapidapi.com/football-...
   üíæ Cache hit  (0.2h old): https://free-api-live-football-data.p.rapidapi.com/football-...
   üíæ Cache hit  (0.2h old): https://free-api-live-football-data.p.rapidapi.com/football-...
   üíæ Cache hit  (0.2h old): https://free-api-live-football-data.p.rapid

In [32]:
# ‚îÄ‚îÄ Preview loaded data ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ
display(Markdown(f"### WC 2026 Context\n> {_wc_context.get('intro', 'N/A')}"))
display(Markdown(f"**Source:** [{_wc_context.get('source_url','')}]({_wc_context.get('source_url','')})"))

_players_df[["Player","Nation","Pos","Squad","Age","Gls","Ast","xG","MarketValue"]].head(8)

### WC 2026 Context
> N/A

**Source:** []()

Unnamed: 0,Player,Nation,Pos,Squad,Age,Gls,Ast,xG,MarketValue
0,David Raya,es ESP,GK,Arsenal,28.0,0,0,0.0,11.9
1,Obite N'Dicka,ci CIV,DF,Roma,24.0,0,1,1.0,23.8
2,Wladimiro Falcone,it ITA,GK,Lecce,29.0,0,0,0.0,11.0
3,David Soria,es ESP,GK,Getafe,31.0,0,0,0.0,11.2
4,Joan Garc√≠a,es ESP,GK,Espanyol,23.0,0,0,0.1,18.2
5,Federico Baschirotto,it ITA,DF,Lecce,27.0,2,0,1.2,23.6
6,Bernd Leno,de GER,GK,Fulham,32.0,0,1,0.0,12.4
7,Dean Henderson,eng ENG,GK,Crystal Palace,27.0,0,0,0.0,13.2


---
## Tool 3 ‚Äî `retrieval_or_filter_tool`
Builds a FAISS vector store from natural-language player descriptions, then runs semantic search with position filtering.

In [33]:
def player_description(row: pd.Series) -> str:
    """Rich natural-language description for embedding."""
    pos, squad, age = row.get("Pos","?"), row.get("Squad","?"), row.get("Age","?")
    desc = f"{row['Player']} is a {pos} for {squad}, age {age}, "
    desc += f"{int(row.get('MP',0) or 0)} matches ({int(row.get('Min',0) or 0)} mins). "

    if row.get("PrimaryPos") == "GK":
        ga90, sv, cs = row.get("GA90"), row.get("Save%"), row.get("CS%")
        if all(pd.notna(x) for x in [ga90, sv, cs]):
            desc += f"Goalkeeper: GA/90={ga90:.2f}, Save%={sv:.1f}%, CS%={cs:.1f}%."
        else:
            desc += "Goalkeeper."
    else:
        g,a,xg,xag = (row.get(k,0) or 0 for k in ["Gls","Ast","xG","xAG"])
        tk,it,sca  = (row.get(k,0) or 0 for k in ["Tkl","Int","SCA"])
        pc,pp      = (row.get(k,0) or 0 for k in ["PrgC","PrgP"])
        desc += (f"Goals={int(g)}, Assists={int(a)}, xG={xg:.1f}, xAG={xag:.1f}. "
                 f"Tackles={int(tk)}, Interceptions={int(it)}, SCA={int(sca)}. "
                 f"ProgCarries={int(pc)}, ProgPasses={int(pp)}.")

    desc += f" Market value ~‚Ç¨{row.get('MarketValue',0):.1f}M."
    return desc


def build_or_load_vector_store(df: pd.DataFrame, emb: OpenAIEmbeddings) -> FAISS:
    if os.path.exists(FAISS_INDEX):
        print("   üìÇ Loading existing FAISS index...")
        return FAISS.load_local(FAISS_INDEX, emb, allow_dangerous_deserialization=True)

    print("   üî® Building FAISS index (~1-2 min)...")
    docs = [
        Document(
            page_content=player_description(row),
            metadata={
                "player":        row["Player"],
                "pos":           row.get("Pos",""),
                "primary_pos":   row.get("PrimaryPos",""),
                "squad":         row.get("Squad",""),
                "nation":        row.get("Nation",""),
                "age":           float(row.get("Age",0) or 0),
                "goals":         float(row.get("Gls",0) or 0),
                "assists":       float(row.get("Ast",0) or 0),
                "xg":            float(row.get("xG",0) or 0),
                "xag":           float(row.get("xAG",0) or 0),
                "tackles":       float(row.get("Tkl",0) or 0),
                "interceptions": float(row.get("Int",0) or 0),
                "sca":           float(row.get("SCA",0) or 0),
                "prgp":          float(row.get("PrgP",0) or 0),
                "market_value":  float(row.get("MarketValue",0) or 0),
                "description":   player_description(row),
            }
        )
        for _, row in df.iterrows()
    ]
    vs = FAISS.from_documents(docs, emb)
    vs.save_local(FAISS_INDEX)
    print("   ‚úÖ FAISS index saved.")
    return vs


_embeddings    = OpenAIEmbeddings(openai_api_key=OPENAI_API_KEY)
_vector_store  = build_or_load_vector_store(_players_df, _embeddings)


@tool
def retrieval_or_filter_tool(query_json: str) -> str:
    """
    Retrieves relevant players using FAISS semantic search.
    Input JSON: {"criteria": "...", "pos_filter": "GK|DF|MF|FW", "k": 80}
    Returns list of top candidate players with stats.
    """
    params = json.loads(query_json) if isinstance(query_json, str) else query_json
    criteria   = params.get("criteria", USER_CRITERIA)
    pos_filter = params.get("pos_filter")
    k          = params.get("k", 80)

    pos_prefix = {"GK": "goalkeeper", "DF": "defender", "MF": "midfielder", "FW": "forward striker"}
    query = f"{pos_prefix.get(pos_filter, '')} {criteria}".strip()

    results = _vector_store.similarity_search(query, k=k)
    candidates = [r.metadata for r in results]

    if pos_filter:
        candidates = [c for c in candidates if pos_filter.upper() in c.get("pos","").upper()]

    print(f"   üîç retrieval_or_filter_tool ‚Üí pos={pos_filter} ‚Üí {len(candidates)} candidates")
    return json.dumps(candidates)


# ‚îÄ‚îÄ Run Tool 3 ‚Äî retrieve for all positions ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ
_candidates_by_pos = {}
for pos in ["GK","DF","MF","FW"]:
    raw = retrieval_or_filter_tool.invoke(
        json.dumps({"criteria": USER_CRITERIA, "pos_filter": pos, "k": 80})
    )
    _candidates_by_pos[pos] = json.loads(raw)

print("\nTop 3 per position:")
for pos, cands in _candidates_by_pos.items():
    print(f"  {pos}: {[c['player'] for c in cands[:3]]}")

   üìÇ Loading existing FAISS index...
   üîç retrieval_or_filter_tool ‚Üí pos=GK ‚Üí 7 candidates
   üîç retrieval_or_filter_tool ‚Üí pos=DF ‚Üí 48 candidates
   üîç retrieval_or_filter_tool ‚Üí pos=MF ‚Üí 41 candidates
   üîç retrieval_or_filter_tool ‚Üí pos=FW ‚Üí 46 candidates

Top 3 per position:
  GK: ['Paulo Gazzaniga', 'Benjamin Siegrist', 'Josep Martinez']
  DF: ['Christian G√ºnter', 'Carl Starfelt', 'Finley Stevens']
  MF: ['Kristjan Asllani', 'Andreas Pereira', 'Diego Moreira']
  FW: ['Niclas F√ºllkrug', 'Nikola Krstoviƒá', 'Mario G√∂tze']


---
## Tool 4 ‚Äî `reasoning_or_aggregation_tool`
Applies squad constraints (max 23, min 3 GKs, position slots, optional budget) and ranks candidates by composite score.

In [34]:
def composite_score(p: dict) -> float:
    """Weighted multi-stat score used for candidate ranking within each position."""
    return (
        p.get("goals",0)*2.5 + p.get("assists",0)*1.8 +
        p.get("xg",0)*1.2   + p.get("xag",0)*1.0 +
        p.get("sca",0)*0.12  + p.get("prgp",0)*0.08 +
        p.get("tackles",0)*0.25 + p.get("interceptions",0)*0.25
    )


@tool
def reasoning_or_aggregation_tool(constraint_json: str) -> str:
    """
    Selects the final 23-player squad by applying World Cup roster rules:
    - Exactly 3 GKs; position slot ranges for DF/MF/FW; total ‚â§ 23
    - Optional budget cap (sum of MarketValue)
    - Ranks by composite score; flags constraint violations
    Input JSON: {"budget": null|float}
    Returns: squad list + constraint report.
    """
    params = json.loads(constraint_json) if isinstance(constraint_json, str) else constraint_json
    budget = params.get("budget")

    squad, constraint_report = [], []

    for pos, (min_cnt, max_cnt) in POSITION_SLOTS.items():
        pool = sorted(_candidates_by_pos.get(pos, []), key=composite_score, reverse=True)
        selected = 0

        for player in pool:
            if selected >= max_cnt:
                break
            if budget is not None:
                spent = sum(p["market_value"] for p in squad)
                if spent + player.get("market_value",0) > budget:
                    continue
            squad.append({**player, "slot_pos": pos, "score": round(composite_score(player),2)})
            selected += 1

        status = "‚úÖ" if selected >= min_cnt else "‚ö†Ô∏è"
        constraint_report.append(
            f"{status} {pos}: {selected}/{min_cnt}-{max_cnt} required"
        )

    squad = squad[:MAX_SQUAD]
    total_val = round(sum(p["market_value"] for p in squad), 1)

    result = {
        "squad":            squad,
        "squad_size":       len(squad),
        "total_value_eur_m": total_val,
        "budget_used":      f"‚Ç¨{total_val}M" + (f" / ‚Ç¨{budget}M cap" if budget else ""),
        "constraint_report": constraint_report,
        "limitations": [
            "Stats sourced from 5 European leagues only ‚Äî WC-specific form not captured",
            "Market value is heuristic-based, not official transfer market data",
            "International tournament performance not factored in",
        ],
    }

    print("   ‚öôÔ∏è  reasoning_or_aggregation_tool ‚Üí")
    for line in constraint_report:
        print(f"      {line}")
    print(f"      Total squad: {len(squad)} players | ‚Ç¨{total_val}M")

    return json.dumps(result)


# ‚îÄ‚îÄ Run Tool 4 ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ
_constraint_raw = reasoning_or_aggregation_tool.invoke(
    json.dumps({"budget": BUDGET})
)
_constraint_result = json.loads(_constraint_raw)
_squad = _constraint_result["squad"]

# Preview
sq_df = pd.DataFrame([{
    "Pos":    p["slot_pos"], "Player": p["player"], "Nation": p["nation"],
    "Club":   p["squad"],   "Age":    p["age"],   "Goals":  p["goals"],
    "Ast":    p["assists"], "xG":     round(p["xg"],1), "Score": p["score"],
    "‚Ç¨M":     p["market_value"]
} for p in _squad])
sq_df

   ‚öôÔ∏è  reasoning_or_aggregation_tool ‚Üí
      ‚úÖ GK: 3/3-3 required
      ‚úÖ DF: 7/5-7 required
      ‚úÖ MF: 8/5-8 required
      ‚úÖ FW: 6/4-6 required
      Total squad: 23 players | ‚Ç¨1158.1M


Unnamed: 0,Pos,Player,Nation,Club,Age,Goals,Ast,xG,Score,‚Ç¨M
0,GK,Marko Dmitroviƒá,rs SRB,Legan√©s,32.0,0.0,0.0,0.0,1.25,9.0
1,GK,Michael Zetterer,de GER,Werder Bremen,29.0,0.0,0.0,0.0,1.09,9.4
2,GK,Dominik Greif,sk SVK,Mallorca,27.0,0.0,0.0,0.0,0.6,11.0
3,DF,Maximilian Mittelst√§dt,de GER,Stuttgart,27.0,1.0,7.0,0.9,78.7,36.3
4,DF,Rasmus Kristensen,dk DEN,Eint Frankfurt,27.0,5.0,3.0,2.5,64.71,38.2
5,DF,Diego Moreira,pt POR,Strasbourg,19.0,2.0,7.0,2.0,62.05,44.8
6,DF,Robin Gosens,de GER,Fiorentina,30.0,5.0,5.0,1.8,60.64,37.8
7,DF,Dimitris Giannoulis,gr GRE,Augsburg,28.0,1.0,4.0,1.1,56.16,27.1
8,DF,Daley Blind,nl NED,Girona,34.0,0.0,2.0,0.5,53.45,19.0
9,DF,Iglesias,es ESP,Getafe,26.0,0.0,1.0,1.6,52.47,26.4


---
## Tool 5 ‚Äî `llm_synthesis_tool`
Sends the 23-player squad to an LLM chain with a structured prompt to generate per-player justifications and a tactical summary.

In [35]:
_llm = ChatOpenAI(model=LLM_MODEL, temperature=0.3, openai_api_key=OPENAI_API_KEY)

_SYNTHESIS_PROMPT = PromptTemplate(
    input_variables=["criteria", "wc_context", "squad_json", "budget_note", "limitations"],
    template="""
You are a world-class football analytics expert building a FIFA World Cup 2026 squad.

TOURNAMENT CONTEXT:
{wc_context}

USER CRITERIA: {criteria}
BUDGET NOTE: {budget_note}

SELECTED 23-PLAYER SQUAD (JSON with stats):
{squad_json}

INSTRUCTIONS:
1. Group by position: GK ‚Üí DF ‚Üí MF ‚Üí FW.
2. For EACH player: one sentence justification grounded in their stats and criteria.
3. Write a 3-sentence TACTICAL SUMMARY covering shape, strengths, and style.
4. Close with a LIMITATIONS section acknowledging: {limitations}

FORMAT:
=== GOALKEEPERS ===
- [Name] ([Nation] | [Club], Age [Age]): [Justification citing specific stat(s)]

=== DEFENDERS ===
...

=== MIDFIELDERS ===
...

=== FORWARDS ===
...

=== TACTICAL SUMMARY ===
[3 sentences]

=== LIMITATIONS & DATA SOURCES ===
[Bullet points]
""",
)

_synthesis_chain = LLMChain(llm=_llm, prompt=_SYNTHESIS_PROMPT)


@tool
def llm_synthesis_tool(squad_json: str) -> str:
    """
    Generates LLM-powered, evidence-backed justifications for each player selection.
    Input: JSON string of the 23-player squad.
    Output: Formatted squad narrative with per-player justifications and tactical summary.
    """
    slim_squad = json.loads(squad_json) if isinstance(squad_json, str) else squad_json

    wc_ctx = (
        f"{_wc_context.get('title','FIFA WC 2026')}: {_wc_context.get('intro','')[:300]}"
        f" (Source: {_wc_context.get('source_url','')})"
    )
    budget_note = f"‚Ç¨{BUDGET:.0f}M total cap" if BUDGET else "No budget constraint."
    limitations = " | ".join(_constraint_result.get("limitations", []))

    # Slim payload to avoid token overflow
    payload = [
        {k: p[k] for k in
         ["player","nation","squad","slot_pos","age","goals","assists","xg","tackles","sca","market_value"]}
        for p in slim_squad
    ]

    print("   üß† llm_synthesis_tool ‚Äî calling LLM...")
    result = _synthesis_chain.invoke({
        "criteria":    USER_CRITERIA,
        "wc_context":  wc_ctx,
        "squad_json":  json.dumps(payload, indent=2),
        "budget_note": budget_note,
        "limitations": limitations,
    })
    output = result["text"] if isinstance(result, dict) else str(result)
    print("   ‚úÖ Justifications generated.")
    return output


# ‚îÄ‚îÄ Run Tool 5 ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ
_justified_squad = llm_synthesis_tool.invoke(json.dumps(_squad))
print(_justified_squad)

   üß† llm_synthesis_tool ‚Äî calling LLM...
   ‚úÖ Justifications generated.
=== GOALKEEPERS ===
- Marko Dmitroviƒá (SRB | Legan√©s, Age 32): A reliable shot-stopper with solid experience, although his stats show no goals or assists, indicating a traditional goalkeeper role.
- Michael Zetterer (GER | Werder Bremen, Age 29): While he has no goals or assists, his presence in goal is backed by a decent number of shot-stopping opportunities, contributing to the team's defensive stability.
- Dominik Greif (SVK | Mallorca, Age 27): Like his counterparts, Greif has not contributed offensively, but his role as a goalkeeper is primarily defensive, where he has shown competence.

=== DEFENDERS ===
- Maximilian Mittelst√§dt (GER | Stuttgart, Age 27): With 1 goal and 7 assists, Mittelst√§dt showcases his ability to contribute offensively, complemented by a high number of tackles (79) that underline his defensive prowess.
- Rasmus Kristensen (DEN | Eintracht Frankfurt, Age 27): His 5 goals and 3 

---
## Tool 6 ‚Äî `report_generation_tool`
Produces a formatted report card and interactive Plotly visualisations.

In [36]:
@tool
def report_generation_tool(report_json: str) -> str:
    """
    Generates the final formatted squad report with metadata and interactive visualisations.
    Input JSON: {"squad": [...], "narrative": "...", "criteria": "...", "budget": ...}
    """
    params    = json.loads(report_json) if isinstance(report_json, str) else report_json
    squad     = params["squad"]
    narrative = params.get("narrative", "")
    criteria  = params.get("criteria", USER_CRITERIA)
    budget    = params.get("budget")

    total_val = sum(p["market_value"] for p in squad)

    # ‚îÄ‚îÄ Header ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ
    header = (
        f"\n{'‚ïê'*64}\n"
        f"  üèÜ  WORLD CUP 2026 ‚Äî DREAM SQUAD REPORT\n"
        f"{'‚ïê'*64}\n"
        f"  Criteria  : {criteria}\n"
        f"  Budget    : {'‚Ç¨{:.0f}M cap'.format(budget) if budget else 'Unlimited'}\n"
        f"  Squad size: {len(squad)} players | Total value: ‚Ç¨{total_val:.1f}M\n"
        f"  Generated : {datetime.now().strftime('%Y-%m-%d %H:%M')}\n"
        f"  Sources   : FBRef CSV | Wikipedia (WC 2026) | API-Football (RapidAPI)\n"
        f"{'‚ïê'*64}\n"
    )
    print(header)
    display(Markdown(narrative))

    # ‚îÄ‚îÄ Chart 1: Squad Value by Position (Bar) ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ
    sq_df = pd.DataFrame(squad)
    fig1  = px.bar(
        sq_df.sort_values("market_value", ascending=False),
        x="player", y="market_value", color="slot_pos",
        title="Squad Market Value by Player (‚Ç¨M)",
        labels={"player": "Player", "market_value": "Value (‚Ç¨M)", "slot_pos": "Position"},
        height=420,
    )
    fig1.update_layout(xaxis_tickangle=-45)
    fig1.show()

    # ‚îÄ‚îÄ Chart 2: Position Breakdown (Pie) ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ
    pos_val = sq_df.groupby("slot_pos")["market_value"].sum().reset_index()
    fig2    = px.pie(
        pos_val, names="slot_pos", values="market_value",
        title="Budget Split by Position",
        color_discrete_sequence=px.colors.qualitative.Set2,
    )
    fig2.show()

    # ‚îÄ‚îÄ Chart 3: Radar ‚Äî Average Attacking vs Defensive Metrics ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ
    cats = ["Goals","Assists","xG","SCA","Tackles","Interceptions"]
    fws  = sq_df[sq_df["slot_pos"]=="FW"]
    dfs  = sq_df[sq_df["slot_pos"]=="DF"]
    avgs = {
        "Forwards":  [fws["goals"].mean(), fws["assists"].mean(), fws["xg"].mean(),
                      fws["sca"].mean(), fws["tackles"].mean(), fws["interceptions"].mean()],
        "Defenders": [dfs["goals"].mean(), dfs["assists"].mean(), dfs["xg"].mean(),
                      dfs["sca"].mean(), dfs["tackles"].mean(), dfs["interceptions"].mean()],
    }
    fig3 = go.Figure()
    colors = {"Forwards": "firebrick", "Defenders": "steelblue"}
    for label, vals in avgs.items():
        fig3.add_trace(go.Scatterpolar(
            r=vals + [vals[0]], theta=cats + [cats[0]],
            fill="toself", name=label, line_color=colors[label]
        ))
    fig3.update_layout(
        polar=dict(radialaxis=dict(visible=True)),
        title="Avg Stat Profile: Forwards vs Defenders",
        showlegend=True,
    )
    fig3.show()

    # ‚îÄ‚îÄ Chart 4: Age Distribution ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ
    fig4 = px.histogram(
        sq_df, x="age", color="slot_pos", nbins=10,
        title="Squad Age Distribution",
        labels={"age": "Age", "slot_pos": "Position"},
    )
    fig4.show()

    return f"‚úÖ Report generated for {len(squad)}-player squad."


# ‚îÄ‚îÄ Run Tool 6 ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ
_ = report_generation_tool.invoke(json.dumps({
    "squad":     _squad,
    "narrative": _justified_squad,
    "criteria":  USER_CRITERIA,
    "budget":    BUDGET,
}))


‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê
  üèÜ  WORLD CUP 2026 ‚Äî DREAM SQUAD REPORT
‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê
  Criteria  : fast defenders with high interceptions, clinical strikers with xG > 5, creative midfielders with high assist rates and progressive passes
  Budget    : Unlimited
  Squad size: 23 players | Total value: ‚Ç¨1158.1M
  Generated : 2026-02-25 21:07
  Sources   : FBRef CSV | Wikipedia (WC 2026) | API-Football (RapidAPI)
‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê



=== GOALKEEPERS ===
- Marko Dmitroviƒá (SRB | Legan√©s, Age 32): A reliable shot-stopper with solid experience, although his stats show no goals or assists, indicating a traditional goalkeeper role.
- Michael Zetterer (GER | Werder Bremen, Age 29): While he has no goals or assists, his presence in goal is backed by a decent number of shot-stopping opportunities, contributing to the team's defensive stability.
- Dominik Greif (SVK | Mallorca, Age 27): Like his counterparts, Greif has not contributed offensively, but his role as a goalkeeper is primarily defensive, where he has shown competence.

=== DEFENDERS ===
- Maximilian Mittelst√§dt (GER | Stuttgart, Age 27): With 1 goal and 7 assists, Mittelst√§dt showcases his ability to contribute offensively, complemented by a high number of tackles (79) that underline his defensive prowess.
- Rasmus Kristensen (DEN | Eintracht Frankfurt, Age 27): His 5 goals and 3 assists highlight his attacking threat from the back, while 59 tackles indicate his defensive reliability.
- Diego Moreira (POR | Strasbourg, Age 19): A young talent with 2 goals and 7 assists, Moreira combines creativity with solid defensive contributions (44 tackles) at a young age.
- Robin Gosens (GER | Fiorentina, Age 30): Gosens has 5 goals and 5 assists, demonstrating his dual-threat capability as a defender, alongside 56 tackles that reinforce his defensive skills.
- Dimitris Giannoulis (GRE | Augsburg, Age 28): With 1 goal and 4 assists, Giannoulis adds creativity to the defense, supported by 47 tackles that show his commitment to defensive duties.
- Daley Blind (NED | Girona, Age 34): Although he has no goals or assists, Blind's experience and tactical awareness make him a valuable asset in defensive organization, with 49 tackles to his name.
- Iglesias (ESP | Getafe, Age 26): With 0 goals but 1 assist and a high number of tackles (81), Iglesias is a defensive stalwart who excels in breaking up opposition attacks.

=== MIDFIELDERS ===
- Cole Palmer (ENG | Chelsea, Age 22): A standout performer with 15 goals and 8 assists, Palmer's offensive output is complemented by his creative play, making him a key player in midfield.
- Christian Pulisic (USA | Milan, Age 25): Pulisic's 11 goals and 9 assists showcase his ability to influence games, making him a critical creative force in the midfield.
- Pedri (ESP | Barcelona, Age 21): With 4 goals and 5 assists, Pedri combines creativity and vision, supported by 61 tackles that highlight his work rate in midfield.
- Alexis Mac Allister (ARG | Liverpool, Age 25): His 5 goals and 5 assists indicate a balanced contribution to the attack, while 95 tackles reflect his defensive capabilities.
- Nadiem Amiri (GER | Mainz 05, Age 27): Amiri's 7 goals and 5 assists, along with an xG of 5.5, show his clinical nature in front of goal, making him a valuable asset in midfield.
- Dominik Szoboszlai (HUN | Liverpool, Age 23): With 6 goals and 6 assists, Szoboszlai's creative influence is significant, supported by his 38 tackles that demonstrate his defensive contributions.
- Granit Xhaka (SUI | Leverkusen, Age 31): Xhaka's 2 goals and 7 assists, along with 48 tackles, highlight his ability to control the midfield and contribute offensively.
- Cristian C√°ceres Jr. (VEN | Toulouse, Age 24): With 1 goal and 3 assists, C√°ceres Jr. adds depth to the midfield, supported by a strong defensive work rate (96 tackles).

=== FORWARDS ===
- Cole Palmer (ENG | Chelsea, Age 22): With 15 goals and 8 assists, Palmer is a clinical forward who can change games with his attacking prowess.
- Hugo Ekitike (FRA | Eintracht Frankfurt, Age 22): Ekitike's 15 goals and 8 assists, along with an xG of 21.6, demonstrate his finishing ability and threat in front of goal.
- Serhou Guirassy (GUI | Dortmund, Age 28): Guirassy's impressive 21 goals and 4 assists, with an xG of 22.7, make him a standout striker with a proven track record of scoring.
- Christian Pulisic (USA | Milan, Age 25): As a forward, Pulisic's 11 goals and 9 assists further emphasize his versatility and ability to impact games from various positions.
- Jonathan David (CAN | Lille, Age 24): With 16 goals and 5 assists, David's clinical finishing and ability to create chances make him a key attacking option.

=== TACTICAL SUMMARY ===
The squad is structured in a balanced formation that emphasizes both defensive solidity and attacking creativity. Fast defenders with high interception rates complement clinical strikers capable of converting chances, while creative midfielders facilitate fluid ball movement and assist creation. This tactical setup aims to exploit opposition weaknesses through pace and technical skill, ensuring a dynamic style of play.

=== LIMITATIONS & DATA SOURCES ===
- Stats sourced from 5 European leagues only ‚Äî WC-specific form not captured.
- Market value is heuristic-based, not official transfer market data.
- International tournament performance not factored in.

---
## ü§ñ LangChain Agent ‚Äî Full Orchestration
Wires all 6 tools into a single `initialize_agent` call with conversation memory.

In [37]:
ALL_TOOLS = [
    dataset_discovery_tool,
    data_ingestion_tool,
    retrieval_or_filter_tool,
    reasoning_or_aggregation_tool,
    llm_synthesis_tool,
    report_generation_tool,
]

agent = initialize_agent(
    tools       = ALL_TOOLS,
    llm         = _llm,
    agent       = AgentType.STRUCTURED_CHAT_ZERO_SHOT_REACT_DESCRIPTION,
    memory      = memory.langchain_memory,
    verbose     = True,
    handle_parsing_errors = True,
    max_iterations = 8,
)

print("‚úÖ Agent initialised with", len(ALL_TOOLS), "tools and conversation memory.")
print("   Tools:", [t.name for t in ALL_TOOLS])

‚úÖ Agent initialised with 6 tools and conversation memory.
   Tools: ['dataset_discovery_tool', 'data_ingestion_tool', 'retrieval_or_filter_tool', 'reasoning_or_aggregation_tool', 'llm_synthesis_tool', 'report_generation_tool']


---
## üöÄ Run Full Pipeline
Execute the end-to-end squad building pipeline. Adjust `USER_CRITERIA` and `BUDGET` in the config cell and re-run.

In [38]:
def run_pipeline(criteria: str, budget=None):
    """
    End-to-end World Cup Squad Builder pipeline:
    Tool1 ‚Üí Tool2 ‚Üí Tool3 ‚Üí Tool4 ‚Üí Tool5 ‚Üí Tool6
    """
    global _candidates_by_pos, _squad, _constraint_result, _justified_squad

    print(f"\n{'‚ïê'*64}")
    print(f"  üèÜ PIPELINE START")
    print(f"  Criteria : {criteria}")
    print(f"  Budget   : {'‚Ç¨{:.0f}M'.format(budget) if budget else 'None'}")
    print(f"{'‚ïê'*64}\n")

    # Update memory with new preferences
    memory.update(criteria, budget)

    # Tool 1 ‚Äî Discover
    print("[1/6] Dataset Discovery")
    dataset_discovery_tool.invoke("player stats fixtures")

    # Tool 2 ‚Äî Ingest
    print("\n[2/6] Data Ingestion")
    data_ingestion_tool.invoke("all")

    # Tool 3 ‚Äî Retrieve
    print("\n[3/6] Retrieval & Filtering")
    _candidates_by_pos = {}
    for pos in ["GK","DF","MF","FW"]:
        raw = retrieval_or_filter_tool.invoke(
            json.dumps({"criteria": criteria, "pos_filter": pos, "k": 80})
        )
        _candidates_by_pos[pos] = json.loads(raw)

    # Tool 4 ‚Äî Constraints
    print("\n[4/6] Reasoning & Aggregation")
    _constraint_result = json.loads(
        reasoning_or_aggregation_tool.invoke(json.dumps({"budget": budget}))
    )
    _squad = _constraint_result["squad"]

    # Tool 5 ‚Äî LLM Synthesis
    print("\n[5/6] LLM Synthesis")
    _justified_squad = llm_synthesis_tool.invoke(json.dumps(_squad))

    # Tool 6 ‚Äî Report
    print("\n[6/6] Report Generation")
    report_generation_tool.invoke(json.dumps({
        "squad":     _squad,
        "narrative": _justified_squad,
        "criteria":  criteria,
        "budget":    budget,
    }))

    memory.save_squad(f"{len(_squad)} players | {criteria[:80]}")
    print(f"\n{memory.summary()}")
    return _squad


# ‚îÄ‚îÄ Execute ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ
final_squad = run_pipeline(criteria=USER_CRITERIA, budget=BUDGET)


‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê
  üèÜ PIPELINE START
  Criteria : fast defenders with high interceptions, clinical strikers with xG > 5, creative midfielders with high assist rates and progressive passes
  Budget   : None
‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê

[1/6] Dataset Discovery
üìÇ dataset_discovery_tool ‚Üí
   1 local file(s) | 3 external source(s)

[2/6] Data Ingestion
   üìä Loading local CSV...
   üåê Scraping FIFA WC 2026 (Wikipedia)...
   üåê Fetching: https://en.wikipedia.org/wiki/2026_FIFA_World_Cup
   ‚ö†Ô∏è  Request failed: 403 Client Error: Forbidden for url: https://en.wikipedia.org/wiki/2026_FIFA_World_Cup
   üîå Querying Football API (RapidAPI)...
 

=== GOALKEEPERS ===
- Marko Dmitroviƒá (SRB | Legan√©s, Age 32): A reliable shot-stopper with solid experience, though lacking in offensive contributions with 0 goals and assists.
- Michael Zetterer (GER | Werder Bremen, Age 29): While he has not contributed offensively, his presence in goal is backed by a decent number of shot-stopping actions.
- Dominik Greif (SVK | Mallorca, Age 27): Similar to his counterparts, Greif's stats show no offensive contributions, but he brings a steady presence in goal.

=== DEFENDERS ===
- Maximilian Mittelst√§dt (GER | Stuttgart, Age 27): With 1 goal and 7 assists, Mittelst√§dt showcases his attacking prowess and creativity from the back, alongside 79 tackles.
- Rasmus Kristensen (DEN | Eintracht Frankfurt, Age 27): He has contributed 5 goals and 3 assists, demonstrating his ability to support the attack while also making 59 tackles.
- Diego Moreira (POR | Strasbourg, Age 19): A young talent with 2 goals and 7 assists, Moreira combines creativity with defensive solidity, making 44 tackles.
- Robin Gosens (GER | Fiorentina, Age 30): Gosens has been effective in both defense and attack, contributing 5 goals and 5 assists, while also making 56 tackles.
- Dimitris Giannoulis (GRE | Augsburg, Age 28): With 1 goal and 4 assists, Giannoulis adds creativity from the back, complemented by 47 tackles.
- Daley Blind (NED | Girona, Age 34): While he has not scored, Blind's experience and 2 assists highlight his playmaking ability from defense, along with 49 tackles.
- Iglesias (ESP | Getafe, Age 26): A strong defensive presence with 81 tackles, Iglesias also contributes creatively with 1 assist.

=== MIDFIELDERS ===
- Cole Palmer (ENG | Chelsea, Age 22): Palmer stands out with 15 goals and 8 assists, showcasing his clinical finishing and creativity in midfield.
- Christian Pulisic (USA | Milan, Age 25): Pulisic's 11 goals and 9 assists highlight his ability to impact games significantly from midfield.
- Pedri (ESP | Barcelona, Age 21): With 4 goals and 5 assists, Pedri combines creativity and playmaking with solid defensive contributions.
- Alexis Mac Allister (ARG | Liverpool, Age 25): Mac Allister's 5 goals and 5 assists demonstrate his dual-threat capability in midfield.
- Nadiem Amiri (GER | Mainz 05, Age 27): With 7 goals and 5 assists, Amiri has been effective in contributing to the attack while also providing defensive support.
- Dominik Szoboszlai (HUN | Liverpool, Age 23): Szoboszlai's 6 goals and 6 assists reflect his creative play and ability to score from midfield.
- Granit Xhaka (SUI | Leverkusen, Age 31): Xhaka's 2 goals and 7 assists, along with 48 tackles, show his ability to contribute offensively while maintaining defensive duties.
- Cristian C√°ceres Jr. (VEN | Toulouse, Age 24): With 1 goal and 3 assists, C√°ceres adds depth to the midfield, complemented by strong defensive contributions.

=== FORWARDS ===
- Cole Palmer (ENG | Chelsea, Age 22): A versatile forward with 15 goals and 8 assists, Palmer is a clinical finisher and creative threat.
- Hugo Ekitike (FRA | Eintracht Frankfurt, Age 22): Ekitike's 15 goals and 8 assists highlight his scoring ability and contribution to the team's offensive play.
- Serhou Guirassy (GUI | Dortmund, Age 28): With an impressive 21 goals and 4 assists, Guirassy is a clinical striker with a strong xG of 22.7.
- Christian Pulisic (USA | Milan, Age 25): Pulisic's versatility allows him to play as a forward, where he has scored 11 goals and provided 9 assists.
- Jonathan David (CAN | Lille, Age 24): David's 16 goals and 5 assists showcase his clinical finishing and ability to create scoring opportunities.

=== TACTICAL SUMMARY ===
The squad is structured in a balanced 4-3-3 formation, allowing for both defensive stability and attacking fluidity. The defenders are fast and capable of high interceptions, while the midfielders are creative playmakers with a strong assist rate. Up front, clinical strikers with high xG ensure that goal-scoring opportunities are converted effectively.

=== LIMITATIONS & DATA SOURCES ===
- Stats sourced from 5 European leagues only ‚Äî WC-specific form not captured.
- Market value is heuristic-based, not official transfer market data.
- International tournament performance not factored in.


üìã Memory | Last criteria: 'fast defenders with high interceptions, clinical strikers with xG > 5, creative midfielders with high assist rates and progressive passes' | Last budget: None | Squads built: 3


---
## üé® Innovation: Multi-Scenario Comparison
Build three squads with contrasting tactical philosophies and compare them side-by-side.

In [39]:
SCENARIOS = {
    "‚öîÔ∏è  Attacking Blitz": {
        "criteria": "clinical strikers with highest goals and xG, pacey wingers with high SCA, attacking midfielders",
        "budget":   None,
    },
    "üõ°Ô∏è  Defensive Fortress": {
        "criteria": "dominant defenders with most tackles and interceptions, disciplined midfielders, sweeper keepers with high save%",
        "budget":   None,
    },
    "üí∞  Value XI (Budget)": {
        "criteria": "best value players with high G+A per minute, young talent under 26, efficient performers",
        "budget":   150.0,
    },
}

scenario_squads = {}

for name, cfg in SCENARIOS.items():
    print(f"\n{'‚îÄ'*50}")
    print(f"Scenario: {name}")
    print(f"{'‚îÄ'*50}")

    # Update global candidates for this scenario
    cands = {}
    for pos in ["GK","DF","MF","FW"]:
        raw  = retrieval_or_filter_tool.invoke(
            json.dumps({"criteria": cfg["criteria"], "pos_filter": pos, "k": 80})
        )
        cands[pos] = json.loads(raw)
    _candidates_by_pos = cands

    cr = json.loads(
        reasoning_or_aggregation_tool.invoke(json.dumps({"budget": cfg["budget"]}))
    )
    scenario_squads[name] = cr["squad"]
    print(f"  Squad: {cr['squad_size']} players | {cr['budget_used']}")

# Restore original candidates
_candidates_by_pos = cands  # last one (doesn't matter ‚Äî just for demo)


‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ
Scenario: ‚öîÔ∏è  Attacking Blitz
‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ
   üîç retrieval_or_filter_tool ‚Üí pos=GK ‚Üí 21 candidates
   üîç retrieval_or_filter_tool ‚Üí pos=DF ‚Üí 38 candidates
   üîç retrieval_or_filter_tool ‚Üí pos=MF ‚Üí 47 candidates
   üîç retrieval_or_filter_tool ‚Üí pos=FW ‚Üí 56 candidates
   ‚öôÔ∏è  reasoning_or_aggregation_tool ‚Üí
      ‚úÖ GK: 3/3-3 required
      ‚úÖ DF: 7/5-7 required
      ‚úÖ MF: 8/5-8 required
      ‚úÖ FW: 6/4-6 required
      Total squad: 23 players | ‚Ç¨1179.9M
  Squad: 23 players | ‚Ç¨1179.9M

‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ
Scenario: üõ°Ô∏è  Defensi

In [40]:
# ‚îÄ‚îÄ Comparison Chart ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ
comparison_rows = []
for scenario_name, squad in scenario_squads.items():
    df_s = pd.DataFrame(squad)
    comparison_rows.append({
        "Scenario":    scenario_name,
        "Avg Goals":   round(df_s["goals"].mean(), 2),
        "Avg Assists": round(df_s["assists"].mean(), 2),
        "Avg xG":      round(df_s["xg"].mean(), 2),
        "Avg Tackles": round(df_s["tackles"].mean(), 2),
        "Avg SCA":     round(df_s["sca"].mean(), 2),
        "Total ‚Ç¨M":    round(df_s["market_value"].sum(), 1),
        "Avg Age":     round(df_s["age"].mean(), 1),
    })

cmp_df = pd.DataFrame(comparison_rows)
display(cmp_df)

# Grouped bar chart
metrics = ["Avg Goals","Avg Assists","Avg xG","Avg Tackles","Avg SCA"]
fig_cmp = go.Figure()
colors  = ["firebrick","steelblue","seagreen"]
for i, row in cmp_df.iterrows():
    fig_cmp.add_trace(go.Bar(
        name=row["Scenario"],
        x=metrics,
        y=[row[m] for m in metrics],
        marker_color=colors[i % len(colors)],
    ))

fig_cmp.update_layout(
    barmode="group",
    title="Multi-Scenario Squad Comparison ‚Äî Average Stats per Player",
    xaxis_title="Metric",
    yaxis_title="Average Value",
    height=450,
)
fig_cmp.show()

Unnamed: 0,Scenario,Avg Goals,Avg Assists,Avg xG,Avg Tackles,Avg SCA,Total ‚Ç¨M,Avg Age
0,‚öîÔ∏è Attacking Blitz,7.0,4.83,6.73,33.43,81.35,1179.9,25.4
1,üõ°Ô∏è Defensive Fortress,4.13,3.52,3.61,34.74,68.48,859.7,26.4
2,üí∞ Value XI (Budget),0.88,1.62,1.21,23.38,32.62,149.6,28.4


---
## üîç Innovation: Explainability Overlay
For each selected player, show **why** they were chosen ‚Äî stat percentile rank vs. the full player pool.

In [41]:
def explainability_overlay(player_name: str, squad: list[dict], df_all: pd.DataFrame):
    """Plot percentile rank of a player across key metrics vs entire dataset."""
    player_row = next((p for p in squad if p["player"] == player_name), None)
    if not player_row:
        print(f"{player_name} not in squad")
        return

    metrics = {"Gls": "Goals", "Ast": "Assists", "xG": "xG",
               "SCA": "SCA", "Tkl": "Tackles", "Int": "Interceptions", "PrgP": "Prog Passes"}

    percentiles, labels = [], []
    for col, label in metrics.items():
        if col not in df_all.columns:
            continue
        series = df_all[col].dropna()
        val    = player_row.get(
            {"Gls":"goals","Ast":"assists","xG":"xg","SCA":"sca",
             "Tkl":"tackles","Int":"interceptions","PrgP":"prgp"}.get(col, col.lower()), 0
        )
        pct = round((series < val).mean() * 100, 1)
        percentiles.append(pct)
        labels.append(f"{label}\n({val:.0f})")

    fig = go.Figure(go.Bar(
        x=percentiles, y=labels, orientation="h",
        marker_color=["#e74c3c" if p < 50 else "#2ecc71" for p in percentiles],
        text=[f"{p}th pct" for p in percentiles], textposition="outside",
    ))
    fig.update_layout(
        title=f"Explainability: {player_name} ‚Äî Percentile Rank vs All Players",
        xaxis=dict(title="Percentile (%)", range=[0, 110]),
        height=350,
    )
    fig.show()


# Show top FW and top DF from the final squad
top_fw = sorted([p for p in final_squad if p["slot_pos"]=="FW"], key=lambda x: x["score"], reverse=True)
top_df_ = sorted([p for p in final_squad if p["slot_pos"]=="DF"], key=lambda x: x["score"], reverse=True)

if top_fw:
    explainability_overlay(top_fw[0]["player"], final_squad, _players_df)
if top_df_:
    explainability_overlay(top_df_[0]["player"], final_squad, _players_df)

---
## üîÑ Try Your Own Criteria
Change the inputs below and re-run `run_pipeline()` ‚Äî memory automatically persists your preferences.

In [42]:
# ‚îÄ‚îÄ Customise & re-run ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ
MY_CRITERIA = "best free-kick takers, aerial threat in set pieces, high pressing forwards, composed ball-playing defenders"
MY_BUDGET   = 400.0   # ‚Ç¨M ‚Äî set to None for no cap

run_pipeline(criteria=MY_CRITERIA, budget=MY_BUDGET)


‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê
  üèÜ PIPELINE START
  Criteria : best free-kick takers, aerial threat in set pieces, high pressing forwards, composed ball-playing defenders
  Budget   : ‚Ç¨400M
‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê

[1/6] Dataset Discovery
üìÇ dataset_discovery_tool ‚Üí
   1 local file(s) | 3 external source(s)

[2/6] Data Ingestion
   üìä Loading local CSV...
   üåê Scraping FIFA WC 2026 (Wikipedia)...
   üåê Fetching: https://en.wikipedia.org/wiki/2026_FIFA_World_Cup
   ‚ö†Ô∏è  Request failed: 403 Client Error: Forbidden for url: https://en.wikipedia.org/wiki/2026_FIFA_World_Cup
   üîå Querying Football API (RapidAPI)...
   üíæ Cache hit  (0.4h old

=== GOALKEEPERS ===
- Donovan L√©on (GUF | Auxerre, Age 31): L√©on has shown reliability with 2 assists and a solid 12 SCA, contributing to his team's build-up play.
- Bernd Leno (GER | Fulham, Age 32): Leno's experience is highlighted by his 2 tackles and 6 SCA, indicating his capability to contribute defensively and in playmaking.
- Diego Conde (ESP | Villarreal, Age 25): Conde has demonstrated his potential with 1 assist and 3 SCA, showcasing his ability to support attacking plays from the back.

=== DEFENDERS ===
- Jules Kound√© (FRA | Barcelona, Age 25): Kound√©'s 63 tackles and 59 SCA underline his defensive prowess and ability to initiate attacks, making him a key asset.
- Mat√≠as Soul√© (ARG | Roma, Age 21): With 5 goals and 5 assists, Soul√©'s offensive contributions from defense are significant, complemented by his 90 SCA.
- Ansgar Knauff (GER | Eint Frankfurt, Age 22): Knauff's 4 goals and 5 assists, alongside 41 tackles, indicate his dual threat as a defender and an attacking option.
- Piero Hincapi√© (ECU | Leverkusen, Age 22): Hincapi√©'s 52 tackles and 2 assists show his defensive solidity and ability to contribute to the attack.
- Alexis Saelemaekers (BEL | Roma, Age 25): Saelemaekers brings versatility with 7 goals and 3 assists, along with 33 tackles, enhancing both defense and offense.
- Valentin Rosier (FRA | Legan√©s, Age 27): Rosier's 72 tackles demonstrate his defensive capabilities, while his 2 assists indicate potential in supporting forward plays.
- Keane Lewis-Potter (ENG | Brentford, Age 23): Lewis-Potter's 1 goal and 3 assists, combined with 49 tackles, highlight his effectiveness in both defensive duties and attacking support.

=== MIDFIELDERS ===
- Cole Palmer (ENG | Chelsea, Age 22): Palmer's impressive 15 goals and 8 assists, along with 202 SCA, make him a creative force in midfield with a high output.
- Tom Cairney (SCO | Fulham, Age 33): While Cairney has only 2 goals and no assists, his experience and tactical awareness can still provide stability in midfield.

=== FORWARDS ===
*Note: No forwards were listed in the provided squad data.*

=== TACTICAL SUMMARY ===
This squad is structured to utilize a dynamic 4-3-3 formation, emphasizing a strong defensive line complemented by creative midfielders who can transition quickly into attack. The defenders are fast and capable of high interceptions, while the midfield boasts players with high assist rates and progressive passing, ensuring fluidity in attack. The attacking strategy will rely on clinical finishing from the forwards, supported by the midfield's creativity and the defenders' ability to initiate plays.

=== LIMITATIONS & DATA SOURCES ===
- Stats sourced from 5 European leagues only ‚Äî WC-specific form not captured.
- Market value is heuristic-based, not official transfer market data.
- International tournament performance not factored in.


üìã Memory | Last criteria: 'best free-kick takers, aerial threat in set pieces, high pressing forwards, composed ball-playing defenders' | Last budget: 400.0 | Squads built: 3


[{'player': 'Donovan L√©on',
  'pos': 'GK',
  'primary_pos': 'GK',
  'squad': 'Auxerre',
  'nation': 'gf GUF',
  'age': 31.0,
  'goals': 0.0,
  'assists': 2.0,
  'xg': 0.0,
  'xag': 0.4,
  'tackles': 0.0,
  'interceptions': 0.0,
  'sca': 12.0,
  'prgp': 1.0,
  'market_value': 12.2,
  'description': 'Donovan L√©on is a GK for Auxerre, age 31.0, 32 matches (2880 mins). Goalkeeper: GA/90=1.50, Save%=73.8%, CS%=28.1%. Market value ~‚Ç¨12.2M.',
  'slot_pos': 'GK',
  'score': 5.52},
 {'player': 'Bernd Leno',
  'pos': 'GK',
  'primary_pos': 'GK',
  'squad': 'Fulham',
  'nation': 'de GER',
  'age': 32.0,
  'goals': 0.0,
  'assists': 1.0,
  'xg': 0.0,
  'xag': 0.1,
  'tackles': 2.0,
  'interceptions': 0.0,
  'sca': 6.0,
  'prgp': 0.0,
  'market_value': 12.4,
  'description': 'Bernd Leno is a GK for Fulham, age 32.0, 38 matches (3420 mins). Goalkeeper: GA/90=1.42, Save%=67.9%, CS%=13.2%. Market value ~‚Ç¨12.4M.',
  'slot_pos': 'GK',
  'score': 3.12},
 {'player': 'Diego Conde',
  'pos': 'GK',
  '