# World Cup Squad Builder Pipeline

**Team Members:**
- Person A: Data ingestion, retrieval, prompt engineering
- Person B: Reasoning pipeline, constraint logic, report generation
- Person C: Agent orchestration, tools, UI, notebook assembly, submission

**Track:** World Cup Squad Builder with Reasoning Pipeline (IE5374, Northeastern University)


## Dependencies and Environment Setup

_This cell will contain `pip install` commands for all required libraries (langchain, langchain-openai, langchain-community, langchain-core, langchain-text-splitters, faiss-cpu, openai, pandas, numpy, matplotlib, gradio, python-dotenv). It will also note any version pins and how to run the notebook reproducibly._


## Imports and API Key Configuration

_This cell will import all necessary modules (pandas, langchain, langchain_openai, FAISS, Gradio, etc.) and load the OpenAI API key from `.env` using `python-dotenv`. It will also set any global configuration (e.g., model names, temperature)._


## Stage 1 — Data Ingestion

_This cell will demonstrate the use of `src.ingestion.load_raw_data`, `clean_data`, `cache_processed_data`, and `dataframe_to_documents`. It will display a sample of the cleaned DataFrame and one or two example `Document` objects to illustrate the natural-language descriptions and metadata._


## Stage 2 — Retrieval

_This cell will construct the FAISS vector store and retriever using `src.retrieval.create_vector_store` and `get_retriever`. It will then show example retrievals for queries such as "fast defenders", "best free kick takers", and "young high-potential midfielders", printing out the top retrieved players for each query._


## Stage 3 — Reasoning and Constraint Solving

_This cell will demonstrate building a squad from a retrieved shortlist using `src.reasoning.build_squad`. It will show how constraints (max 23 players, positional minimums, optional budget) are defined and passed in, and will print the structured squad output (selected players, excluded players, total wage, formation notes)._


## Stage 4 — Synthesis and Report Generation

_This cell will use `src.synthesis.generate_report` to produce a formatted natural-language squad report from the structured squad dictionary. It will display the resulting report, including the squad table, philosophy summary, budget summary, notable exclusions, limitations disclaimer, and data source citation._


## Agent Demo with Memory

_This cell will showcase the end-to-end LangChain agent created in `src.agent`. It will run a multi-turn conversation where the user first specifies preferences (e.g., pace-focused squad) and then adjusts constraints (e.g., "now make it cheaper"), demonstrating that the agent retains at least two user preferences via `ConversationBufferMemory` while using tools to rebuild and re-explain the squad._


## Responsible AI Disclaimer

_This cell will state the responsible AI considerations, including the fact that outputs are educational, not professional sports analytics advice; that they are based on FIFA video game ratings; and that real-world performance may differ significantly. It will also mention data source licensing and limitations of using video game data for serious decision-making._
