# Final Capstone: AI Research Team ‚Äî Multi-Agent System

**Use case chosen (auto-picked):** *AI Research Team for timely literature & web discovery, summarization, critique, and drafting.*

**Why this is judge-friendly:**
- Demonstrates multi-agent orchestration, tool integration (search), context engineering (sessions + memory), MCP-style long-running ops (human approval), observability (logs/traces/metrics), A2A messaging, and deployment guidance.
- Clear metrics for evaluation (relevance, factuality, completeness) and an LLM-as-a-Judge evaluation flow.

---

**How to run:**
- This notebook is written to run locally or on Kaggle in dry-run (mock) mode by default. To run with real APIs (Gemini, Google Search), set `USE_MOCK=False` and configure secrets as described in the README.


In [None]:
# Setup - imports and path fixes
import os, sys, json
from getpass import getpass
ROOT = r'C:\Users\HEALTHY MACHINES\OneDrive\Desktop\capstone_project' # Adjust this path as needed
if ROOT not in sys.path:
    sys.path.insert(0, ROOT)
    sys.path.insert(0, os.path.join(ROOT, 'src'))

print('Project root:', ROOT)
USE_MOCK = False  # Set to False and provide API keys in kaggle_secrets to use real services.

if not USE_MOCK:
    print("üîê Real API mode ON: Please provide your credentials.")

    # 1Ô∏è‚É£ Gemini API Key
    os.environ["GOOGLE_API_KEY"] = getpass("Enter your GOOGLE_API_KEY (hidden): ")

    # 2Ô∏è‚É£ Optional: Google Cloud Service Account (for Vertex / Agent Engine)
    # If you have a JSON credential file uploaded to Kaggle:
    # /kaggle/input/<your-credential-folder>/service-account.json
    service_account_path = getpass(
        "Enter path to GOOGLE_APPLICATION_CREDENTIALS (or leave blank): "
    )
    if service_account_path.strip():
        os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = service_account_path
    else:
        print("‚ö†Ô∏è GOOGLE_APPLICATION_CREDENTIALS not set.")

    print("‚úÖ Environment variables successfully configured.")
else:
    print("üü£ USE_MOCK=True ‚Äî Running in mock mode (no external API calls).")

Project root: C:\Users\HEALTHY MACHINES\OneDrive\Desktop\capstone_project_package
üîê Real API mode ON: Please provide your credentials.


Enter your GOOGLE_API_KEY (hidden):  ¬∑¬∑¬∑¬∑¬∑¬∑¬∑¬∑
Enter path to GOOGLE_APPLICATION_CREDENTIALS (or leave blank):  ¬∑¬∑¬∑¬∑¬∑¬∑¬∑¬∑


‚úÖ Environment variables successfully configured.


In [2]:
# Configuration
PROJECT_NAME = 'AI_Research_Team_Capstone'
SEED = 42
DEMO_QUERY = 'Recent breakthroughs in quantum computing and impact on AI (2024-2025)'


## Architecture (brief)

1. **ResearchAgent** ‚Äî uses search tool to retrieve web content.
2. **SummarizerAgent** ‚Äî condenses findings into concise bullets.
3. **CriticAgent** ‚Äî evaluates summaries for factuality & gaps (LLM-as-a-Judge pattern).
4. **WriterAgent** ‚Äî drafts a short technical brief.

Agents communicate via an **Orchestrator** which uses A2A bus for messages. Observability emits logs/traces/metrics at each stage. Session management and MemoryStore persist notable findings across runs.


In [3]:
# Import the helper modules created in src/
from agents import ResearchAgent, SummarizerAgent, CriticAgent, WriterAgent
from orchestrator import Orchestrator
from a2a_simulator import A2ABus
from tool_adapter import Tool, simple_search
from memory import MemoryStore, Session
from observability import log_event, trace_span, emit_metric

print('Helpers loaded.')


Helpers loaded.


In [4]:
# Build agents and orchestrator (mock mode)
# Tools
search_tool = Tool('web_search', simple_search)

# Agents
research = ResearchAgent('ResearchAgent', tools={'search': search_tool}, use_mock=USE_MOCK)
summarizer = SummarizerAgent('SummarizerAgent', use_mock=USE_MOCK)
critic = CriticAgent('CriticAgent', use_mock=USE_MOCK)
writer = WriterAgent('WriterAgent', use_mock=USE_MOCK)

# A2A bus & orchestrator
bus = A2ABus()
orch = Orchestrator(agents=[research, summarizer, critic, writer], bus=bus, memory_path='data/processed/memory_store.json', use_mock=USE_MOCK)

print('Orchestrator ready with agents:', [a.name for a in orch.agents])


Orchestrator ready with agents: ['ResearchAgent', 'SummarizerAgent', 'CriticAgent', 'WriterAgent']


In [5]:
# Run a demo research pipeline
session = Session('demo-session-1')
session.add_turn('user', DEMO_QUERY)

result = orch.run_pipeline(session_id=session.session_id, user_query=DEMO_QUERY)

print('\n--- Final Draft Output (excerpt) ---')
print(result.get('final_draft')[:1000])

# Display recorded logs/metrics (if any)
print('\nMetrics snapshot:')
try:
    import json
    with open('data/processed/agent_metrics.json') as f:
        print(json.load(f))
except Exception as e:
    print('No metrics emitted or file not present:', e)



--- Final Draft Output (excerpt) ---
Draft Brief:

Summary: Quantum advances 2024 - New technique stabilizes qubits.
AI and quantum - Researchers explore hybrid models.

Critique: check factual claims and add citations where missing.

(End of draft)

Metrics snapshot:
{'pipeline_success': {'value': 1.0, 'labels': {}, 'ts': 1763863520.0123823}, 'eval_relevance': {'value': 1.0, 'labels': {}, 'ts': 1763861362.3291554}, 'eval_completeness': {'value': 0.265, 'labels': {}, 'ts': 1763861362.3305264}, 'eval_factuality': {'value': 0.8, 'labels': {}, 'ts': 1763861362.3318698}}


## Evaluation ‚Äî LLM-as-a-Judge Simulation

We run a simulated automatic judge (or a real LLM judge if configured) to score the final draft on Relevance, Factuality, Completeness.

In [6]:
# Simple automatic scoring (mock): basic heuristics on presence of keywords and length.
def mock_score(draft: str):
    scores = {}
    scores['relevance'] = 1.0 if 'quantum' in draft.lower() and 'ai' in draft.lower() else 0.5
    scores['completeness'] = min(1.0, len(draft)/800)
    scores['factuality'] = 0.8  # In mock mode we assume decent factuality
    return scores

scores = mock_score(result.get('final_draft',''))
print('Mock evaluation scores:', scores)
emit_metric('eval_relevance', scores['relevance'])
emit_metric('eval_completeness', scores['completeness'])
emit_metric('eval_factuality', scores['factuality'])


Mock evaluation scores: {'relevance': 1.0, 'completeness': 0.265, 'factuality': 0.8}


---

## Next steps & Deployment

- To run in production with Gemini + Vertex AI Agent Engine: see `src/deployment.py` for autogenerated deployment instructions. 
- Add real search adapter (replace `simple_search` with a real search API client) and set `USE_MOCK=False`.

## Files produced by this project
- `src/` modules (agents, orchestrator, a2a_simulator, tool_adapter, memory, observability, mcp_simulator, deployment)
- `notebooks/03_Final_Project.ipynb` (this notebook)
- `reports/evaluation_report.md`
- `slides/presentation.md` (slide deck outline)

Thank you ‚Äî export this notebook as your Kaggle submission and include the README and slides.
