# AlzKB Design Notebook

This notebook drives the design process for the Alzheimer's Knowledge Base (AlzKB). 
It uses a Multi-Agent System (using Google Gemini) to simulate discussions between a Principal Investigator (PI) and a Scientific Critic.

In [None]:
# Setup and Imports
import sys
import os
import json
from datetime import datetime

# Ensure src is in pythonpath
sys.path.append(os.path.abspath(os.path.join(os.getcwd(), "../src")))

from alzkb.meeting import run_meeting
from alzkb.constants import (
    PRINCIPAL_INVESTIGATOR, SCIENTIFIC_CRITIC, 
    KG_ENGINEER, ONTOLOGIST, VALIDATION_SCIENTIST,
    MODEL_FLASH, MODEL_PRO, MODEL_IMAGE, 
    BACKGROUND_PROMPT, TEAM_MEMBERS, CODE_GENERATION_RULES
)
from alzkb.agents import Agent

## 1. Team Selection

**Objective**: Select 3 specialized agents to join the AlzKB implementation team.
**Participants**: Principal Investigator (Lead) & Scientific Critic.

In [None]:
# Define the Agenda for Team Selection
team_selection_agenda = f"""{BACKGROUND_PROMPT}
TASK: Define 3 distinct Agents to form the AlzKB Implementation Team.

PROCESS:
1. PROPOSAL: The PI proposes 3 Agents with their specific system prompts.
2. CRITIQUE: The Scientific Critic reviews the proposal for gaps, redundancy, or scientific validity.
3. FINALIZATION: In the meeting summary, the PI MUST output the **Final Revised Python Code** for the 3 agents, incorporating the Critic's feedback.

OUTPUT FORMAT: Python `Agent()` objects ONLY. No conversational filler for the code blocks.
Each agent must have:
- `title`: A descriptive role title.
- `system_prompt`: A detailed persona description including roles and responsibilities.

Do not include yourself (PI or Critic). 
Select roles that cover key technical and scientific needs (e.g., Knowledge Graph Engineering, Ontology, Data Science).
"""

print("Agenda defined.")

In [None]:
# Run the Meeting
print("Starting Team Selection Meeting...")

chat_session = run_meeting(
    meeting_type="individual",
    agenda=team_selection_agenda,
    topic="Team Selection",
    team_member=PRINCIPAL_INVESTIGATOR,
    num_rounds=1,
    model_name=MODEL_FLASH
)

print("Meeting Complete.")

In [None]:
# Save Discussion
discussion_dir = os.path.join("../discussions", "team_selection")
os.makedirs(discussion_dir, exist_ok=True)

timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
filename = f"team_selection_{timestamp}.md"
filepath = os.path.join(discussion_dir, filename)

# Extract history from chat session
# The chat object has a history attribute which is a list of Content objects
history = chat_session.get_history()

with open(filepath, "w") as f:
    f.write(f"# Discussion Log: Team Selection\n")
    f.write(f"**Date**: {timestamp}\n\n")
    
    for turn in history:
         role = turn.role
         text = turn.parts[0].text if turn.parts else "[No Content]"
         f.write(f"### {role}\n{text}\n\n")

print(f"Discussion saved to: {filepath}")

## 2. Implementation Plan

**Objective**: Create a step-by-step Implementation Plan for the entire AlzKB project.
**Participants**: The Whole Team (PI, Critic, Data Engineer, Ontologist, Validation Scientist).

In [None]:
# Define the Agenda for Implementation Plan
impl_plan_agenda = f"""{BACKGROUND_PROMPT}
TASK: Develop a comprehensive, step-by-step Implementation Plan for the entire AlzKB project.

PROCESS:
1. BREAKDOWN: Divide the project into clearly defined Phases (e.g., Phase I: Schema & Ingestion, Phase II: Validation, etc.).
2. ASSIGNMENT: Assign specific tasks to each agent (Data Engineer, Ontologist, Validator).
3. RISKS: Identify potential bottlenecks (e.g., discordant data formats, lack of ground truth) and mitigation strategies.

The PI will lead the discussion. The Critic will challenge assumptions. The Team Members will provide technical details for their respective domains.

OUTPUT GOAL: A finalized, phased roadmap with milestones.
"""

print("Implementation Plan Agenda defined.")

In [None]:
# Run the Meeting
print("Starting Implementation Plan Meeting...")

chat_session_impl = run_meeting(
    meeting_type="team",
    agenda=impl_plan_agenda,
    topic="Implementation Plan",
    team_lead=PRINCIPAL_INVESTIGATOR,
    team_members=TEAM_MEMBERS,
    num_rounds=2,
    model_name=MODEL_PRO
)

print("Meeting Complete.")

In [None]:
# Save Discussion
discussion_dir_impl = os.path.join("../discussions", "implementation_plan")
os.makedirs(discussion_dir_impl, exist_ok=True)

timestamp_impl = datetime.now().strftime("%Y%m%d_%H%M%S")
filename_impl = f"implementation_plan_{timestamp_impl}.md"
filepath_impl = os.path.join(discussion_dir_impl, filename_impl)

history_impl = chat_session_impl.get_history()

with open(filepath_impl, "w") as f:
    f.write(f"# Discussion Log: Implementation Plan\n")
    f.write(f"**Date**: {timestamp_impl}\n\n")
    
    for turn in history_impl:
         role = turn.role
         text = turn.parts[0].text if turn.parts else "[No Content]"
         f.write(f"### {role}\n{text}\n\n")

print(f"Discussion saved to: {filepath_impl}")

## 3. Phase I: Semantic Foundation & Infrastructure

**Objective**: Detail the execution of Phase I, focusing on Schema, Named Graphs, and Ontology Deployment.
**Participants**: The Whole Team.

In [None]:
# Context from Previous Phase (Hardcoded from 'Implementation Plan' outcomes)
phase_1_context = """
--- PREVIOUS DECISIONS & ROADMAP ---
1. Dual-Layer Storage: Store Raw Values (Layer A) and Cohort-Normalized Values (Layer B) side-by-side with PROV metadata.
2. Named Graph Partitioning: 'Core' (Trusted), 'Exploratory' (Tier 2 GWAS), 'Quarantine' (Failed Logic).
3. Materialized RAG Views: Post-ingestion trigger limits ontology pollution; generates flattened Vector Index.
4. Phase I Goal: Deploy Schema and Named Graph infrastructure. Deploy alzkb-ontology-v1.owl and SHACL constraints.
------------------------------------
"""

# Define Agenda for Phase I
phase_1_agenda = f"""{BACKGROUND_PROMPT}
{phase_1_context}
TASK: Discuss EXACTLY how to achieve the Phase I goals defined above.

FOCUS AREAS:
1. ONTOLOGY: Ontologist, define the specific classes for 'CohortStatisticalProfile' and the SHACL shapes for the Dual-Layer pattern.
2. INFRASTRUCTURE: Data Engineer, describe the setup of the Quad Store to support the defined Named Graphs (Core/Expl/Quarantine).
3. VALIDATION: Validator, define the 'Quarantine' triggers. What specific data error moves a record to the Quarantine graph?

OUTPUT GOAL: Concrete specifications for the Ontology file and the Database setup scripts.
"""

print("Phase 1 Agenda defined.")

In [None]:
# Run the Meeting
print("Starting Phase I Meeting...")

chat_session_phase1 = run_meeting(
    meeting_type="team",
    agenda=phase_1_agenda,
    topic="Phase I: Semantic Foundation",
    team_lead=PRINCIPAL_INVESTIGATOR,
    team_members=TEAM_MEMBERS,
    num_rounds=2,
    model_name=MODEL_PRO
)

print("Meeting Complete.")

In [None]:
# Save Discussion
discussion_dir_p1 = os.path.join("../discussions", "phase_1")
os.makedirs(discussion_dir_p1, exist_ok=True)

timestamp_p1 = datetime.now().strftime("%Y%m%d_%H%M%S")
filename_p1 = f"phase_1_{timestamp_p1}.md"
filepath_p1 = os.path.join(discussion_dir_p1, filename_p1)

history_p1 = chat_session_phase1.get_history()

with open(filepath_p1, "w") as f:
    f.write(f"# Discussion Log: Phase I - Semantic Foundation\n")
    f.write(f"**Date**: {timestamp_p1}\n\n")
    
    for turn in history_p1:
         role = turn.role
         text = turn.parts[0].text if turn.parts else "[No Content]"
         f.write(f"### {role}\n{text}\n\n")

print(f"Discussion saved to: {filepath_p1}")

## 4. Phase I Execution: Code Generation

**Objective**: Generate the actual code artifacts (Ontology, Schema Specifications, Validation Logic) based on Phase I decisions.
**Participants**: Individual meetings for each Specialist.

In [None]:
# Official Phase I Deployment Specification
deployment_spec = """
--- OFFICIAL PHASE I DEPLOYMENT SPECIFICATION ---
1. ARCHITECTURE:
   - Hybrid Validation: Arithmetic/Logic in Python ETL. Structural Integrity/Entity Resolution in Quad Store (SHACL).
   - Retrieval: Ephemeral microservice for RAG indexing; no search metadata in the DB.
   - Scoring: Evidence Confidence Score (Float 0.0-1.0) = (0.6 * Significance) + (0.4 * Power).

2. INFRASTRUCTURE & NAMED GRAPHS:
   - Engine: Quad Store (Virtuoso/GraphDB).
   - Namespace: alzkb: -> http://alzkb.org/ontology/v1#
   - Graphs:
     - Core: http://alzkb.org/graph/core
     - Exploratory: http://alzkb.org/graph/exploratory
     - Quarantine: http://alzkb.org/graph/quarantine

3. ONTOLOGY (alzkb-ontology-v1.owl):
   - Class: CohortStatisticalProfile (subClassOf obo:STATO_0000039).
   - Props: hasMeanValue (double), hasPValue (double), hasSampleSize (int), computedConfidenceScore (double).
   - Individual: APOE_e4 (owl:sameAs http://alzkb.org/data/allele/APOE_e4).

4. VALIDATION (Hybrid):
   - Python ETL: Check temporal causality (diagnosis > birth) and range (0 <= MMSE <= 30).
   - SHACL: NormalizedValueShape -> minCount 1 prov:wasDerivedFrom. GenomicVariantShape -> Pattern match canonical URI.
--------------------------------------------------
"""

# Common Code Generation Agenda
codegen_agenda_template = f"""{BACKGROUND_PROMPT}
{deployment_spec}
{CODE_GENERATION_RULES}

TASK: Generate the specific code/specifications for your domain based on the Official Spec above.
"""

print("Code Generation Readiness Check Complete.")

In [None]:
# 1. ONTOLOGIST: Generate Ontology File (TTL/OWL)
print("Starting Code Gen: Ontology...")

agenda_onto = codegen_agenda_template + "\nFOCUS: Generate the full 'alzkb-ontology-v1.owl' file in Turtle syntax (.ttl). Include all classes, properties, and the SHACL shapes defined in the spec."

session_onto = run_meeting(
    meeting_type="individual",
    agenda=agenda_onto,
    topic="Code Gen: Ontology",
    team_member=ONTOLOGIST,
    num_rounds=1,
    model_name=MODEL_PRO
)

# 2. KG ENGINEER: Generate Infrastructure Specs (SPARQL/Python)
print("\nStarting Code Gen: Infrastructure...")

agenda_infra = codegen_agenda_template + "\nFOCUS: Generate the Python code for the 'Evidence Confidence Score' calculation (the 'calculate_confidence' function) and the Quad Store setup commands (SPARQL text to create graphs)."

session_infra = run_meeting(
    meeting_type="individual",
    agenda=agenda_infra,
    topic="Code Gen: Infrastructure",
    team_member=KG_ENGINEER,
    num_rounds=1,
    model_name=MODEL_PRO
)

# 3. VALIDATION SCIENTIST: Generate Validation Logic (Python)
print("\nStarting Code Gen: Validation...")

agenda_val = codegen_agenda_template + "\nFOCUS: Generate the Python 'validate_row' function for the ETL layer (checking temporal causality and ranges) and the SPARQL query for the 'ATN Stress Test'."

session_val = run_meeting(
    meeting_type="individual",
    agenda=agenda_val,
    topic="Code Gen: Validation",
    team_member=VALIDATION_SCIENTIST,
    num_rounds=1,
    model_name=MODEL_PRO
)

In [None]:
# Save All Code Gen Discussions
discussion_dir_code = os.path.join("../discussions", "phase_1_execution")
os.makedirs(discussion_dir_code, exist_ok=True)

timestamp_code = datetime.now().strftime("%Y%m%d_%H%M%S")

# Map agents to sessions
sessions = [
    ("ontology", session_onto),
    ("infrastructure", session_infra),
    ("validation", session_val)
]

for name, session in sessions:
    filename = f"{name}_{timestamp_code}.md"
    filepath = os.path.join(discussion_dir_code, filename)
    history = session.get_history()
    
    with open(filepath, "w") as f:
        f.write(f"# Code Gen Log: {name.upper()}\n")
        f.write(f"**Date**: {timestamp_code}\n\n")
        for turn in history:
             role = turn.role
             text = turn.parts[0].text if turn.parts else "[No Content]"
             f.write(f"### {role}\n{text}\n\n")
    print(f"Saved {name} discussion to: {filepath}")

## 5. Phase II: Context-Aware Ingestion

**Objective**: Design the Data Ingestion Pipelines for ADNI and AMP-AD using the Dual-Layer strategy.
**Participants**: The Whole Team.

In [None]:
# Define Context for Phase II
phase_2_context = f"""
--- PHASE I STATUS: COMPLETE ---
1. Ontology: 'alzkb-ontology-v1.owl' deployed (Includes 'CohortStatisticalProfile', 'hasExposure', 'hasOutcome').
2. Infrastructure: Named Graphs (Core/Expl/Quarantine) initialized.
3. Validation: 'QualityEngine' (Python) and 'ATN Stress Test' (SPARQL) verified.

--- PHASE II GOALS ---
1. INGESTION: Ingest ADNI and AMP-AD datasets.
2. STRATEGY: Use Dual-Layer (Raw + Cohort-Normalized) pattern.
3. QUALITY CONSTRAINT: The 'Quarantine' graph must contain < 5% of records. High failure rates indicate ETL flaws.
"""

# Define Agenda for Phase II
phase_2_agenda = f"""{BACKGROUND_PROMPT}
{phase_2_context}
TASK: Design the concrete Data Ingestion Pipeline for Phase II.

FOCUS AREAS:
1. DATA MAPPING (Ontologist): How do we map ADNI-specific CSV columns (e.g., 'PTGENDER', 'AB42_raw') to our Ontology classes?
2. ETL LOGIC (KG Engineer): Define the Python pseudo-code that transforms a CSV row into the Dual-Layer RDF pattern (creating both the Instance Node and the Metadata Node).
3. QUARANTINE GATES (Validation Scientist): Define the precise thresholds for the 'Quarantine' logic. When exactly do we reject a record?

OUTPUT GOAL: A finalized ETL specification and Python pseudo-code for the ingestion script.
"""

print("Phase 2 Agenda defined.")

In [None]:
# Run the Meeting
print("Starting Phase II Meeting...")

chat_session_phase2 = run_meeting(
    meeting_type="team",
    agenda=phase_2_agenda,
    topic="Phase II: Context-Aware Ingestion",
    team_lead=PRINCIPAL_INVESTIGATOR,
    team_members=TEAM_MEMBERS,
    num_rounds=2,
    model_name=MODEL_PRO
)

print("Meeting Complete.")

In [None]:
# Save Discussion
discussion_dir_p2 = os.path.join("../discussions", "phase_2")
os.makedirs(discussion_dir_p2, exist_ok=True)

timestamp_p2 = datetime.now().strftime("%Y%m%d_%H%M%S")
filename_p2 = f"phase_2_{timestamp_p2}.md"
filepath_p2 = os.path.join(discussion_dir_p2, filename_p2)

history_p2 = chat_session_phase2.get_history()

with open(filepath_p2, "w") as f:
    f.write(f"# Discussion Log: Phase II - Context-Aware Ingestion\n")
    f.write(f"**Date**: {timestamp_p2}\n\n")
    
    for turn in history_p2:
         role = turn.role
         text = turn.parts[0].text if turn.parts else "[No Content]"
         f.write(f"### {role}\n{text}\n\n")

print(f"Discussion saved to: {filepath_p2}")

## 6. Phase III: Validation & Entity Resolution

**Objective**: Validate biological plausibility and resolve entities into a unified graph.
**Participants**: The Whole Team.

In [None]:
# Define Context for Phase III
phase_3_context = f"""
--- PHASE II STATUS: COMPLETE ---
1. Ingestion: 'ingest_csf.py' verified. Handles ADNI CSF data with Dual-Layer pattern.
2. Inference: 'dynamic_inference.sparql' verifies as 'High Confidence'.
3. Quality: Filters correctly reject invalid matrices (Plasma) and negative values.

--- PHASE III GOALS ---
1. ENTITY RESOLUTION: Map diverse identifiers (Uniprot, dbSNP, ADNI_RID) to single canonical URIs.
2. BIOLOGICAL VALIDATION: Run the 'Cumulative ATN Stress Test'. The graph MUST show a positive correlation between p-tau and amyloid in the Core graph.
3. TIERING: Resolve 'Tier 2' GWAS signals into 'Tier 1' candidates if they have multi-omic support.
"""

# Define Agenda for Phase III
phase_3_agenda = f"""{BACKGROUND_PROMPT}
{phase_3_context}
TASK: Design the Validation & Resolution Layer (Phase III).

FOCUS AREAS:
1. RESOLUTION RULES (Ontologist): Define the 'owl:sameAs' logic. How do we treat an ADNI patient who appears in another dataset?
2. MASTER VALIDATION (Validator): We have the unit test. Now design the 'Cumulative Stress Test'. How do we validate 1 million nodes for biological plausibility without crashing the DB?
3. TIER PROMOTION (KG Engineer): Write the logic to promote a 'Tier 2' gene to 'Tier 1' based on graph topology (e.g., if it connects to >3 confirmed biomarkers).

OUTPUT GOAL: A set of SPARQL/Python rules for Entity Resolution and Graph Tiering.
"""

print("Phase 3 Agenda defined.")

In [None]:
# Run the Meeting
print("Starting Phase III Meeting...")

chat_session_phase3 = run_meeting(
    meeting_type="team",
    agenda=phase_3_agenda,
    topic="Phase III: Validation & Entity Resolution",
    team_lead=PRINCIPAL_INVESTIGATOR,
    team_members=TEAM_MEMBERS,
    num_rounds=2,
    model_name=MODEL_PRO
)

print("Meeting Complete.")

In [None]:
# Save Discussion
discussion_dir_p3 = os.path.join("../discussions", "phase_3")
os.makedirs(discussion_dir_p3, exist_ok=True)

timestamp_p3 = datetime.now().strftime("%Y%m%d_%H%M%S")
filename_p3 = f"phase_3_{timestamp_p3}.md"
filepath_p3 = os.path.join(discussion_dir_p3, filename_p3)

history_p3 = chat_session_phase3.get_history()

with open(filepath_p3, "w") as f:
    f.write(f"# Discussion Log: Phase III - Validation & Entity Resolution\n")
    f.write(f"**Date**: {timestamp_p3}\n\n")
    
    for turn in history_p3:
         role = turn.role
         text = turn.parts[0].text if turn.parts else "[No Content]"
         f.write(f"### {role}\n{text}\n\n")

print(f"Discussion saved to: {filepath_p3}")

## 7. Phase III Execution: Haplotype Computer

**Objective**: Generate the Python logic for the Haplotype Computer (Action Item 2 from Phase III).
**Participants**: Individual Meeting with KG Engineer.

In [None]:
# Haplotype Computer Context
haplotype_agenda = f"""{BACKGROUND_PROMPT}
CONTEXT: In Phase III, we agreed to implement a 'Haplotype Computer' in the ETL layer.
GOAL: Materialize specific Genotype nodes (e.g., alz:APOE_e4e4) based on raw SNPs (rs429358, rs7412).

TASK: Write the Python function `compute_haplotype(rs429358_val, rs7412_val)`.

RULES:
1. INPUTS: Raw allele strings (e.g., 'C/C', 'C/T', 'T/T') for the two SNPs.
2. LOGIC:
   - rs429358 (T=Cys, C=Arg? Check standard) + rs7412 -> Determine e2, e3, e4.
   - Standard APOE mapping:
     - e2: rs429358(T) + rs7412(T)
     - e3: rs429358(T) + rs7412(C)
     - e4: rs429358(C) + rs7412(C)
   - (Verify these biological mappings in your response).
3. OUTPUT: The correct Ontology URI (e.g., alzkb:APOE_e4e4) and risk profile.

{CODE_GENERATION_RULES}
"""

print("Haplotype Grid Ready.")

In [None]:
# Run the Meeting
print("Starting Haplotype Computer Code Gen...")

session_haplo = run_meeting(
    meeting_type="individual",
    agenda=haplotype_agenda,
    topic="Code Gen: Haplotype Computer",
    team_member=KG_ENGINEER,
    num_rounds=1,
    model_name=MODEL_PRO
)

print("Meeting Complete.")

In [None]:
# Save Discussion
discussion_dir_haplo = os.path.join("../discussions", "phase_3_execution")
os.makedirs(discussion_dir_haplo, exist_ok=True)

timestamp_haplo = datetime.now().strftime("%Y%m%d_%H%M%S")
filename_haplo = f"haplotype_computer_{timestamp_haplo}.md"
filepath_haplo = os.path.join(discussion_dir_haplo, filename_haplo)

history_haplo = session_haplo.get_history()

with open(filepath_haplo, "w") as f:
    f.write(f"# Code Gen Log: Haplotype Computer\n")
    f.write(f"**Date**: {timestamp_haplo}\n\n")
    
    for turn in history_haplo:
         role = turn.role
         text = turn.parts[0].text if turn.parts else "[No Content]"
         f.write(f"### {role}\n{text}\n\n")

print(f"Discussion saved to: {filepath_haplo}")

## 8. Phase IV: Retrieval Optimization

**Objective**: Enable RAG Capabilities and Vector Indexing using Narrative Templates.
**Participants**: The Whole Team.

In [None]:
# Define Context for Phase IV
phase_4_context = f"""
--- PHASE III STATUS: COMPLETE ---
1. Validation: 'test_atn_biology.py' verified (Mann-Whitney U passed).
2. Resolution: 'haplotype_computer.py' verified (Proper strand flipping and confidence).
3. Tiering: 'tier_promotion.sparql' deployed.

--- PHASE IV GOALS ---
1. RAG ENABLEMENT: Create the 'Narrative Document' template to convert RDF subgraphs into text for embedding.
2. INDEXING STRATEGY: Define the Trigger (Post-Ingestion) and Schema (Flattened Vector Index).
3. USER MODES: Implement 'Standard' (Core Graph Only) vs 'Exploratory' (Include Tier 2 + Warnings) retrieval modes.
"""

# Define Agenda for Phase IV
phase_4_agenda = f"""{BACKGROUND_PROMPT}
{phase_4_context}
TASK: Design the Retrieval & RAG System (Phase IV).

FOCUS AREAS:
1. VECTOR WRAPPER (Ontologist/Engineer): We don't want raw triples in the LLM context. Define a Python function `graph_to_text(subject_uri)` that creates a coherent paragraph (Narrative Document) for embedding.
2. EXPLORATORY MODE (Critic): How do we safely expose 'Exploratory' graph data? Define the exact prompt injection or warning label (e.g., "[CAUTION: LOW CONFIDENCE]") that accompanies these chunks.
3. SYSTEM ARCHITECTURE (Engineer): Finalize the 'Lambda Architecture'. How does the Vector Store sync with the Quad Store? (e.g., Re-index nightly vs. Event-driven).

OUTPUT GOAL: A specification for the `graph_to_text` function and the RAG System Prompt.
"""

print("Phase 4 Agenda defined.")

In [None]:
# Run the Meeting
print("Starting Phase IV Meeting...")

chat_session_phase4 = run_meeting(
    meeting_type="team",
    agenda=phase_4_agenda,
    topic="Phase IV: Retrieval Optimization",
    team_lead=PRINCIPAL_INVESTIGATOR,
    team_members=TEAM_MEMBERS,
    num_rounds=2,
    model_name=MODEL_PRO
)

print("Meeting Complete.")

In [None]:
# Save Discussion
discussion_dir_p4 = os.path.join("../discussions", "phase_4")
os.makedirs(discussion_dir_p4, exist_ok=True)

timestamp_p4 = datetime.now().strftime("%Y%m%d_%H%M%S")
filename_p4 = f"phase_4_{timestamp_p4}.md"
filepath_p4 = os.path.join(discussion_dir_p4, filename_p4)

history_p4 = chat_session_phase4.get_history()

with open(filepath_p4, "w") as f:
    f.write(f"# Discussion Log: Phase IV - Retrieval Optimization\n")
    f.write(f"**Date**: {timestamp_p4}\n\n")
    
    for turn in history_p4:
         role = turn.role
         text = turn.parts[0].text if turn.parts else "[No Content]"
         f.write(f"### {role}\n{text}\n\n")

print(f"Discussion saved to: {filepath_p4}")

## 9. Phase V: UI & Visualization

**Objective**: Design the User Interface and Visualization Layer for AlzKB.
**Participants**: The Whole Team.

In [None]:
# Define Context for Phase V
phase_5_context = f"""
--- PHASE IV STATUS: COMPLETE ---
1. RAG: 'graph_to_text' implements Safety Stamping and Semantic Restraint.
2. Search: 'hybrid_search' implements RRF with Keyword Boosting.
3. Ontology: 'last_updated' functionality verified.

--- PHASE V GOALS ---
1. DASHBOARD: Provide a User Interface to query the Knowledge Base.
2. VISUALIZATION: Visualize the Graph network (nodes and edges).
3. EVIDENCE VIEWER: Display the RAG-generated Narratives with citations.
"""

# Define Agenda for Phase V
phase_5_agenda = f"""{BACKGROUND_PROMPT}
{phase_5_context}
TASK: Design the UI and Visualization Layer (Phase V).

FOCUS AREAS:
1. DASHBOARD (Engineer): How do we toggle between 'Core' and 'Exploratory' modes in the UI? What is the tech stack (Streamlit/React)?
2. VISUALIZATION (Critic): How do we prevent 'hairballs' when visualizing >100 nodes? We need a force-directed layout strategy that highlights Tier 1 validity.
3. CITATION UX (Ontologist): How do we display the URIs? They must be clickable and resolve to the metadata node.

OUTPUT GOAL: A set of requirements and pseudo-code for the UI application.
"""

print("Phase 5 Agenda defined.")

In [None]:
# Run the Meeting
print("Starting Phase V Meeting...")

chat_session_phase5 = run_meeting(
    meeting_type="team",
    agenda=phase_5_agenda,
    topic="Phase V: UI & Visualization",
    team_lead=PRINCIPAL_INVESTIGATOR,
    team_members=TEAM_MEMBERS,
    num_rounds=2,
    model_name=MODEL_PRO
)

print("Meeting Complete.")

In [None]:
# Save Discussion
discussion_dir_p5 = os.path.join("../discussions", "phase_5")
os.makedirs(discussion_dir_p5, exist_ok=True)

timestamp_p5 = datetime.now().strftime("%Y%m%d_%H%M%S")
filename_p5 = f"phase_5_{timestamp_p5}.md"
filepath_p5 = os.path.join(discussion_dir_p5, filename_p5)

history_p5 = chat_session_phase5.get_history()

with open(filepath_p5, "w") as f:
    f.write(f"# Discussion Log: Phase V - UI & Visualization\n")
    f.write(f"**Date**: {timestamp_p5}\n\n")
    
    for turn in history_p5:
         role = turn.role
         text = turn.parts[0].text if turn.parts else "[No Content]"
         f.write(f"### {role}\n{text}\n\n")

print(f"Discussion saved to: {filepath_p5}")

## 10. Phase VI: Data Sourcing & Real Integration

**Objective**: Replace mock data with real scientific datasets (ADNI/GWAS Catalog) and define real ingestion strategies.
**Participants**: The Whole Team.

In [None]:
# Define Context for Phase VI
phase_6_context = f"""
--- PHASE V STATUS: COMPLETE ---
1. UI: Streamlit Dashboard deployed and verified using mock stubs.
2. Logic: Pruning, RAG, and Safety protocols are verified.

--- PHASE VI GOALS ---
1. REAL DATA: Transition from 'MockNode' to real datasets. 
2. DATASETS: Identify the accessible public datasets (e.g., GWAS Catalog, ADNI Public Subset).
3. IMPLEMENTATION: Define the specific loaders in `alzkb.ingestion` that actually read these files.
"""

# Define Agenda for Phase VI
phase_6_agenda = f"""{BACKGROUND_PROMPT}
{phase_6_context}
TASK: Design the Real Data Integration Strategy (Phase VI).

FOCUS AREAS:
1. DATA SOURCES (Critic): Which public datasets are safe to use without complex DUA blocking us right now? (Suggest: GWAS Catalog, ClinVar).
2. INGESTION MAP (Data Engineer): How do we replace `GraphDriver` stubs with a real NetworkX or RDFLib graph populated from these CSVs?
3. VALIDATION (Validator): How do we ensure the 'Real Data' doesn't break our strict Safety Stamps?

OUTPUT GOAL: A list of target datasets and a plan to replace the Mock Backend.
"""

print("Phase 6 Agenda defined.")

In [None]:
# Run the Meeting
print("Starting Phase VI Meeting...")

chat_session_phase6 = run_meeting(
    meeting_type="team",
    agenda=phase_6_agenda,
    topic="Phase VI: Data Resources",
    team_lead=PRINCIPAL_INVESTIGATOR,
    team_members=TEAM_MEMBERS,
    num_rounds=2,
    model_name=MODEL_PRO
)

print("Meeting Complete.")

In [None]:
# Save Discussion
discussion_dir_p6 = os.path.join("../discussions", "phase_6")
os.makedirs(discussion_dir_p6, exist_ok=True)

timestamp_p6 = datetime.now().strftime("%Y%m%d_%H%M%S")
filename_p6 = f"phase_6_{timestamp_p6}.md"
filepath_p6 = os.path.join(discussion_dir_p6, filename_p6)

history_p6 = chat_session_phase6.get_history()

with open(filepath_p6, "w") as f:
    f.write(f"# Discussion Log: Phase VI - Data Resources\n")
    f.write(f"**Date**: {timestamp_p6}\n\n")
    
    for turn in history_p6:
         role = turn.role
         text = turn.parts[0].text if turn.parts else "[No Content]"
         f.write(f"### {role}\n{text}\n\n")

print(f"Discussion saved to: {filepath_p6}")

## 11. Phase VII: Persistent Storage & RDF Export

**Objective**: Define how to save the In-Memory Graph to persistent files and connect the UI.
**Participants**: The Whole Team.

In [None]:
# Define Context for Phase VII
phase_7_context = f"""
--- PHASE VI STATUS: COMPLETE ---
1. Data: Successfully downloaded GWAS Catalog and ClinVar.
2. Ingestion: `ingest_gwas.py` works and passes validation (APOE signal found).
3. Current State: The graph handles >1000 nodes but only lives in memory during the script run.

--- PHASE VII GOALS ---
1. PERSISTENCE: Save the NetworkX graph to disk. Options: GraphML (easier for Python) or Turtle/RDF (Standard for Knowledge Graph).
2. BACKEND INTEGRATION: Update `alzkb.backend.GraphDriver` to load this file instead of generating mock stubs.
3. EXPORT: Ensure the data is portable for the UI.
"""

# Define Agenda for Phase VII
phase_7_agenda = f"""{BACKGROUND_PROMPT}
{phase_7_context}
TASK: Design the Persistent Storage Layer (Phase VII).

FOCUS AREAS:
1. FILE FORMAT (Ontologist): We built this with BioLink. Should we export as Turtle (.ttl) to be true to the semantic vision?
2. LOADING STRATEGY (Engineer): How do we make the Streamlit app load this 1000+ node graph efficiently on startup? Should we cache it?
3. END-TO-END FLOW (Lead): Confirm the path from `run_real_ingestion.py` -> `graph.graphml` -> `app.py`.

OUTPUT GOAL: Specification for the export function and the updated Backend Driver.
"""

print("Phase 7 Agenda defined.")

In [None]:
# Run the Meeting
print("Starting Phase VII Meeting...")

chat_session_phase7 = run_meeting(
    meeting_type="team",
    agenda=phase_7_agenda,
    topic="Phase VII: Persistent Storage",
    team_lead=PRINCIPAL_INVESTIGATOR,
    team_members=TEAM_MEMBERS,
    num_rounds=2,
    model_name=MODEL_PRO
)

print("Meeting Complete.")

In [None]:
# Save Discussion
discussion_dir_p7 = os.path.join("../discussions", "phase_7")
os.makedirs(discussion_dir_p7, exist_ok=True)

timestamp_p7 = datetime.now().strftime("%Y%m%d_%H%M%S")
filename_p7 = f"phase_7_{timestamp_p7}.md"
filepath_p7 = os.path.join(discussion_dir_p7, filename_p7)

history_p7 = chat_session_phase7.get_history()

with open(filepath_p7, "w") as f:
    f.write(f"# Discussion Log: Phase VII - Persistent Storage\n")
    f.write(f"**Date**: {timestamp_p7}\n\n")
    
    for turn in history_p7:
         role = turn.role
         text = turn.parts[0].text if turn.parts else "[No Content]"
         f.write(f"### {role}\n{text}\n\n")

print(f"Discussion saved to: {filepath_p7}")