# Judge's Familiar – Introducción del Proyecto

**Judge's Familiar** es un asistente de IA diseñado para actuar como el compañero definitivo de reglas para jugadores de *Magic: The Gathering*. Más que un simple buscador, funciona como un **experto consultor en tiempo real**, capaz de interpretar dudas en lenguaje natural y explicar interacciones complejas basándose estrictamente en la documentación oficial.

El núcleo del sistema utiliza una arquitectura **RAG (Retrieval-Augmented Generation)** que combina:

* Las **Comprehensive Rules (CR)** oficiales de Magic (incluyendo el Glosario).
* El **Oracle text** de la base de datos de cartas (vía MTGJSON).

A diferencia de los LLMs genéricos que pueden "alucinar" reglas, *Judge's Familiar* recupera las normas exactas y construye una respuesta razonada, garantizando **trazabilidad y precisión**.

## Objetivo de la Versión 1 (V1)

La primera versión se define como un **"Pocket Companion"** (Compañero de Bolsillo) enfocado en la resolución de dudas técnicas.

* **Entrada Multimodal:**
    * Texto (consultas directas).
    * Voz (transcripción automática mediante modelo **Whisper**).
* **Salida Transparente:**
    * Explicación pedagógica de la interacción ("*Por qué* ocurre esto").
    * **Citas explícitas obligatorias** (ej. `[702.19b]`, `[510.1c]`).
    * Referencias cruzadas entre definiciones del Glosario y reglas numéricas.
* **Filosofía de Diseño:**
    * **Objetividad:** El sistema explica la mecánica, no juzga la conducta de los jugadores.
    * **Rigor:** Si la información no existe en las reglas recuperadas, el sistema lo indica en lugar de inventar.

*Judge's Familiar* no pretende reemplazar al juez humano en política de torneos o disputas de conducta; su misión es **democratizar el acceso a las reglas**, permitiendo partidas más fluidas y justas tanto en entornos casuales como competitivos.

## Arquitectura Técnica

El sistema se estructura en un pipeline de cuatro capas:

1.  **Ingesta de Datos (ETL)**
    * Parsing atómico de las *Comprehensive Rules* y el *Glosario* (1 regla = 1 nodo).
    * Indexación de cartas y textos Oracle.
2.  **Índices Semánticos**
    * Base de datos vectorial optimizada para búsquedas de similitud (embeddings).
3.  **Motor RAG + LLM**
    * Recuperación híbrida de reglas y definiciones.
    * Generación de respuesta con *Prompt Engineering* estricto (rol de Asistente Nivel 3).
    * Uso de modelos eficientes (`gpt-4o-mini`) con temperatura 0.0 para máxima fidelidad.
4.  **Interfaz de Usuario**
    * Chat web responsive (Desktop/Móvil).
    * Visualización clara de la respuesta separada de las fuentes técnicas.

## Escalabilidad (Roadmap)

La arquitectura modular permite futuras extensiones sin reescribir el núcleo:

* **Reconocimiento Visual:** Identificación de cartas físicas mediante cámara (OCR/Image Recognition).
* **Búsqueda Semántica por Arte:** Localización de cartas basada en descripciones visuales (*"bestia verde con tres cabezas"*).
* **Agentes de Decisión:** Una capa superior opcional para simular juicios complejos encadenados.

## Experiencia Interactiva

Como parte de la presentación, se entregarán copias físicas de la carta *Judge's Familiar* modificadas con un **código QR dinámico**. Esto permitirá a la audiencia y al jurado escanear la carta y **probar el sistema en vivo desde sus propios dispositivos**, cerrando la brecha entre el juego físico (Tabletop) y la asistencia digital.

In [None]:
# ---------------------------------------------------------
# JUDGE'S FAMILIAR - PRODUCTION V1
# ---------------------------------------------------------

!pip install -q python-dotenv jedi llama_index llama-index-embeddings-openai llama-index-llms-openai

import os
import re
import json
import unicodedata
import requests
import time
from pathlib import Path
from typing import List, Dict, Any, Generator, Tuple

from dotenv import load_dotenv

from llama_index.core import (
    Document,
    VectorStoreIndex,
    Settings,
    StorageContext,
    load_index_from_storage,
)
from llama_index.core.retrievers import VectorIndexRetriever
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.llms.openai import OpenAI
from llama_index.core.llms import ChatMessage, MessageRole

# --- CONFIGURATION ---
LLM_MODEL_NAME = "gpt-4o-mini"
EMBEDDING_MODEL_NAME = "text-embedding-3-small"

# --- 1. SETUP & AUTH ---
def load_openai_key():
    try:
        from google.colab import userdata
        key = userdata.get("OPENAI_API_KEY")
        if key:
            os.environ["OPENAI_API_KEY"] = key
            return
    except ImportError:
        pass
    load_dotenv()
    # Pass allows manual input later or graceful fail if handled elsewhere
    pass

load_openai_key()

# --- 2. DATA CONSTANTS & PARSING ---
BASE_DIR = Path(".").resolve()
DATA_DIR = BASE_DIR / "data"
DATA_DIR.mkdir(exist_ok=True)
PERSIST_DIR = DATA_DIR / "storage"

CR_TXT_PATH = DATA_DIR / "magic_comprehensive_rules.txt"
CR_RULES_JSONL = DATA_DIR / "magic_comprehensive_rules_rules.jsonl"
CR_URL = "https://media.wizards.com/2025/downloads/MagicCompRules%2020251114.txt"

# Metadata Mapping
CHAPTERS = {
    "1": "Game Concepts",
    "2": "Parts of a Card",
    "3": "Card Types",
    "4": "Zones",
    "5": "Turn Structure",
    "6": "Spells, Abilities, and Effects",
    "7": "Additional Rules",
    "8": "Multiplayer Rules",
    "9": "Casual Variants",
}

def clean_unicode(s: str) -> str:
    """Normalizes text to handle special characters consistently."""
    s = unicodedata.normalize("NFC", s)
    s = (s.replace("\u00A0", " ").replace("\u2019", "'").replace("\u2018", "'")
           .replace("\u201C", '"').replace("\u201D", '"')
           .replace("\u2013", "-").replace("\u2014", "-").replace("\u2026", "..."))
    return "".join(ch for ch in s if (unicodedata.category(ch) not in {"Cf", "Cc", "Cs", "Co", "Cn"} or ch in ("\n", "\t")))

def parse_complete_rules(txt_path: Path) -> List[Dict[str, Any]]:
    """
    Parses Numbered Rules line-by-line.
    Parses Glossary by isolating the block and splitting by double-newlines.
    """
    text = txt_path.read_text(encoding="utf-8")
    lines = text.splitlines()

    # Regex Patterns
    chapter_pattern = re.compile(r"^([1-9])\.\s+(.*)$")        # "1. Game Concepts"
    section_pattern = re.compile(r"^(\d{3})\.\s+(.*)$")        # "100. General"
    rule_pattern = re.compile(r"^(\d{3}\.\d+[a-z]?)\.?\s+(.*)$") # "100.1. Text..."

    nodes = []

    # Context Trackers
    current_chapter_id = "1"
    current_chapter_title = "Game Concepts"
    current_section_id = "100"
    current_section_title = "General"

    current_rule_id = None
    current_rule_lines = []

    # Logic Flags
    glossary_mode = False
    glossary_lines = []
    rules_parsed_count = 0

    print("[LOG] Starting Parse Process...")

    for line in lines:
        stripped = line.strip()
        if not stripped:
            # If we are in glossary mode, keep blank lines to allow splitting later
            if glossary_mode:
                glossary_lines.append(line)
            continue

        # --- A. DETECT GLOSSARY START ---
        if stripped == "Glossary":
            # Avoid TOC: Only switch if we have parsed many rules
            if rules_parsed_count > 100:
                # Flush the last numbered rule
                if current_rule_id:
                    nodes.append({
                        "rule_id": current_rule_id,
                        "chapter_id": current_chapter_id,
                        "chapter_title": current_chapter_title,
                        "section_id": current_section_id,
                        "section_title": current_section_title,
                        "text": "\n".join(current_rule_lines).strip()
                    })
                    current_rule_id = None

                glossary_mode = True
                print("[LOG] Real Glossary section detected. Switching mode...")
                continue
            else:
                continue # TOC Glossary

        # --- B. DETECT CREDITS (STOP) ---
        if stripped == "Credits":
            if glossary_mode:
                print("[LOG] Credits detected. Stopping parse.")
                break
            continue

        # --- C. PARSE NUMBERED RULES ---
        if not glossary_mode:
            # Chapter Header
            m_chap = chapter_pattern.match(line)
            if m_chap:
                current_chapter_id = m_chap.group(1)
                current_chapter_title = m_chap.group(2).strip()
                continue

            # Section Header
            m_sec = section_pattern.match(line)
            if m_sec:
                current_section_id = m_sec.group(1)
                current_section_title = m_sec.group(2).strip()
                continue

            # Rule Line
            m_rule = rule_pattern.match(line)
            if m_rule:
                if current_rule_id:
                    nodes.append({
                        "rule_id": current_rule_id,
                        "chapter_id": current_chapter_id,
                        "chapter_title": current_chapter_title,
                        "section_id": current_section_id,
                        "section_title": current_section_title,
                        "text": "\n".join(current_rule_lines).strip()
                    })

                current_rule_id = m_rule.group(1)

                # Fallbacks for metadata
                if not current_section_id:
                    current_section_id = current_rule_id.split('.')[0]
                if not current_chapter_id:
                    current_chapter_id = current_section_id[0]
                    current_chapter_title = CHAPTERS.get(current_chapter_id, "Unknown")

                current_rule_lines = [m_rule.group(2).strip()]
                rules_parsed_count += 1

            elif current_rule_id:
                current_rule_lines.append(line)

        # --- D. COLLECT GLOSSARY RAW TEXT ---
        else:
            glossary_lines.append(line)

    # --- E. PROCESS GLOSSARY BLOCK ---
    if glossary_lines:
        print("[LOG] Processing Glossary Block...")
        full_glossary_text = "\n".join(glossary_lines)

        # Split by double newlines (Standard separation in this file)
        entries = re.split(r'\n\s*\n', full_glossary_text.strip())

        for entry in entries:
            clean_entry = entry.strip()
            if not clean_entry: continue

            # Split into Term (Line 1) and Definition (Rest)
            parts = clean_entry.split('\n', 1)
            term = parts[0].strip()

            if len(parts) > 1:
                definition = parts[1].strip()
            else:
                definition = term # Fallback if single line

            # Create Clean Glossary Node
            nodes.append({
                "rule_id": term,
                "chapter_id": "G",       # ID is "G"
                "chapter_title": "Glossary", # Title is "Glossary"
                "section_id": None,      # Explicit None (will show as null in JSON)
                "section_title": None,
                "text": definition
            })

    return nodes

# --- 3. INDEX FACTORY ---
def get_or_build_index():
    if not CR_TXT_PATH.exists():
        print(f"[LOG] Downloading Rules...")
        resp = requests.get(CR_URL, timeout=60)
        CR_TXT_PATH.write_text(clean_unicode(resp.content.decode("utf-8-sig")), encoding="utf-8")

    Settings.embed_model = OpenAIEmbedding(model=EMBEDDING_MODEL_NAME)
    Settings.llm = OpenAI(model=LLM_MODEL_NAME, temperature=0.0)

    print("[LOG] Parsing raw TXT file...")
    rules_data = parse_complete_rules(CR_TXT_PATH)

    # Save to JSONL
    with CR_RULES_JSONL.open("w", encoding="utf-8") as f:
        for r in rules_data: f.write(json.dumps(r, ensure_ascii=False) + "\n")

    # --- SANITY CHECKS ---
    rule_nodes = [r for r in rules_data if r['chapter_id'] != 'G']
    glossary_nodes = [r for r in rules_data if r['chapter_id'] == 'G']

    print("\n" + "="*50)
    print("DATASET STATISTICS")
    print(f"   Total Nodes:      {len(rules_data)}")
    print(f"   Numbered Rules:   {len(rule_nodes)}")
    print(f"   Glossary Terms:   {len(glossary_nodes)}")
    print("="*50)

    print("\n[SAMPLE RULE NODE]")
    # Find 100.1
    sample_rule = next((r for r in rule_nodes if r['rule_id'] == "100.1"), rule_nodes[0] if rule_nodes else None)
    print(json.dumps(sample_rule, indent=2))

    print("\n[SAMPLE GLOSSARY NODE]")
    # Find "Abandon"
    sample_glossary = next((r for r in glossary_nodes if r['rule_id'] == "Abandon"), glossary_nodes[0] if glossary_nodes else None)
    print(json.dumps(sample_glossary, indent=2))
    print("="*50 + "\n")
    # ---------------------

    documents = []
    for r in rules_data:
        doc = Document(
            text=r["text"],
            metadata={
                "rule_id": r["rule_id"],
                "chapter_id": r["chapter_id"],
                "chapter_title": r["chapter_title"],
                "section_id": r["section_id"],
                "section_title": r["section_title"]
            },
            excluded_embed_metadata_keys=["chapter_id", "section_id", "chapter_title", "section_title"],
            excluded_llm_metadata_keys=["chapter_id", "section_id"]
        )
        documents.append(doc)

    if PERSIST_DIR.exists() and any(PERSIST_DIR.iterdir()):
        print("[LOG] Loading vector index from storage...")
        return load_index_from_storage(StorageContext.from_defaults(persist_dir=str(PERSIST_DIR)))

    print("[LOG] Building new Vector Index (First Run)...")
    index = VectorStoreIndex.from_documents(documents, show_progress=True)
    PERSIST_DIR.mkdir(exist_ok=True)
    index.storage_context.persist(persist_dir=str(PERSIST_DIR))
    return index

# Force rebuild to ensure schema update
if PERSIST_DIR.exists():
    import shutil
    shutil.rmtree(str(PERSIST_DIR))

rules_index = get_or_build_index()

# --- 4. ENGINE ---
class MagicJudgeEngine:
    def __init__(self, index: VectorStoreIndex, k: int = 8):
        self.retriever = VectorIndexRetriever(index=index, similarity_top_k=k)
        self.llm = Settings.llm

    def query(self, user_question: str) -> Tuple[Any, List[Dict]]:
        # 1. Retrieve
        nodes = self.retriever.retrieve(user_question)

        # 2. Context Construction
        context_parts = []
        for n in nodes:
            rid = n.metadata['rule_id']
            # If Glossary (Chapter G), format specially
            if n.metadata.get('chapter_id') == 'G':
                context_parts.append(f"[Glossary: {rid}] {n.text}")
            else:
                context_parts.append(f"[Rule {rid}] {n.text}")

        context_str = "\n\n".join(context_parts)

        # 3. Prompt
        system_msg = (
            "You are 'Judge's Familiar', a helpful Magic: The Gathering rules assistant.\n"
            "Your goal is to explain interactions clearly using the provided Comprehensive Rules and Glossary.\n\n"
            "GUIDELINES:\n"
            "1. Primary Source: Base your answer on the Numbered Rules (e.g. 702.2) whenever possible.\n"
            "2. Secondary Source: Use Glossary definitions to explain complex terms simply.\n"
            "3. Citations: You MUST cite specific rule numbers (e.g. [702.2b]) used in your answer.\n"
            "4. Objective: Explain the mechanics objectively. Do not judge player behavior.\n"
            "5. Output: End with 'Sources: [Rule IDs]'."
        )

        user_msg = (
            f"Reference Rules & Glossary:\n{context_str}\n\n"
            f"User Question: {user_question}\n\n"
            "Explanation:"
        )

        stream_response = self.llm.stream_chat([
            ChatMessage(role=MessageRole.SYSTEM, content=system_msg),
            ChatMessage(role=MessageRole.USER, content=user_msg)
        ])

        # 4. Extract Sources
        sources = []
        for n in nodes:
            sources.append({
                "rule_id": n.metadata['rule_id'],
                "chapter": n.metadata.get('chapter_title'),
                "section": n.metadata.get('section_title'),
                "score": n.score
            })

        return stream_response, sources

# --- 5. EXECUTION ---
judge = MagicJudgeEngine(rules_index)

def ask_judge(question):
    print(f"\nQuestion: {question}")

    t_start = time.time()
    stream_obj, sources = judge.query(question)
    latency = time.time() - t_start

    print("Familiar's Answer: ", end="")
    try:
        response_gen = stream_obj.response_gen if hasattr(stream_obj, "response_gen") else stream_obj
        for token in response_gen:
            content = token.delta if hasattr(token, "delta") else str(token)
            print(content, end="", flush=True)
    except Exception as e:
        print(f"\nError: {e}")

    # Logs
    print(f"\n\n[LOG] Retrieval Time: {latency:.3f}s")
    print("[LOG] Sources Used:")
    for s in sources:
        # Handle None for Glossary Sections gracefully in logs
        sec = s['section'] if s['section'] else "N/A"
        chap = s['chapter'] if s['chapter'] else "N/A"
        print(f" - [{s['rule_id']}] {sec} (Chapter: {chap})")

[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.6/1.6 MB[0m [31m41.3 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m11.9/11.9 MB[0m [31m98.0 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m303.3/303.3 kB[0m [31m26.1 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m51.8/51.8 kB[0m [31m4.4 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m92.0/92.0 kB[0m [31m8.7 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m63.9/63.9 kB[0m [31m6.5 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m328.3/328.3 kB[0m [31m28.2 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.2/1.2 MB[0m [31m58.6 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Parsing nodes:   0%|          | 0/3834 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/1786 [00:00<?, ?it/s]

In [None]:
ask_judge("What are the steps of the combat phase?")


Question: What are the steps of the combat phase?
Familiar's Answer: The combat phase in Magic: The Gathering consists of five steps, which occur in the following order:

1. **Beginning of Combat Step**: This is the first step of the combat phase where players can cast spells or activate abilities before attackers are declared. [Glossary: Beginning of Combat Step]

2. **Declare Attackers Step**: In this step, the active player declares which creatures they are attacking with. This is the point where creatures are declared as attackers.

3. **Declare Blockers Step**: The defending player then has the opportunity to declare blockers for the attacking creatures. This step is skipped if no creatures were declared as attackers. [Rule 506.1]

4. **Combat Damage Step**: In this step, combat damage is assigned and dealt simultaneously by all attacking and blocking creatures. If any creatures have first strike or double strike, there will be an additional combat damage step for those creatures

In [None]:
ask_judge("If I have a creature with Deathtouch and Trample, how much damage do I need to assign to the blocker?")


Question: If I have a creature with Deathtouch and Trample, how much damage do I need to assign to the blocker?
Familiar's Answer: When you have a creature with both deathtouch and trample attacking, the amount of damage you need to assign to a blocking creature is determined by the rules governing these abilities.

1. **Deathtouch**: Any nonzero amount of damage assigned to a creature by a source with deathtouch is considered lethal damage. This means that even 1 point of damage is sufficient to satisfy the requirement for lethal damage for that creature [702.2c].

2. **Trample**: When a creature with trample is blocked, it must assign enough damage to the blocking creature(s) to meet the lethal damage requirement before it can assign any excess damage to the defending player or planeswalker. In the case of a creature with deathtouch, you only need to assign 1 damage to the blocker to satisfy this requirement [702.19b].

Therefore, if your creature with deathtouch and trample is bloc