# 2.1 Graph Pattern Construction - Ungraph

Este notebook cubre la fase **TRANSFORM** del patr√≥n ETI: c√≥mo construir estructuras de grafo usando patrones predefinidos y personalizados.

## Objetivos

1. **Patrones predefinidos** - FILE_PAGE_CHUNK (default), SIMPLE_CHUNK
2. **Crear patrones personalizados** - Definir tu propia estructura de grafo
3. **Validar patrones** - Verificar que los patrones son v√°lidos
4. **Generar queries Cypher** - Ver los queries generados autom√°ticamente
5. **Usar patrones en ingesta** - Aplicar patrones al ingerir documentos

**Referencias:**
- [Patrones de Grafo](../../docs/concepts/graph-patterns.md)
- [Patrones Personalizados](../../docs/guides/custom-patterns.md)


In [None]:
def add_src_to_path(path_folder: str):
    import sys
    from pathlib import Path
    base_path = Path().resolve()
    for parent in [base_path] + list(base_path.parents):
        candidate = parent / path_folder
        if candidate.exists():
            parent_dir = candidate.parent
            if str(parent_dir) not in sys.path:
                sys.path.insert(0, str(parent_dir))
            if str(candidate) not in sys.path:
                sys.path.append(str(candidate))
            return

add_src_to_path(path_folder="src")
add_src_to_path(path_folder="src/utils")
add_src_to_path(path_folder="src/data")

try:
    import ungraph
except ImportError:
    import src
    ungraph = src

from domain.value_objects.predefined_patterns import FILE_PAGE_CHUNK_PATTERN
from domain.value_objects.graph_pattern import GraphPattern, NodeDefinition, RelationshipDefinition
from infrastructure.services.neo4j_pattern_service import Neo4jPatternService
from src.utils.handlers import find_in_project

print(f"üì¶ Ungraph version: {ungraph.__version__}")


## Parte 1: Patrones Predefinidos

Ungraph incluye patrones predefinidos listos para usar. El patr√≥n por defecto es `FILE_PAGE_CHUNK`.


In [None]:
# Examinar patr√≥n predefinido FILE_PAGE_CHUNK
print("üìã Patr√≥n Predefinido: FILE_PAGE_CHUNK")
print("=" * 80)
print(f"Nombre: {FILE_PAGE_CHUNK_PATTERN.name}")
print(f"Descripci√≥n: {FILE_PAGE_CHUNK_PATTERN.description}")
print(f"\nNodos ({len(FILE_PAGE_CHUNK_PATTERN.node_definitions)}):")
for node_def in FILE_PAGE_CHUNK_PATTERN.node_definitions:
    print(f"  - {node_def.label}")
    print(f"    Propiedades requeridas: {list(node_def.required_properties.keys())}")
    if node_def.optional_properties:
        print(f"    Propiedades opcionales: {list(node_def.optional_properties.keys())}")

print(f"\nRelaciones ({len(FILE_PAGE_CHUNK_PATTERN.relationship_definitions)}):")
for rel_def in FILE_PAGE_CHUNK_PATTERN.relationship_definitions:
    print(f"  - {rel_def.from_node} --[{rel_def.relationship_type}]--> {rel_def.to_node}")

print(f"\nPatrones de b√∫squeda soportados: {FILE_PAGE_CHUNK_PATTERN.search_patterns}")


## Parte 2: Crear Patrones Personalizados

Creemos patrones personalizados seg√∫n nuestras necesidades espec√≠ficas.


### 2.1 Patr√≥n Simple: Solo Chunks

Un patr√≥n minimalista sin estructura jer√°rquica File-Page.


In [None]:
# Crear patr√≥n SIMPLE_CHUNK
simple_chunk_node = NodeDefinition(
    label="Chunk",
    required_properties={
        "chunk_id": str,
        "page_content": str,
        "embeddings": list,
        "embeddings_dimensions": int
    },
    optional_properties={
        "chunk_id_consecutive": int,
        "source_file": str
    },
    indexes=["chunk_id"]
)

SIMPLE_CHUNK_PATTERN = GraphPattern(
    name="SIMPLE_CHUNK",
    description="Solo chunks, sin estructura File-Page. √ötil para documentos simples.",
    node_definitions=[simple_chunk_node],
    relationship_definitions=[],
    search_patterns=["basic", "hybrid"]
)

print("‚úÖ Patr√≥n SIMPLE_CHUNK creado:")
print(f"   Nodos: {[n.label for n in SIMPLE_CHUNK_PATTERN.node_definitions]}")
print(f"   Relaciones: {len(SIMPLE_CHUNK_PATTERN.relationship_definitions)}")


### 2.2 Patr√≥n con Relaciones Secuenciales

Chunks conectados con relaciones NEXT_CHUNK para mantener orden.


In [None]:
# Crear patr√≥n SEQUENTIAL_CHUNKS
chunk_node = NodeDefinition(
    label="Chunk",
    required_properties={
        "chunk_id": str,
        "page_content": str,
        "embeddings": list,
        "embeddings_dimensions": int
    },
    optional_properties={
        "chunk_id_consecutive": int
    },
    indexes=["chunk_id"]
)

next_chunk_rel = RelationshipDefinition(
    from_node="Chunk",
    to_node="Chunk",
    relationship_type="NEXT_CHUNK",
    direction="OUTGOING"
)

SEQUENTIAL_CHUNKS_PATTERN = GraphPattern(
    name="SEQUENTIAL_CHUNKS",
    description="Chunks con relaciones NEXT_CHUNK entre consecutivos.",
    node_definitions=[chunk_node],
    relationship_definitions=[next_chunk_rel],
    search_patterns=["basic", "hybrid"]
)

print("‚úÖ Patr√≥n SEQUENTIAL_CHUNKS creado:")
print(f"   Relaciones: {[r.relationship_type for r in SEQUENTIAL_CHUNKS_PATTERN.relationship_definitions]}")


### 2.3 Patr√≥n L√©xico: Chunks y Entidades

Patr√≥n que incluye entidades extra√≠das y sus relaciones con chunks.


In [None]:
# Crear patr√≥n LEXICAL_GRAPH con entidades
entity_node = NodeDefinition(
    label="Entity",
    required_properties={
        "id": str,
        "name": str,
        "type": str
    },
    optional_properties={
        "mentions": list
    },
    indexes=["id", "name", "type"]
)

chunk_node_lexical = NodeDefinition(
    label="Chunk",
    required_properties={
        "chunk_id": str,
        "page_content": str,
        "embeddings": list,
        "embeddings_dimensions": int
    },
    indexes=["chunk_id"]
)

# Relaci√≥n: Chunk menciona Entity
mentions_rel = RelationshipDefinition(
    from_node="Chunk",
    to_node="Entity",
    relationship_type="MENTIONS",
    direction="OUTGOING"
)

# Relaci√≥n: Entity relacionada con otra Entity
related_rel = RelationshipDefinition(
    from_node="Entity",
    to_node="Entity",
    relationship_type="RELATED_TO",
    direction="OUTGOING"
)

LEXICAL_GRAPH_PATTERN = GraphPattern(
    name="LEXICAL_GRAPH",
    description="Grafo l√©xico con entidades y chunks. √ötil para extracci√≥n de conocimiento.",
    node_definitions=[chunk_node_lexical, entity_node],
    relationship_definitions=[mentions_rel, related_rel],
    search_patterns=["basic", "hybrid", "graph_enhanced_vector"]
)

print("‚úÖ Patr√≥n LEXICAL_GRAPH creado:")
print(f"   Nodos: {[n.label for n in LEXICAL_GRAPH_PATTERN.node_definitions]}")
print(f"   Relaciones: {[r.relationship_type for r in LEXICAL_GRAPH_PATTERN.relationship_definitions]}")


### 2.4 Patr√≥n Jer√°rquico: Document ‚Üí Section ‚Üí Paragraph

Estructura jer√°rquica para documentos con secciones bien definidas.


In [None]:
# Crear patr√≥n jer√°rquico DOCUMENT_SECTION_PARAGRAPH
document_node = NodeDefinition(
    label="Document",
    required_properties={"doc_id": str, "title": str},
    indexes=["doc_id"]
)

section_node = NodeDefinition(
    label="Section",
    required_properties={"section_id": str, "title": str},
    indexes=["section_id"]
)

paragraph_node = NodeDefinition(
    label="Paragraph",
    required_properties={"para_id": str, "content": str},
    indexes=["para_id"]
)

# Relaciones jer√°rquicas
has_section = RelationshipDefinition(
    from_node="Document",
    to_node="Section",
    relationship_type="HAS_SECTION",
    direction="OUTGOING"
)

has_paragraph = RelationshipDefinition(
    from_node="Section",
    to_node="Paragraph",
    relationship_type="HAS_PARAGRAPH",
    direction="OUTGOING"
)

next_paragraph = RelationshipDefinition(
    from_node="Paragraph",
    to_node="Paragraph",
    relationship_type="NEXT_PARAGRAPH",
    direction="OUTGOING"
)

DOCUMENT_SECTION_PARAGRAPH_PATTERN = GraphPattern(
    name="DOCUMENT_SECTION_PARAGRAPH",
    description="Estructura jer√°rquica: Document ‚Üí Section ‚Üí Paragraph",
    node_definitions=[document_node, section_node, paragraph_node],
    relationship_definitions=[has_section, has_paragraph, next_paragraph],
    search_patterns=["basic", "parent_child"]
)

print("‚úÖ Patr√≥n DOCUMENT_SECTION_PARAGRAPH creado:")
print(f"   Estructura: Document ‚Üí Section ‚Üí Paragraph")


## Parte 3: Validar Patrones

Antes de usar un patr√≥n, debemos validarlo para asegurar que es correcto.


In [None]:
# Crear servicio de patrones
pattern_service = Neo4jPatternService()

# Validar todos los patrones creados
patterns_to_validate = [
    ("FILE_PAGE_CHUNK", FILE_PAGE_CHUNK_PATTERN),
    ("SIMPLE_CHUNK", SIMPLE_CHUNK_PATTERN),
    ("SEQUENTIAL_CHUNKS", SEQUENTIAL_CHUNKS_PATTERN),
    ("LEXICAL_GRAPH", LEXICAL_GRAPH_PATTERN),
    ("DOCUMENT_SECTION_PARAGRAPH", DOCUMENT_SECTION_PARAGRAPH_PATTERN)
]

print("üîç Validando patrones:\n")
for name, pattern in patterns_to_validate:
    is_valid = pattern_service.validate_pattern(pattern)
    status = "‚úÖ V√ÅLIDO" if is_valid else "‚ùå INV√ÅLIDO"
    print(f"{status}: {name}")


## Parte 4: Generar Queries Cypher

Podemos ver los queries Cypher que se generan autom√°ticamente para cada patr√≥n.


In [None]:
# Generar query Cypher para FILE_PAGE_CHUNK
cypher_query = pattern_service.generate_cypher(FILE_PAGE_CHUNK_PATTERN, "create")

print("üìù Query Cypher generado para FILE_PAGE_CHUNK:")
print("=" * 80)
print(cypher_query[:500] + "..." if len(cypher_query) > 500 else cypher_query)
print("=" * 80)


## Parte 5: Usar Patrones en Ingesta

Aplicamos patrones personalizados al ingerir documentos.


In [None]:
# Ingerir documento con patr√≥n personalizado
data_path = find_in_project("data", "folder", None)
markdown_file = data_path / "110225.md"

if markdown_file.exists():
    print(f"üì• Ingiriendo con patr√≥n SIMPLE_CHUNK: {markdown_file.name}")
    
    chunks = ungraph.ingest_document(
        markdown_file,
        chunk_size=1000,
        chunk_overlap=200,
        pattern=SIMPLE_CHUNK_PATTERN
    )
    
    print(f"‚úÖ Documento ingerido con patr√≥n personalizado!")
    print(f"   Chunks creados: {len(chunks)}")
else:
    print(f"‚ö†Ô∏è  Archivo no encontrado: {markdown_file}")


## Resumen y Comparaci√≥n de Patrones

### Patrones Disponibles

| Patr√≥n | Estructura | Uso Recomendado |
|--------|-----------|-----------------|
| **FILE_PAGE_CHUNK** | File ‚Üí Page ‚Üí Chunk | Default, documentos con p√°ginas |
| **SIMPLE_CHUNK** | Solo Chunk | Documentos simples, sin estructura |
| **SEQUENTIAL_CHUNKS** | Chunk ‚Üí Chunk (NEXT) | Mantener orden secuencial |
| **LEXICAL_GRAPH** | Chunk ‚Üí Entity | Extracci√≥n de conocimiento |
| **DOCUMENT_SECTION_PARAGRAPH** | Document ‚Üí Section ‚Üí Paragraph | Documentos con secciones |

### Reglas de Validaci√≥n

- **Labels de nodos**: Deben empezar con may√∫scula (ej: `Chunk`, `File`)
- **Tipos de relaci√≥n**: Solo may√∫sculas y underscores (ej: `NEXT_CHUNK`, `HAS_CHUNK`)
- **Propiedades**: Nombres v√°lidos de Python, tipos b√°sicos (str, int, list, etc.)

### Mejores Pr√°cticas

1. **Empezar con FILE_PAGE_CHUNK**: Es el patr√≥n m√°s completo y probado
2. **Crear patrones personalizados**: Solo cuando necesites estructuras espec√≠ficas
3. **Validar siempre**: Usa `validate_pattern()` antes de usar un patr√≥n
4. **Revisar queries**: Genera y revisa los queries Cypher para entender qu√© se crea

### Siguiente Paso

Una vez que has construido tu grafo con patrones, contin√∫a con:
- **2.2 Smart Chunking Strategies** - Optimizar c√≥mo se dividen los documentos
- **3.1 Entity Extraction & Facts** - Extraer conocimiento del grafo

## Referencias

- [Patrones de Grafo](../../docs/concepts/graph-patterns.md)
- [Patrones Personalizados](../../docs/guides/custom-patterns.md)
- [GraphPattern API](../../src/domain/value_objects/graph_pattern.py)
