# Notebook 1.1: Default & Simple Pattern

Este notebook prueba el patr√≥n **FILE_PAGE_CHUNK** (default) y el patr√≥n **SIMPLE_CHUNK** con datos reales.

## Objetivos

1. **Limpiar el grafo** antes de empezar
2. **Ingerir 3 documentos** usando cada patr√≥n
3. **Explorar el grafo** creado
4. **Probar b√∫squedas** con los patrones GraphRAG implementados

## Patrones a Probar

- ‚úÖ **FILE_PAGE_CHUNK**: Patr√≥n default con estructura File ‚Üí Page ‚Üí Chunk
- ‚úÖ **SIMPLE_CHUNK**: Solo chunks sin estructura jer√°rquica


In [11]:
def add_src_to_path(path_folder: str):
    ''' 
    Helper function for adding the "path_folder" directory to the path.
    in order to work on notebooks and scripts
    '''
    import sys
    from pathlib import Path

    base_path = Path().resolve()
    for parent in [base_path] + list(base_path.parents):
        candidate = parent / path_folder
        if candidate.exists():
            parent_dir = candidate.parent
            if str(parent_dir) not in sys.path:
                sys.path.insert(0, str(parent_dir))
                print(f"Path Folder parent added: {parent_dir}")
            if str(candidate) not in sys.path:
                sys.path.append(str(candidate))
                print(f"Path Folder {path_folder} added: {candidate}")
            return
    print(f"Not found '{path_folder}' folder on the hierarchy of directories")

# Agregar carpetas necesarias al path
add_src_to_path(path_folder="src")
add_src_to_path(path_folder="src/utils")
add_src_to_path(path_folder="src/data")

In [12]:
# Importar librer√≠as necesarias
import sys
from pathlib import Path
from typing import List, Dict, Any

# Importar handlers
from src.utils.handlers import find_in_project

# Importar ungraph
try:
    import ungraph
    print("‚úÖ Ungraph importado como paquete instalado")
except ImportError:
    import src
    ungraph = src
    print("‚úÖ Ungraph importado desde src/ (modo desarrollo)")

# Importar servicios para limpieza
from infrastructure.services.neo4j_index_service import Neo4jIndexService

# Importar patrones
from domain.value_objects.predefined_patterns import FILE_PAGE_CHUNK_PATTERN
from domain.value_objects.graph_pattern import GraphPattern, NodeDefinition

print(f"üì¶ Ungraph version: {ungraph.__version__}")

‚úÖ Ungraph importado desde src/ (modo desarrollo)
üì¶ Ungraph version: 0.1.0


## Parte 1: Configuraci√≥n y Limpieza

Configuramos Neo4j y limpiamos el grafo antes de empezar.

In [13]:
# Configurar Neo4j
ungraph.configure(
    neo4j_uri="bolt://localhost:7687",
    neo4j_user="neo4j",
    neo4j_password="Ungraph22",  # ‚ö†Ô∏è CAMBIAR: Usa tu contrase√±a real
    neo4j_database="neo4j",
    embedding_model="sentence-transformers/all-MiniLM-L6-v2"
)

print("‚úÖ Configuraci√≥n completada")

‚úÖ Configuraci√≥n completada


In [14]:
# Limpiar el grafo antes de empezar
print("üßπ Limpiando grafo...")
print("=" * 80)

index_service = Neo4jIndexService()

# Limpiar todos los nodos y relaciones
try:
    index_service.clean_graph()
    print("‚úÖ Grafo limpiado (todos los nodos y relaciones eliminados)")
except Exception as e:
    print(f"‚ö†Ô∏è  Error al limpiar grafo: {e}")

# Eliminar todos los √≠ndices
try:
    index_service.drop_all_indexes()
    print("‚úÖ √çndices eliminados")
except Exception as e:
    print(f"‚ö†Ô∏è  Error al eliminar √≠ndices: {e}")

print("\n‚úÖ Limpieza completada. Listo para ingesta.")

üßπ Limpiando grafo...


Error cleaning graph: The result is out of scope. The associated transaction has been closed. Results can only be used while the transaction is open.


‚ö†Ô∏è  Error al limpiar grafo: The result is out of scope. The associated transaction has been closed. Results can only be used while the transaction is open.
‚úÖ √çndices eliminados

‚úÖ Limpieza completada. Listo para ingesta.


## Parte 2: Preparar Documentos

Localizamos los 3 documentos de prueba.

In [15]:
# Encontrar carpeta de datos
data_path = find_in_project(
    target="data",
    search_type="folder",
    project_root=None
)

if data_path:
    print(f"‚úÖ Carpeta de datos encontrada: {data_path}")
    
    # Seleccionar los 3 documentos de prueba
    test_files = [
        data_path / "110225.md",
        data_path / "AnnyLetter.txt",
        data_path / "Usar s√≠mboles de silencio de corchea.docx"
    ]
    
    # Verificar que existen
    available_files = [f for f in test_files if f.exists()]
    print(f"\nüìÑ Archivos disponibles ({len(available_files)}/{len(test_files)}):")
    for f in available_files:
        print(f"   ‚úÖ {f.name}")
    
    for f in test_files:
        if not f.exists():
            print(f"   ‚ö†Ô∏è  No encontrado: {f.name}")
else:
    print("‚ùå Carpeta de datos no encontrada")
    available_files = []

‚úÖ Carpeta de datos encontrada: D:\projects\Ungraph\src\data

üìÑ Archivos disponibles (3/3):
   ‚úÖ 110225.md
   ‚úÖ AnnyLetter.txt
   ‚úÖ Usar s√≠mboles de silencio de corchea.docx


## Parte 3: Ingesta con Patr√≥n FILE_PAGE_CHUNK (Default)

Ingerimos los documentos usando el patr√≥n default.

In [16]:
# Ingesta con patr√≥n FILE_PAGE_CHUNK (default)
print("üì• INGESTA CON PATR√ìN FILE_PAGE_CHUNK (DEFAULT)")
print("=" * 80)

all_chunks_default = []

for file_path in available_files:
    print(f"\nüìÑ Procesando: {file_path.name}")
    try:
        chunks = ungraph.ingest_document(
            file_path,
            pattern=FILE_PAGE_CHUNK_PATTERN,  # Patr√≥n default
            chunk_size=1000,
            chunk_overlap=200,
            clean_text=True
        )
        all_chunks_default.extend(chunks)
        print(f"   ‚úÖ {len(chunks)} chunks creados")
    except Exception as e:
        print(f"   ‚ùå Error: {e}")

print(f"\n‚úÖ Total chunks con patr√≥n FILE_PAGE_CHUNK: {len(all_chunks_default)}")

üì• INGESTA CON PATR√ìN FILE_PAGE_CHUNK (DEFAULT)

üìÑ Procesando: 110225.md




Chunk relationships created successfully
   ‚úÖ 9 chunks creados

üìÑ Procesando: AnnyLetter.txt
Chunk relationships created successfully
   ‚úÖ 23 chunks creados

üìÑ Procesando: Usar s√≠mboles de silencio de corchea.docx
Chunk relationships created successfully
   ‚úÖ 27 chunks creados

‚úÖ Total chunks con patr√≥n FILE_PAGE_CHUNK: 59


## Parte 4: Explorar Grafo FILE_PAGE_CHUNK

Exploramos la estructura del grafo creado.

In [17]:
# Explorar estructura del grafo
from src.utils.graph_operations import graph_session

driver = graph_session()
with driver.session() as session:
    # Contar nodos por tipo
    result = session.run("""
        MATCH (n)
        RETURN labels(n)[0] as label, count(n) as count
        ORDER BY count DESC
    """)
    
    print("üìä ESTRUCTURA DEL GRAFO (FILE_PAGE_CHUNK):")
    print("=" * 80)
    for record in result:
        print(f"   {record['label']}: {record['count']} nodos")
    
    # Contar relaciones
    result = session.run("""
        MATCH ()-[r]->()
        RETURN type(r) as rel_type, count(r) as count
        ORDER BY count DESC
    """)
    
    print("\nüîó RELACIONES:")
    for record in result:
        print(f"   {record['rel_type']}: {record['count']} relaciones")

driver.close()

üìä ESTRUCTURA DEL GRAFO (FILE_PAGE_CHUNK):
   Chunk: 59 nodos
   File: 3 nodos
   Page: 3 nodos

üîó RELACIONES:
   NEXT_CHUNK: 135 relaciones
   HAS_CHUNK: 59 relaciones
   CONTAINS: 3 relaciones


## Parte 4.1: Visualizar Grafo FILE_PAGE_CHUNK

Visualizamos el grafo usando yFiles for Jupyter.

In [18]:
# Importar funciones de visualizaci√≥n
from src.notebooks.graph_visualization import visualize_file_page_chunk_pattern
from src.utils.graph_operations import graph_session

print("üé® VISUALIZANDO PATR√ìN FILE_PAGE_CHUNK")
print("=" * 80)

driver = graph_session()
try:
    # Obtener el primer filename disponible
    with driver.session() as session:
        result = session.run("MATCH (f:File) RETURN f.filename as filename LIMIT 1")
        record = result.single()
        if record:
            filename = record["filename"]
            print(f"Visualizando estructura de: {filename}")
            visualize_file_page_chunk_pattern(driver, limit=15, filename=filename)
        else:
            print("Visualizando estructura general (sin filtro)")
            visualize_file_page_chunk_pattern(driver, limit=15)
except Exception as e:
    print(f"‚ö†Ô∏è  Error al visualizar: {e}")
    print("üí° Aseg√∫rate de tener yfiles_jupyter_graphs_for_neo4j instalado")
    print("   Instalar con: pip install yfiles-jupyter-graphs-for-neo4j")
finally:
    driver.close()

üé® VISUALIZANDO PATR√ìN FILE_PAGE_CHUNK
Visualizando estructura de: Usar s√≠mboles de silencio de corchea.docx
‚ö†Ô∏è  Error al visualizar: Neo4jGraphWidget.__init__() got an unexpected keyword argument 'parameters'
üí° Aseg√∫rate de tener yfiles_jupyter_graphs_for_neo4j instalado
   Instalar con: pip install yfiles-jupyter-graphs-for-neo4j


## Parte 5: Probar B√∫squedas con FILE_PAGE_CHUNK

Probamos los patrones de b√∫squeda GraphRAG implementados.

In [19]:
# Probar b√∫squedas
test_query = "test"
print(f"üîç PROBANDO B√öSQUEDAS CON QUERY: '{test_query}'")
print("=" * 80)

# 1. Basic Retriever
print("\n1. Basic Retriever:")
try:
    results = ungraph.search_with_pattern(
        test_query,
        pattern_type="basic",
        limit=3
    )
    print(f"   ‚úÖ {len(results)} resultados")
    if results:
        print(f"   Score promedio: {sum(r.score for r in results) / len(results):.3f}")
except Exception as e:
    print(f"   ‚ùå Error: {e}")

# 2. Metadata Filtering
print("\n2. Metadata Filtering:")
try:
    # Obtener un filename del grafo
    driver = graph_session()
    with driver.session() as session:
        result = session.run("MATCH (f:File) RETURN f.filename as filename LIMIT 1")
        record = result.single()
        if record:
            filename = record["filename"]
            results = ungraph.search_with_pattern(
                test_query,
                pattern_type="metadata_filtering",
                metadata_filters={"filename": filename},
                limit=3
            )
            print(f"   ‚úÖ {len(results)} resultados (filtrado por '{filename}')")
        else:
            print("   ‚ö†Ô∏è  No hay archivos en el grafo")
    driver.close()
except Exception as e:
    print(f"   ‚ùå Error: {e}")

# 3. Parent-Child Retriever
print("\n3. Parent-Child Retriever:")
try:
    results = ungraph.search_with_pattern(
        test_query,
        pattern_type="parent_child",
        parent_label="Page",
        child_label="Chunk",
        relationship_type="HAS_CHUNK",
        limit=3
    )
    print(f"   ‚úÖ {len(results)} resultados")
except Exception as e:
    print(f"   ‚ùå Error: {e}")

üîç PROBANDO B√öSQUEDAS CON QUERY: 'test'

1. Basic Retriever:
   ‚úÖ 0 resultados

2. Metadata Filtering:
   ‚úÖ 0 resultados (filtrado por 'Usar s√≠mboles de silencio de corchea.docx')

3. Parent-Child Retriever:


Error in search_with_pattern (parent_child): {code: Neo.ClientError.Statement.SyntaxError} {message: In a WITH/RETURN with DISTINCT or an aggregation, it is not possible to access variables declared before the WITH/RETURN: parent_score (line 16, column 18 (offset: 574))
"        ORDER BY parent_score DESC"
                  ^}
neo4j.exceptions.GqlError: {gql_status: 42N44} {gql_status_description: error: syntax error or access rule violation - inaccessible variable. It is not possible to access the variable `parent_score` declared before the RETURN clause when using `DISTINCT` or an aggregation.} {message: 42N44: It is not possible to access the variable `parent_score` declared before the RETURN clause when using `DISTINCT` or an aggregation.} {diagnostic_record: {'_classification': 'CLIENT_ERROR', '_position': {'offset': 574, 'column': 18, 'line': 16}, 'OPERATION': '', 'OPERATION_CODE': '0', 'CURRENT_SCHEMA': '/'}} {raw_classification: CLIENT_ERROR}

The above exception was the direct

   ‚ùå Error: {code: Neo.ClientError.Statement.SyntaxError} {message: In a WITH/RETURN with DISTINCT or an aggregation, it is not possible to access variables declared before the WITH/RETURN: parent_score (line 16, column 18 (offset: 574))
"        ORDER BY parent_score DESC"
                  ^}


## Parte 6: Limpiar y Probar Patr√≥n SIMPLE_CHUNK

Ahora probamos el patr√≥n SIMPLE_CHUNK (solo chunks).

In [20]:
# Crear patr√≥n SIMPLE_CHUNK
print("üìù CREANDO PATR√ìN SIMPLE_CHUNK")
print("=" * 80)

simple_chunk_node = NodeDefinition(
    label="Chunk",
    required_properties={
        "chunk_id": str,
        "content": str,
        "embeddings": list,
        "embeddings_dimensions": int
    },
    optional_properties={
        "chunk_id_consecutive": int,
        "source_file": str
    },
    indexes=["chunk_id", "chunk_id_consecutive"]
)

SIMPLE_CHUNK_PATTERN = GraphPattern(
    name="SIMPLE_CHUNK",
    description="Solo chunks, sin estructura File-Page. √ötil para documentos simples.",
    node_definitions=[simple_chunk_node],
    relationship_definitions=[],
    search_patterns=["basic", "hybrid"]
)

print(f"‚úÖ Patr√≥n creado: {SIMPLE_CHUNK_PATTERN.name}")
print(f"   Nodos: {[n.label for n in SIMPLE_CHUNK_PATTERN.node_definitions]}")
print(f"   Relaciones: {len(SIMPLE_CHUNK_PATTERN.relationship_definitions)}")

üìù CREANDO PATR√ìN SIMPLE_CHUNK
‚úÖ Patr√≥n creado: SIMPLE_CHUNK
   Nodos: ['Chunk']
   Relaciones: 0


In [21]:
# Limpiar grafo antes de probar SIMPLE_CHUNK
print("üßπ Limpiando grafo para probar SIMPLE_CHUNK...")
index_service.clean_graph()
index_service.drop_all_indexes()
print("‚úÖ Grafo limpiado")

Error cleaning graph: The result is out of scope. The associated transaction has been closed. Results can only be used while the transaction is open.


üßπ Limpiando grafo para probar SIMPLE_CHUNK...


ResultConsumedError: The result is out of scope. The associated transaction has been closed. Results can only be used while the transaction is open.

In [None]:
# Ingesta con patr√≥n SIMPLE_CHUNK
print("üì• INGESTA CON PATR√ìN SIMPLE_CHUNK")
print("=" * 80)

all_chunks_simple = []

for file_path in available_files:
    print(f"\nüìÑ Procesando: {file_path.name}")
    try:
        chunks = ungraph.ingest_document(
            file_path,
            pattern=SIMPLE_CHUNK_PATTERN,
            chunk_size=1000,
            chunk_overlap=200,
            clean_text=True
        )
        all_chunks_simple.extend(chunks)
        print(f"   ‚úÖ {len(chunks)} chunks creados")
    except Exception as e:
        print(f"   ‚ùå Error: {e}")

print(f"\n‚úÖ Total chunks con patr√≥n SIMPLE_CHUNK: {len(all_chunks_simple)}")

In [None]:
# Explorar estructura del grafo SIMPLE_CHUNK
driver = graph_session()
with driver.session() as session:
    result = session.run("""
        MATCH (n)
        RETURN labels(n)[0] as label, count(n) as count
        ORDER BY count DESC
    """)
    
    print("üìä ESTRUCTURA DEL GRAFO (SIMPLE_CHUNK):")
    print("=" * 80)
    for record in result:
        print(f"   {record['label']}: {record['count']} nodos")

driver.close()

## Parte 6.1: Visualizar Grafo SIMPLE_CHUNK

Visualizamos el grafo usando yFiles for Jupyter.

In [None]:
# Importar funciones de visualizaci√≥n
from src.notebooks.graph_visualization import visualize_simple_chunk_pattern
from src.utils.graph_operations import graph_session

print("üé® VISUALIZANDO PATR√ìN SIMPLE_CHUNK")
print("=" * 80)

driver = graph_session()
try:
    visualize_simple_chunk_pattern(driver, limit=25)
except Exception as e:
    print(f"‚ö†Ô∏è  Error al visualizar: {e}")
    print("üí° Aseg√∫rate de tener yfiles_jupyter_graphs_for_neo4j instalado")
    print("   Instalar con: pip install yfiles-jupyter-graphs-for-neo4j")
finally:
    driver.close()

## Parte 7: Resumen Comparativo

Comparamos ambos patrones.

In [None]:
print("üìä RESUMEN COMPARATIVO")
print("=" * 80)

comparison = {
    "FILE_PAGE_CHUNK": {
        "chunks": len(all_chunks_default),
        "estructura": "File ‚Üí Page ‚Üí Chunk",
        "relaciones": "CONTAINS, HAS_CHUNK, NEXT_CHUNK",
        "uso": "Documentos con estructura jer√°rquica"
    },
    "SIMPLE_CHUNK": {
        "chunks": len(all_chunks_simple),
        "estructura": "Solo Chunk",
        "relaciones": "Ninguna",
        "uso": "Documentos simples sin jerarqu√≠a"
    }
}

for pattern_name, info in comparison.items():
    print(f"\n{pattern_name}:")
    print(f"   Chunks creados: {info['chunks']}")
    print(f"   Estructura: {info['estructura']}")
    print(f"   Relaciones: {info['relaciones']}")
    print(f"   Uso recomendado: {info['uso']}")

print("\n‚úÖ Notebook completado exitosamente")