---
# 7. GraphRAG

---
## 7.1. Introducción

### De los RAG a los GraphRAG

La generación aumentada por recuperación (RAG) es un enfoque establecido para responder preguntas de usuarios sobre colecciones privadas de documentos. Sin embargo, los RAG's están diseñados para situaciones donde las respuestas se encuentran localmente en regiones de texto, es decir, cuando la información necesaria para responder una pregunta se puede encontrar en fragmentos específicos y delimitados del texto, en lugar de estar dispersa o requerir la síntesis de múltiples partes del documento. De este modo, una vez que se han identificado y recuperado estos fragmentos relevantes de texto, éstos contienen suficiente información para que el modelo pueda generar una respuesta adecuada. Este enfoque tiene un problema muy evidente: mostrará un desempeño muy pobre en preguntas generales tales como *"¿Cuáles son los principales temas de este documento?"*

En los últimos años, el campo del procesamiento del lenguaje natural (NLP) ha experimentado una transformación significativa, especialmente en lo que respecta a las tareas de resumen de texto. Tradicionalmente, se hacían distinciones claras entre diferentes tipos de resúmenes:

- abstractivos vs. extractivos
- genéricos vs. enfocados en consultas
- de documento único vs. de múltiples documentos

Sin embargo, estas categorías han perdido relevancia en la actualidad. La razón principal es el avance tecnológico en los modelos de lenguaje. Inicialmente, la arquitectura *transformer* mostró mejoras sustanciales en todas estas tareas de resumen. Pero el verdadero salto cualitativo vino con los LLMs, tales como GPT, Llama o Gemini.

#### Avances en Modelos de Lenguaje

Estos LLMs han simplificado enormemente las tareas de resumen. Para ello, utilizan una técnica llamada "aprendizaje en contexto". Esto significa que pueden adaptar su comprensión y generación de texto basándose en el contexto proporcionado, sin necesidad de entrenamiento adicional. En la práctica, esto implica que estos modelos pueden realizar cualquier tipo de resumen independientemente de si es abstractivo o extractivo, genérico o específico, de un solo documento o de varios. Todo lo que necesitan es que el contenido a resumir esté dentro de su "ventana de contexto", es decir, la cantidad de texto que pueden procesar de una vez.

#### Problemas en Resúmenes de Grandes Corpus

A pesar de estos progresos, aún existen problemas importantes a tratar. Uno de los más notables corresponde al resumen abstractivo enfocado en consultas (query-focused abstractive summarization) cuando se trata de trabajar con corpus de texto muy extensos. Podríamos decir que hay tres razones fundamentales al tratar con este tipo de resúmenes:

1. **Limitaciones de contexto en LLMs:** Los LLMs tienen una capacidad limitada para procesar texto, conocida como "ventana de contexto". Cuando se trata de resumir grandes volúmenes de texto, como corpus enteros de documentos, esta limitación se vuelve evidente.

2. **El problema de "perderse en el medio":** Incluso si se ampliara la ventana de contexto de los LLMs, surge un nuevo problema: la información puede "perderse en el medio" de contextos más largos. Esto significa que el modelo puede tener dificultades para mantener la coherencia y relevancia a lo largo de textos muy extensos.

3. **Insuficiencia del RAG tradicional:** El RAG en su forma básica, que implica recuperar fragmentos de texto directamente, no es adecuada para tareas de resumen enfocado en consultas (QFS) a gran escala. Esto se debe a que la información relevante puede estar dispersa en todo el corpus y no concentrada en fragmentos específicos.

#### GraphRAG: Una Solución Innovadora

Para responder a estos problemas, surge GraphRAG (Generación Aumentada por Recuperación basada en Grafos), que se fundamenta en la creación de resúmenes globales a partir del grafo de conocimiento generado por un LLM.

La elección del uso de grafos de conocimiento es natural debido principalmente a dos razones:

- Su modularidad. Esta propiedad se refiere a la capacidad de los grafos para organizarse naturalmente en grupos o módulos.
- La existencia de algoritmos para la detección de comunidades, tales como Louvain o Leiden, que nos van a permitir dividir el grafo en comunidades de nodos estrechamente relacionados. Estas comunidades representan grupos de información interconectada.

#### Estrategia Map-Reduce

Una vez halladas estas comunidades, se utilizará un LLM para generar resúmenes de las descripciones de éstas. De esta manera, será posible realizar resúmenes enfocados en consultas de un corpus completo, empleamos una estrategia de *map-reduce*:

1. **Map:** Cada resumen de comunidad se utiliza para responder a la consulta de forma independiente y en paralelo.

2. **Reduce:** Luego, sintetizamos todas las respuestas parciales relevantes en una respuesta global final.

Este método permite abordar el problema de resumir grandes volúmenes de texto de manera eficiente y efectiva, superando las limitaciones de contexto de los LLMs y aprovechando la estructura natural de la información representada en forma de grafo.


### PipeLine del GraphRAG
Ahora vamos a desglosar el flujo de datos del enfoque GraphRAG descrito en [1]. 
<div style="text-align: justify;">

Ahora vamos a desglosar el flujo de datos del enfoque **GraphRAG** descrito en [@graphrag2024].

### Documentos Fuente → *Text Chunks*

Una decisión fundamental en el diseño de este sistema es determinar la granularidad con la que los textos extraídos de los documentos fuente deben dividirse en fragmentos para su procesamiento. Esta decisión afecta directamente a la eficiencia del sistema: fragmentos más largos requieren menos llamadas a los modelos de lenguaje, pero pueden perder precisión, mientras que fragmentos más cortos ofrecen mayor detalle pero son más costosos de procesar.

### *Text Chunks* → Instancias de Elementos

En esta etapa, el objetivo es identificar y extraer nodos y aristas del grafo a partir de cada fragmento de texto. Esto se logra mediante un prompt de LLM en varias partes que identifica entidades (con nombre, tipo y descripción) y sus relaciones.

Un aspecto clave es que, para mejorar la eficiencia y calidad, se implementa un sistema de “recolección” múltiple, en el que el LLM revisa si se han extraído todas las entidades posibles. Si se detectan entidades faltantes, se realiza una extracción adicional. Este enfoque permite usar fragmentos de texto más grandes sin perder calidad en la extracción.

Además, se pueden extraer covariables adicionales, como afirmaciones vinculadas a las entidades detectadas, incluyendo detalles como sujeto, objeto, tipo, descripción o fechas relevantes.

### Instancias de Elementos → Resúmenes de Elementos

En esta fase se utiliza un LLM para extraer descripciones de entidades, relaciones y afirmaciones representadas en los documentos. Este proceso ya es una forma de resumen abstractivo, ya que el LLM crea resúmenes significativos de conceptos que pueden estar implícitos pero no explícitamente mencionados en el texto.

Para convertir todos estos resúmenes a nivel de instancia en bloques únicos de texto descriptivo para cada elemento del grafo (es decir, nodos de entidad y aristas de relación), se requiere una ronda adicional de resumen por parte del LLM sobre grupos coincidentes de instancias, es decir, aquellos que se refieren al mismo elemento o concepto en el texto fuente.

### Resúmenes de Elementos → *Graph Communities*

En esta etapa, se transforman los resúmenes de elementos obtenidos en la fase anterior en una estructura de grafo. Así, el índice creado en la etapa anterior se modela como un grafo homogéneo no dirigido y ponderado. En dicho grafo:
- **Nodos:** Representan las entidades.
- **Aristas:** Representan las relaciones entre entidades, con un peso basado en la frecuencia normalizada de las relaciones detectadas.

Además, se aplican algoritmos de detección de comunidades para dividir el grafo en grupos de nodos más fuertemente conectados entre sí que con el resto del grafo. Esto permite identificar estructuras y patrones significativos en los datos. Concretamente, se usa el algoritmo **Leiden**, que permite obtener una jerarquía de comunidades, donde cada nivel proporciona una partición que cubre todos los nodos del grafo de manera mutuamente exclusiva.

Esta estructura jerárquica permite un enfoque de “divide y vencerás” para el resumen global, haciendo posible procesar y sintetizar información de grandes corpus de manera más eficiente.

### *Graph Communities* → Resúmenes de Comunidad

El siguiente paso en el proceso de GraphRAG es la creación de resúmenes estructurados para cada comunidad de la jerarquía de Leiden. Este método está diseñado para ser escalable a conjuntos de datos muy grandes y nos permite:

- Obtener una comprensión global de la estructura y semántica del conjunto de datos.
- Explorar el corpus de información sin necesidad de formular preguntas específicas.
- Facilitar la navegación temática, permitiendo explorar temas generales y profundizar en subtemas de interés.

### Generación de Resúmenes de Comunidades

El proceso de generación de resúmenes se adapta según el nivel jerárquico de la comunidad.

**Comunidades de Nivel Hoja**

Para las comunidades en el nivel más bajo de la jerarquía:
- *i.* Se priorizan los resúmenes de elementos (nodos, aristas y covariables) de la comunidad. La priorización sigue este orden:
  - Se ordenan las aristas de la comunidad según la suma de los grados de sus nodos fuente y destino.
  - Para cada arista, se añaden las descripciones del nodo fuente, nodo destino, covariables vinculadas y la propia arista.
- *ii.* Estos resúmenes se añaden iterativamente a la ventana de contexto del LLM hasta alcanzar el límite de tokens.

**Comunidades de Nivel Superior**

Para las comunidades en niveles más altos de la jerarquía:
- Si todos los resúmenes de elementos caben en la ventana de contexto, se procede como en las comunidades de nivel hoja.
- Si no caben, se sigue este proceso:
  - Se ordenan las subcomunidades de mayor a menor según la cantidad de tokens de sus resúmenes de elementos.
  - Se sustituyen iterativamente los resúmenes de elementos, que son más largos, por los resúmenes de subcomunidades, de menor extensión, hasta que todo quepa en la ventana de contexto.

### Resúmenes de Comunidad → Respuestas de Comunidad → Respuesta Global

La fase final del pipeline de GraphRAG se centra en la generación de una respuesta global a partir de los resúmenes de las comunidades. Este proceso se divide en tres etapas principales:

1. **Preparación de los Resúmenes de Comunidades**  
   Los resúmenes de las comunidades se reorganizan de manera aleatoria y se dividen en *chunks* de un tamaño predeterminado. Este enfoque:
   - Asegura que la información relevante se distribuya uniformemente entre los fragmentos.
   - Evita que información crucial se concentre en una sola ventana de contexto, lo que podría resultar en la pérdida de datos importantes.

2. **Generación de Respuestas Intermedias**  
   Para cada *chunk* de texto, se generan respuestas intermedias de forma paralela utilizando el LLM, que:
   - Crea una respuesta basada en cada *chunk*.
   - Asigna una puntuación de 0 a 100 a cada respuesta, indicando su relevancia para la pregunta del usuario.
   - Filtra y elimina las respuestas con puntuación 0, consideradas no relevantes.

3. **Síntesis de la Respuesta Global**  
   Finalmente, se construye la respuesta global mediante un proceso de reducción:
   - Las respuestas intermedias se ordenan de mayor a menor según su puntuación de relevancia.
   - Se van añadiendo a una nueva ventana de contexto hasta alcanzar el límite de tokens establecido.
   - Utilizando este contexto final, se genera la respuesta global que se presentará al usuario.

Así, este enfoque multietapa permite a GraphRAG proporcionar respuestas más completas y contextualizadas, aprovechando la estructura de conocimiento generada en las fases anteriores del pipeline. En la [Figura 1](#fig-pipeline), se muestra un resumen gráfico del proceso descrito.

![Representación Gráfica del Pipeline usado en el enfoque GraphRAG](../Figures/pipeline.png)

</div>


Para indexar los documentos se ha seguido un proceso de inicialización y configuración que permite poner en marcha el sistema GraphRAG. En primer lugar, se instaló la librería de Python **graphrag** con el comando:

```bash
pip install graphrag
```
A continuación, se preparó el entorno creando un directorio de trabajo (por ejemplo, `./ragtest/input`) y se descargó el libro *A Christmas Carol* de Charles Dickens  Luego, se inicializó el espacio de trabajo ejecutando:

```bash
graphrag init --root ./ragtest
```

Este comando crea dos archivos importantes en el directorio `./ragtest`:
- **.env**: Contiene las variables de entorno necesarias para ejecutar el pipeline, incluyendo `GRAPHRAG_API_KEY`, que debe sustituirse por tu clave de API de OpenAI o Azure OpenAI.
- **settings.yaml**: Define la configuración del pipeline, permitiendo ajustar parámetros como el modelo LLM, el tamaño de los fragmentos de texto y otros aspectos del procesamiento.

Una vez configurado el entorno, se ejecuta el pipeline de indexación con:

```bash
graphrag index --root ./ragtest
```
Este proceso indexa el documento y genera una serie de archivos Parquet en el directorio `./ragtest/output`, que contienen la información estructurada y enriquecida del texto. Con estos datos, es posible realizar consultas sobre el contenido indexado para responder preguntas específicas o extraer insights a un nivel global.

En resumen, este procedimiento transforma el libro *A Christmas Carol* en una base de conocimiento estructurada y accesible, lista para ser consultada mediante el motor de búsqueda de GraphRAG.


---
## 7.2. Importación de los documentos resultantes de la indexación

In [2]:
import os

import pandas as pd
import tiktoken
import numpy as np

from graphrag.query.context_builder.entity_extraction import EntityVectorStoreKey
from graphrag.query.indexer_adapters import (
    read_indexer_covariates,
    read_indexer_entities,
    read_indexer_relationships,
    read_indexer_reports,
    read_indexer_text_units,
)
from graphrag.query.llm.oai.chat_openai import ChatOpenAI
from graphrag.query.llm.oai.embedding import OpenAIEmbedding
from graphrag.query.llm.oai.typing import OpenaiApiType
from graphrag.query.structured_search.local_search.mixed_context import (
    LocalSearchMixedContext,
)
from graphrag.query.structured_search.local_search.search import LocalSearch
from graphrag.vector_stores.lancedb import LanceDBVectorStore


from typing import cast


<div style="text-align: justify;">

En este fragmento de código se establecen parámetros fundamentales que definirán cómo se gestionará el flujo de datos en nuestro sistema. Primero, se definen las rutas para los directorios de entrada y salida mediante las variables `INPUT_DIR` y `OUTPUT_DIR`, respectivamente. Esto nos permite centralizar y organizar los archivos de datos y resultados, lo que es clave para mantener un flujo de trabajo ordenado y reproducible.

A continuación, se configura la variable `LANCEDB_URI` a partir del directorio de salida, apuntando a la ubicación donde se almacenarán los datos transformados y el índice generado. Este URI es fundamental para la integración con LanceDB, la base de datos utilizada para gestionar y consultar eficientemente nuestros datos.

El resto de las variables definen los nombres de las tablas que se crearán durante el proceso de indexación y generación del modelo de conocimiento. Cada una de estas variables —como `COMMUNITY_REPORT_TABLE`, `ENTITY_TABLE`, `ENTITY_EMBEDDING_TABLE`, `RELATIONSHIP_TABLE`, `COVARIATE_TABLE` y `TEXT_UNIT_TABLE`— representa una colección específica de datos. Por ejemplo, la tabla de reportes de comunidades (`COMMUNITY_REPORT_TABLE`) contendrá los resúmenes finales de las comunidades detectadas, mientras que `ENTITY_TABLE` y `ENTITY_EMBEDDING_TABLE` se encargarán de almacenar, respectivamente, los nodos finales y sus representaciones vectoriales. Estas definiciones facilitan la alineación de los datos transformados con el modelo de conocimiento que estamos implementando.

Finalmente, la variable `COMMUNITY_LEVEL` se utiliza para especificar el nivel de la jerarquía de comunidades que se empleará en el proceso de resumen y análisis. Esta configuración es esencial, ya que determina la granularidad con la que se agruparán y resumirán los datos.

En resumen, este bloque inicial establece la estructura de directorios y las convenciones de nomenclatura que serán utilizadas a lo largo del pipeline, permitiéndonos cargar los outputs de indexación (por ejemplo, desde archivos parquet a dataframes) y, posteriormente, convertir dichos dataframes en colecciones de objetos de datos que se alinean con nuestro modelo de conocimiento. Esto sienta las bases para un procesamiento de datos coherente y escalable a lo largo del proyecto.

</div>

### Carga de los datos en DataFrames

In [3]:
INPUT_DIR = "input"
OUTPUT_DIR = "output"

LANCEDB_URI = f"{OUTPUT_DIR}/lancedb"

COMMUNITY_REPORT_TABLE = "create_final_community_reports"
ENTITY_TABLE = "create_final_nodes"
ENTITY_EMBEDDING_TABLE = "create_final_entities"
RELATIONSHIP_TABLE = "create_final_relationships"
COVARIATE_TABLE = "create_final_covariates"
TEXT_UNIT_TABLE = "create_final_text_units"
COMMUNITY_LEVEL = 2

#### Lectura de entidades

<div style="text-align: justify;">


Posteriormente, se lee la tabla de nodos (entidades) a partir de un archivo Parquet, utilizando la función `pd.read_parquet`. Esta tabla contiene información sobre cada nodo, como a qué comunidad pertenece y el grado del nodo, lo que es crucial para el análisis estructural del grafo.

Una vez leída la tabla, se procede a reemplazar los valores `NaN` por `None` mediante el método `replace` de pandas, utilizando la función de numpy para detectar los valores nulos. Este paso es importante para unificar el tratamiento de los valores faltantes, facilitando así su manipulación en pasos posteriores del pipeline.

A continuación, se lee otra tabla, que contiene los embeddings de las entidades, también desde un archivo Parquet. Esta tabla proporciona una representación vectorial de cada nodo, lo cual es fundamental para calcular similitudes y realizar operaciones de clustering o agrupamiento en el grafo.

Es importante destacar que, en la segunda asignación, se utiliza nuevamente el método `replace` para sustituir los valores `NaN` por `None`.

En resumen, este bloque de código carga y limpia los datos correspondientes a las entidades y sus embeddings.

Después, hacemos lo propio con las relaciones.

</div>

In [5]:
import pandas as pd
import numpy as np
# read nodes table to get community and degree data
entity_df = pd.read_parquet(f"{OUTPUT_DIR}/{ENTITY_TABLE}.parquet")
entity_df = entity_df.replace({np.nan: None})

entity_embedding_df = pd.read_parquet(f"{OUTPUT_DIR}/{ENTITY_EMBEDDING_TABLE}.parquet")
entity_embedding_df = entity_df.replace({np.nan: None})


In [6]:

#pd.set_option('display.max_rows', None)
#pd.set_option('display.max_columns', None)
#pd.set_option('display.width', None)
#pd.set_option('display.max_colwidth', None)
entity_df

Unnamed: 0,id,human_readable_id,title,community,level,degree,x,y
0,4f370d7f0d734f92b6118d88e79d886a,0,PROJECT GUTENBERG,8,0,7,-1.930967,17.19829
1,8e5be5b1e63343b8a2f0af6baba70337,1,UNITED STATES,8,0,2,19.662577,8.599833
2,73b0e0a551dc454690bad3d6756912be,2,A CHRISTMAS CAROL,0,0,8,-11.451687,2.69545
7,cf70df771d6c464e93b7ff99d2ea8142,3,CHARLES DICKENS,0,0,1,-12.506288,1.64068
12,3dfdd81803b74b3a8d3244af70df4f83,4,ARTHUR RACKHAM,0,0,1,-11.712614,2.434599
...,...,...,...,...,...,...,...,...
1681,9efdb8d321d5445e9dfea1fa29a1b173,209,SCROOGE AND MARLEY'S,38,4,2,14.64785,1.287843
1682,c4e8ca318c6046a58ed833be1aa86732,210,THE PORTLY GENTLEMAN,38,4,2,16.623621,2.794902
1683,ea84881a96d644c5884e6675ee1ff591,211,THE BOY,37,4,1,8.075386,-3.995178
1684,3ce8510deb354206b7dbcbd0f7b0d8b8,212,OFFICE,37,4,2,14.349295,1.01974


#### Lectura de relaciones

In [7]:
relationship_df = pd.read_parquet(f"{OUTPUT_DIR}/{RELATIONSHIP_TABLE}.parquet")

# Para relationship_df
relationship_df = relationship_df.replace({np.nan: None})

titles = entity_df['title'].tolist()

# Filtramos el DataFrame de relaciones
relationship_df = relationship_df[
    relationship_df['source'].isin(titles) & 
    relationship_df['target'].isin(titles)
]
relationship_df




Unnamed: 0,id,human_readable_id,source,target,description,weight,combined_degree,text_unit_ids
0,27f628f3ac9e44cfb4fba4efb6104d88,0,PROJECT GUTENBERG,A CHRISTMAS CAROL,Project Gutenberg released the eBook version o...,1.0,15,[d6583840046247f428a9f02738842a7c]
1,40c306c8012c4f499359a608dcb020e7,1,PROJECT GUTENBERG,UNITED STATES,"Project Gutenberg, a collection of individual ...",2.0,9,"[2b5ecb7fba1301d1f3d307e194a6c435, aa8d2310a20..."
2,74294f1a9ee947d8879a245e1c19d5e8,2,PROJECT GUTENBERG,PROJECT GUTENBERG LITERARY ARCHIVE FOUNDATION,The Project Gutenberg Literary Archive Foundat...,19.0,15,"[aa8d2310a206001404282ddb3fd645aa, cd4234ed6ca..."
3,e74897d5b27a48cdbc346676663a269a,3,PROJECT GUTENBERG,WWW.GUTENBERG.ORG,Project Gutenberg's works are posted and can b...,1.0,8,[aa8d2310a206001404282ddb3fd645aa]
4,348a412fd97a40aea938af6e5f34d0d9,4,PROJECT GUTENBERG,MICHAEL S. HART,Michael S. Hart was the originator of the Proj...,1.0,8,[cd4234ed6caba8f15d09a2e3ee604b2a]
...,...,...,...,...,...,...,...,...
313,e5446a8a991640f1bfb33a18489be572,313,PROJECT GUTENBERG™,INDEMNITY,Project Gutenberg™ has an Indemnity policy for...,8.0,7,[0ddc17ea5e566006c000b4013f2181a5]
314,a7b558f7b329426dade7b8cd6d471da0,314,INTERNAL REVENUE SERVICE,FOUNDATION,The Foundation has been granted tax exempt sta...,7.0,7,[cd4234ed6caba8f15d09a2e3ee604b2a]
315,a27982d1672a40fa9d27e61af39c9119,315,MISSISSIPPI,FOUNDATION,The Foundation is organized under the laws of ...,6.0,7,[cd4234ed6caba8f15d09a2e3ee604b2a]
316,a60835b9d9eb45d3a6d0fbecd1176718,316,SALT LAKE CITY,FOUNDATION,The business office of the Foundation is locat...,5.0,7,[cd4234ed6caba8f15d09a2e3ee604b2a]


In [8]:
pd.set_option('display.max_colwidth', None)

relationship_df["description"].iloc[10]

'J. B. Lippincott Company is the original publisher of A Christmas Carol'

In [9]:
## Tiene el mismo contenido que relationship_df, pero organizado de otra manera
relationships = read_indexer_relationships(relationship_df)
relationships

[Relationship(id='27f628f3ac9e44cfb4fba4efb6104d88', short_id='0', source='PROJECT GUTENBERG', target='A CHRISTMAS CAROL', weight=1.0, description='Project Gutenberg released the eBook version of A Christmas Carol', description_embedding=None, text_unit_ids=['d6583840046247f428a9f02738842a7c'], rank=15, attributes=None),
 Relationship(id='40c306c8012c4f499359a608dcb020e7', short_id='1', source='PROJECT GUTENBERG', target='UNITED STATES', weight=2.0, description='Project Gutenberg, a collection of individual works, is predominantly composed of materials that are in the public domain in the United States. This means that the works within Project Gutenberg are not protected by copyright law in the United States.', description_embedding=None, text_unit_ids=['2b5ecb7fba1301d1f3d307e194a6c435', 'aa8d2310a206001404282ddb3fd645aa'], rank=9, attributes=None),
 Relationship(id='74294f1a9ee947d8879a245e1c19d5e8', short_id='2', source='PROJECT GUTENBERG', target='PROJECT GUTENBERG LITERARY ARCHIVE

---
## 7.3. Visualización de nodos y relaciones



<div style="text-align: justify;">

`yfiles-jupyter-graphs` es una extensión de visualización de grafos que proporciona representaciones interactivas y personalizables para datos estructurados de nodos y relaciones.

En este caso, la utilizamos para ofrecer una visualización interactiva del grafo de conocimiento, pasando listas de nodos y relaciones convertidas a partir de los archivos parquet proporcionados. Los requisitos para los datos de entrada son un atributo `id` para los nodos y propiedades `start`/`end` para las relaciones, que correspondan a los identificadores de los nodos. Se pueden agregar atributos adicionales en el campo `properties` de cada diccionario de nodo o relación.

</div>

In [10]:
%pip install yfiles_jupyter_graphs --quiet
from yfiles_jupyter_graphs import GraphWidget


# converts the entities dataframe to a list of dicts for yfiles-jupyter-graphs (mirar celda de abajo)
def convert_entities_to_dicts(df):
    """Convert the entities dataframe to a list of dicts for yfiles-jupyter-graphs."""
    nodes_dict = {}
    for _, row in df.iterrows():
        # Create a dictionary for each row and collect unique nodes
        node_id = row["title"]
        if node_id not in nodes_dict:
            nodes_dict[node_id] = {
                "id": node_id,
                "properties": row.to_dict(),
            }
    return list(nodes_dict.values())


# converts the relationships dataframe to a list of dicts for yfiles-jupyter-graphs
def convert_relationships_to_dicts(df):
    """Convert the relationships dataframe to a list of dicts for yfiles-jupyter-graphs."""
    relationships = []
    for _, row in df.iterrows():
        # Create a dictionary for each row
        relationships.append({
            "start": row["source"],
            "end": row["target"],
            "properties": row.to_dict(),
        })
    return relationships


w = GraphWidget()
w.directed = True
### Los nodos del grafo son las entidades
w.nodes = convert_entities_to_dicts(entity_df)
### Los ejes del grafo son las relaciones
w.edges = convert_relationships_to_dicts(relationship_df)

Note: you may need to restart the kernel to use updated packages.



[notice] A new release of pip is available: 24.3.1 -> 25.0.1
[notice] To update, run: python.exe -m pip install --upgrade pip


In [11]:
### Función para convertir entidades a diccionarios
entity_dict = convert_entities_to_dicts(entity_df)
entity_dict

[{'id': 'PROJECT GUTENBERG',
  'properties': {'id': '4f370d7f0d734f92b6118d88e79d886a',
   'human_readable_id': 0,
   'title': 'PROJECT GUTENBERG',
   'community': 8,
   'level': 0,
   'degree': 7,
   'x': -1.9309667348861694,
   'y': 17.19828987121582}},
 {'id': 'UNITED STATES',
  'properties': {'id': '8e5be5b1e63343b8a2f0af6baba70337',
   'human_readable_id': 1,
   'title': 'UNITED STATES',
   'community': 8,
   'level': 0,
   'degree': 2,
   'x': 19.66257667541504,
   'y': 8.599832534790039}},
 {'id': 'A CHRISTMAS CAROL',
  'properties': {'id': '73b0e0a551dc454690bad3d6756912be',
   'human_readable_id': 2,
   'title': 'A CHRISTMAS CAROL',
   'community': 0,
   'level': 0,
   'degree': 8,
   'x': -11.45168685913086,
   'y': 2.6954503059387207}},
 {'id': 'CHARLES DICKENS',
  'properties': {'id': 'cf70df771d6c464e93b7ff99d2ea8142',
   'human_readable_id': 3,
   'title': 'CHARLES DICKENS',
   'community': 0,
   'level': 0,
   'degree': 1,
   'x': -12.506287574768066,
   'y': 1.640679597

In [12]:
# Hacemos lo propio con las relaciones
relationships_dict = convert_relationships_to_dicts(relationship_df)
relationships_dict

[{'start': 'PROJECT GUTENBERG',
  'end': 'A CHRISTMAS CAROL',
  'properties': {'id': '27f628f3ac9e44cfb4fba4efb6104d88',
   'human_readable_id': 0,
   'source': 'PROJECT GUTENBERG',
   'target': 'A CHRISTMAS CAROL',
   'description': 'Project Gutenberg released the eBook version of A Christmas Carol',
   'weight': 1.0,
   'combined_degree': 15,
   'text_unit_ids': array(['d6583840046247f428a9f02738842a7c'], dtype=object)}},
 {'start': 'PROJECT GUTENBERG',
  'end': 'UNITED STATES',
  'properties': {'id': '40c306c8012c4f499359a608dcb020e7',
   'human_readable_id': 1,
   'source': 'PROJECT GUTENBERG',
   'target': 'UNITED STATES',
   'description': 'Project Gutenberg, a collection of individual works, is predominantly composed of materials that are in the public domain in the United States. This means that the works within Project Gutenberg are not protected by copyright law in the United States.',
   'weight': 2.0,
   'combined_degree': 9,
   'text_unit_ids': array(['2b5ecb7fba1301d1f3d3


<div style="text-align: justify;">

En este siguiente bloque se configura la visualización basada en los datos para representar de forma interactiva el grafo de conocimiento.

Luego, se define una función `community_to_color` que asigna un color a cada comunidad. Para ello, se utiliza una lista predefinida de colores y se selecciona uno en función del valor numérico de la comunidad, garantizando que cada comunidad tenga un color distintivo. En el caso de que el nodo no pertenezca a ninguna comunidad, se asigna el color "lightgray".

A continuación, la función `edge_to_source_community` permite obtener la comunidad del nodo de origen de una arista. Esta función busca en la lista de nodos aquel cuyo título coincide con el identificador de inicio de la arista y retorna la comunidad asociada. Esto es útil para que las aristas también reflejen visualmente la pertenencia a una comunidad, basándose en el color del nodo de origen.

Con las funciones definidas, se aplican mapeos para configurar la visualización:
- `w.node_color_mapping` asigna a cada nodo un color en función de la comunidad a la que pertenece.
- `w.edge_color_mapping` hace lo propio para las aristas, basándose en la comunidad del nodo de origen.

Finalmente, se comentan líneas adicionales que podrían utilizarse para ajustar el tamaño de los nodos en función de su grado (utilizando un factor de escala) o para definir el grosor de las aristas según algún peso asociado. Estas configuraciones adicionales permiten personalizar aún más la visualización para diferentes casos de uso.

</div>

In [13]:
# show title on the node
w.node_label_mapping = "NORMAS"


# map community to a color
def community_to_color(community):
    """Map a community to a color."""
    colors = [
        "crimson",
        "darkorange",
        "indigo",
        "cornflowerblue",
        "cyan",
        "teal",
        "green",
    ]
    return (
        colors[int(community) % len(colors)] if community is not None else "lightgray"
    )


def edge_to_source_community(edge):
    """Get the community of the source node of an edge."""
    source_node = next(
        (entry for entry in w.nodes if entry["properties"]["title"] == edge["start"]),
        None,
    )
    source_node_community = source_node["properties"]["community"]
    return source_node_community if source_node_community is not None else None

## Coloreamos en función de la comunidad a la que pertenecen
w.node_color_mapping = lambda node: community_to_color(node["properties"]["community"])
w.edge_color_mapping = lambda edge: community_to_color(edge_to_source_community(edge))
# map size data to a reasonable factor
#w.node_scale_factor_mapping = lambda node: 0.5 + node["properties"]["degree"] * 1.5 / 20
# use weight for edge thickness
#w.edge_thickness_factor_mapping = "weight"

<div style="text-align: justify;">

El widget ofrece diferentes layouts automáticos que se adaptan a diversos propósitos y estilos de visualización. Entre las opciones disponibles se encuentran `Circular`, `Hierarchic`, `Organic (interactivo o estático)`, `Orthogonal`, `Radial`, `Tree` y `Geo-spatial`. Cada uno de estos diseños organiza los nodos y aristas del grafo de manera distinta, permitiendo resaltar diferentes aspectos estructurales y relacionales de los datos.

En el caso del grafo de conocimiento, este ejemplo utiliza el layout `Circular`. Este diseño distribuye los nodos uniformemente alrededor de un círculo, lo que facilita la identificación de comunidades y patrones de conexión de forma clara y simétrica. Sin embargo, layouts como `Hierarchic` o `Organic` también pueden ser opciones  adecuadas.


</div>

In [14]:
def custom_node_label_mapping(index,node):
    properties = node.get("properties", {})
    return properties.get("title", "no title")

w.node_label_mapping = custom_node_label_mapping

<div style="color: red; font-weight: bold;">
AVISO: Para poder visualizar el grafo es necesario ejecutar las celdas, ya que no se guarda una vez cerrado el notebook.
</div>

In [15]:
w.set_sidebar(start_with='Data')
display(w)

GraphWidget(layout=Layout(height='800px', width='100%'))

In [16]:
# Use the circular layout for this visualization. For larger graphs, the default organic layout is often preferrable.
w.circular_layout()
display(w)

GraphWidget(layout=Layout(height='800px', width='100%'))

---
## 7.4. Consultas sobre GraphRAG


### NO EJECUTAR A PARTIR DE AQUÍ 
A partir de este punto del notebook se incluyen celdas destinadas a visualizar el contexto de los resultados obtenidos a partir de las consultas realizadas con GraphRAG. Es importante destacar que estas celdas no deben ejecutarse directamente en este entorno, ya que requieren de un token API de OpenAI para funcionar correctamente. Por razones de seguridad, dicho token no se proporciona en este código.

Si se desea ejecutar estas celdas, se recomienda almacenar el token API en un archivo de entorno (`.env`) ubicado en la carpeta `graphrag`. De esta forma, se puede acceder al token de forma segura sin exponerlo en el código fuente.

In [31]:
entity_df['type'] = entity_df['community']
entity_embedding_df['type'] = entity_embedding_df['community']


entity_df['description'] = entity_df['title']
entity_embedding_df['description'] = entity_embedding_df['title']

In [32]:
columns = [col for col in entity_df.columns if col != 'degree'] + ['degree']

# Reorganizar las columnas en ambos DataFrames
entity_df = entity_df[columns]
entity_embedding_df = entity_embedding_df[columns]

# Verificar el nuevo orden de las columnas
print(entity_df.columns)
print(entity_embedding_df.columns)

Index(['id', 'human_readable_id', 'title', 'community', 'level', 'x', 'y',
       'type', 'description', 'degree'],
      dtype='object')
Index(['id', 'human_readable_id', 'title', 'community', 'level', 'x', 'y',
       'type', 'description', 'degree'],
      dtype='object')


En GraphRAG, el "nivel de una comunidad" se refiere a la jerarquía de las comunidades detectadas dentro de un grafo de conocimiento. Las comunidades son grupos de nodos densamente conectados que representan temas o subtemas en los datos. 

Cómo se estructuran estos niveles:

- Comunidades de Nivel 0: Representan los temas más amplios del conjunto de datos. Estas comunidades abarcan conceptos generales y proporcionan una visión general del grafo.

- Comunidades de Nivel 1: Muestran temas más detallados dentro de cada comunidad principal. Estas comunidades desglosan los temas amplios en subtemas más específicos, permitiendo un análisis más profundo.

In [34]:
from graphrag.model.entity import Entity


# La función read_indexer_entities procesa las salidas crudas de la indexación y las transforma en
# una lista de objetos Entity, alineando los datos de nodos y entidades según el modelo de conocimiento.
def read_indexer_entities(final_nodes: pd.DataFrame,
                          final_entities: pd.DataFrame,
                          community_level: int | None) -> list[Entity]:
    # Asignamos los DataFrames de nodos y entidades a variables locales.
    nodes_df = final_nodes
    entities_df = final_entities

    # Si se especifica un nivel de comunidad, filtramos los nodos para conservar solo aquellos
    # cuyo nivel es menor o igual que el nivel indicado.
    if community_level is not None:
        nodes_df = _filter_under_community_level(nodes_df, community_level)

    # Seleccionamos únicamente las columnas relevantes: 'id', 'degree' y 'community'.
    nodes_df = cast("pd.DataFrame", nodes_df[["id", "degree", "community"]])

    # Agrupamos los nodos por 'id' y 'degree', consolidando las comunidades en un conjunto
    # para eliminar duplicados y luego transformamos el conjunto en una lista de strings.
    nodes_df = nodes_df.groupby(["id", "degree"]).agg({"community": set}).reset_index()
    nodes_df["community"] = nodes_df["community"].apply(lambda x: [str(i) for i in x])
    
    # Fusionamos los datos de nodos con los de entidades utilizando el 'id' como clave y eliminamos duplicados.
    final_df = nodes_df.merge(entities_df, on="id", how="inner").drop_duplicates(subset=["id"])

    # Convertimos el DataFrame final en objetos Entity mediante la función read_entities,
    # mapeando las columnas del DataFrame a los atributos correspondientes del modelo.
    return read_entities(
        df=final_df,
        id_col="id",
        title_col="title",
        type_col="type",
        short_id_col="human_readable_id",
        description_col="description",
        community_col="community",
        rank_col="degree_x",
        name_embedding_col=None,
        description_embedding_col="description_embedding",
        text_unit_ids_col="text_unit_ids",
    )


## Esta función nos permite quedarnos con las filas cuyo nivel sea menor o igual que un determinado nivel de comunidad seleccionado
def _filter_under_community_level(
    df: pd.DataFrame, community_level: int
) -> pd.DataFrame:
    return cast(
        "pd.DataFrame",
        df[df.level <= community_level],
    )


In [35]:
community_level = 2
nodes_df = entity_df
entities_df = entity_embedding_df

if community_level is not None:
    nodes_df = _filter_under_community_level(nodes_df, community_level)

nodes_df.dtypes

id                   object
human_readable_id     int64
title                object
community             int32
level                 int64
x                    object
y                    object
type                  int32
description          object
degree                int64
dtype: object

In [36]:
#Nos quedamos solo con las columnas que nos interesan
nodes_df = cast("pd.DataFrame", nodes_df[["id", "degree", "community"]])
nodes_df

Unnamed: 0,id,degree,community
0,4f370d7f0d734f92b6118d88e79d886a,7,8
1,8e5be5b1e63343b8a2f0af6baba70337,2,8
2,73b0e0a551dc454690bad3d6756912be,8,0
7,cf70df771d6c464e93b7ff99d2ea8142,1,0
12,3dfdd81803b74b3a8d3244af70df4f83,1,0
...,...,...,...
1001,9efdb8d321d5445e9dfea1fa29a1b173,2,29
1002,c4e8ca318c6046a58ed833be1aa86732,2,29
1003,ea84881a96d644c5884e6675ee1ff591,1,29
1004,3ce8510deb354206b7dbcbd0f7b0d8b8,2,29


In [37]:
# A continuación, vamos a consolidar y estructurar la información resultante de la indexación para que se ajuste
# a nuestro modelo de conocimiento. Se fusionan los datos de nodos y entidades, se filtran según un nivel de comunidad
# específico y se agrupan para eliminar redundancias. El resultado es una lista de objetos Entity que encapsulan
# la información esencial, facilitando así el análisis y la representación del grafo de conocimiento.
nodes_df = nodes_df.groupby(["id", "degree"]).agg({"community": set}).reset_index()
nodes_df["community"] = nodes_df["community"].apply(lambda x: [str(i) for i in x])
nodes_df


Unnamed: 0,id,degree,community
0,004f7d801f964160b441c6e7060c2479,1,[-1]
1,00aea98437c042d6b606e7f8d402822e,1,"[1, 12, 29]"
2,01505a7c5e3e4b1c978016e6a2020f27,4,[6]
3,026adfbca92540759d2c7a2c8d242baa,1,"[1, 12, 29]"
4,05a8b2da0f4d469f8e3ef8b9305ec577,1,[0]
...,...,...,...
223,fc18d39cb306425f97dbc98e67477b71,5,"[8, 28]"
224,fc4e0665f6374672a2fe1951bdd3f829,3,"[18, 4]"
225,fdee948a65da4ee5b6a3e25793dfab52,1,"[1, 12, 29]"
226,fedf3c73a3274d3db77d65c5398a8508,1,"[1, 12, 29]"


In [38]:
# Este código fusiona (mediante un inner join) el DataFrame de nodos (nodes_df) con el de entidades (entities_df)
# utilizando la columna "id" como clave. Posteriormente, se eliminan los registros duplicados basados en "id",
# de modo que cada entidad aparezca solo una vez en el DataFrame final (final_df).

final_df = nodes_df.merge(entities_df, on="id", how="inner").drop_duplicates(
        subset=["id"]
    )
final_df

Unnamed: 0,id,degree_x,community_x,human_readable_id,title,community_y,level,x,y,type,description,degree_y
0,004f7d801f964160b441c6e7060c2479,1,[-1],121,DAUGHTER,-1,0,,,-1,DAUGHTER,1
1,00aea98437c042d6b606e7f8d402822e,1,"[1, 12, 29]",92,WINTER DAY,1,0,17.794003,-3.476077,1,WINTER DAY,1
6,01505a7c5e3e4b1c978016e6a2020f27,4,[6],202,DEATH,6,0,0.946703,15.462376,6,DEATH,4
7,026adfbca92540759d2c7a2c8d242baa,1,"[1, 12, 29]",137,GROCERS',1,0,8.608845,-3.767654,1,GROCERS',1
12,05a8b2da0f4d469f8e3ef8b9305ec577,1,[0],7,J. B. LIPPINCOTT COMPANY,0,0,-11.86204,2.28514,0,J. B. LIPPINCOTT COMPANY,1
...,...,...,...,...,...,...,...,...,...,...,...,...
630,fc18d39cb306425f97dbc98e67477b71,5,"[8, 28]",216,PROJECT GUTENBERG™,8,0,-17.540855,-16.454866,8,PROJECT GUTENBERG™,5
632,fc4e0665f6374672a2fe1951bdd3f829,3,"[18, 4]",197,UNDERTAKER'S MAN,4,0,1.462716,-0.495093,4,UNDERTAKER'S MAN,3
634,fdee948a65da4ee5b6a3e25793dfab52,1,"[1, 12, 29]",15,GHOST OF CHRISTMAS PAST,1,0,16.049393,5.321182,1,GHOST OF CHRISTMAS PAST,1
639,fedf3c73a3274d3db77d65c5398a8508,1,"[1, 12, 29]",67,THE CORPORATION,1,0,10.352509,-4.813623,1,THE CORPORATION,1



La función `read_entities` que definimos a continuación recibe un DataFrame y mapea cada fila a un objeto de tipo `Entity`, que forma parte de nuestro modelo de conocimiento. Para ello, extrae y convierte los datos de cada columna utilizando funciones auxiliares (como `to_str`, `to_optional_str`, etc.) que aseguran que los valores se transformen al tipo de dato esperado.

Los parámetros de la función (por ejemplo, `id_col`, `title_col`, etc.) indican el nombre de las columnas a utilizar para cada atributo del objeto. En cada iteración del bucle, se crea un objeto `Entity` con:

- **Atributos obligatorios:** como `id`, `short_id` y `title`.
- **Atributos opcionales:** como `type`, `description`, embeddings (para nombre y descripción), comunidades (`community_ids`), ids de unidades de texto y el rango (`rank`).
- **Atributos adicionales:** si se proporciona una lista de columnas adicionales a través del parámetro `attributes_cols`.

Finalmente, la función agrega cada objeto `Entity` a una lista y la retorna.

In [40]:
from graphrag.query.input.loaders.utils import (
    to_optional_dict,
    to_optional_float,
    to_optional_int,
    to_optional_list,
    to_optional_str,
    to_str
)


def read_entities(
    df: pd.DataFrame,
    id_col: str = "id",
    short_id_col: str | None = "human_readable_id",
    title_col: str = "title",
    type_col: str | None = "type",
    description_col: str | None = "description",
    name_embedding_col: str | None = "name_embedding",
    description_embedding_col: str | None = "description_embedding",
    community_col: str | None = "community_ids",
    text_unit_ids_col: str | None = "text_unit_ids",
    rank_col: str | None = "degree",
    attributes_cols: list[str] | None = None,
) -> list[Entity]:
    """Read entities from a dataframe."""
    entities = []
    for idx, row in df.iterrows():
        entity = Entity(
            id=to_str(row, id_col),
            short_id=to_optional_str(row, short_id_col) if short_id_col else str(idx),
            title=to_str(row, title_col),
            type=to_optional_str(row, type_col),
            description=to_optional_str(row, description_col),
            name_embedding=to_optional_list(row, name_embedding_col, item_type=float),
            description_embedding=to_optional_list(
                row, description_embedding_col, item_type=float
            ),
            community_ids=to_optional_list(row, community_col, item_type=str),
            text_unit_ids=to_optional_list(row, text_unit_ids_col),
            rank=to_optional_int(row, rank_col),
            attributes=(
                {col: row.get(col) for col in attributes_cols}
                if attributes_cols
                else None
            ),
        )
        entities.append(entity)
    return entities

In [41]:
entities = read_indexer_entities(entity_df, entity_embedding_df, COMMUNITY_LEVEL)
entities

[Entity(id='004f7d801f964160b441c6e7060c2479', short_id='121', title='DAUGHTER', type='-1', description='DAUGHTER', description_embedding=None, name_embedding=None, community_ids=None, text_unit_ids=None, rank=1, attributes=None),
 Entity(id='00aea98437c042d6b606e7f8d402822e', short_id='92', title='WINTER DAY', type='1', description='WINTER DAY', description_embedding=None, name_embedding=None, community_ids=None, text_unit_ids=None, rank=1, attributes=None),
 Entity(id='01505a7c5e3e4b1c978016e6a2020f27', short_id='202', title='DEATH', type='6', description='DEATH', description_embedding=None, name_embedding=None, community_ids=None, text_unit_ids=None, rank=4, attributes=None),
 Entity(id='026adfbca92540759d2c7a2c8d242baa', short_id='137', title="GROCERS'", type='1', description="GROCERS'", description_embedding=None, name_embedding=None, community_ids=None, text_unit_ids=None, rank=1, attributes=None),
 Entity(id='05a8b2da0f4d469f8e3ef8b9305ec577', short_id='7', title='J. B. LIPPINCO

Vamos a importar los informes de comunidad. Es interesante leer como con el LLM hemos conseguido hacer un resumen de entidades abstractas que, en principio, no figuran explicitamente en el libro.

In [42]:

report_df = pd.read_parquet(f"{OUTPUT_DIR}/{COMMUNITY_REPORT_TABLE}.parquet")
report_df.head(4)

Unnamed: 0,id,human_readable_id,community,level,title,summary,full_content,rank,rank_explanation,findings,full_content_json,period,size
0,4040ff80-3c0a-4698-b513-0d0670425656,37,37,4,Scrooge's Transformation and Interactions with Ghosts,"The community revolves around the central character, Scrooge, and his interactions with various entities, most notably the Ghosts of Christmas Past, Present, and Yet to Come. Scrooge's relationships with these entities, as well as his past business partner Marley, significantly influence his character development throughout the story.","# Scrooge's Transformation and Interactions with Ghosts\n\nThe community revolves around the central character, Scrooge, and his interactions with various entities, most notably the Ghosts of Christmas Past, Present, and Yet to Come. Scrooge's relationships with these entities, as well as his past business partner Marley, significantly influence his character development throughout the story.\n\n## Scrooge's Character and Transformation\n\nScrooge is initially depicted as a cold, unfeeling, and miserly man who despises Christmas and all things which engender happiness. However, throughout the story, Scrooge undergoes a significant transformation. After being visited by various spirits, including the ghost of Jacob Marley, his former business partner, and the Ghost of Christmas Past, Scrooge changes his ways. He is seen celebrating Christmas Day, buying a turkey for Bob Cratchit, and visiting his nephew's house. He also promises to raise Bob Cratchit's salary and assist his family, becoming a better person in the process [Data: Entities (33); Relationships (144, 158, 132, 134)].\n\n## Role of Ghosts in Scrooge's Transformation\n\nThe Ghosts of Christmas Past, Present, and Yet to Come play a significant role in Scrooge's transformation. These spectral entities visit Scrooge, showing him scenes from his past, present, and potential future. The purpose of these visits is to provide Scrooge with a comprehensive view of his life, allowing him to reflect on his past actions and decisions, and to show him the potential consequences of his current lifestyle. These interactions significantly influence Scrooge's character development [Data: Entities (93, 15, 16, 17); Relationships (144, 158, 27, 28, 29)].\n\n## Scrooge's Relationship with Marley\n\nScrooge's relationship with his deceased business partner, Marley, is another significant aspect of the story. Despite being dead, Marley visits Scrooge as a ghost, warning him about his life choices and the consequences of his actions. Marley's spectral presence and his concern for Scrooge's fate indicate a deep connection between the two, possibly as past associates or friends [Data: Entities (81, 34); Relationships (132, 100)].\n\n## Scrooge's Workplace and Employee\n\nScrooge's workplace, the counting-house, and his employee, the clerk, are also important entities in the community. The clerk is threatened with dismissal by Scrooge for applauding his nephew's speech about Christmas. This interaction provides insight into Scrooge's initial character and his attitude towards Christmas [Data: Entities (212, 49); Relationships (113, 208)].\n\n## Scrooge's Interaction with Other Entities\n\nScrooge interacts with various other entities throughout the story, including the portly gentlemen, the boy in Sunday clothes, and the Poulterer's. These interactions further illustrate Scrooge's transformation, as he changes from a man who refuses to donate to charity to a man who buys a turkey for his employee's family on Christmas Day [Data: Entities (50, 205, 202); Relationships (203, 201, 202)].",7.0,The impact severity rating is high due to the profound transformation of Scrooge's character and the potential societal implications of his changed behavior.,"[{'explanation': 'Scrooge is initially depicted as a cold, unfeeling, and miserly man who despises Christmas and all things which engender happiness. However, throughout the story, Scrooge undergoes a significant transformation. After being visited by various spirits, including the ghost of Jacob Marley, his former business partner, and the Ghost of Christmas Past, Scrooge changes his ways. He is seen celebrating Christmas Day, buying a turkey for Bob Cratchit, and visiting his nephew's house. He also promises to raise Bob Cratchit's salary and assist his family, becoming a better person in the process [Data: Entities (33); Relationships (144, 158, 132, 134)].', 'summary': 'Scrooge's Character and Transformation'}, {'explanation': 'The Ghosts of Christmas Past, Present, and Yet to Come play a significant role in Scrooge's transformation. These spectral entities visit Scrooge, showing him scenes from his past, present, and potential future. The purpose of these visits is to provide Scrooge with a comprehensive view of his life, allowing him to reflect on his past actions and decisions, and to show him the potential consequences of his current lifestyle. These interactions significantly influence Scrooge's character development [Data: Entities (93, 15, 16, 17); Relationships (144, 158, 27, 28, 29)].', 'summary': 'Role of Ghosts in Scrooge's Transformation'}, {'explanation': 'Scrooge's relationship with his deceased business partner, Marley, is another significant aspect of the story. Despite being dead, Marley visits Scrooge as a ghost, warning him about his life choices and the consequences of his actions. Marley's spectral presence and his concern for Scrooge's fate indicate a deep connection between the two, possibly as past associates or friends [Data: Entities (81, 34); Relationships (132, 100)].', 'summary': 'Scrooge's Relationship with Marley'}, {'explanation': 'Scrooge's workplace, the counting-house, and his employee, the clerk, are also important entities in the community. The clerk is threatened with dismissal by Scrooge for applauding his nephew's speech about Christmas. This interaction provides insight into Scrooge's initial character and his attitude towards Christmas [Data: Entities (212, 49); Relationships (113, 208)].', 'summary': 'Scrooge's Workplace and Employee'}, {'explanation': 'Scrooge interacts with various other entities throughout the story, including the portly gentlemen, the boy in Sunday clothes, and the Poulterer's. These interactions further illustrate Scrooge's transformation, as he changes from a man who refuses to donate to charity to a man who buys a turkey for his employee's family on Christmas Day [Data: Entities (50, 205, 202); Relationships (203, 201, 202)].', 'summary': 'Scrooge's Interaction with Other Entities'}]","{\n ""title"": ""Scrooge's Transformation and Interactions with Ghosts"",\n ""summary"": ""The community revolves around the central character, Scrooge, and his interactions with various entities, most notably the Ghosts of Christmas Past, Present, and Yet to Come. Scrooge's relationships with these entities, as well as his past business partner Marley, significantly influence his character development throughout the story."",\n ""rating"": 7.0,\n ""rating_explanation"": ""The impact severity rating is high due to the profound transformation of Scrooge's character and the potential societal implications of his changed behavior."",\n ""findings"": [\n {\n ""summary"": ""Scrooge's Character and Transformation"",\n ""explanation"": ""Scrooge is initially depicted as a cold, unfeeling, and miserly man who despises Christmas and all things which engender happiness. However, throughout the story, Scrooge undergoes a significant transformation. After being visited by various spirits, including the ghost of Jacob Marley, his former business partner, and the Ghost of Christmas Past, Scrooge changes his ways. He is seen celebrating Christmas Day, buying a turkey for Bob Cratchit, and visiting his nephew's house. He also promises to raise Bob Cratchit's salary and assist his family, becoming a better person in the process [Data: Entities (33); Relationships (144, 158, 132, 134)].""\n },\n {\n ""summary"": ""Role of Ghosts in Scrooge's Transformation"",\n ""explanation"": ""The Ghosts of Christmas Past, Present, and Yet to Come play a significant role in Scrooge's transformation. These spectral entities visit Scrooge, showing him scenes from his past, present, and potential future. The purpose of these visits is to provide Scrooge with a comprehensive view of his life, allowing him to reflect on his past actions and decisions, and to show him the potential consequences of his current lifestyle. These interactions significantly influence Scrooge's character development [Data: Entities (93, 15, 16, 17); Relationships (144, 158, 27, 28, 29)].""\n },\n {\n ""summary"": ""Scrooge's Relationship with Marley"",\n ""explanation"": ""Scrooge's relationship with his deceased business partner, Marley, is another significant aspect of the story. Despite being dead, Marley visits Scrooge as a ghost, warning him about his life choices and the consequences of his actions. Marley's spectral presence and his concern for Scrooge's fate indicate a deep connection between the two, possibly as past associates or friends [Data: Entities (81, 34); Relationships (132, 100)].""\n },\n {\n ""summary"": ""Scrooge's Workplace and Employee"",\n ""explanation"": ""Scrooge's workplace, the counting-house, and his employee, the clerk, are also important entities in the community. The clerk is threatened with dismissal by Scrooge for applauding his nephew's speech about Christmas. This interaction provides insight into Scrooge's initial character and his attitude towards Christmas [Data: Entities (212, 49); Relationships (113, 208)].""\n },\n {\n ""summary"": ""Scrooge's Interaction with Other Entities"",\n ""explanation"": ""Scrooge interacts with various other entities throughout the story, including the portly gentlemen, the boy in Sunday clothes, and the Poulterer's. These interactions further illustrate Scrooge's transformation, as he changes from a man who refuses to donate to charity to a man who buys a turkey for his employee's family on Christmas Day [Data: Entities (50, 205, 202); Relationships (203, 201, 202)].""\n }\n ]\n}",2025-01-13,72.0
1,323af97c-7b5f-45a7-91ad-1a7a5bc6d494,38,38,4,"Scrooge, Scrooge and Marley's, and The Portly Gentleman","The community revolves around three key entities: Scrooge, Scrooge and Marley's counting-house, and The Portly Gentleman. Scrooge is associated with both Scrooge and Marley's counting-house and The Portly Gentleman, while The Portly Gentleman has a direct relationship with Scrooge and Marley's counting-house.","# Scrooge, Scrooge and Marley's, and The Portly Gentleman\n\nThe community revolves around three key entities: Scrooge, Scrooge and Marley's counting-house, and The Portly Gentleman. Scrooge is associated with both Scrooge and Marley's counting-house and The Portly Gentleman, while The Portly Gentleman has a direct relationship with Scrooge and Marley's counting-house.\n\n## Scrooge's central role\n\nScrooge is a central figure in this community, having direct relationships with both Scrooge and Marley's counting-house and The Portly Gentleman. His association with the counting-house suggests a professional role, while his conversation with The Portly Gentleman indicates personal interactions. The nature of these relationships could provide insights into Scrooge's character and his influence within the community. [Data: Relationships (204, 203)]\n\n## Significance of Scrooge and Marley's counting-house\n\nScrooge and Marley's counting-house is another key entity in this community. It is associated with both Scrooge and The Portly Gentleman, suggesting its importance as a location where significant interactions occur. The counting-house's role could be crucial in understanding the dynamics of this community. [Data: Entities (209), Relationships (204, 302)]\n\n## Role of The Portly Gentleman\n\nThe Portly Gentleman is a character who interacts with both Scrooge and the counting-house. His presence and interactions could provide insights into the social dynamics within the community. The nature of his conversation with Scrooge and his visit to the counting-house could be significant in understanding the community's narrative. [Data: Entities (210), Relationships (203, 302)]",3.0,The impact severity rating is low due to the limited number of entities and their interactions within the community.,"[{'explanation': 'Scrooge is a central figure in this community, having direct relationships with both Scrooge and Marley's counting-house and The Portly Gentleman. His association with the counting-house suggests a professional role, while his conversation with The Portly Gentleman indicates personal interactions. The nature of these relationships could provide insights into Scrooge's character and his influence within the community. [Data: Relationships (204, 203)]', 'summary': 'Scrooge's central role'}, {'explanation': 'Scrooge and Marley's counting-house is another key entity in this community. It is associated with both Scrooge and The Portly Gentleman, suggesting its importance as a location where significant interactions occur. The counting-house's role could be crucial in understanding the dynamics of this community. [Data: Entities (209), Relationships (204, 302)]', 'summary': 'Significance of Scrooge and Marley's counting-house'}, {'explanation': 'The Portly Gentleman is a character who interacts with both Scrooge and the counting-house. His presence and interactions could provide insights into the social dynamics within the community. The nature of his conversation with Scrooge and his visit to the counting-house could be significant in understanding the community's narrative. [Data: Entities (210), Relationships (203, 302)]', 'summary': 'Role of The Portly Gentleman'}]","{\n ""title"": ""Scrooge, Scrooge and Marley's, and The Portly Gentleman"",\n ""summary"": ""The community revolves around three key entities: Scrooge, Scrooge and Marley's counting-house, and The Portly Gentleman. Scrooge is associated with both Scrooge and Marley's counting-house and The Portly Gentleman, while The Portly Gentleman has a direct relationship with Scrooge and Marley's counting-house."",\n ""rating"": 3.0,\n ""rating_explanation"": ""The impact severity rating is low due to the limited number of entities and their interactions within the community."",\n ""findings"": [\n {\n ""summary"": ""Scrooge's central role"",\n ""explanation"": ""Scrooge is a central figure in this community, having direct relationships with both Scrooge and Marley's counting-house and The Portly Gentleman. His association with the counting-house suggests a professional role, while his conversation with The Portly Gentleman indicates personal interactions. The nature of these relationships could provide insights into Scrooge's character and his influence within the community. [Data: Relationships (204, 203)]""\n },\n {\n ""summary"": ""Significance of Scrooge and Marley's counting-house"",\n ""explanation"": ""Scrooge and Marley's counting-house is another key entity in this community. It is associated with both Scrooge and The Portly Gentleman, suggesting its importance as a location where significant interactions occur. The counting-house's role could be crucial in understanding the dynamics of this community. [Data: Entities (209), Relationships (204, 302)]""\n },\n {\n ""summary"": ""Role of The Portly Gentleman"",\n ""explanation"": ""The Portly Gentleman is a character who interacts with both Scrooge and the counting-house. His presence and interactions could provide insights into the social dynamics within the community. The nature of his conversation with Scrooge and his visit to the counting-house could be significant in understanding the community's narrative. [Data: Entities (210), Relationships (203, 302)]""\n }\n ]\n}",2025-01-13,2.0
2,62b7bd68-bfd9-4c62-a271-26a492f7010a,34,34,3,Scrooge and the Ghosts: A Christmas Carol Community,"The community revolves around the character Scrooge, a miserly businessman, and his interactions with various entities, most notably the Ghosts of Christmas Past, Present, and Yet to Come. These entities, along with others like Jacob Marley, Scrooge's deceased business partner, and the clerk, play significant roles in Scrooge's transformation throughout the story.","# Scrooge and the Ghosts: A Christmas Carol Community\n\nThe community revolves around the character Scrooge, a miserly businessman, and his interactions with various entities, most notably the Ghosts of Christmas Past, Present, and Yet to Come. These entities, along with others like Jacob Marley, Scrooge's deceased business partner, and the clerk, play significant roles in Scrooge's transformation throughout the story.\n\n## Scrooge's transformation\n\nScrooge, initially depicted as a cold, unfeeling, and miserly man, undergoes a significant transformation throughout the story. This transformation is largely facilitated by his interactions with various spirits, including the ghost of Jacob Marley, his former business partner, and the Ghosts of Christmas Past, Present, and Yet to Come. These spirits show him scenes from his past, present, and future, leading to a change of heart and a promise to assist his clerk's family [Data: Entities (33, 93, 81, 49); Relationships (144, 158, 132, 113)].\n\n## Role of the Ghosts\n\nThe Ghosts of Christmas Past, Present, and Yet to Come play a crucial role in Scrooge's transformation. They guide Scrooge through various scenes from his past, present, and potential future, allowing him to reflect on his actions and their consequences. The Ghosts' interactions with Scrooge are instrumental in his eventual change of heart and his decision to assist his clerk's family [Data: Entities (93, 15, 16, 17); Relationships (144, 27, 28, 29)].\n\n## Jacob Marley's influence\n\nJacob Marley, Scrooge's deceased business partner, also plays a significant role in Scrooge's transformation. Despite being dead, Marley visits Scrooge as a ghost, warning him about his life choices and the consequences of his actions. Marley's spectral presence and his warnings set the stage for Scrooge's encounters with the Ghosts of Christmas [Data: Entities (81, 34); Relationships (132, 100)].\n\n## Scrooge's relationship with his clerk\n\nScrooge's relationship with his clerk is another key aspect of the community. Initially, Scrooge is depicted as a harsh employer, threatening to fire his clerk for applauding his nephew's speech. However, following his transformation, Scrooge promises to assist his clerk's family, indicating a significant change in their relationship [Data: Entities (33, 49); Relationships (113)].\n\n## Scrooge's interactions with other entities\n\nScrooge's interactions with other entities, such as the Portly Gentleman and the boy in Sunday clothes, further illustrate his transformation. These interactions, which occur after his encounters with the Ghosts, demonstrate Scrooge's newfound generosity and kindness, contrasting with his initial miserly demeanor [Data: Entities (33, 210, 205); Relationships (203, 206)].",7.0,"The impact severity rating is high due to the profound transformation of Scrooge, a central character, and the potential societal implications of his changed behavior.","[{'explanation': 'Scrooge, initially depicted as a cold, unfeeling, and miserly man, undergoes a significant transformation throughout the story. This transformation is largely facilitated by his interactions with various spirits, including the ghost of Jacob Marley, his former business partner, and the Ghosts of Christmas Past, Present, and Yet to Come. These spirits show him scenes from his past, present, and future, leading to a change of heart and a promise to assist his clerk's family [Data: Entities (33, 93, 81, 49); Relationships (144, 158, 132, 113)].', 'summary': 'Scrooge's transformation'}, {'explanation': 'The Ghosts of Christmas Past, Present, and Yet to Come play a crucial role in Scrooge's transformation. They guide Scrooge through various scenes from his past, present, and potential future, allowing him to reflect on his actions and their consequences. The Ghosts' interactions with Scrooge are instrumental in his eventual change of heart and his decision to assist his clerk's family [Data: Entities (93, 15, 16, 17); Relationships (144, 27, 28, 29)].', 'summary': 'Role of the Ghosts'}, {'explanation': 'Jacob Marley, Scrooge's deceased business partner, also plays a significant role in Scrooge's transformation. Despite being dead, Marley visits Scrooge as a ghost, warning him about his life choices and the consequences of his actions. Marley's spectral presence and his warnings set the stage for Scrooge's encounters with the Ghosts of Christmas [Data: Entities (81, 34); Relationships (132, 100)].', 'summary': 'Jacob Marley's influence'}, {'explanation': 'Scrooge's relationship with his clerk is another key aspect of the community. Initially, Scrooge is depicted as a harsh employer, threatening to fire his clerk for applauding his nephew's speech. However, following his transformation, Scrooge promises to assist his clerk's family, indicating a significant change in their relationship [Data: Entities (33, 49); Relationships (113)].', 'summary': 'Scrooge's relationship with his clerk'}, {'explanation': 'Scrooge's interactions with other entities, such as the Portly Gentleman and the boy in Sunday clothes, further illustrate his transformation. These interactions, which occur after his encounters with the Ghosts, demonstrate Scrooge's newfound generosity and kindness, contrasting with his initial miserly demeanor [Data: Entities (33, 210, 205); Relationships (203, 206)].', 'summary': 'Scrooge's interactions with other entities'}]","{\n ""title"": ""Scrooge and the Ghosts: A Christmas Carol Community"",\n ""summary"": ""The community revolves around the character Scrooge, a miserly businessman, and his interactions with various entities, most notably the Ghosts of Christmas Past, Present, and Yet to Come. These entities, along with others like Jacob Marley, Scrooge's deceased business partner, and the clerk, play significant roles in Scrooge's transformation throughout the story."",\n ""rating"": 7.0,\n ""rating_explanation"": ""The impact severity rating is high due to the profound transformation of Scrooge, a central character, and the potential societal implications of his changed behavior."",\n ""findings"": [\n {\n ""summary"": ""Scrooge's transformation"",\n ""explanation"": ""Scrooge, initially depicted as a cold, unfeeling, and miserly man, undergoes a significant transformation throughout the story. This transformation is largely facilitated by his interactions with various spirits, including the ghost of Jacob Marley, his former business partner, and the Ghosts of Christmas Past, Present, and Yet to Come. These spirits show him scenes from his past, present, and future, leading to a change of heart and a promise to assist his clerk's family [Data: Entities (33, 93, 81, 49); Relationships (144, 158, 132, 113)].""\n },\n {\n ""summary"": ""Role of the Ghosts"",\n ""explanation"": ""The Ghosts of Christmas Past, Present, and Yet to Come play a crucial role in Scrooge's transformation. They guide Scrooge through various scenes from his past, present, and potential future, allowing him to reflect on his actions and their consequences. The Ghosts' interactions with Scrooge are instrumental in his eventual change of heart and his decision to assist his clerk's family [Data: Entities (93, 15, 16, 17); Relationships (144, 27, 28, 29)].""\n },\n {\n ""summary"": ""Jacob Marley's influence"",\n ""explanation"": ""Jacob Marley, Scrooge's deceased business partner, also plays a significant role in Scrooge's transformation. Despite being dead, Marley visits Scrooge as a ghost, warning him about his life choices and the consequences of his actions. Marley's spectral presence and his warnings set the stage for Scrooge's encounters with the Ghosts of Christmas [Data: Entities (81, 34); Relationships (132, 100)].""\n },\n {\n ""summary"": ""Scrooge's relationship with his clerk"",\n ""explanation"": ""Scrooge's relationship with his clerk is another key aspect of the community. Initially, Scrooge is depicted as a harsh employer, threatening to fire his clerk for applauding his nephew's speech. However, following his transformation, Scrooge promises to assist his clerk's family, indicating a significant change in their relationship [Data: Entities (33, 49); Relationships (113)].""\n },\n {\n ""summary"": ""Scrooge's interactions with other entities"",\n ""explanation"": ""Scrooge's interactions with other entities, such as the Portly Gentleman and the boy in Sunday clothes, further illustrate his transformation. These interactions, which occur after his encounters with the Ghosts, demonstrate Scrooge's newfound generosity and kindness, contrasting with his initial miserly demeanor [Data: Entities (33, 210, 205); Relationships (203, 206)].""\n }\n ]\n}",2025-01-13,74.0
3,f8c3132c-0691-4711-88e6-2130fad157d6,35,35,3,"Scrooge, The Clerk, and Christmas Eve","The community revolves around the character Scrooge, his employee The Clerk, and the event of Christmas Eve. Scrooge shows disdain for Christmas Eve, while The Clerk celebrates it. The Clerk also has a professional relationship with Scrooge, having signed the register of Marley's burial.","# Scrooge, The Clerk, and Christmas Eve\n\nThe community revolves around the character Scrooge, his employee The Clerk, and the event of Christmas Eve. Scrooge shows disdain for Christmas Eve, while The Clerk celebrates it. The Clerk also has a professional relationship with Scrooge, having signed the register of Marley's burial.\n\n## Scrooge's disdain for Christmas Eve\n\nScrooge, a central character in the story, shows a strong disdain for Christmas Eve. This attitude is a significant aspect of his character and plays a crucial role in the narrative. His disdain for the holiday contrasts sharply with other characters, such as The Clerk, who celebrate it. This contrast could be indicative of Scrooge's isolation and his strained relationships with other characters. [Data: Entities (65), Relationships (122)]\n\n## The Clerk's celebration of Christmas Eve\n\nThe Clerk, another character in the story, celebrates Christmas Eve. This celebration is marked by sliding down Cornhill and running home to Camden Town. The Clerk's celebration of the holiday contrasts with Scrooge's disdain for it, highlighting the different attitudes towards the holiday within the community. This contrast could be indicative of the different social positions and attitudes of the characters. [Data: Entities (65), Relationships (210)]\n\n## The professional relationship between Scrooge and The Clerk\n\nScrooge is the employer of The Clerk, indicating a professional relationship between the two. The Clerk also signed the register of Marley's burial, further demonstrating their professional connection. This relationship is significant as it provides context for the interactions between Scrooge and The Clerk, and it could be indicative of the power dynamics within the community. [Data: Entities (38), Relationships (103)]",3.0,The impact severity rating is low as the community is based on a fictional story with no real-world implications.,"[{'explanation': 'Scrooge, a central character in the story, shows a strong disdain for Christmas Eve. This attitude is a significant aspect of his character and plays a crucial role in the narrative. His disdain for the holiday contrasts sharply with other characters, such as The Clerk, who celebrate it. This contrast could be indicative of Scrooge's isolation and his strained relationships with other characters. [Data: Entities (65), Relationships (122)]', 'summary': 'Scrooge's disdain for Christmas Eve'}, {'explanation': 'The Clerk, another character in the story, celebrates Christmas Eve. This celebration is marked by sliding down Cornhill and running home to Camden Town. The Clerk's celebration of the holiday contrasts with Scrooge's disdain for it, highlighting the different attitudes towards the holiday within the community. This contrast could be indicative of the different social positions and attitudes of the characters. [Data: Entities (65), Relationships (210)]', 'summary': 'The Clerk's celebration of Christmas Eve'}, {'explanation': 'Scrooge is the employer of The Clerk, indicating a professional relationship between the two. The Clerk also signed the register of Marley's burial, further demonstrating their professional connection. This relationship is significant as it provides context for the interactions between Scrooge and The Clerk, and it could be indicative of the power dynamics within the community. [Data: Entities (38), Relationships (103)]', 'summary': 'The professional relationship between Scrooge and The Clerk'}]","{\n ""title"": ""Scrooge, The Clerk, and Christmas Eve"",\n ""summary"": ""The community revolves around the character Scrooge, his employee The Clerk, and the event of Christmas Eve. Scrooge shows disdain for Christmas Eve, while The Clerk celebrates it. The Clerk also has a professional relationship with Scrooge, having signed the register of Marley's burial."",\n ""rating"": 3.0,\n ""rating_explanation"": ""The impact severity rating is low as the community is based on a fictional story with no real-world implications."",\n ""findings"": [\n {\n ""summary"": ""Scrooge's disdain for Christmas Eve"",\n ""explanation"": ""Scrooge, a central character in the story, shows a strong disdain for Christmas Eve. This attitude is a significant aspect of his character and plays a crucial role in the narrative. His disdain for the holiday contrasts sharply with other characters, such as The Clerk, who celebrate it. This contrast could be indicative of Scrooge's isolation and his strained relationships with other characters. [Data: Entities (65), Relationships (122)]""\n },\n {\n ""summary"": ""The Clerk's celebration of Christmas Eve"",\n ""explanation"": ""The Clerk, another character in the story, celebrates Christmas Eve. This celebration is marked by sliding down Cornhill and running home to Camden Town. The Clerk's celebration of the holiday contrasts with Scrooge's disdain for it, highlighting the different attitudes towards the holiday within the community. This contrast could be indicative of the different social positions and attitudes of the characters. [Data: Entities (65), Relationships (210)]""\n },\n {\n ""summary"": ""The professional relationship between Scrooge and The Clerk"",\n ""explanation"": ""Scrooge is the employer of The Clerk, indicating a professional relationship between the two. The Clerk also signed the register of Marley's burial, further demonstrating their professional connection. This relationship is significant as it provides context for the interactions between Scrooge and The Clerk, and it could be indicative of the power dynamics within the community. [Data: Entities (38), Relationships (103)]""\n }\n ]\n}",2025-01-13,2.0


In [43]:
reports = read_indexer_reports(report_df, entity_df, COMMUNITY_LEVEL)
reports

 CommunityReport(id='402c6cab-a227-4ac9-b5f2-14106612d619', short_id='30', title="Scrooge's City and the Business Men", community_id='30', summary="The community revolves around Scrooge's city, which is the home and workplace of Scrooge. The city is also the location of the 'Change, a place where merchants, referred to as the Business Men, gather. Scrooge observes these Business Men in his visions of the future, discussing the death of an unknown man.", full_content="# Scrooge's City and the Business Men\n\nThe community revolves around Scrooge's city, which is the home and workplace of Scrooge. The city is also the location of the 'Change, a place where merchants, referred to as the Business Men, gather. Scrooge observes these Business Men in his visions of the future, discussing the death of an unknown man.\n\n## Scrooge's relationship with the city\n\nScrooge is a resident of the city, where he both lives and works. He has business associates also located within the city. At one poi

Aquí hacemos lo propio con las unidades de texto.

In [44]:
text_unit_df = pd.read_parquet(f"{OUTPUT_DIR}/{TEXT_UNIT_TABLE}.parquet")
text_unit_df.head(4)

Unnamed: 0,id,human_readable_id,text,n_tokens,document_ids,entity_ids,relationship_ids
0,d6583840046247f428a9f02738842a7c,1,"﻿The Project Gutenberg eBook of A Christmas Carol\n \nThis ebook is for the use of anyone anywhere in the United States and\nmost other parts of the world at no cost and with almost no restrictions\nwhatsoever. You may copy it, give it away or re-use it under the terms\nof the Project Gutenberg License included with this ebook or online\nat www.gutenberg.org. If you are not located in the United States,\nyou will have to check the laws of the country where you are located\nbefore using this eBook.\n\nTitle: A Christmas Carol\n\nAuthor: Charles Dickens\n\nIllustrator: Arthur Rackham\n\nRelease date: December 24, 2007 [eBook #24022]\n\nLanguage: English\n\nOriginal publication: Philadelphia and New York: J. B. Lippincott Company,, 1915\n\nCredits: Produced by Suzanne Shell, Janet Blenkinship and the Online\n Distributed Proofreading Team at http://www.pgdp.net\n\n\n*** START OF THE PROJECT GUTENBERG EBOOK A CHRISTMAS CAROL ***\n\n\n\n\nProduced by Suzanne Shell, Janet Blenkinship and the Online\nDistributed Proofreading Team at http://www.pgdp.net\n\n\n\n\n\n\n\n\n\n\n\n A CHRISTMAS CAROL\n\n [Illustration: _""How now?"" said Scrooge, caustic and cold as ever.\n ""What do you want with me?""_]\n\n\n A CHRISTMAS CAROL\n\n [Illustration]\n\n BY\n\n CHARLES DICKENS\n\n [Illustration]\n\n ILLUSTRATED BY ARTHUR RACKHAM\n\n [Illustration]\n\n J. B. LIPPINCOTT COMPANY PHILADELPHIA AND NEW YORK\n\n FIRST PUBLISHED 1915\n\n REPRINTED 1923, 1927, 1932, 1933, 1934, 1935, 1947, 1948, 1952, 1958,\n 1962, 1964, 1966, 1967, 1969, 1971, 1972, 1973\n\n ISBN: 0-397-00033-2\n\n PRINTED IN GREAT BRITAIN\n\n\n\n\n PREFACE\n\n I have endeavoured in this Ghostly little book to raise the Ghost of an\n Idea which shall not put my readers out of humour with themselves, with\n each other, with the season, or with me. May it haunt their house\n pleasantly, and no one wish to lay it.\n\n Their faithful Friend and Servant,\n\n C. D.\n\n _December, 1843._\n\n\n\n\n CHARACTERS\n\n Bob Cratchit, clerk to Ebenezer Scrooge.\n Peter Cratchit, a son of the preceding.\n Tim Cratchit (""Tiny Tim""), a cripple, youngest son of Bob Cratchit.\n Mr. Fezziwig, a kind-hearted, jovial old merchant.\n Fred, Scrooge's nephew.\n Ghost of Christmas Past, a phantom showing things past.\n Ghost of Christmas Present, a spirit of a kind, generous,\n and hearty nature.\n Ghost of Christmas Yet to Come, an apparition showing the shadows\n of things which yet may happen.\n Ghost of Jacob Marley, a spectre of Scrooge's former partner in business.\n Joe, a marine-store dealer and receiver of stolen goods.\n Ebenezer Scrooge, a grasping, covetous old man, the surviving partner\n of the firm of Scrooge and Marley.\n Mr. Topper, a bachelor.\n Dick Wilkins, a fellow apprentice of Scrooge's.\n\n Belle, a comely matron, an old sweetheart of Scrooge's.\n Caroline, wife of one of Scrooge's debtors.\n Mrs. Cratchit, wife of Bob Cratchit.\n Belinda and Martha Cratchit, daughters of the preceding.\n\n Mrs. Dilber, a laundress.\n Fan, the sister of Scrooge.\n Mrs. Fezziwig, the worthy partner of Mr. Fezziwig.\n\n\n\n\n CONTENTS\n\n STAVE ONE--MARLEY'S GHOST 3\n STAVE TWO--THE FIRST OF THE THREE SPIRITS 37\n STAVE THREE--THE SECOND OF THE THREE SPIRITS 69\n STAVE FOUR--THE LAST OF THE SPIRITS 111\n STAVE FIVE--THE END OF IT 137\n\n\n LIST OF ILLUSTRATIONS\n\n _IN COLOUR_\n\n\n ""How now?"" said Scrooge, caustic\n and cold as ever. ""What do you\n want with me?"" _Frontispiece_\n\n Bob Cratchit went down a slide on\n Cornhill, at the end of a lane of\n boys, twenty times, in honour of\n its being Christmas Eve 16\n\n Nobody under the bed; nobody in\n the closet; nobody in his dressing-gown,\n which was hanging up\n in a suspicious attitude against\n the wall 20\n\n The air was filled with phantoms,\n wandering hither and thither in\n restless haste and moaning as\n they went 32\n\n Then old Fezziwig stood out to\n dance with Mrs. Fezziwig 54\n\n A flushed and boisterous group 62\n\n Laden with Christmas toys and\n presents 64\n\n The way he went after that plump\n sister in the lace tucker! 100\n\n ""How are you?"" said one.\n",1200,[c305886e4aa2f6efcf64b57762777055],"[4f370d7f0d734f92b6118d88e79d886a, 8e5be5b1e63343b8a2f0af6baba70337, 73b0e0a551dc454690bad3d6756912be, cf70df771d6c464e93b7ff99d2ea8142, 3dfdd81803b74b3a8d3244af70df4f83, e478fddb40f044169a81153ee5f8bb3b, 4531def00cd0409a88e0ad3ca51bfd37, 05a8b2da0f4d469f8e3ef8b9305ec577, 6a99fcd063fa4975978c2891b9b6a4d0, 3582a02e66254a76b79d23799edd6fd5, 728d49a565b04cbdb630e64ef05ee631, a9ebcb82b5694ebf90e11747729afde1, 9208f851f93148b8aa4f5760351fe935, 79c9743b91c34addb4078147b4481c0b, ce62fb5eacf74c18a8d6b7417a028b47, fdee948a65da4ee5b6a3e25793dfab52, 267d767526244db6bdbabfd305b64665, 0c4eabfc3e684d2f8ede442fbdacdb7e, 6041ce29e51e4ee3888db60fa054e349, 87845a23037848d9b58e0be1539a1ab3, c1e8f6a1790a42cba70228fa79dbf313, 952b08ea7e8d48bd8ccc79ad78f3835c, 76d55ff1d0b94c24be81664f63c5fd16, 586193437d6d4bc2b156e5d8f073d825, 558b3c526384498786816333a727b9b6, 7a5df0bc0a71469dab1e1c922bafc911, b6b645896dc545a5920fc2c4a00357b9, 072fbc692fe34e069bea21d95e2b4121, 31e3b8674621489cb3bd0689a4630f60, 4bc5ea10dc1342c497eac5a7aef14f7a, 7f32030293ab434eb8c5730ed22c41c8, 256c91e05ad44aff89255d345553a6ec]","[27f628f3ac9e44cfb4fba4efb6104d88, 2a238fb19702455ea908bc7f6cb646ba, 7ce4b0c4ced84a5cbfb5d689b76dd463, 6b62957820284e01b5dddab42da62c14, f04cb909b8264316956780a5aff3e7ab, 520e5b61d354470da4057726eb3c2e55, c1e5881730ca4132a03d6ca9be155fbd, 86eecbc0e64d44d8a9f161e6c7d9e528, 7661c462c78b48be9f9d9f6a4b94147e, c115f823d8cb4060a09352bd9d912bae, cc2cdab573904a9cb0efb55df3d93eef, a71e256151f64f588f6d0b073a531c5b, 5690b7ae4b5c4ad9b8a403eb458de263, 3528c2ba352847be9b9b06c1756bb7a1, fcf4d46f529e4e47a3a4af65761fd197, 293333ff826345d4829269755c461c6c, 9114fce3a9e945aa8e7ded44b868d392, 00b1ff2d90bf428ba03987884cef56f4, bd8580b4c0d84b11ac071bd87f5a6167, f04f995f4ce54013a018d5cf7fd5e6bb]"
1,10730234d6ccc7cee08f3cfc58d8a9a1,2,"and thither in\n restless haste and moaning as\n they went 32\n\n Then old Fezziwig stood out to\n dance with Mrs. Fezziwig 54\n\n A flushed and boisterous group 62\n\n Laden with Christmas toys and\n presents 64\n\n The way he went after that plump\n sister in the lace tucker! 100\n\n ""How are you?"" said one.\n ""How are you?"" returned the other.\n ""Well!"" said the first. ""Old\n Scratch has got his own at last,\n hey?"" 114\n\n ""What do you call this?"" said Joe.\n ""Bed-curtains!"" ""Ah!"" returned\n the woman, laughing....\n ""Bed-curtains!""\n\n ""You don't mean to say you took\n 'em down, rings and all, with him\n lying there?"" said Joe.\n\n ""Yes, I do,"" replied the woman.\n ""Why not?"" 120\n\n ""It's I, your uncle Scrooge. I have\n come to dinner. Will you let\n me in, Fred?"" 144\n\n ""Now, I'll tell you what, my friend,""\n said Scrooge. ""I am not going\n to stand this sort of thing any\n longer."" 146\n\n[Illustration]\n\n_IN BLACK AND WHITE_\n\n\n Tailpiece vi\n Tailpiece to List of Coloured Illustrations x\n Tailpiece to List of Black and White Illustrations xi\n Heading to Stave One 3\n They were portly gentlemen, pleasant to behold 12\n On the wings of the wind 28-29\n Tailpiece to Stave One 34\n Heading to Stave Two 37\n He produced a decanter of curiously\n light wine and a block of curiously heavy cake 50\n She left him, and they parted 60\n Tailpiece to Stave Two 65\n Heading to Stave Three 69\n There was nothing very cheerful in the climate 75\n He had been Tim's blood-horse all the way from church 84-85\n With the pudding 88\n Heading to Stave Four 111\n Heading to Stave Five 137\n Tailpiece to Stave Five 147\n\n[Illustration]\n\n\nSTAVE ONE\n\n\n[Illustration]\n\n\n\n\nMARLEY'S GHOST\n\n\nMarley was dead, to begin with. There is no doubt whatever about that.\nThe register of his burial was signed by the clergyman, the clerk, the\nundertaker, and the chief mourner. Scrooge signed it. And Scrooge's name\nwas good upon 'Change for anything he chose to put his hand to. Old\nMarley was as dead as a door-nail.\n\nMind! I don't mean to say that I know of my own knowledge, what there is\nparticularly dead about a door-nail. I might have been inclined, myself,\nto regard a coffin-nail as the deadest piece of ironmongery in the\ntrade. But the wisdom of our ancestors is in the simile; and my\nunhallowed hands shall not disturb it, or the country's done for. You\nwill, therefore, permit me to repeat, emphatically, that Marley was as\ndead as a door-nail.\n\nScrooge knew he was dead? Of course he did. How could it be otherwise?\nScrooge and he were partners for I don't know how many years. Scrooge\nwas his sole executor, his sole administrator, his sole assign, his sole\nresiduary legatee, his sole friend, and sole mourner. And even Scrooge\nwas not so dreadfully cut up by the sad event but that he was an\nexcellent man of business on the very day of the funeral, and solemnised\nit with an undoubted bargain.\n\nThe mention of Marley's funeral brings me back to the point I started\nfrom. There is no doubt that Marley was dead. This must be distinctly\nunderstood, or nothing wonderful can come of the story I am going to\nrelate. If we were not perfectly convinced that Hamlet's father died\nbefore the play began, there would be nothing more remarkable in his\ntaking a stroll at night, in an easterly wind, upon his own ramparts,\nthan there would be in any other middle-aged gentleman rashly turning\nout after dark in a breezy spot--say St. Paul's Churchyard, for\ninstance--literally to astonish his son's weak mind.\n\nScrooge never painted out Old Marley's name. There it stood, years\nafterwards, above the warehouse door: Scrooge and Marley. The firm was\nknown as Scrooge and Marley. Sometimes people new to the business called\nScrooge Scrooge, and sometimes Marley, but he answered to both names. It\nwas all the same to him.\n\nOh! but he was a tight-fisted hand at the grindstone, Scrooge! a\nsqueezing, wrenching, grasping, scraping, clutching, covetous old\nsinner! Hard and sharp as flint, from which no steel had ever struck out\ngenerous fire; secret, and self-contained, and solitary as an oyster.\nThe cold within him froze his old features, nipped his pointed nose,\nshrivelled his cheek, stiffened his gait; made his",1200,[c305886e4aa2f6efcf64b57762777055],"[ce62fb5eacf74c18a8d6b7417a028b47, 87845a23037848d9b58e0be1539a1ab3, 7f32030293ab434eb8c5730ed22c41c8, d43a4bf0218041a6a0d39fa269891322, b7fde61247a54514a9482ae2a9b4efae, 07983dc4ae694ee6beade2959a611354, 27d3bbc17fb6423b933d0263b178ec1a, bd6d54a7a8d94de1bded3c5807a933ad, d9f4a9704b20480db8b8c4e3899a3751, e9829f782df6461691c81a1b206ef95a, 5c6faac584694fc094a9552e25452149, a576fece771e4165b738d532bce5b66a, d99b325cde304d31bcdb16ba4cd3abbf, 1fbdeede6a0449fe8b295ed5cfba6cd3]","[bbd1bb9e7bca472e80b37960694a882f, 7943536f667640b79b7aec77eca9748a, 7e15bcf26eee414289be20ff13434bea, 8b8abfd8da2340b6a1fb4253419c81c8, 8e7b10cd2540402c99a3dc5cbb07b61b, 3b183b6002c54888af6712e4daaa2fe3, 82ffa2ba82964352b71f9eef693a5d62, a0d5844ea436452cac765c27db8e86ed, 47aa79ed2d5c4bbba126f9d115acf905, 74c9bb75e92d416a974d6716ebb9a038]"
2,980594a50d68db06e6ca257bdb9ae95e,3,"-fisted hand at the grindstone, Scrooge! a\nsqueezing, wrenching, grasping, scraping, clutching, covetous old\nsinner! Hard and sharp as flint, from which no steel had ever struck out\ngenerous fire; secret, and self-contained, and solitary as an oyster.\nThe cold within him froze his old features, nipped his pointed nose,\nshrivelled his cheek, stiffened his gait; made his eyes red, his thin\nlips blue; and spoke out shrewdly in his grating voice. A frosty rime\nwas on his head, and on his eyebrows, and his wiry chin. He carried his\nown low temperature always about with him; he iced his office in the\ndog-days, and didn't thaw it one degree at Christmas.\n\nExternal heat and cold had little influence on Scrooge. No warmth could\nwarm, no wintry weather chill him. No wind that blew was bitterer than\nhe, no falling snow was more intent upon its purpose, no pelting rain\nless open to entreaty. Foul weather didn't know where to have him. The\nheaviest rain, and snow, and hail, and sleet could boast of the\nadvantage over him in only one respect. They often 'came down'\nhandsomely, and Scrooge never did.\n\nNobody ever stopped him in the street to say, with gladsome looks, 'My\ndear Scrooge, how are you? When will you come to see me?' No beggars\nimplored him to bestow a trifle, no children asked him what it was\no'clock, no man or woman ever once in all his life inquired the way to\nsuch and such a place, of Scrooge. Even the blind men's dogs appeared to\nknow him; and, when they saw him coming on, would tug their owners into\ndoorways and up courts; and then would wag their tails as though they\nsaid, 'No eye at all is better than an evil eye, dark master!'\n\nBut what did Scrooge care? It was the very thing he liked. To edge his\nway along the crowded paths of life, warning all human sympathy to keep\nits distance, was what the knowing ones call 'nuts' to Scrooge.\n\nOnce upon a time--of all the good days in the year, on Christmas\nEve--old Scrooge sat busy in his counting-house. It was cold, bleak,\nbiting weather; foggy withal; and he could hear the people in the court\noutside go wheezing up and down, beating their hands upon their breasts,\nand stamping their feet upon the pavement stones to warm them. The City\nclocks had only just gone three, but it was quite dark already--it had\nnot been light all day--and candles were flaring in the windows of the\nneighbouring offices, like ruddy smears upon the palpable brown air. The\nfog came pouring in at every chink and keyhole, and was so dense\nwithout, that, although the court was of the narrowest, the houses\nopposite were mere phantoms. To see the dingy cloud come drooping down,\nobscuring everything, one might have thought that nature lived hard by,\nand was brewing on a large scale.\n\nThe door of Scrooge's counting-house was open, that he might keep his\neye upon his clerk, who in a dismal little cell beyond, a sort of tank,\nwas copying letters. Scrooge had a very small fire, but the clerk's fire\nwas so very much smaller that it looked like one coal. But he couldn't\nreplenish it, for Scrooge kept the coal-box in his own room; and so\nsurely as the clerk came in with the shovel, the master predicted that\nit would be necessary for them to part. Wherefore the clerk put on his\nwhite comforter, and tried to warm himself at the candle; in which\neffort, not being a man of strong imagination, he failed.\n\n'A merry Christmas, uncle! God save you!' cried a cheerful voice. It was\nthe voice of Scrooge's nephew, who came upon him so quickly that this\nwas the first intimation he had of his approach.\n\n'Bah!' said Scrooge. 'Humbug!'\n\nHe had so heated himself with rapid walking in the fog and frost, this\nnephew of Scrooge's, that he was all in a glow; his face was ruddy and\nhandsome; his eyes sparkled, and his breath smoked again.\n\n'Christmas a humbug, uncle!' said Scrooge's nephew. 'You don't mean\nthat, I am sure?'\n\n'I do,' said Scrooge. 'Merry Christmas! What right have you to be merry?\nWhat reason have you to be merry? You're poor enough.'\n\n'Come, then,' returned the nephew gaily. 'What right have you to be\ndismal? What reason have you to be morose? You're rich enough.'\n\nScrooge, having no better answer ready on the spur of the moment, said,\n'Bah!' again; and followed it up with 'Humbug!'\n\n'Don't be cross, uncle!' said the nephew.\n\n'What else can I be,' returned the uncle, 'when I live in such a world\nof fools as this? Merry Christmas! Out upon merry Christmas! What's\nChristmas-time to you but a time for paying bills without money; a time\nfor finding yourself a year older, and not an hour richer; a time for\nbalancing your books",1200,[c305886e4aa2f6efcf64b57762777055],"[b7fde61247a54514a9482ae2a9b4efae, 5b932f4d5c544b648d35af136acfd1af, 555a6f653f454ff4bb07e48ae79a540d, 7f2701241d404a9699b1c0587d07c03e, a736c92e282d41aaaa1e390030317dd3, 570173a4496b4b46a5e81f5ee799733f]","[e1cff2a7aec64a7b90c8d90a6ad6e0fe, f49e640cff9746d0a0bc348acc6ed12b, 2959223763ce4a8f8713bed4444afb0d, e3762833f4094197b62132870ca5c68a, 3dc12ba290824592b809481bf0b21903, 97e033ade5ed4330a51f8873db799e63]"
3,080d8e696ff38c653ca90fa086415e74,4,"'Bah!' again; and followed it up with 'Humbug!'\n\n'Don't be cross, uncle!' said the nephew.\n\n'What else can I be,' returned the uncle, 'when I live in such a world\nof fools as this? Merry Christmas! Out upon merry Christmas! What's\nChristmas-time to you but a time for paying bills without money; a time\nfor finding yourself a year older, and not an hour richer; a time for\nbalancing your books, and having every item in 'em through a round dozen\nof months presented dead against you? If I could work my will,' said\nScrooge indignantly, 'every idiot who goes about with ""Merry Christmas""\non his lips should be boiled with his own pudding, and buried with a\nstake of holly through his heart. He should!'\n\n'Uncle!' pleaded the nephew.\n\n'Nephew!' returned the uncle sternly, 'keep Christmas in your own way,\nand let me keep it in mine.'\n\n'Keep it!' repeated Scrooge's nephew. 'But you don't keep it.'\n\n'Let me leave it alone, then,' said Scrooge. 'Much good may it do you!\nMuch good it has ever done you!'\n\n'There are many things from which I might have derived good, by which I\nhave not profited, I dare say,' returned the nephew; 'Christmas among\nthe rest. But I am sure I have always thought of Christmas-time, when\nit has come round--apart from the veneration due to its sacred name and\norigin, if anything belonging to it can be apart from that--as a good\ntime; a kind, forgiving, charitable, pleasant time; the only time I know\nof, in the long calendar of the year, when men and women seem by one\nconsent to open their shut-up hearts freely, and to think of people\nbelow them as if they really were fellow-passengers to the grave, and\nnot another race of creatures bound on other journeys. And therefore,\nuncle, though it has never put a scrap of gold or silver in my pocket, I\nbelieve that it _has_ done me good and _will_ do me good; and I say, God\nbless it!'\n\nThe clerk in the tank involuntarily applauded. Becoming immediately\nsensible of the impropriety, he poked the fire, and extinguished the\nlast frail spark for ever.\n\n'Let me hear another sound from _you_,' said Scrooge, 'and you'll keep\nyour Christmas by losing your situation! You're quite a powerful\nspeaker, sir,' he added, turning to his nephew. 'I wonder you don't go\ninto Parliament.'\n\n'Don't be angry, uncle. Come! Dine with us to-morrow.'\n\nScrooge said that he would see him----Yes, indeed he did. He went the\nwhole length of the expression, and said that he would see him in that\nextremity first.\n\n'But why?' cried Scrooge's nephew. 'Why?'\n\n'Why did you get married?' said Scrooge.\n\n'Because I fell in love.'\n\n'Because you fell in love!' growled Scrooge, as if that were the only\none thing in the world more ridiculous than a merry Christmas. 'Good\nafternoon!'\n\n'Nay, uncle, but you never came to see me before that happened. Why give\nit as a reason for not coming now?'\n\n'Good afternoon,' said Scrooge.\n\n'I want nothing from you; I ask nothing of you; why cannot we be\nfriends?'\n\n'Good afternoon!' said Scrooge.\n\n'I am sorry, with all my heart, to find you so resolute. We have never\nhad any quarrel to which I have been a party. But I have made the trial\nin homage to Christmas, and I'll keep my Christmas humour to the last.\nSo A Merry Christmas, uncle!'\n\n'Good afternoon,' said Scrooge.\n\n'And A Happy New Year!'\n\n'Good afternoon!' said Scrooge.\n\nHis nephew left the room without an angry word, notwithstanding. He\nstopped at the outer door to bestow the greetings of the season on the\nclerk, who, cold as he was, was warmer than Scrooge; for he returned\nthem cordially.\n\n'There's another fellow,' muttered Scrooge, who overheard him: 'my\nclerk, with fifteen shillings a week, and a wife and family, talking\nabout a merry Christmas. I'll retire to Bedlam.'\n\nThis lunatic, in letting Scrooge's nephew out, had let two other people\nin. They were portly gentlemen, pleasant to behold, and now stood, with\ntheir hats off, in Scrooge's office. They had books and papers in their\nhands, and bowed to him.\n\n'Scrooge and Marley's, I believe,' said one of the gentlemen, referring\nto his list. 'Have I the pleasure of addressing Mr. Scrooge, or Mr.\nMarley?'\n\n'Mr. Marley has been dead these seven years,' Scrooge replied. 'He died\nseven years ago, this very night.'\n\n'We have no doubt his liberality is well represented by his surviving\npartner,' said the gentleman, presenting his credentials.\n\n[Illustration: THEY WERE PORTLY GENTLEMEN, PLEASANT TO BEHOLD]\n\nIt certainly was; for they had been two kindred spirits. At the ominous\nword 'liberality' Scrooge frowned, and shook his head, and handed the\ncredentials back.\n\n'At this festive season of the year, Mr. Scro",1200,[c305886e4aa2f6efcf64b57762777055],"[b7fde61247a54514a9482ae2a9b4efae, 07983dc4ae694ee6beade2959a611354, 570173a4496b4b46a5e81f5ee799733f, c03c1461548c4e20a594dd399578cb73, 6d509d0e429848779faa5d7b308aa638, 25da083f02e8473ab00632d2c39b72ca, 2ee5c5ca9ecf461da9706e2f023b4653]","[7e15bcf26eee414289be20ff13434bea, 3dc12ba290824592b809481bf0b21903, a40b5cdd5ed84072a1b4fcf320cfa206, f872580ff75a4a4e99f69bff5d03b143, 9e4184f850a4494596a452942189f487, f8b67c70b90a4151a69dcf187788c9bb, f2a75055521540fb8be35fd8fc5d72a3, 4366a5945f5b43fe8e614afaa02cda62]"


In [45]:
text_units = read_indexer_text_units(text_unit_df)
text_units

[TextUnit(id='d6583840046247f428a9f02738842a7c', short_id='0', text='\ufeffThe Project Gutenberg eBook of A Christmas Carol\n    \nThis ebook is for the use of anyone anywhere in the United States and\nmost other parts of the world at no cost and with almost no restrictions\nwhatsoever. You may copy it, give it away or re-use it under the terms\nof the Project Gutenberg License included with this ebook or online\nat www.gutenberg.org. If you are not located in the United States,\nyou will have to check the laws of the country where you are located\nbefore using this eBook.\n\nTitle: A Christmas Carol\n\nAuthor: Charles Dickens\n\nIllustrator: Arthur Rackham\n\nRelease date: December 24, 2007 [eBook #24022]\n\nLanguage: English\n\nOriginal publication: Philadelphia and New York: J. B. Lippincott Company,, 1915\n\nCredits: Produced by Suzanne Shell, Janet Blenkinship and the Online\n        Distributed Proofreading Team at http://www.pgdp.net\n\n\n*** START OF THE PROJECT GUTENBERG EBOOK

<div style="text-align: justify;">

Este bloque de código configura todo el entorno necesario para realizar consultas mediante el enfoque GraphRAG. En primer lugar, se leen y procesan las salidas de la indexación: se cargan los datos de entidades, embeddings, reportes de comunidades y unidades de texto desde archivos Parquet, y se transforman en estructuras de datos alineadas con nuestro modelo de conocimiento.

A continuación, se establece una conexión con una base de datos vectorial (LanceDB) a través de un objeto llamado `description_embedding_store`, que almacenará las representaciones vectoriales de las descripciones de las entidades. 

El código procede a configurar la interfaz con el API de OpenAI, definiendo el token (que en este ejemplo se deja vacío por motivos de seguridad), el modelo LLM (en este caso, GPT-4-32k) y el modelo de embeddings (text-embedding-ada-002). Se crea también un codificador de tokens mediante la biblioteca `tiktoken`, el cual es esencial para gestionar la longitud de los textos en función del límite de tokens del modelo.

Con estos elementos, se construye un objeto `LocalSearchMixedContext`, que actúa como ensamblador del contexto para las consultas. Este objeto integra los reportes de comunidades, las unidades de texto, las entidades, las relaciones (y opcionalmente las covariables) junto con el almacenamiento de embeddings, lo que permite componer el contexto relevante a partir de diversas fuentes de información.

Finalmente, se definen dos conjuntos de parámetros: uno para la construcción del contexto (`local_context_params`), que incluye proporciones de contribución de las diferentes fuentes y límites de tokens, y otro para el LLM (`llm_params`), con parámetros como el número máximo de tokens y la temperatura de generación. Con estos parámetros, se instancia el motor de búsqueda local (`LocalSearch`), que es el componente encargado de generar respuestas basadas en el contexto integrado, devolviendo resultados en formato de múltiples párrafos.

En resumen, este código orquesta la integración de datos indexados, la conexión con la base de datos vectorial, la configuración de los modelos de OpenAI y la construcción del contexto de búsqueda, para finalmente crear un motor de búsqueda  capaz de responder consultas.

</div>

In [46]:
# setup (see also ../../local_search.ipynb)
entities = read_indexer_entities(entity_df, entity_embedding_df, COMMUNITY_LEVEL)

description_embedding_store = LanceDBVectorStore(
    collection_name="default-entity-description",
)
description_embedding_store.connect(db_uri=LANCEDB_URI)
#covariate_df = pd.read_parquet(f"{OUTPUT_DIR}/{COVARIATE_TABLE}.parquet")
#claims = read_indexer_covariates(covariate_df)
#covariates = {"claims": claims}
report_df = pd.read_parquet(f"{OUTPUT_DIR}/{COMMUNITY_REPORT_TABLE}.parquet")
reports = read_indexer_reports(report_df, entity_df, COMMUNITY_LEVEL)
text_unit_df = pd.read_parquet(f"{OUTPUT_DIR}/{TEXT_UNIT_TABLE}.parquet")
text_units = read_indexer_text_units(text_unit_df)

api_key = " "
llm_model = "gpt-4-32k"

embedding_model = "text-embedding-ada-002"

llm = ChatOpenAI(
    api_key=api_key,
    model=llm_model,
    api_version= "2024-02-15-preview",
    api_base= "https://azureopenaimark3.openai.azure.com/" ,
    deployment_name = "gpt-4-32k",
    api_type=OpenaiApiType.AzureOpenAI,  
    max_retries=20,
)

token_encoder = tiktoken.get_encoding("cl100k_base")

text_embedder = OpenAIEmbedding(
    api_key=api_key,
    api_base= "https://azureopenaimark3.openai.azure.com/",
    api_version= "2024-02-15-preview",
    api_type=OpenaiApiType.AzureOpenAI,
    model=embedding_model,
    deployment_name= "embedding-ada-v2",
    max_retries=20,
)

context_builder = LocalSearchMixedContext(
    community_reports=reports,
    text_units=text_units,
    entities=entities,
    relationships=relationships,
    #covariates=covariates,
    entity_text_embeddings=description_embedding_store,
    embedding_vectorstore_key=EntityVectorStoreKey.ID,  # if the vectorstore uses entity title as ids, set this to EntityVectorStoreKey.TITLE
    text_embedder=text_embedder,
    token_encoder=token_encoder,
)

local_context_params = {
    "text_unit_prop": 0.5,
    "community_prop": 0.1,
    "conversation_history_max_turns": 5,
    "conversation_history_user_turns_only": True,
    "top_k_mapped_entities": 10,
    "top_k_relationships": 10,
    "include_entity_rank": True,
    "include_relationship_weight": True,
    "include_community_rank": False,
    "return_candidate_context": False,
    "embedding_vectorstore_key": EntityVectorStoreKey.ID,  # set this to EntityVectorStoreKey.TITLE if the vectorstore uses entity title as ids
    "max_tokens": 1_000,
    "requests_per_minute": 1_000  # change this based on the token limit you have on your model (if you are using a model with 8k limit, a good setting could be 5000)
}

llm_params = {
    "max_tokens": 1_000,  # change this based on the token limit you have on your model (if you are using a model with 8k limit, a good setting could be 1000=1500)
    "temperature": 0.0,
}

search_engine = LocalSearch(
    llm=llm,
    context_builder=context_builder,
    token_encoder=token_encoder,
    llm_params=llm_params,
    context_builder_params=local_context_params,
    response_type="multiple paragraphs",  # free form text describing the response type and format, can be anything, e.g. prioritized list, single paragraph, multiple paragraphs, multiple-page report
)

Probamos con esta consulta para evaluar el desempeño del sistema. Nos hubiera gustado experimentar modificando y ajustando los parámetros—como la proporción de contribución de cada fuente en el contexto o el límite de tokens—para optimizar la configuración. Sin embargo, debido a que cada consulta a la API de OpenAI tiene un costo asociado, no ha sido posible realizar pruebas extensivas con diferentes combinaciones de parámetros.

In [47]:
result = await search_engine.asearch("Tell me about Scrooge")
print(result.response)

Scrooge is a prominent entity in the data set, appearing in various forms and with a significant number of relationships. The entity "Scrooge" has the highest number of relationships, with a total of 123 [Data: Entities (33)]. This suggests that Scrooge is a central figure or character in the context of the data.

There are also other entities that seem to be related to Scrooge. "Ebenezer Scrooge" has 12 relationships [Data: Entities (20)], "Master Scrooge" has 4 relationships [Data: Entities (105)], and "Uncle Scrooge" has 2 relationships [Data: Entities (174)]. This could indicate that these are different titles or roles that Scrooge has, or they could be different characters entirely.

There are also entities that seem to be related to Scrooge in a familial or professional context. "Mr. Scrooge's Nephew" has 3 relationships [Data: Entities (203)], and "Scrooge's Nephew" has 4 relationships [Data: Entities (43)]. This suggests that Scrooge has a nephew who is also a significant chara