## Notes on Graph RAGs

#### What is Graph RAG?
A Graph RAG augments the standard Retrieval-Augmented Generation (RAG) framework by introducing a knowledge graph into the retrieval step. This graph provides semantic, logical, or structural relationships between documents or concepts—something a vanilla vector search might miss.

#### Core Architecture Pattern

1. Initial Dense Retrieval (Vector Search)
     - We perform a semantic vector search using the query to retrieve a small, high-relevance core set of documents (D_core)
     - THese documents are typically nodes in a graph, either representing the documents themselves or entities within the documents
     - This stage ensures precision in matching query intent with text
  
2. Graph traversal for context expansion
     - Use the knowledge graph to expand the context by exploring neighbouring nodes of D_core

3. Contextual re-ranking and filtering
     - Post traversal we may have hundreds or thousands of documents
     - Use ranking mechanisms like PageRank, Betweenness or Cross-encoders (BERT)

4. Context packing and prompt construction
     - Curate the final subset (say, top 10-30 documents) for inclusion in LLM prompt

6. LLM Generation
     - The LLM (e.g GPT, Claude, etc.) takes the curated, context-rich documents and generates answers
     - The increased context density (via graph expansion) improves factual grounding and reasoning

#### Benefits of Graph RAG in combination with a vector search
1. Accuracy - improved accuracy - almost 3x because the context is enriched with the connections of surrounding documents and the ranking of those documents which is provided to the LLM
2. Easier development - if a knowledge graph is already created
3. Explainability and governance (see diagram below)

![image.png](attachment:2d4be40c-8cda-46f8-99a2-88d0b281cb37.png)

## dlt
dlt is an open-source Python library that loads data from various, often messy data sources into well-structured, live datasets.

In the workshop, dlt is used in the first demo to read a dataset into a local DuckDB destination and in the second demo from an REST API source to a local file system 


### Cognee
Cognee is a tool to turn data into a queryable memory knowledge graphs. 

It allows:
- Adding structured or unstructured data
- Automatically build a knowledge graph from it
- Ask natural language questions and get grounded, context rich results