Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions experimental/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,10 @@ Experimental examples are sample code and deployments for RAG pipelines that are

This example is able to ingest PDFs, PowerPoint slides, Word and other documents with complex data formats including text, images, slides and tables. It allows users to ask questions through a text interface and optionally with an image query, and it can respond with text and reference images, slides and tables in its response, along with source links and downloads.

* [NVIDIA Knowledge Graph RAG](./knowledge_graph_rag)

This example implements a GPU-accelerated pipeline for creating and querying knowledge graphs using Retrieval-Augmented Generation (RAG). The approach leverages NVIDIA's AI technologies and RAPIDS ecosystem to process large-scale datasets efficiently. It allows users to interact through a chat interface and also visualize the corresponding knowledge graph, and perform evaluations against synthetic data generated with NVIDIA's Nemotron-4 340B model.

* [Run RAG-LLM in Azure Machine Learning](./AzureML)

This example shows the configuration changes to using Docker containers and local GPUs that are required
Expand Down
170 changes: 170 additions & 0 deletions experimental/knowledge_graph_rag/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,170 @@
# Knowledge Graphs for RAG with NVIDIA AI Foundation Models and Endpoints

This repository implements a GPU-accelerated pipeline for creating and querying knowledge graphs using Retrieval-Augmented Generation (RAG). Our approach leverages NVIDIA's AI technologies and RAPIDS ecosystem to process large-scale datasets efficiently.

## Overview

This project demonstrates:
- Creation of knowledge graphs from various document sources
- Provides a simple script to download research papers from Arxiv for a given topic
- GPU-accelerated graph processing and analysis using NVIDIA's RAPIDS Graph Analytics library (cuGraph): https://github.com/rapidsai/cugraph
- Hybrid semantic search combining keyword and dense vector approaches
- Integration of knowledge graphs into RAG workflows
- Visualization of the knowledge graph through [Gephi-Lite](https://github.com/gephi/gephi-lite), an open-source web app for visualization of large graphs
- Comprehensive evaluation metrics using NVIDIA's Nemotron-4 340B model for synthetic data generation and reward scoring

## Technologies Used

- **Frontend**: Streamlit
- **Graph Representation and Optimization**: cuGraph (RAPIDS), NetworkX
- **Vector Database**: Milvus
- **LLM Models**:
- NVIDIA AI Playground hosted models for graph creation and querying, providing numerous instruct-fine-tuned options
- NVIDIA AI Playground hosted Nemotron-4 340B model for synthetic data generation and evaluation reward scoring

## Architecture Diagram

Here is how the ingestion system is designed, by leveraging a high throughput hosted LLM deployment which can process multiple document chunks in parallel. The LLM can optionally be fine-tuned for triple extraction, thereby requiring a shorter prompt and enabling greater accuracy and optimized inference.

```mermaid
graph TD
A[Document Collection] --> B{Document Splitter}
B --> |Chunk 1| C1[LLM Stream 1]
B --> |Chunk 2| C2[LLM Stream 2]
B --> |Chunk 3| C3[LLM Stream 3]
B --> |...| C4[...]
B --> |Chunk N| C5[LLM Stream N]
C1 --> D[Response Parser<br>and Aggregator]
C2 --> D
C3 --> D
C4 --> D
C5 --> D
D --> E[GraphML Generator]
E --> F[Single GraphML File]
```

Here's how the inference system is designed, incorporating both hybrid dense-vector search and sparse keyword-based search, reranking, and Knowledge Graph for multi-hop search:

```mermaid
graph LR
E(User Query) --> A(FRONTEND<br/>Chat UI<br/>Streamlit)
A --Dense-Sparse<br>Retrieval--> C(Milvus Vector DB)
A --Multi-hop<br>Search--> F(Knowledge Graph <br> with cuGraph)
C --Hybrid<br>Chunks--> X(Reranker)
X -- Augmented<br/>Prompt--> B((Hosted LLM API<br/>NVIDIA AI Playground))
F -- Graph Context<br>Triples--> B
B --> D(Streaming<br/>Chat Response)
```

This architecture shows how the user query is processed through both the Milvus Vector DB for traditional retrieval and the Knowledge Graph with cuGraph for multi-hop search. The results from both are then used to augment the prompt sent to the NVIDIA AI Playground backend.

## Setup Steps

Follow these steps to get the chatbot up and running in less than 5 minutes:

### 1. Clone this repository to a Linux machine

```bash
git clone https://github.com/NVIDIA/GenerativeAIExamples/ && cd GenerativeAIExamples/experimental/knowledge_graph_rag
```

### 2. Get an NVIDIA AI Playground API Key

```bash
export NVIDIA_API_KEY="nvapi-*******************"
```

If you don't have an API key, follow [these instructions](https://github.com/NVIDIA/GenerativeAIExamples/blob/main/docs/api-catalog.md#get-an-api-key-for-the-accessing-models-on-the-api-catalog) to sign up for an NVIDIA AI Foundation developer account and obtain access.

### 3. Create a Python virtual environment and activate it

```bash
cd knowledge_graph_rag
pip install virtualenv
python3 -m virtualenv venv
source venv/bin/activate
```

### 4. Install the required packages

```bash
pip install -r requirements.txt
```

### 5. Setup a hosted Milvus vector database

Follow the instructions [here](https://milvus.io/docs/install_standalone-docker.md) to deploy a hosted Milvus instance for the vector database backend. Note that it must be Milvus 2.4 or better to support [hybrid search](https://milvus.io/docs/multi-vector-search.md). We do not support disabling this feature for previous versions of Milvus as of now.

### 5. Launch the Streamlit frontend

```bash
streamlit run app.py
```

Open the URL in your browser to access the UI and chatbot!

### 6. Upload Docs and Train Model

Upload your own documents to a folder, or use an existing folder for the knowledge graph creation. Note that the implementation currently focuses on text from PDFs only. It can be extended to other text file formats using the Unstructured.io data loader in LangChain.

## Pipeline Components

1. **Data Ingestion**:
- ArXiv paper downloader
- Arbitrary document folder ingestion
2. **Knowledge Graph Creation**:
- Uses the API Catalog models through the LangChain NVIDIA AI Endpoints interface
3. **Graph Representation**: cuGraph + RAPIDS + NetworkX
4. **Semantic Search**: Milvus 2.4.x for hybrid (keyword + dense vector) search
5. **RAG Integration**: Custom workflow incorporating knowledge graph retrieval
6. **Evaluation**: Comparison of different RAG approaches using Nemotron-4 340B model

## Evaluation Metrics

We've implemented comprehensive evaluation metrics using NVIDIA's Nemotron-4 340B model, which is designed for synthetic data generation and reward scoring. Our evaluation compares different RAG approaches across five key attributes:

1. **Helpfulness**: Overall helpfulness of the response to the prompt.
2. **Correctness**: Inclusion of all pertinent facts without errors.
3. **Coherence**: Consistency and clarity of expression.
4. **Complexity**: Intellectual depth required to write the response.
5. **Verbosity**: Amount of detail included in the response, relative to what is asked for in the prompt.

## Evaluation Results

We compared four RAG approaches on a small representative dataset using the NeMoTron-340B reward model:

![Evaluation Results](viz.png)

Key takeaways:
- Graph RAG significantly outperforms traditional Text RAG.
- Combined Text and Graph RAG shows promise but doesn't consistently beat the ground truth yet. This may be due to the way we structure the augmented prompt for the LLM and needs more experimentation.
- Our approach improves on verbosity and coherence compared to ground truth.

While we're not beating long-context ground truth across the board, these results show the potential of integrating knowledge graphs into RAG systems. We're particularly excited about the improvements in verbosity and coherence. Next steps include refining how we combine text and graph retrieval to get the best of both worlds.

## Component Swapping

All components are designed to be swappable. Here are some options:

- **Frontend**: The current Streamlit implementation can be replaced with other web frameworks.
- **Retrieval**: The embedding model and reranker model being used for semantic search can be swapped to use other models for higher performance. The number of entities retrieved prior to reranking can also be changed. The chunk size for documents can be changed.
- **Vector DB**: While we use Milvus, it can be replaced with options like ChromaDB, Pinecone, FAISS, etc. Milvus is designed to be highly performant and scale on GPU infrastructure.
- **Backend**:
- Cloud Hosted: Currently uses NVIDIA AI Playground APIs, but can be deployed in a private DGX Cloud or AWS/Azure/GCP with NVIDIA GPUs and LLMs.
- On-Prem/Locally Hosted: Smaller models like Llama2-7B or Mistral-7B can be run locally with appropriate hardware. Fine-tuning can also be done for the purpose of a specific model designed for triple extraction for a given use-case.

## Future Work

- Dynamic information incorporation into knowledge graphs (continuous update of knowledge graphs)
- Further refinement of evaluation metrics and combined semantic-graphRAG pipeline
- Investigating the impact of different graph structures and queries on RAG performance (single/multi-hop retrieval, BFS/DFS, etc)
- Expanding support for various document types and formats (multimodal RAG with knowledge graphs)
- Fine-tuning the Nemotron-4 340B model for domain-specific evaluations

## Contributing

Please create a merge request to this repository, our team appreciates any and all contributions that add features! We will review and get back as soon as possible.

## Acknowledgements

This project utilizes NVIDIA's AI technologies, including the Nemotron-4 340B model, and the RAPIDS ecosystem. We thank the open-source community for their invaluable contributions to the tools and libraries used in this project.
138 changes: 138 additions & 0 deletions experimental/knowledge_graph_rag/app.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,138 @@
# SPDX-FileCopyrightText: Copyright (c) 2023-2024 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

import os
import streamlit as st
from llama_index.core import SimpleDirectoryReader, KnowledgeGraphIndex
from utils.preprocessor import extract_triples
from llama_index.core import ServiceContext
import multiprocessing
import pandas as pd
import networkx as nx
from utils.lc_graph import process_documents, save_triples_to_csvs
from vectorstore.search import SearchHandler
from langchain_nvidia_ai_endpoints import ChatNVIDIA

def load_data(input_dir, num_workers):
reader = SimpleDirectoryReader(input_dir=input_dir)
documents = reader.load_data(num_workers=num_workers)
return documents

def has_pdf_files(directory):
for file in os.listdir(directory):
if file.endswith(".pdf"):
return True
return False

st.title("Knowledge Graph RAG")

st.subheader("Load Data from Files")

# Variable for documents
if 'documents' not in st.session_state:
st.session_state['documents'] = None

models = ChatNVIDIA.get_available_models()
available_models = [model.id for model in models if model.model_type=="chat" and "instruct" in model.id]
with st.sidebar:
llm = st.selectbox("Choose an LLM", available_models, index=available_models.index("mistralai/mixtral-8x7b-instruct-v0.1"))
st.write("You selected: ", llm)
llm = ChatNVIDIA(model=llm)

def app():
# Get the current working directory
cwd = os.getcwd()

# Get a list of visible directories in the current working directory
directories = [d for d in os.listdir(cwd) if os.path.isdir(os.path.join(cwd, d)) and not d.startswith('.') and '__' not in d]

# Create a dropdown menu for directory selection
selected_dir = st.selectbox("Select a directory:", directories, index=0)

# Construct the full path of the selected directory
directory = os.path.join(cwd, selected_dir)

if st.button("Process Documents"):
# Check if the selected directory has PDF files
res = has_pdf_files(directory)
if not res:
st.error("No PDF files found in directory! Only PDF files and text extraction are supported for now.")
st.stop()
documents, results = process_documents(directory, llm)
print(documents)
st.write(documents)
search_handler = SearchHandler("hybrid_demo3", use_bge_m3=True, use_reranker=True)
search_handler.insert_data(documents)
st.write(f"Processing complete. Total triples extracted: {len(results)}")

with st.spinner("Saving triples to CSV files with Pandas..."):
# write the resulting entities to a CSV, relations to a CSV and all triples with IDs to a CSV
save_triples_to_csvs(results)

with st.spinner("Loading the CSVs into dataframes..."):
# Load the triples from the CSV file
triples_df = pd.read_csv("triples.csv")
# Load the entities and relations DataFrames
entities_df = pd.read_csv("entities.csv")
relations_df = pd.read_csv("relations.csv")

# with st.spinner("Creating the knowledge graph from these triples..."):
# Create a mapping from IDs to entity names and relation names
entity_name_map = entities_df.set_index("entity_id")["entity_name"].to_dict()
relation_name_map = relations_df.set_index("relation_id")["relation_name"].to_dict()

# Create the graph from the triples DataFrame
G = nx.from_pandas_edgelist(
triples_df,
source="entity_id_1",
target="entity_id_2",
edge_attr="relation_id",
create_using=nx.DiGraph,
)

with st.spinner("Relabeling node integers to strings for future retrieval..."):
# Relabel the nodes with the actual entity names
G = nx.relabel_nodes(G, entity_name_map)

# Relabel the edges with the actual relation names
edge_attributes = nx.get_edge_attributes(G, "relation_id")

# Update the edges with the new relation names
new_edge_attributes = {
(u, v): relation_name_map[edge_attributes[(u, v)]]
for u, v in G.edges()
if edge_attributes[(u, v)] in relation_name_map
}

nx.set_edge_attributes(G, new_edge_attributes, "relation")

with st.spinner("Saving the graph to a GraphML file for further visualization and retrieval..."):
try:
nx.write_graphml(G, "knowledge_graph.graphml")

# Verify by reading it back
G_loaded = nx.read_graphml("knowledge_graph.graphml")
if nx.is_directed(G_loaded):
st.success("GraphML file is valid and successfully loaded.")
else:
st.error("GraphML file is invalid.")
except Exception as e:
st.error(f"Error saving or loading GraphML file: {e}")
return

st.success("Done!")

if __name__ == "__main__":
app()
Loading