# BYOKG RAG using Neptune Analytics with Cypher Queries
This notebook demonstrates a RAG (Retrieval Augmented Generation) system built on top of a Knowledge Graph. In this example, we demonstrate how the BYOKG framework operates on a Neptune Analytics graph by generating executable cypher queries. The overall system allows querying a knowledge graph using natural language questions and retrieving relevant information to generate answers.

1. **Graph Store**: Neptune Analytics endpoint for the graph structure and for storing embeddings based on the graph
2. **Cypher Linker**: Links natural language questions via openCypher queries
3. **Query Engine**: Orchestrates all components to answer questions

#### Setup
If you haven't already, install the toolkit and dependencies in [README.md](../../byokg-rag/README.md).
Let's validate if the package is correctly installed.

In [None]:
# !pip install https://github.com/awslabs/graphrag-toolkit/archive/refs/tags/v3.8.1.zip#subdirectory=byokg-rag

In [None]:
from graphrag_toolkit.byokg_rag.graphstore import NeptuneAnalyticsGraphStore

### Graph Store
The `NeptuneAnalyticsGraphStore` class provides an interface to work with the Neptune Analytics graph.
If you already have a NeptuneAnalyticsGraphEndpoint you want to use, simply change the cell below to assign `graph_identifier` to your NeptuneAnalytics graph id. 

If you don't already have a Neptune Graph then you can create one by running the command below from an environment that has the AWS CLI configured with appropriate permissions. Please refer to documentation for more details about [creating a graph](https://docs.aws.amazon.com/neptune-analytics/latest/userguide/create-graph-using-console.html) and [loading data into the graph](https://docs.aws.amazon.com/neptune-analytics/latest/userguide/batch-load.html).

```
aws neptune-graph create-graph --graph-name 'test-kg-with-embedding' --provisioned-memory 128 --public-connectivity --replica-count 0 --vector-search-configuration '{"dimension": 1024}'
```

After running the command you should receive a response that includes the graph id. Change the cell below to assign  `graph_identifier` to the id.

To run the rest of the notebook, you'll need to ensure that the environment has the right IAM permissions to interact with your neptune analytics graph endpoint. Specifically you will need `neptune-graph:ReadDataViaQuery` and `neptune-graph:GetGraph`. You will also need s3 IAM read permissions so that `graphstore.read_from_csv` can access data from `s3://aws-neptune-customer-samples-*/*` and optionally, s3 IAM read and write permissions to your s3 bucket so that embeddings can be saved and loaded from your desired s3 location.

In the rest of the notebook, we
1. Initialize the BYOKG graph store to use a Neptune Analytics Graph
2. Optionally, load an example data from a CSV file for a new graph and get basic statistics
3. Demonstrate using `CypherKGLinker` how we can link and retrieve from the graph with openCypher queries
4. Finally, combine all the steps using the NeptuneAnalyticsGraphStore and `CypherBYOKGQueryEngine` into a RAG pipeline and answer a sample question

In [None]:
region = "us-west-2" #replace with aws region
graph_identifier = "g-yrii2u1wf0" # replace with graph id 

In [None]:
graph_store = NeptuneAnalyticsGraphStore(graph_identifier=graph_identifier,
                                         region=region)

#### Loading Data

If you ran the command to create a new graph, then uncomment the code cell below to load the new graph with some data. The data we are loading is a KG with information about AWS blog posts on Neptune and Neptune Analytics.

In [None]:
#graph_store.read_from_csv(s3_path=f"s3://aws-neptune-customer-samples-{region}/sample-datasets/gremlin/KG/")

In [None]:
# Print graph statistics
number_of_nodes = len(graph_store.nodes())
number_of_edges = len(graph_store.edges())
print(f"The graph has {number_of_nodes} nodes and {number_of_edges} edges.")

In [None]:
# Print graph schema
import json

schema = graph_store.get_schema()
print(json.dumps(schema, indent=4))

### Cypher KG Linker
The `CypherKGLinker` uses an LLM (Claude 3.5 Sonnet) to:
1. Generate opencypher queries for linking question entities to KG nodes
2. Generate opencypher queries for retrieving KG answers
3. Generate initial responses based on its knowledge

In [None]:
from graphrag_toolkit.byokg_rag.graph_connectors import CypherKGLinker
from graphrag_toolkit.byokg_rag.llm import BedrockGenerator

# Initialize llm
llm_generator = BedrockGenerator(
                model_name='us.anthropic.claude-3-5-sonnet-20240620-v1:0',
                region_name='us-west-2')

cypher_linker = CypherKGLinker(graph_store=graph_store, llm_generator=llm_generator)


In [None]:
question = "What are some blogs authored by Dave Bechberger?"
response = cypher_linker.generate_response(
                question=question,
                schema=schema,
                graph_context="Not provided. Use the above schema to understand the graph."
            )
response

In [None]:
artifacts = cypher_linker.parse_response(response)
artifacts

### BYOKG RAG Pipeline for QA with Neptune Analytics and OpenCypher Execution

Now let's use `ByoKGQueryEngine` to create generate opencypher queries and execute them over the graphstore to retrieve context. To get more details about the different graph retrievers in `ByoKGQueryEngine` base class, see the `byokg_rag_neptune_analytics_demo.ipynb` notebook

In [None]:
# create query engine
from graphrag_toolkit.byokg_rag.byokg_query_engine import ByoKGQueryEngine

byokg_query_engine = ByoKGQueryEngine(
    graph_store=graph_store,
    cypher_kg_linker=cypher_linker,
    kg_linker=None, # deactivate multi-strategy retrieval
)

In [None]:
# set a question to test queries
question = "What are some blogs authored by Dave Bechberger?"

cypher_iterations = 3 # set 3 iterations for iterative improvements

# We use `cypher_query` functionality to generate and exectute cypher queries
retrieved_context = byokg_query_engine.query(question, cypher_iterations=cypher_iterations)
answers, response = byokg_query_engine.generate_response(question, "\n".join(retrieved_context))

print(retrieved_context)
print(answers)
print(response)

### Full BYOKG RAG Pipeline for QA with Neptune Analytics 


In [None]:
# create query engine
from graphrag_toolkit.byokg_rag.byokg_query_engine import ByoKGQueryEngine
# add KGLinker to the pipeline with CypherKGLinker
from graphrag_toolkit.byokg_rag.graph_connectors import KGLinker
kg_linker = KGLinker(graph_store=graph_store, llm_generator=llm_generator)

byokg_query_engine = ByoKGQueryEngine(
    graph_store=graph_store,
    cypher_kg_linker=cypher_linker,
    kg_linker=kg_linker, # activate multi-strategy retrieval
)

In [None]:
# set a question to test queries
question = "What are some blogs authored by Dave Bechberger?"

iterations = 2
cypher_iterations = 2 

# We use `cypher_query` functionality to generate and exectute cypher queries
retrieved_context = byokg_query_engine.query(question, iterations=iterations, cypher_iterations=cypher_iterations)
answers, response = byokg_query_engine.generate_response(question, "\n".join(retrieved_context))

print(retrieved_context)
print(answers)
print(response)