# Natural Language Querying using Amazon Neptune and LlamaIndex

## Introduction

In this notebook we are going to demonstrate how you can leverage [LlamaIndex](https://www.llamaindex.ai/), and specifically the [Property Graph Index](https://docs.llamaindex.ai/en/stable/examples/property_graph/property_graph_basic/) feature to perform natural language querying with Amazon Neptune.  Let's start by taking a look at the two main components here: LlamaIndex and Natural Language Querying

LlamaIndex is a data structure and tooling designed to create and interact with large language model indexes. It facilitates the storage, searching, and querying of textual data using advanced vector database techniques in conjunction with large language models like GPT. This enables efficient and effective retrieval of relevant information from extensive text corpora.

Natural language querying is the ability to interact with computer systems using human language, rather than structured query languages or complex programming commands. It allows users to ask questions or provide instructions in their native language, and the system processes this input to understand the intent and provide relevant information or perform the requested action.  In the example here we will be using LlamaIndex to translate a natural language question into a structured graph query, specifically openCypher, which is then be executed on data in your Amazon Neptune database.

### Prerequisites

For this notebook we will be using Amazon Neptune Database as our data store so you must have a Neptune Database configured.  The methodology presented here will also work with Neptune Analytics and we will call out where the code differs.  This notebook will also require permissions to run Amazon Bedrock models, specifically `Claude v3 Sonnet` and `Titan Embedding v1`.

### Installing our dependencies

Run the next cell to install the core LlamaIndex packages as well as the specific package for Amazon Neptune and Amazon Bedrock

In [1]:
%pip install -q llama-index llama-index-graph-stores-neptune llama-index-llms-bedrock llama-index-embeddings-bedrock 

Note: you may need to restart the kernel to use updated packages.


### Loading your Data

The data we will use in this notebook is based on the data in *[Graph Databases in Action](https://www.manning.com/books/graph-databases-in-action?a_aid=bechberger)* by Manning Publications. The book uses the most common graph access patterns to build a fictitious application, DiningByFriends, that uses friends and ratings to provide personalized restaurant recommendations.

In the following notebook, we demonstrate how to use LlamaIndex and Amazon Neptune to build the queries for the DiningByFriends app using Natural Language questions instead of explicitly writing the queries.  


In [2]:
%seed --model property_graph --language gremlin --dataset dining_by_friends --run

Dropdown(description='Source type:', options=('', 'samples', 'custom'), style=DescriptionStyle(description_wid…

Dropdown(description='Data model:', layout=Layout(display='none', visibility='hidden'), options=('', 'property…

Dropdown(description='Language:', layout=Layout(display='none', visibility='hidden'), options=('', 'opencypher…

Dropdown(description='Language:', layout=Layout(display='none', visibility='hidden'), options=('', 'opencypher…

Dropdown(description='Data set:', layout=Layout(display='none', visibility='hidden'), options=(), style=Descri…

Dropdown(description='Full File Query:', index=1, layout=Layout(display='none', visibility='hidden'), options=…

Dropdown(description='Location:', layout=Layout(display='none', visibility='hidden'), options=('Local', 'S3'),…

FileChooser(path='/home/ec2-user/SageMaker', filename='', title='', show_hidden=False, select_desc='Select', c…

HBox(children=(Text(value='', description='Source:', placeholder='path/to/seedfiles/directory', style=Descript…

Button(description='Submit', layout=Layout(visibility='hidden'), style=ButtonStyle())

Output()

Output()

## Setting up our LLM

In [3]:
from llama_index.llms.bedrock import Bedrock
from llama_index.embeddings.bedrock import BedrockEmbedding


# Setup LlamaIndex to use Claude V3 Sonnet for the LLM
llm = Bedrock(model="anthropic.claude-3-sonnet-20240229-v1:0")

# Create the embedding model, this is required by the Property Graph
embed_model = BedrockEmbedding(model="amazon.titan-embed-text-v1")

In [4]:
from llama_index.core import Settings

Settings.llm = llm
Settings.embed_model = embed_model

## Setting up our GraphStore

In [5]:
from llama_index.graph_stores.neptune import NeptuneDatabasePropertyGraphStore
import graph_notebook as gn

# Retrieve the configuration of the notebook to get the current host
config = gn.configuration.get_config.get_config()

graph_store = NeptuneDatabasePropertyGraphStore(host=config.host)

## Setting up our Index

In [6]:
from llama_index.core import PropertyGraphIndex
index = PropertyGraphIndex.from_existing(
    property_graph_store=graph_store,
    embed_model = embed_model,
    llm=llm
)

## Setting up our Retriever

In [7]:
from llama_index.core.indices.property_graph import TextToCypherRetriever

retriever = TextToCypherRetriever(index.property_graph_store)

## Querying our graph

In [8]:
nodes = retriever.retrieve("Who are Dave's Friends?")

for node in nodes:
    print(node.text)



Generated Cypher query:
MATCH (p:person {first_name: 'Dave'})-[:friends]->(f:person)
RETURN f.first_name, f.last_name

Cypher Response:
[{'f.first_name': 'Kelly', 'f.last_name': 'Gorman'}, {'f.first_name': 'Jim', 'f.last_name': 'Miller'}, {'f.first_name': 'Josh', 'f.last_name': 'Perry'}, {'f.first_name': 'Hank', 'f.last_name': 'Erin'}]


In [11]:
nodes = retriever.retrieve("Are Dave and Denise connected?")

for node in nodes:
    print(node.text)

Generated Cypher query:
MATCH (p1:person {first_name: 'Dave'})-[:friends*..6]-(p2:person {first_name: 'Denise'})
RETURN p1, p2

Cypher Response:
[{'p1': {'~id': '10', '~entityType': 'node', '~labels': ['person'], '~properties': {'person_id': 1, 'last_name': 'Bech', 'first_name': 'Dave'}}, 'p2': {'~id': '45', '~entityType': 'node', '~labels': ['person'], '~properties': {'person_id': 8, 'last_name': 'Mande', 'first_name': 'Denise'}}}, {'p1': {'~id': '10', '~entityType': 'node', '~labels': ['person'], '~properties': {'person_id': 1, 'last_name': 'Bech', 'first_name': 'Dave'}}, 'p2': {'~id': '45', '~entityType': 'node', '~labels': ['person'], '~properties': {'person_id': 8, 'last_name': 'Mande', 'first_name': 'Denise'}}}, {'p1': {'~id': '10', '~entityType': 'node', '~labels': ['person'], '~properties': {'person_id': 1, 'last_name': 'Bech', 'first_name': 'Dave'}}, 'p2': {'~id': '45', '~entityType': 'node', '~labels': ['person'], '~properties': {'person_id': 8, 'last_name': 'Mande', 'first_n

In [12]:
nodes = retriever.retrieve("Are Dave and Denise connected? Return the path LIMIT 1 ")

for node in nodes:
    print(node.text)

Generated Cypher query:
MATCH (p1:person {first_name: 'Dave'})-[:friends*..]->(p2:person {first_name: 'Denise'})
RETURN p1, p2
LIMIT 1

Cypher Response:
[{'p1': {'~id': '10', '~entityType': 'node', '~labels': ['person'], '~properties': {'person_id': 1, 'last_name': 'Bech', 'first_name': 'Dave'}}, 'p2': {'~id': '45', '~entityType': 'node', '~labels': ['person'], '~properties': {'person_id': 8, 'last_name': 'Mande', 'first_name': 'Denise'}}}]


In [13]:
nodes = retriever.retrieve("What restaurants near Dave with a diner or bar cuisine is the highest rated? Return the name and rating ordered by name")

for node in nodes:
    print(node.text)

Generated Cypher query:
MATCH (p:person {first_name: 'Dave'})-[:lives]->(c:city)<-[:within]-(r:restaurant)-[:serves]->(cuisine:cuisine)
WHERE cuisine.name IN ['diner', 'bar']
MATCH (r)<-[a:about]-(rev:review)
WITH r, max(rev.rating) AS max_rating
MATCH (r)<-[a:about]-(rev:review)
WHERE rev.rating = max_rating
RETURN r.name, rev.rating
ORDER BY r.name

Cypher Response:
[{'r.name': 'All Night Long', 'rev.rating': 4}, {'r.name': 'Northern Quench', 'rev.rating': 4}, {'r.name': 'Northern Quench', 'rev.rating': 4}, {'r.name': 'Satiated', 'rev.rating': 4}, {'r.name': 'Satiated', 'rev.rating': 4}, {'r.name': 'Without Chaser', 'rev.rating': 4}]
