Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved. SPDX-License-Identifier: MIT-0

# Getting Started Guide to using Strands Agents and Neptune MCP

In this notebook, we're going to demonstrate how to use Amazon GenAI tools alongside Amazon's graph database, Amazon Neptune, in order to build a knowledge graph containing data relating to Amazon Neptune, and entities extracting directly from a conversation.

In this demo we will be using the following tools and services:

- [Amazon Bedrock](https://aws.amazon.com/bedrock)
- [Amazon Neptune Analytics](https://aws.amazon.com/neptune)
- [Strands Agent SDK](https://strandsagents.com/latest/)
- [Neptune MCP Server](https://github.com/awslabs/mcp/tree/main/src/amazon-neptune-mcp-server)

## Setting up the Neptune MCP Server
In order to use the [Neptune MCP server](https://github.com/awslabs/mcp/tree/main/src/amazon-neptune-mcp-server), we must first download it from the AWS Labs GitHub repository and build it using Docker. We can do this using the following process. 

Navigate to the [folder view](../../tree) open a new terminal window and execute each of the following commands in turn:

```
    cd SageMaker/
    git clone https://github.com/awslabs/mcp.git
    cd mcp/
    cd src/
    cd amazon-neptune-mcp-server/
    docker build -t awslabs/amazon-neptune-mcp-server .
    docker images
    docker run awslabs/amazon-neptune-mcp-server .
```

## Environment Setup

Now that we have the Neptune MCP Server running, we need to install the correct Python packages to run it, and to use the [Strands SDK](https://strandsagents.com/0.1.x/user-guide/concepts/tools/mcp-tools/) library. Running the following command will install these packages locally.

In [None]:
!pip install strands-agents strands-agents-tools uv -q

In [None]:
!uv python install 3.10

To extract and format information from the Neptune public documentation, we also need to install the Beautiful Soup library. We'll be using this later on.

In [None]:
!pip install beautifulsoup4 -q

In [None]:
import os

from mcp import stdio_client, StdioServerParameters
from strands import Agent
from strands.tools.mcp import MCPClient
from strands.models import BedrockModel

model_id="us.anthropic.claude-3-7-sonnet-20250219-v1:0"

MEMORY_PROMPT = """
                For all identified entities and connections, save them to the Amazon Neptune graph.
                Do not create duplicate entities. If you identify an entity already exists, use that instead.
            """

QUERY_PROMPT = """
                You are an agent that interacts with an Amazon Neptune database to run graph queries. 
                Whenever you write queries you should first fetch the schema to ensure that you understand 
                the correct labels and property names as well as the appropriate casing of those names and values.
            """

USE_NEPTUNE_ANALYTICS = False

In [None]:
#GRAPH_ID is the ID of your Neptune Analytics graph, and will be in the format "g-abc123". This is not used if you're working with Neptune Database.
GRAPH_ID = "<UPDATE WITH THE NEPTUNE GRAPH ID>"
#GRAPH_ENDPOINT is the primary endpoint of your Neptune Analytics graph or Neptune Database cluster
GRAPH_ENDPOINT = "<UPDATE WITH THE NEPTUNE CLUSTER/GRAPH ENDPOINT>"

In [None]:
def get_graph_endpoint(is_neptune_analytics):
    #check if we're connecting to Neptune Analytics or Neptune Database
    graph_endpoint = ""
    if USE_NEPTUNE_ANALYTICS:
        graph_endpoint = f"neptune-graph://{GRAPH_ID}" #for Neptune Analytics
    else:
        graph_endpoint = f"neptune-db://{GRAPH_ENDPOINT}" #for Neptune Database

    print(f"Using '{graph_endpoint}' as the graph endpoint.")
    return graph_endpoint

# creates and returns a new instance of the MCPClient
def create_mcp_client():
    endpoint = get_graph_endpoint(USE_NEPTUNE_ANALYTICS)

    return MCPClient(lambda: stdio_client(StdioServerParameters(
        command="uvx", 
        args=["awslabs.amazon-neptune-mcp-server@latest"],
        env={"NEPTUNE_ENDPOINT": endpoint}
    )))

# executes a read query on the graph
def run_agent_read_query(question, return_response=False):
    memory_mcp_client = create_mcp_client()

    with memory_mcp_client:
        tools = memory_mcp_client.list_tools_sync()
        agent = Agent(tools=tools, 
                      model=BedrockModel(model_id=model_id),
                      system_prompt=QUERY_PROMPT
                )
        r = agent(question)
        if return_response:
            return r

def run_agent_clean_up_query():
    memory_mcp_client = create_mcp_client()
    
    with memory_mcp_client:
        tools = memory_mcp_client.list_tools_sync()
        agent = Agent(tools=tools, 
                      model=BedrockModel(model_id=model_id),
                      system_prompt=MEMORY_PROMPT
                )
        
        question = f"""
            Retrieve information about all the nodes in the graph. Where you find duplicate nodes with the same or similar 
            name, description, label and properties choose one of the duplicates to be the primary node. 
            Copy all the relationships between all the other matching duplicate nodes and object nodes, and recreate 
            them connecting to the primary node. 
            Only do this where the same relationship between the primary and object node doesn't already exist. 
            Once all the relationships have been recreated where necessary, add the "Duplicate" label to all the other 
            similar duplicates, excluding the selected primary node.
        """
        
        agent(question)
    
# executes a mutation/write query on the graph
def run_agent_write_query(query):
    memory_mcp_client = create_mcp_client()

    with memory_mcp_client:
        tools = memory_mcp_client.list_tools_sync()
        agent = Agent(tools=tools, 
                      model=BedrockModel(model_id=model_id),
                      system_prompt=MEMORY_PROMPT
                )
        
        question = f"""
            From the following data, create new nodes based on the entities you identify. 
            Only create new entities if required, or use existing ones if they're already in the graph. 
            Capture as much detail as possible to create a highly connected graph.
            
            ##DATA##
            {query}
        """
        
        agent(question)

## Integrating Neptune with GenAI

### Loading data from  files

The following code will read from several web pages in the Neptune public documentation, and send the contents to the Neptune MCP server to be stored into our graph. We'll initialise the graph with the contents to build up some information about the service in our knowledge graph. 

In [None]:
import requests
from bs4 import BeautifulSoup

urls = [
    "https://docs.aws.amazon.com/neptune/latest/userguide/intro.html",
    "https://docs.aws.amazon.com/neptune/latest/userguide/graph-get-started.html",
    "https://docs.aws.amazon.com/neptune/latest/userguide/neptune-setup.html",
    "https://docs.aws.amazon.com/neptune-analytics/latest/userguide/what-is-neptune-analytics.html#neptune-analytics-vs-neptune-database"
]

for u in urls:
    c = requests.get(u)
    html = c.text
    soup = BeautifulSoup(html,'html.parser')

    message = f"""
        Here is information about the Amazon Neptune service.
        {soup.body}
    """

    run_agent_write_query(message)

#### Cleaning up the graph model

Due to the way LLMs process the data, the graph can become populated with entities that represent the same real-life thing. For example, `Amazon Neptune`, `Amazon Neptune Database` and `Neptune Database` all represent the same `Amazon Neptune Database` service entity. 

As a result, we can run a further query to direct the agent to identify these duplicate entities, re-route the connected relationships, and then either perform a hard-delete, e.g. remove the duplicates from the graph, or a soft-delete, by applying a marker to the duplicate entries. In this example, we're performing a soft-delete by applying a `Duplicate` label using the following prompt:

```
    Retrieve information about all the nodes in the graph. Where you find duplicate nodes with the same or similar 
    name, description, label and properties choose one of the duplicates to be the primary node. 
    Copy all the relationships between all the other matching duplicate nodes and object nodes, and recreate 
    them connecting to the primary node. 
    Only do this where the same relationship between the primary and object node doesn't already exist. 
    Once all the relationships have been recreated where necessary, add the "Duplicate" label to all the other 
    similar duplicates, excluding the selected primary node.
```

In [None]:
run_agent_clean_up_query()

### Exploring our graph

As we'll be running graph queries over our data set, let's now ensure the notebook is pointing to the correct graph. Using the `%graph_notebook_host` [line magic](https://docs.aws.amazon.com/neptune/latest/userguide/notebooks-magics.html#notebooks-line-magics-graph-notebook-host) command provides us with the functionality to set the current notebook to a specific graph.

In [None]:
%graph_notebook_host {get_graph_endpoint(USE_NEPTUNE_ANALYTICS)}

Let's now take a look at what's been saved to our knowledge graph. We can do this using openCypher, an open-standard query language that both Neptune Database and Neptune Analytics support. The Strands agent and LLM creates nodes with a label based on their identified type, e.g. Service, ServiceComponent, QueryLanguage, Feature, etc. It also creates properties based on the information extracted by the LLM for each entity.

An example of this is shown below:

- `~labels`: ServiceFeature
- `name`: Cluster Volume
- `description`: Neptune data is stored in the cluster volume, designed for reliability and high availability

In [None]:
%%oc
MATCH p = ()-[*1..3]->()
RETURN p

### Asking natural language-based questions

It's often the case that the job functions who will be interacting with the graph will not know how to write openCypher, or know how the graph is connected. In this case, we can utilise Strands and the Neptune MCP server to perform read queries for a given natural language question.

In [None]:
run_agent_read_query(
    """
        What are the differences between Neptune Database and Neptune Analytics?
    """
)

The above query produces a list of performed actions as well as the final response, which contains a summary of all the source nodes and relationships found in the graph. It should look similar to the following:

```
To answer your question about the differences between Neptune Database and Neptune Analytics, I need to understand what kind of information we have available in the Neptune graph. Let me first check the graph status and schema to determine what information we can retrieve.
Tool #1: get_graph_status

Tool #2: get_graph_schema
Now, let me look for information about Neptune Database and Neptune Analytics in the graph:
Tool #3: run_opencypher_query
Let's get more detailed information about the features of each service:

... // removed for brevity

Based on the database information, I can provide a comparison between Amazon Neptune Database and Amazon Neptune Analytics:

# Differences Between Neptune Database and Neptune Analytics

## Amazon Neptune Database
- **Primary Purpose**: A managed graph database service for storing and querying highly connected data
- **Key Features**:
  - Fully managed graph database service
  - High availability (>99.99% availability)
  - Security features including VPC isolation and encryption at rest
  - Various instance and storage types available
  - Data loading capabilities including bulk loading and streaming
  - Provides connectivity, cluster creation, and monitoring tools

## Amazon Neptune Analytics
- **Primary Purpose**: A memory-optimized graph database engine specifically designed for analytics
- **Key Features**:
  - Memory optimization for storing large graph datasets in memory
  - Graph analytic algorithms library
  - Low-latency graph queries for fast data processing
  - Vector search capabilities within graph traversals
  - Designed to quickly analyze large amounts of graph data in seconds

## Relationship
- The two services **complement** each other, with Neptune Analytics enhancing the capabilities of Neptune Database
- Neptune Database focuses on operational graph database needs (storage, querying, management)
- Neptune Analytics focuses on analytical processing and deriving insights from large graph datasets

## Use Case Differences
- **Neptune Database**: Best for transactional workloads, real-time querying, and storing graph data
- **Neptune Analytics**: Best for analytical workloads, pattern detection, finding trends, and complex graph algorithms over large datasets

In summary, Neptune Database is designed for storing and managing graph data with ACID compliance in a transactional environment, while Neptune Analytics is optimized for performing fast analytical operations on large graph datasets. They work together as complementary services, with analytics extending the capabilities of the base database service.
```

## Conclusion

The combination of Neptune, Bedrock, Strands Agents and MCP provides a consistently powerful mechanism for ingesting data into and reading data from a knowledge graph to identify common patterns, trends and information that would otherwise be difficult to navigate or locate using traditional database and RAG methods.

By using a simple functions to read the contents from a web page, conversation or file, we can send the contents to an agent to perform entity identification, extraction and relationship generation. This process greatly simplifies production of highly connected knowledge graphs. 