## Query Neptune Graph with SPARQL in Natural Language Using Function Calling

This notebook will walk you through the steps to create a tool that can query Amazon Neptune database via SPARQL using function calling in Bedrock Converse API. The tool will be invoked when a user wants to get object litereal information from Amazon Neptune database for a specific object id. LLM(Large Language Model) will pass the required input parameter from the user's question the function *get_objectLiteral* and the function will execute the SPARQL query to fetch the results. At the end LLM(Large Language Model) will give the final response to the user including the query results.



### Please Compete the Prerequisites before you start!






## Pre-requisites
**1- First we need to create our Neptune Database Cluster. Use *US-EAST-1* as the region.You can create cluster and all the required additional service configurations via**

* Cloud formation template from this link [link](https://docs.aws.amazon.com/neptune/latest/userguide/get-started-cfn-create.html). Before you use AWS Cloudformation template make sure that you have the permissions described in [link](https://docs.aws.amazon.com/neptune/latest/userguide/get-started-prereqs.html)

* After you deploy the CloudFormation Template, it will create

    - The necessary IAM role for the Neptune Cluster
    
    - A new VPC,security group, subnets, route table, S3 Gateway Endpoint
    
    - Serverless Neptune Database


**2- Creating the IAM role for the Sagemaker Notebook**

* Go to Identity and Access Management(IAM)in AWS Console 

* Go to Neptune -> Clusters-> Your cluster being created via AWS CloudFormation 

* Click Role -> Create Role -> AWS Account -> and select the policies below to attach the Role

  - *AmazonBedrockFullAccess*
  
  - *AWSCloudFormationFullAccess*
  
  - *NeptuneFullAccess*
  
  - *AmazonS3FullAccess*
  
* Click Trust Relationships and attach the json below into the relationships field


```
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "NeptuneSagemakerNotebookAccess",
            "Effect": "Allow",
            "Principal": {
                "Service": "sagemaker.amazonaws.com"
            },
            "Action": "sts:AssumeRole"
        }
    ]
}

```


* Create the role and save the role arn since you will use it in the next step!


**2- Creating the Sagemaker Notebook**

* Go to Amazon Sagemaker in AWS Console

* Click Notebooks-> Create Notebook instance

  - Name the Notebook Instance
  
  - Choose *ml.t3.medium* as the instance type
  
  - Under **Permission and Encryption**  choose *Enter a custom role ARN* and paste the arn of the role that you have created in the previous step.
  
  - Under **Network** choose
      - **VPC** -> 'neptune-test'
      - **Subnet** -> Choose the subnet where your Neptune Cluster resides (You can check it from the Neptune console by clicking your Amazon Neptune Database writer instance under your Cluster -> Connectivity and Security -> Availability Zone
      - **Security Group** -> Choose security group starting with 'Neptune'
   - Click -> Create Notebook Instance
   - Upload this notebook to jupyter lab that you can reach out when you click your notebook instance and pick the kernel as **conda_python3**


### After you complete all the steps you should have
* Amazon Neptune database 
* S3 VPC endpoint
* IAM role for Neptune DB instance 
* IAM role for the notebook instance
* VPC and network setup
* Sagemaker Notebook



## Running the Notebook

### It is important to use boto3 version equal or greater than 1.34.139!! Please make sure you execute the cell below.


In [None]:
#install the lates version of the boto3 library
!pip3 install boto3==1.34.139

In [None]:
import requests
import boto3
import json, sys
from datetime import datetime

print('Running boto3 version:', boto3.__version__)

### Configuring Amazon Bedrock for Model Access



In [None]:
modelId = 'anthropic.claude-3-sonnet-20240229-v1:0'
#modelId = 'anthropic.claude-3-haiku-20240307-v1:0'
#modelId = 'cohere.command-r-plus-v1:0'
#modelId = 'cohere.command-r-v1:0'
#modelId = 'mistral.mistral-large-2402-v1:0'
print(f'Using modelId: {modelId}')

region = 'us-east-1'
print('Using region: ', region)

bedrock = boto3.client(
    service_name = 'bedrock-runtime',
    region_name = region,
    )

### Download the Human Disease Ontology Data

In the cell below 

1- We will create an S3 bucket and download the Human disease ontology data into that bucket 

2- Replace bucket_name with your own Amazon S3 bucket name

#### The dataset is publicly available under https://creativecommons.org/publicdomain/zero/1.0/ 

In [None]:
# Create an S3 client
s3 = boto3.client('s3')

# Define the bucket name
bucket_name = 'BUCKET_NAME'

# Create the S3 bucket
response = s3.create_bucket(Bucket=bucket_name)

if response['ResponseMetadata']['HTTPStatusCode'] ==200:
    print(f'Bucket {bucket_name} created successfully!')
else:
    print(f"Failed to create the bucket Error: {response.status_code}")


### Download the dataset into the S3 bucket

In [None]:
# URL of the OWL file to download
owl_file_url = "http://purl.obolibrary.org/obo/doid.owl"

# Download the OWL file
response = requests.get(owl_file_url)

# Check if the download was successful
if response.status_code == 200:
    # Create an S3 client
    s3_client = boto3.client('s3')

    # S3 object key (filename in the bucket)
    object_key = 'data/doid.owl'

    # Upload the file to S3
    s3_client.put_object(Body=response.content, Bucket=bucket_name, Key=object_key)
    
    s3_uri=f"s3://{bucket_name}/{object_key}"
    
    print(f"File '{object_key}' uploaded to S3 bucket '{bucket_name}' successfully!")
else:
    print(f"Failed to download the file. Error: {response.status_code}")

### Now lets fetch our enpoints and IAM role ARN  from Neptune database. 
**Insert your *'DBClusterIdentifier'* name inside the parameters. You can find it from the neptune console.**

In [None]:
# Create a Neptune client
neptune_client = boto3.client('neptune')

# Define the parameters for the describe_db_clusters() operation
params = {
    'DBClusterIdentifier': 'YOUR_DATABASE_CLUSTER_NAME'
}

# Call the describe_db_clusters() operation
response = neptune_client.describe_db_clusters(**params)

# Extract the reader and writer endpoints from the response
db_clusters = response['DBClusters']

if db_clusters:
    cluster = db_clusters[0]
    reader_endpoint = cluster['ReaderEndpoint']
    writer_endpoint = cluster['Endpoint']
    arn_of_neptune_cluster_IAM_role=cluster['AssociatedRoles'][0]['RoleArn']
    db_writer_instance_identifier=cluster['DBClusterMembers'][0]['DBInstanceIdentifier']

    print(f'Neptune database reader endpoint: {reader_endpoint}')
    print(f'Neptune database writer endpoint: {writer_endpoint}')
    print(f'IAM Role ARN of the Neptune cluster: {arn_of_neptune_cluster_IAM_role}')
    print(f'Neptune database writer instance identifier: {db_writer_instance_identifier}')
else:
    print('No Neptune database clusters found.')
    
 
# Define the parameters for the describe_db_instances() operation
params2 = {
    'DBInstanceIdentifier': db_writer_instance_identifier
}


response_instance_information = neptune_client.describe_db_instances(**params2)   

db_writer_instance_endpoint=response_instance_information['DBInstances'][0]['Endpoint']['Address']
print('endpoint:',db_writer_instance_endpoint)

### Insert the writer endpoint of your database instance to check the connectivity, if it is not healthy please check the prerequisites again ! You can find the database endpoints from Amazon Neptune console



In [None]:
#checking database connectivity
!curl https://{writer_endpoint}:8182/status


### Replace the parameters in this command with your own parameters for s3_uri and IAM role of the neptune cluster that allows S3 access

curl -X POST \

    -H 'Content-Type: application/json' \
    
    https://your-neptune-endpoint:port/loader -d '
    {
    
      "source" : "s3://bucket-name/object-key-name",
      
      "format" : "format",
      
      "iamRoleArn" : "arn:aws:iam::account-id:role/role-name",
      
      "region" : "region",
      
      "failOnError" : "FALSE",
      
      "parallelism" : "MEDIUM",
      
      "updateSingleCardinalityProperties" : "FALSE",
      
      "queueRequest" : "TRUE",
      
      "dependencies" : ["load_A_id", "load_B_id"]
      
    }'

In [None]:
print('s3_uri:',s3_uri)
print('iamRoleArn:',arn_of_neptune_cluster_IAM_role)

### Copy the s3_uri and iamRoleARN above and paste in to the souce and iamRoleArn fields below

In [None]:
#Load the data from S3 to writer endpoint of your database instance
!curl -X POST -H 'Content-Type: application/json' https://{db_writer_instance_endpoint}:8182/loader -d '{{"source" :"s3_uri","format" : "rdfxml","iamRoleArn" : "IAM_ROLE","region" : "us-east-1","failOnError" : "FALSE","parallelism" : "MEDIUM","updateSingleCardinalityProperties" : "FALSE","queueRequest" : "TRUE"}}'


In [None]:
#get neptune query url
neptune_port='8182'
url=f"https://{reader_endpoint}:{neptune_port}/sparql"
print('neptune_query_url: ',url)

### Check the example query below before we define our tool

In [None]:
# Define the test SPARQL query
query = """
    PREFIX : <http://purl.obolibrary.org/obo/>
    SELECT ?objectLiteral
    WHERE {
        :DOID_9884 ?p ?objectLiteral .
        FILTER(isLiteral(?objectLiteral))
        FILTER(STRSTARTS(STR(?p), 'http://purl.obolibrary.org/obo/'))
    }
"""

# Send the SPARQL query to the Neptune endpoint via POST request
response = requests.post(url, data={"query":query})

if response.status_code == 200:
    print(response.json())
else:
    print(f"Error: {response.status_code} - {response.text}")

### Defining our ToolsList with *get_objectLiteral* function which will execute the SPARQL query for the Amazon Neptune database

In [None]:
class ToolsList:
    #Define our get_objectLiteral tool function...
    def get_objectLiteral(self, id, predicate):
        print(id, predicate)
        query_url = url
        query = f"""
            PREFIX : <{predicate}> 
            SELECT ?objectLiteral 
            WHERE {{:{id} ?p ?objectLiteral . 
            FILTER(isLiteral(?objectLiteral)) 
            FILTER(STRSTARTS(STR(?p), '{predicate}'))
            }}"""
        print(query)
        response = requests.post(
            f"{query_url}",
            data={"query": query},
        )
        result = f'Value in {id}, {predicate} is ' + response.text
        print(f'Tool result: {result}')
        return result

In [None]:
#Define the configuration for our tool...
toolConfig = {'tools': [],
'toolChoice': {
    'auto': {},
    #'any': {},
    #'tool': {
    #    'name': 'get_weather'
    #}
    }
}

toolConfig['tools'].append({
        'toolSpec': {
            'name': 'get_objectLiteral',
            'description': 'Get objectLiteral value of a given ID.',
            'inputSchema': {
                'json': {
                    'type': 'object',
                    'properties': {
                        'id': {
                            'type': 'string',
                            'description': 'ID of the object'
                        },
                        'predicate': {
                            'type': 'string',
                            'description': 'predicate'
                        }
                    },
                    'required': ['id', 'predicate']
                }
            }
        }
    })

In [None]:
#Function for caling the Bedrock Converse API...
def converse_with_tools(messages, system='', toolConfig=toolConfig):
    response = bedrock.converse(
        modelId=modelId,
        system=system,
        messages=messages,
        toolConfig=toolConfig
    )
    return response

In [None]:
#Function for orchestrating the conversation flow...
def converse(prompt, system=''):
    #Add the initial prompt:
    messages = []
    messages.append(
        {
            "role": "user",
            "content": [
                {
                    "text": prompt
                }
            ]
        }
    )
    print(f"\n{datetime.now().strftime('%H:%M:%S')} - Initial prompt:\n{json.dumps(messages, indent=2)}")

    #Invoke the model the first time:
    output = converse_with_tools(messages, system)
    print(f"\n{datetime.now().strftime('%H:%M:%S')} - Output so far:\n{json.dumps(output['output'], indent=2, ensure_ascii=False)}")

    #Add the intermediate output to the prompt:
    messages.append(output['output']['message'])

    function_calling = next((c['toolUse'] for c in output['output']['message']['content'] if 'toolUse' in c), None)

    #Check if function calling is triggered:
    if function_calling:
        #Get the tool name and arguments:
        tool_name = function_calling['name']
        tool_args = function_calling['input'] or {}
        
        #Run the tool:
        print(f"\n{datetime.now().strftime('%H:%M:%S')} - Running ({tool_name}) tool...")
        tool_response = getattr(ToolsList(), tool_name)(**tool_args) or ""
        if tool_response:
            tool_status = 'success'
        else:
            tool_status = 'error'

        #Add the tool result to the prompt:
        messages.append(
            {
                "role": "user",
                "content": [
                    {
                        'toolResult': {
                            'toolUseId':function_calling['toolUseId'],
                            'content': [
                                {
                                    "text": tool_response
                                }
                            ],
                            'status': tool_status
                        }
                    }
                ]
            }
        )
        #print(f"\n{datetime.now().strftime('%H:%M:%S')} - Messages so far:\n{json.dumps(messages, indent=2)}")

        #Invoke the model one more time:
        output = converse_with_tools(messages, system)
        print(f"\n{datetime.now().strftime('%H:%M:%S')} - Final output:\n{json.dumps(output['output'], indent=2, ensure_ascii=False)}\n")
    return


In [None]:
prompts = [
    "What is the object literal for id as 'DOID_9884' and predicate as 'http://purl.obolibrary.org/obo/' ",
    ]

for prompt in prompts:
    converse(
        system = [{"text": "You're provided with a few tools; \
            only use the tool if required. Don't make reference to the tools in your final answer."}],
        prompt = prompt
)



## Conclusion


In this notebook we learned how to create a tool that can query Amazon Neptune database via SPARQL using function calling in Bedrock Converse API.

Our tool help you to get object literal value from Amazon Neptune database for a specific object id using natural language. User asks the object literal with a given id and LLM(Large Language Model) in this case 'Claude 3 Sonnet' passes the required input parameter from the user's input to the function get_objectLiteral and the function executes the SPARQL query to fetch the results. 

In our example, if you have a successfull execution you should be able to see the output text as:

*"The object literal value for DOID_9884 with the predicate http://purl.obolibrary.org/obo/ is "A myopathy is characterized by progressive skeletal muscle weakness degeneration.""*

#### !!! This notebook has full permissions for S3 and Neptune Database for test purposes. In case you want to use this code example in prod make sure you restrict permission for your bucket and neptune database by specifying your bucket name and database id within IAM roles


## Clean Up


Lets Clean Up the resources being created. You can delete the Neptune Cluster and the VPC & Network resources by running the cell below.

- For the 'stack_name' insert the name of the CloudFormation Stack that was being created at the beginning of this lab. You can find it through CloudFormation -> Stacks

In [None]:
# Create a CloudFormation client
cloudformation = boto3.client('cloudformation')

stack_name = 'CLOUD_FORMATION_STACK_NAME'

try:
    cloudformation.delete_stack(StackName=stack_name)
    print(f'Stack {stack_name} deletion initiated.')
except Exception as e:
    print(f'Error deleting stack {stack_name}: {str(e)}')

**(Optional) You can delete your S3 bucket. Go to S3 -> Choose your bucket -> Delete**