## Query Neptune Graph with SPARQL in Natural Language Using Function Calling

This notebook will walk you through the steps to create a tool that can query Amazon Neptune database via SPARQL using function calling in Bedrock Converse API. The tool will be invoked when a user wants to get object litereal information from Amazon Neptune database for a specific object id. LLM(Large Language Model) will pass the required input parameter from the user's question the function *get_objectLiteral* and the fucntion will execute the SPARQL query to fetch the results. At the end LLM(Large Language Model) will give the final response to the user including the query results.
We will use the Human Disease Ontology as a dataset to load to Amazon Neptune https://www.ebi.ac.uk/ols4/ontologies/doid



### Please Compete the Prerequisites before you start!






## Prerequisites
1.Please make sure that you complete all the prerequisites explained in this [link](https://docs.aws.amazon.com/neptune/latest/userguide/bulk-load-data.html)

2.In the security group of Neptune Database, allow TCP connections from the port 8182(default port of the instance) through the subnet CIDR range where the database instance resides.

3.This notebook will run as neptune notebook. During the configuration of database cluster allow the instance to create notebook.

4.After the notebook is being created you can see it in the sagemaker console, under notebooks. The default IAM role of the notebook instance, by default, has list and read permissions for S3 in the policy. Edit the policy and add "s3:PutObject" in the actions and add your S3 bucket ARN in the Resources section of the IAM policy. In order to access Amazon Bedrock, attach *AmazonBedrockFullAccess* policy to the same role.

After you complete all the steps you should have
* Amazon Neptune database instance (Follow [creating an new neptune DB cluster](https://docs.aws.amazon.com/neptune/latest/userguide/get-started-create-cluster.html) or create manually from the AWS Console). You need to set up the cluster within VPC!
* S3 bucket to save the sample ontology data
* S3 VPC endpoint
* IAM role for Neptune DB instance within the notebook which allows access to the data files in your S3 bucket and allows access to Amazon Bedrock
*A security group attached to Neptune database which will allow TCP connections from the port 8182(default port of the instance) through the subnet CIDR range where the database instance resides.
    

## Install and Import the required modules


In [None]:
!pip3 install -qU boto3


### It is important to use boto3 version equal or greater than 1.34.139!!!!


In [None]:
import requests
import boto3
import json, sys
from datetime import datetime

print('Running boto3 version:', boto3.__version__)

In [None]:
modelId = 'anthropic.claude-3-sonnet-20240229-v1:0'
#modelId = 'anthropic.claude-3-haiku-20240307-v1:0'
#modelId = 'cohere.command-r-plus-v1:0'
#modelId = 'cohere.command-r-v1:0'
#modelId = 'mistral.mistral-large-2402-v1:0'
print(f'Using modelId: {modelId}')

region = 'us-east-1'
print('Using region: ', region)

bedrock = boto3.client(
    service_name = 'bedrock-runtime',
    region_name = region,
    )

### After you create your S3 bucket add a folder named 'data'. You will download the Human disease ontology data into that folder in the cell below

In [None]:
# URL of the OWL file to download
owl_file_url = "http://purl.obolibrary.org/obo/doid.owl"

# Download the OWL file
response = requests.get(owl_file_url)

# Check if the download was successful
if response.status_code == 200:
    # Create an S3 client
    s3_client = boto3.client('s3')

    # S3 bucket name
    bucket_name = 's3bucketname'

    # S3 object key (filename in the bucket)
    object_key = 'data/doid.owl'

    # Upload the file to S3
    s3_client.put_object(Body=response.content, Bucket=bucket_name, Key=object_key)
    
    print(f"File '{object_key}' uploaded to S3 bucket '{bucket_name}' successfully!")
else:
    print(f"Failed to download the file. Error: {response.status_code}")

### Insert the writer endpoint of your database instance to check the connectivity, if it is not healthy please check the prerequisites again ! You can find the database endpoints from Amazon Neptune console



In [None]:
#checking database connectivity

!curl https://{your_writer_endpoint}/status

### Replace the parameters in this command with your own parameters for neptune endpoint, region and IAM role

curl -X POST \
    -H 'Content-Type: application/json' \
    https://your-neptune-endpoint:port/loader -d '
    {
      "source" : "s3://bucket-name/object-key-name",
      "format" : "format",
      "iamRoleArn" : "arn:aws:iam::account-id:role/role-name",
      "region" : "region",
      "failOnError" : "FALSE",
      "parallelism" : "MEDIUM",
      "updateSingleCardinalityProperties" : "FALSE",
      "queueRequest" : "TRUE",
      "dependencies" : ["load_A_id", "load_B_id"]
    }'

In [None]:
#Load the data from S3 to writer endpoint of your database instance
!curl -X POST -H 'Content-Type: application/json' https://{writer-endpoint}:8182/loader -d '{"source" : "{s3_url_of_data}","format" : "rdfxml","iamRoleArn" : "{arn_of_the_IAMrole}","region" : "us-east-1","failOnError" : "FALSE","parallelism" : "MEDIUM","updateSingleCardinalityProperties" : "FALSE","queueRequest" : "TRUE"}'

In [None]:
#get neptune reader endoint
neptune_endpoint='neptune_reader_endpoint'
neptune_port='8182'
url=f"https://{neptune_endpoint}:{neptune_port}/sparql"
print('neptune_query_url: ',url)

In [None]:
# Define the test SPARQL query
query = """
    PREFIX : <http://purl.obolibrary.org/obo/>
    SELECT ?objectLiteral
    WHERE {
        :DOID_9884 ?p ?objectLiteral .
        FILTER(isLiteral(?objectLiteral))
        FILTER(STRSTARTS(STR(?p), 'http://purl.obolibrary.org/obo/'))
    }
"""

# Send the SPARQL query to the Neptune endpoint via POST request
response = requests.post(url, data={"query":query})

if response.status_code == 200:
    print(response.json())
else:
    print(f"Error: {response.status_code} - {response.text}")

### Insert your neptune query url to the 'url' value.

In [None]:
class ToolsList:
    #Define our get_objectLiteral tool function...
    def get_objectLiteral(self, id, predicate):
        print(id, predicate)
        url = '{insert_neptune_query_url}'
        query = f"""
            PREFIX : <{predicate}> 
            SELECT ?objectLiteral 
            WHERE {{:{id} ?p ?objectLiteral . 
            FILTER(isLiteral(?objectLiteral)) 
            FILTER(STRSTARTS(STR(?p), '{predicate}'))
            }}"""
        print(query)
        response = requests.post(
            f"{url}",
            data={"query": query},
        )
        result = f'Value in {id}, {predicate} is ' + response.text
        print(f'Tool result: {result}')
        return result

In [None]:
#Define the configuration for our tool...
toolConfig = {'tools': [],
'toolChoice': {
    'auto': {},
    #'any': {},
    #'tool': {
    #    'name': 'get_weather'
    #}
    }
}

toolConfig['tools'].append({
        'toolSpec': {
            'name': 'get_objectLiteral',
            'description': 'Get objectLiteral value of a given ID.',
            'inputSchema': {
                'json': {
                    'type': 'object',
                    'properties': {
                        'id': {
                            'type': 'string',
                            'description': 'ID of the object'
                        },
                        'predicate': {
                            'type': 'string',
                            'description': 'predicate'
                        }
                    },
                    'required': ['id', 'predicate']
                }
            }
        }
    })

In [None]:
#Function for caling the Bedrock Converse API...
def converse_with_tools(messages, system='', toolConfig=toolConfig):
    response = bedrock.converse(
        modelId=modelId,
        system=system,
        messages=messages,
        toolConfig=toolConfig
    )
    return response

In [None]:
#Function for orchestrating the conversation flow...
def converse(prompt, system=''):
    #Add the initial prompt:
    messages = []
    messages.append(
        {
            "role": "user",
            "content": [
                {
                    "text": prompt
                }
            ]
        }
    )
    print(f"\n{datetime.now().strftime('%H:%M:%S')} - Initial prompt:\n{json.dumps(messages, indent=2)}")

    #Invoke the model the first time:
    output = converse_with_tools(messages, system)
    print(f"\n{datetime.now().strftime('%H:%M:%S')} - Output so far:\n{json.dumps(output['output'], indent=2, ensure_ascii=False)}")

    #Add the intermediate output to the prompt:
    messages.append(output['output']['message'])

    function_calling = next((c['toolUse'] for c in output['output']['message']['content'] if 'toolUse' in c), None)

    #Check if function calling is triggered:
    if function_calling:
        #Get the tool name and arguments:
        tool_name = function_calling['name']
        tool_args = function_calling['input'] or {}
        
        #Run the tool:
        print(f"\n{datetime.now().strftime('%H:%M:%S')} - Running ({tool_name}) tool...")
        tool_response = getattr(ToolsList(), tool_name)(**tool_args) or ""
        if tool_response:
            tool_status = 'success'
        else:
            tool_status = 'error'

        #Add the tool result to the prompt:
        messages.append(
            {
                "role": "user",
                "content": [
                    {
                        'toolResult': {
                            'toolUseId':function_calling['toolUseId'],
                            'content': [
                                {
                                    "text": tool_response
                                }
                            ],
                            'status': tool_status
                        }
                    }
                ]
            }
        )
        #print(f"\n{datetime.now().strftime('%H:%M:%S')} - Messages so far:\n{json.dumps(messages, indent=2)}")

        #Invoke the model one more time:
        output = converse_with_tools(messages, system)
        print(f"\n{datetime.now().strftime('%H:%M:%S')} - Final output:\n{json.dumps(output['output'], indent=2, ensure_ascii=False)}\n")
    return


In [None]:
prompts = [
    "What is the object literal for id as 'DOID_9884' and predicate as 'http://purl.obolibrary.org/obo/' ",
    ]

for prompt in prompts:
    converse(
        system = [{"text": "You're provided with a few tools; \
            only use the tool if required. Don't make reference to the tools in your final answer."}],
        prompt = prompt
)

