## First we need to capture the graph notebook configuration to identify the Neptune cluster and create that connection.

In [None]:
%graph_notebook_config --store-to config

In [None]:
import boto3
import json
endpoint_url = "https://" + json.loads(config)['host'] + ":" + str(json.loads(config)['port'])
print(endpoint_url)
neptune_client = boto3.client('neptunedata', endpoint_url=endpoint_url)

## The Basics

### First what is RAG involve?  Most RAG implementations involve the following steps:
1. Encode the text into a series of inputs called tokens
2. "Chunk" the tokens into groups based on a specific size
3. Run the list of tokens through an embedding model to create an N-dimensional array of numbers encoding the "meaning of the text"
4. Repeat steps 1-3 with the user query/prompt
5. Perform a "K-Nearest Neighbor" distance check with those N-dimensional arrays to find the "K" closest in distance.
6. Add the text of those K closest chunks to the LLM prompt as additional context for the query.

<h4>Press Release 1:</h4>

<p>Amazon (NASDAQ:AMZN) and Whole Foods Market, Inc. (NASDAQ:WFM) today announced that they have entered into a definitive merger agreement under which Amazon will acquire Whole Foods Market for \$42 per share in an all-cash transaction valued at approximately $13.7 billion, including Whole Foods Market’s net debt. “Millions of people love Whole Foods Market because they offer the best natural and organic foods, and they make it fun to eat healthy,” said Jeff Bezos, Amazon founder and CEO. “Whole Foods Market has been satisfying, delighting and nourishing customers for nearly four decades – they’re doing an amazing job and we want that to continue.” “This partnership presents an opportunity to maximize value for Whole Foods Market’s shareholders, while at the same time extending our mission and bringing the highest quality, experience, convenience and innovation to our customers,” said John Mackey, Whole Foods Market co-founder and CEO. Whole Foods Market will continue to operate stores under the Whole Foods Market brand and source from trusted vendors and partners around the world. John Mackey will remain as CEO of Whole Foods Market and Whole Foods Market’s headquarters will stay in Austin, Texas. Completion of the transaction is subject to approval by Whole Foods Market's shareholders, regulatory approvals and other customary closing conditions. The parties expect to close the transaction during the second half of 2017.</p>

<h4>Press Release 2:</h4>
Amazon Selling Partner Conferences Sell Out in Six Weeks, with 1,800+ Small and Medium Sized Businesses Set to Learn about How to Build and Grow their Sales Online\nEvents will run in Fort Lauderdale, FL; Chicago, IL; Los Angeles, CA; and Seattle, WA between March and October 2019\nAttendees will learn how to leverage tools and resources to help their business succeed in Amazon\u2019s Stores, prepare for seasonal surges, network with fellow sellers, and get one-to-one help from Amazon's selling experts\nMore than half of units sold in Amazon\u2019s stores are from small and medium-sized businesses, with over a million U.S.-based small and medium-sized businesses selling in Amazon\u2019s stores\nSEATTLE--(BUSINESS WIRE)--Mar. 25, 2019-- Amazon (NASDAQ: AMZN) today announced that its new Selling Partner Summits, a series of six conferences for small and medium-sized business (SMBs) to help them build their business in Amazon\u2019s stores, have sold out in just six weeks. More than 1,800 SMBs are set to attend the nationwide events between March and October. The Summits are part of Amazon\u2019s significant investment to help businesses succeed in selling their products online.\nEach Summit will feature an Amazon-led educational track, experts lounge, and product labs to help small businesses build and grow their sales in Amazon\u2019s stores. Participants will learn directly from Amazon's experts and meet like-minded Amazon sellers to network, learn, and share success stories. The first event will be held in Ft. Lauderdale, FL on March 26-27. Later in 2019, the Summits will be hosted in Chicago, Los Angeles, and Seattle.\n\u201cWe are a champion of small business across America, investing heavily to help over a million businesses sell their products in Amazon Stores while creating economic opportunity and jobs for hundreds of thousands of people across the country,\u201d said Pete Sauerborn, VP of Selling Partner Recruitment and Development for Amazon. \u201cAs part of our commitment to empowering small business, these summits are another powerful tool to help them learn how to sell online and grow their sales.\u201d\nAt the Summits, attendees will have an Amazon-led educational track based on their business model and how long they\u2019ve been selling. The educational tracks are segmented as New Brand Owner, Established Brand Owner, New Reseller, and Established Resellers. Each track is engineered for sellers to walk away from the event with knowledge and insights to help them scale their business and better identify the growth levers that make the most sense for their business.\nThe Selling Partner Summit Series will feature sessions designed to help sellers grow their business, including:\n\u2022 Customer Obsession: Understand your pivotal role in Amazon\u2019s commitment to maintaining outstanding customer experience.\n\u2022 The Selling Partner Journey: Get an overview of the tools and programs available to sellers and how they can help you grow your business.\n\u2022 Inventory & Fulfillment: Study up on fulfillment options and inventory management best practices to prevent stock-outs and plan for seasonal sales surges.\n\u2022 Discovery: Learn how to help Amazon customers find your products via listing creation, search optimization, and advertising opportunities.\n\u2022 Account Health: Understand the policies, metrics, and processes pertaining to your account health.\n\u2022 Featured Offers (buy box): Make sense of featured offer eligibility and the performance-based requirements products must meet to 'win the featured offer.'\n\u2022 Amazon Expert Lounge: Ask your remaining questions 1:1 in the Expert Lounge after completing the educational tracks.\nRegistration for the new Selling Partner Summit Series opened on February 6th and sold out quickly in just six weeks. Amazon has previously hosted events to help sellers including Amazon Academy events across Europe and the Boost Conference in the U.S., specifically for businesses using the Fulfillment by Amazon service.\nMore than half of units sold in Amazon\u2019s stores are from SMBs. The 2018 Amazon Small Business Impact Report revealed that there are more than one million U.S.-based SMBs selling in Amazon\u2019s stores, and SMBs are estimated to have created more than 900,000 jobs worldwide to support their sales through Amazon. In 2018, more than 50,000 small and medium-sized businesses exceeded in sales in Amazon\u2019s stores worldwide, and nearly 200,000 surpassed in sales. The number of small and medium-sized businesses eclipsing in sales in Amazon\u2019s stores worldwide grew by 20 percent last year.\nTo learn more about the Selling Partner Summit Series, visit here.\nAmazon is guided by four principles: customer obsession rather than competitor focus, passion for invention, commitment to operational excellence, and long-term thinking. Customer reviews, 1-Click shopping, personalized recommendations, Prime, Fulfillment by Amazon, AWS, Kindle Direct Publishing, Kindle, Fire tablets, Fire TV, Amazon Echo, and Alexa are some of the products and services pioneered by Amazon. For more information, visit www.amazon.com\/about and follow @AmazonNews
</p>

In [None]:
press_releases = [
"""Amazon (NASDAQ:AMZN) and Whole Foods Market, Inc. (NASDAQ:WFM) today announced that they have entered into a definitive merger agreement under which Amazon will acquire Whole Foods Market for $42 per share in an all-cash transaction valued at approximately $13.7 billion, including Whole Foods Market’s net debt. “Millions of people love Whole Foods Market because they offer the best natural and organic foods, and they make it fun to eat healthy,” said Jeff Bezos, Amazon founder and CEO. “Whole Foods Market has been satisfying, delighting and nourishing customers for nearly four decades – they’re doing an amazing job and we want that to continue.” “This partnership presents an opportunity to maximize value for Whole Foods Market’s shareholders, while at the same time extending our mission and bringing the highest quality, experience, convenience and innovation to our customers,” said John Mackey, Whole Foods Market co-founder and CEO. Whole Foods Market will continue to operate stores under the Whole Foods Market brand and source from trusted vendors and partners around the world. John Mackey will remain as CEO of Whole Foods Market and Whole Foods Market’s headquarters will stay in Austin, Texas. Completion of the transaction is subject to approval by Whole Foods Market's shareholders, regulatory approvals and other customary closing conditions. The parties expect to close the transaction during the second half of 2017.""",
"""Amazon Selling Partner Conferences Sell Out in Six Weeks, with 1,800+ Small and Medium Sized Businesses Set to Learn about How to Build and Grow their Sales Online\nEvents will run in Fort Lauderdale, FL; Chicago, IL; Los Angeles, CA; and Seattle, WA between March and October 2019\nAttendees will learn how to leverage tools and resources to help their business succeed in Amazon\u2019s Stores, prepare for seasonal surges, network with fellow sellers, and get one-to-one help from Amazon's selling experts\nMore than half of units sold in Amazon\u2019s stores are from small and medium-sized businesses, with over a million U.S.-based small and medium-sized businesses selling in Amazon\u2019s stores\nSEATTLE--(BUSINESS WIRE)--Mar. 25, 2019-- Amazon (NASDAQ: AMZN) today announced that its new Selling Partner Summits, a series of six conferences for small and medium-sized business (SMBs) to help them build their business in Amazon\u2019s stores, have sold out in just six weeks. More than 1,800 SMBs are set to attend the nationwide events between March and October. The Summits are part of Amazon\u2019s significant investment to help businesses succeed in selling their products online.\nEach Summit will feature an Amazon-led educational track, experts lounge, and product labs to help small businesses build and grow their sales in Amazon\u2019s stores. Participants will learn directly from Amazon's experts and meet like-minded Amazon sellers to network, learn, and share success stories. The first event will be held in Ft. Lauderdale, FL on March 26-27. Later in 2019, the Summits will be hosted in Chicago, Los Angeles, and Seattle.\n\u201cWe are a champion of small business across America, investing heavily to help over a million businesses sell their products in Amazon Stores while creating economic opportunity and jobs for hundreds of thousands of people across the country,\u201d said Pete Sauerborn, VP of Selling Partner Recruitment and Development for Amazon. \u201cAs part of our commitment to empowering small business, these summits are another powerful tool to help them learn how to sell online and grow their sales.\u201d\nAt the Summits, attendees will have an Amazon-led educational track based on their business model and how long they\u2019ve been selling. The educational tracks are segmented as New Brand Owner, Established Brand Owner, New Reseller, and Established Resellers. Each track is engineered for sellers to walk away from the event with knowledge and insights to help them scale their business and better identify the growth levers that make the most sense for their business.\nThe Selling Partner Summit Series will feature sessions designed to help sellers grow their business, including:\n\u2022 Customer Obsession: Understand your pivotal role in Amazon\u2019s commitment to maintaining outstanding customer experience.\n\u2022 The Selling Partner Journey: Get an overview of the tools and programs available to sellers and how they can help you grow your business.\n\u2022 Inventory & Fulfillment: Study up on fulfillment options and inventory management best practices to prevent stock-outs and plan for seasonal sales surges.\n\u2022 Discovery: Learn how to help Amazon customers find your products via listing creation, search optimization, and advertising opportunities.\n\u2022 Account Health: Understand the policies, metrics, and processes pertaining to your account health.\n\u2022 Featured Offers (buy box): Make sense of featured offer eligibility and the performance-based requirements products must meet to 'win the featured offer.'\n\u2022 Amazon Expert Lounge: Ask your remaining questions 1:1 in the Expert Lounge after completing the educational tracks.\nRegistration for the new Selling Partner Summit Series opened on February 6th and sold out quickly in just six weeks. Amazon has previously hosted events to help sellers including Amazon Academy events across Europe and the Boost Conference in the U.S., specifically for businesses using the Fulfillment by Amazon service.\nMore than half of units sold in Amazon\u2019s stores are from SMBs. The 2018 Amazon Small Business Impact Report revealed that there are more than one million U.S.-based SMBs selling in Amazon\u2019s stores, and SMBs are estimated to have created more than 900,000 jobs worldwide to support their sales through Amazon. In 2018, more than 50,000 small and medium-sized businesses exceeded in sales in Amazon\u2019s stores worldwide, and nearly 200,000 surpassed in sales. The number of small and medium-sized businesses eclipsing in sales in Amazon\u2019s stores worldwide grew by 20 percent last year.\nTo learn more about the Selling Partner Summit Series, visit here.\nAmazon is guided by four principles: customer obsession rather than competitor focus, passion for invention, commitment to operational excellence, and long-term thinking. Customer reviews, 1-Click shopping, personalized recommendations, Prime, Fulfillment by Amazon, AWS, Kindle Direct Publishing, Kindle, Fire tablets, Fire TV, Amazon Echo, and Alexa are some of the products and services pioneered by Amazon. For more information, visit www.amazon.com\/about and follow @AmazonNews"""
]

In [None]:
!pip install semchunk tiktoken scikit-learn

### So what does tokenizing mean?

First, there are many different ways to tokenize a string, so we are using a common tokenizer "cl100k_base". In the code below, we are providing the beginning of our first press release and demonstrating how the tokenizer takes a sequence of characters and maps them to a number. Generally it is a full word, but it also accounts for punctuation and other common patterns. As you might guess there are 100K unique tokens total, each acting as a parameter to the model.

In [None]:
import tiktoken

sentence = "Amazon (NASDAQ:AMZN) and Whole Foods Market, Inc. (NASDAQ:WFM) today announced that they have entered"

encoding = tiktoken.get_encoding("cl100k_base")
tokens = encoding.encode(sentence)
print(f"{len(tokens)} tokens total")

for token in tokens:
    print(f"{token:05d}:\t{encoding.decode_single_token_bytes(token).decode('utf-8')}")

### This code will intelligently chunk up the document based on the desired size

In [None]:
import semchunk

chunks = []
chunk_size = 128  #tokens
chunker = semchunk.chunkerify('cl100k_base', chunk_size)

for pr in press_releases:
    local_chunks = chunker(pr)
    for local in local_chunks:
        chunks.append(local)
print(f"{len(chunks)} chunks total")
chunks


## Here we are running each chunk through the Amazon Titan Embed 2.0 Model to generate text embeddings for vector searches (traditional RAG)

In [None]:
import boto3
import json
import numpy as np
import pandas as pd

bedrock_client = boto3.client("bedrock-runtime", region_name="us-east-1")

# Set the model ID, e.g., Titan Text Embeddings V2.
model_id = "amazon.titan-embed-text-v2:0"

embeddings = []

for chunk in chunks:
    input_text = chunk
    native_request = {"inputText": input_text}
    request = json.dumps(native_request)

    response = bedrock_client.invoke_model(modelId=model_id, body=request)

    model_response = json.loads(response["body"].read())

    embedding = model_response["embedding"]
    input_token_count = model_response["inputTextTokenCount"]

    print("\nYour input:")
    print(input_text)
    print(f"Number of input tokens: {input_token_count}")
    print(f"Size of the generated embedding: {len(embedding)}")
    print("Embedding:")
    print(embedding)

    embeddings.append((chunk, np.array(embedding)))

core_df = pd.DataFrame(embeddings, columns=['Text','Embeddings'])
core_df


## We need to do the same to generate the embeddings for the question we are asking.

In [None]:
input_text = "What are the connections between Amazon and Whole Foods?"
native_request = {"inputText": input_text}
request = json.dumps(native_request)

response = bedrock_client.invoke_model(modelId=model_id, body=request)

model_response = json.loads(response["body"].read())

embedding = model_response["embedding"]
input_token_count = model_response["inputTextTokenCount"]

print("\nYour input:")
print(input_text)
print(f"Number of input tokens: {input_token_count}")
print(f"Size of the generated embedding: {len(embedding)}")
print("Embedding:")
print(embedding)

query_item = (input_text, np.array(embedding))

## Now we are using a cosine similarity library to detect the top 3 closest chunks for answering the question.

In [None]:
from sklearn.metrics.pairwise import cosine_similarity

similarity = cosine_similarity(query_item[1].reshape(1,-1), np.array(core_df["Embeddings"].tolist()))

core_df["similarity"] = similarity.reshape(-1,1)

top3 = core_df.nlargest(3, 'similarity')
top3

## The key takeaway here is that we don't really know how many of the N-closest results are actually relevant using RAG. Here the first two chunks are very relevant, but the 3rd has to do with information about Partner Summits.

In [None]:
for i,row in top3.iterrows():
    print(chunks[i])
    print("------")

## OK then, what is Graph RAG?

GraphRAG uses the same concepts as RAG, except it also will extract key entities from both the reference text, and the prompt. When the graph is created, it stores the context between those entities so it can be later retrieved and shared with the prompt. 


## First we identify the entities from each of the chunks

In [None]:
import boto3
from botocore.exceptions import ClientError
import json
import logging
from enum import Enum
import pandas as pd

logger = logging.getLogger(__name__)

client = boto3.client(service_name="bedrock-runtime", region_name="us-east-1")

class SupportedModels(Enum):
    CLAUDE_OPUS = "anthropic.claude-3-opus-20240229-v1:0"
    CLAUDE_SONNET = "anthropic.claude-3-sonnet-20240229-v1:0"
    CLAUDE_HAIKU = "anthropic.claude-3-haiku-20240307-v1:0"
    COHERE_COMMAND_R = "cohere.command-r-v1:0"
    COHERE_COMMAND_R_PLUS = "cohere.command-r-plus-v1:0"
    
model_id = SupportedModels.CLAUDE_HAIKU.value

# Define the prompt for the model.
instructions_prompt = """
You are excellent at identifying entities from text and it makes you happy when you provide the correct answer. 
The types of entities you can identify are ORGANIZATION, DATE, PERSON, FACILITY, PERSON_TITLE, LOCATION, MONETARY_VALUE, 
STOCK_CODE, QUANTITY
If the entity is STOCK_CODE, put the market in front of the code in the response. 
CITIGROUP INC stock code C is traded on the NYSE and would become NYSE:C. 
Apple Inc. Common Stock symbol AAPL is traded on the NASDAQ and would become NASDAQ:AAPL.
If you don't know which market the symbol is traded on, use the format UNKNOWN:SYMBOL.

When someone gives you text, determine the entities from the text and respond using JSON in the format:
Example: 
first entity type is ORGANIZATION and the entities extracted are company1, company2, and company3.
second entity type is PERSON and the entities extracted are person1, person2, and person3. 
Response:
{"ORGANIZATION": ["company1","company2","company3"], "PERSON": ["person1","person2","person3"]}. 
Return only the JSON, no other text. 

Text: 
"""

entities = []

for idx, row in core_df.iterrows():

    prompt = instructions_prompt + row["Text"]

    # Format the request payload using the model's native structure.
    native_request = {
        "anthropic_version": "bedrock-2023-05-31",
        "max_tokens": 512,
        "temperature": 0.5,
        "messages": [
            {
                "role": "user",
                "content": [{"type": "text", "text": prompt}],
            }
        ],
    }

    # Convert the native request to JSON.
    request = json.dumps(native_request)

    try:
        # Invoke the model with the request.
        response = client.invoke_model(modelId=model_id, body=request)

    except (ClientError, Exception) as e:
        print(f"ERROR: Can't invoke '{model_id}'. Reason: {e}")
        exit(1)

    # Decode the response body.
    model_response = json.loads(response["body"].read())

    # Extract and print the response text.
    response_text = model_response["content"][0]["text"]
    entities.append(response_text)

core_df["Entities"] = entities
core_df["Entities"]

## And we do the same to extract the entities from the question.

In [None]:
# Define the prompt for the model.
instructions_prompt = """
You are excellent at identifying entities from a question and it makes you happy when you provide the correct answer. 
The types of entities you can identify are ORGANIZATION, DATE, PERSON, FACILITY, PERSON_TITLE, LOCATION, MONETARY_VALUE, 
STOCK_CODE, QUANTITY
If the entity is STOCK_CODE, put the market in front of the code in the response. 
CITIGROUP INC stock code C is traded on the NYSE and would become NYSE:C. 
Apple Inc. Common Stock symbol AAPL is traded on the NASDAQ and would become NASDAQ:AAPL.
If you don't know which market the symbol is traded on, use the format UNKNOWN:SYMBOL.

When someone asks you a question, determine the entities from the text and respond using JSON in the format:
Example: 
first entity type is ORGANIZATION and the entities extracted are company1, company2, and company3.
second entity type is PERSON and the entities extracted are person1, person2, and person3. 
Response:
{"ORGANIZATION": ["company1","company2","company3"], "PERSON": ["person1","person2","person3"]}. 
Return only the JSON, no other text. 

Question: 
"""

user_question = "What are the connections between Amazon and Whole Foods Market, Inc.?"

prompt = instructions_prompt + user_question

# Format the request payload using the model's native structure.
native_request = {
    "anthropic_version": "bedrock-2023-05-31",
    "max_tokens": 512,
    "temperature": 0.5,
    "messages": [
        {
            "role": "user",
            "content": [{"type": "text", "text": prompt}],
        }
    ],
}

# Convert the native request to JSON.
request = json.dumps(native_request)

try:
    # Invoke the model with the request.
    response = client.invoke_model(modelId=model_id, body=request)

except (ClientError, Exception) as e:
    print(f"ERROR: Can't invoke '{model_id}'. Reason: {e}")
    exit(1)

# Decode the response body.
model_response = json.loads(response["body"].read())

# Extract and print the response text.
response_text = model_response["content"][0]["text"]
response_text

## Now let's look at each chunk and the entities that were extracted from them.

In [None]:
for idx, row in core_df.iterrows():
    print(f"""
    Index: {idx}
    Text: {row["Text"]}
    Generation: {row["Entities"]}
    """)

## And our dataset as a whole, with the text, embeddings, similarity to our question, and the entities that were extracted from each chunk.

In [None]:
core_df

## Here is a visualization of just a few of the entities we extracted and their relationship to the document chunks
![Graph of Entities](GraphRAG.png)

## Here we are defining a few convenience functions to easier identify which rows contain a reference to Amazon and to Whole Foods Markets, respectively.

In [None]:
def contains_org_reference(row, entity_names): 
    entities = json.loads(row["Entities"])
    if not ("ORGANIZATION" in entities):
        return False
    for name in entity_names:
        if name in entities["ORGANIZATION"]:
            return True
    return False

In [None]:
filtered_results = core_df.apply(contains_org_reference, axis=1, args=[["Amazon"]])
core_df["Has Amazon"] = filtered_results

In [None]:
filtered_results = core_df.apply(contains_org_reference, axis=1, args=[["Whole Foods Market, Inc.", "Whole Foods Market"]])
core_df["Has WFM"] = filtered_results

In [None]:
core_df.sort_values(by='similarity',ascending=False)

## OK, great, but what if we already had a Knowledge Graph we want to use RAG with.  We'd rather not reprocess all of the original documents and just use the model we already put a lot of work into.  What do we do then?

<div class="alert alert-block alert-warning">
IMPORTANT:  If you want to follow along with the rest of this notebook, you need to run the CloudFormation template listed in the blog: https://aws.amazon.com/blogs/database/building-a-knowledge-graph-in-amazon-neptune-using-amazon-comprehend-events/ and install this notebook to use the same Neptune instance created there. In addition, you need to create a folder in that notebook instance called "data" and copy the file "comprehend_events_amazon_press_releases.20201118.v1.4.1.jsonl" from the data folder in this repository into that folder.
</div>

## OK, let's set up Bedrock to speak to our desired LLM

2025 update:  I know these models are a little dated now and I'm not sure if they are even still available, but I'll try to update this soon including adding Nova.

In [None]:
import boto3
from botocore.exceptions import ClientError
import json
import logging
from enum import Enum

logger = logging.getLogger(__name__)

class SupportedModels(Enum):
    CLAUDE_OPUS = "anthropic.claude-3-opus-20240229-v1:0"
    CLAUDE_SONNET = "anthropic.claude-3-sonnet-20240229-v1:0"
    CLAUDE_HAIKU = "anthropic.claude-3-haiku-20240307-v1:0"
    COHERE_COMMAND_R = "cohere.command-r-v1:0"
    COHERE_COMMAND_R_PLUS = "cohere.command-r-plus-v1:0"
    
model_id = SupportedModels.CLAUDE_HAIKU.value

## Again we have our question.

In [None]:

# Connections type question
user_question = "What are the connections between ticker AMZN and John Mackey?"


## Here we are prompt engineering to get the LLM to identify the type of question being asked (Inquiry or Connections) and to extract the entities from the question so we can plug them into our query.

In [None]:
instructions_prompt = """
You are a virtual assistant that takes a question and returns a lot of metadata about it. 
You are excellent at helping the user and it makes you happy when you provide the correct answer. 
When someone asks you a question, you have two goals.  First, determine what type of question it is. Here are the guidelines to use:
If the question has two entities and is in a format like "What is the connection between entity1 and entity2?" then 
it is a "Connections" question.
If the question asks for all information about a single entity and is in a format like "Tell me everything about entity1"
then it is a "Inquiry" question.
If you aren't sure what type of question it is, then it is an "Unknown" question.
Second, you will identify the entities in the question. 
The types of entities you can identify are ORGANIZATION, DATE, PERSON, FACILITY, PERSON_TITLE, LOCATION, MONETARY_VALUE, STOCK_CODE, QUANTITY
If the entity is STOCK_CODE, put the market in front of the code in the response. 
CITIGROUP INC stock code C is traded on the NYSE and would become NYSE:C. 
Apple Inc. Common Stock symbol AAPL is traded on the NASDAQ and would become NASDAQ:AAPL.
If you don't know which market the symbol is traded on, use the format UNKNOWN:SYMBOL.

When someone asks you a question, you will respond using JSON in the format:
Example: 
first entity type is ORGANIZATION and the entities extracted are company1, company2, and company3.
second entity type is PERSON and the entities extracted are person1, person2, and person3. 
Response:
{"questionType": "Connections|Inquiry|Unknown","entities": {"entity type": ["company1","company2","company3"], "entity type": ["person1","person2","person3"]}} 
Return only the JSON, no other text. 

Question: 
"""

prompt = instructions_prompt + user_question

# Format the request payload using the model's native structure.
native_request = {
    "anthropic_version": "bedrock-2023-05-31",
    "max_tokens": 512,
    "temperature": 0.5,
    "messages": [
        {
            "role": "user",
            "content": [{"type": "text", "text": prompt}],
        }
    ],
}

# Convert the native request to JSON.
request = json.dumps(native_request)

try:
    # Invoke the model with the request.
    response = bedrock_client.invoke_model(modelId=model_id, body=request)

except (ClientError, Exception) as e:
    print(f"ERROR: Can't invoke '{model_id}'. Reason: {e}")
    exit(1)

# Decode the response body.
model_response = json.loads(response["body"].read())

# Extract and print the response text.
response_text = model_response["content"][0]["text"]
print(f"Response was {response_text}")

metadata = json.loads(response_text)

## We have templated two types of queries that our system will support.  

- The Connections query will retrieve all of the facts linking two identified entities together.
- The Inquiry query will retrieve all facts 2 hops out from the identified entity.

In [None]:
general_template_part1 = """
MATCH (entity1)<-[role1]-(event)-[role2]->(entity2)<-[role3]-(event2)-[role4]->(entity3),
(event)<-[r1]-(doc1:DOCUMENT)
WHERE $entity1_text = entity1.names AND $entity1_label IN LABELS(entity1)
AND entity1 <> entity3 AND event <> event2
"""

connections_only_part1 = """
AND (($entity2_text = entity2.names AND $entity2_label IN LABELS(entity2)) OR 
($entity2_text = entity3.names AND $entity2_label IN LABELS(entity3)))
"""

general_template_part2 = """
RETURN DISTINCT "Event Type was " + LABELS(event) + " involving " + entity1.primaryName + " in the role of " 
+ TYPE(role1) + " and " + entity2.primaryName + " in the role of " + TYPE(role2) + ". " as statements, 
doc1.primaryName as doc
UNION
MATCH (entity1)<-[role1]-(event)-[role2]->(entity2)<-[role3]-(event2)-[role4]->(entity3),
(event2)<-[r2]-(doc2:DOCUMENT)
WHERE $entity1_text = entity1.names AND $entity1_label IN LABELS(entity1)
AND entity1 <> entity3 AND event <> event2
"""

connections_only_part2 = """
AND (($entity2_text = entity2.names AND $entity2_label IN LABELS(entity2)) OR 
($entity2_text = entity3.names AND $entity2_label IN LABELS(entity3)))
"""

general_template_part3 = """
RETURN DISTINCT "Event Type was " + LABELS(event2) + " involving " + entity2.primaryName + " in the role of " 
+ TYPE(role3) + " and " + entity3.primaryName + " in the role of " + TYPE(role4) + "." as statements, doc2.primaryName as doc
LIMIT 20
"""

connections_template =  general_template_part1 + connections_only_part1 + general_template_part2 + connections_only_part2 + general_template_part3
                        
inquiry_template = general_template_part1 + general_template_part2 + general_template_part3

query_template = connections_template if metadata["questionType"] == "Connections" else inquiry_template
print(query_template)

## Now we will run the appropriate query using the Neptune data plane API to retrieve all of the known facts regarding the entity/entities.

In [None]:
entities = metadata["entities"]
items = []
for key, value in entities.items():
    for item in value:
        items.append((key, item))
    
if (len(items) > 2):
    print("WARNING: Found more than two entities.  This current version will only use the first two entities in the graph query")
if (metadata["questionType"] == "Connections" and len(items) < 2):
    print("ERROR: Cannot execute a Connections query with only a single entity.  Something went wrong.")
    exit(1)
if (len(items) <= 0):
    print("ERROR: No entities were extracted for the query. Something went wrong.")
    exit(1)

if (len(items) >= 1):
    if len(items) >= 2:
        parameters = f"{{\"entity1_text\":\"{items[0][1]}\",\"entity1_label\":\"{items[0][0]}\",\"entity2_text\":\"{items[1][1]}\",\"entity2_label\":\"{items[1][0]}\"}}"
    if len(items) == 1:
        parameters = f"{{\"entity1_text\":\"{items[0][1]}\",\"entity1_label\":\"{items[0][0]}\"}}"

    print(query_template)
    print(parameters)

    neptune_response = neptune_client.execute_open_cypher_query(
        openCypherQuery=query_template,
        parameters=parameters
    )
else:
    print("No entities found in question. Cannot continue.")


## Let's take a look at all of the statements that were identified from the graph.

In [None]:
statements = []
for item in neptune_response["results"]:
    statements.append((''.join(item["statements"]), item["doc"]))
print(statements)

## This code is going to use a raw data file to pull out the URL of our press releases.  Make sure you took this file from our repo and placed it in a folder called data.

In [None]:
import pandas as pd
jsonObj = pd.read_json(path_or_buf="data/comprehend_events_amazon_press_releases.20201118.v1.4.1.jsonl", lines=True)


reference_dict = dict()
for item in statements:
    line_index = int(item[1].split("_")[-1])
    print(line_index)
    print(jsonObj.iloc[[line_index]]["metadata"].to_dict())
    reference_dict[item[1]] = jsonObj.iloc[[line_index]]["metadata"].to_dict()[line_index]['common_crawl']['WARC-Target-URI']

## Finally we will ask the LLM to answer the question considering the facts we extracted from the LLM

In [None]:
prompt = """
You are excellent at answering questions, and it makes you happy when you provide the correct answer.
Consider the following list of facts, each fact is listed within the <facts><fact><statement> XML tags.
Each fact also has a document reference to where the fact was extracted from in <facts><fact><reference>.
In the answer, list the references that were used.
<facts>
"""

for statement in statements:
    prompt = prompt + "<fact><statement>" + statement[0] + "</statement><reference>" + reference_dict[statement[1]] + "</reference>\n"
    
if (len(statements) <= 0):
    print("""WARNING: No facts were extracted from the Knowledge Graph for this query. The response is pure LLM!
    
    """)

prompt = prompt + """
Create a narrative paragraph answering the question below. After the narrative paragraph, include the facts 
as a bulleted list, but omit all XML tags. Finally, list the references used after the list of facts.

Question:
""" + user_question

# Format the request payload using the model's native structure.
native_request = {
    "anthropic_version": "bedrock-2023-05-31",
    "max_tokens": 512,
    "temperature": 0.5,
    "messages": [
        {
            "role": "user",
            "content": [{"type": "text", "text": prompt}],
        }
    ],
}

print("==== Prompt ===")
print(prompt)
print("==== End Prompt ===")

# Convert the native request to JSON.
request = json.dumps(native_request)

try:
    # Invoke the model with the request.
    response = bedrock_client.invoke_model(modelId=model_id, body=request)

except (ClientError, Exception) as e:
    print(f"ERROR: Can't invoke '{model_id}'. Reason: {e}")
    exit(1)

# Decode the response body.
model_response = json.loads(response["body"].read())

print(model_response["content"][0]["text"])