# Building a knowledge graph with an LLM

This notebook shows how to build up a knowledge base from unstructured data using a large language model (LLM). This approach is useful if you have a lot of unstructured data like meeting notes or short articles, and you want to automatically see the relationships between different concepts.

Our approach starts by extracting a list of nodes and entities using Anthropic's Claude 3 model via Amazon Bedrock. We take the resulting nodes and entities and store them in Amazon Neptune, a graph database. Then we can use the typical set of graph visualizations and queries to understand the data.

In [None]:
// Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
// SPDX-License-Identifier: MIT-0

## Load local configuration

Create the file `config.yml` and then add settings for your Neptune writer endpoint and AWS region. For example:

    aws:
        region: us-east-1
    neptune:
        endpoint: your_neptune_writer.your_region.neptune.amazonaws.com

You should not include `config.yml` in your version control. If you use Git, add it to your `.gitignore` file.

In [1]:
import yaml
config = yaml.safe_load(open("config.yml"))

## Install dependencies and load data

We'll load a [sample set](https://github.com/applicaai/kleister-nda/tree/master) of NDA documents, and the `neo4j` library to interact with Neptune programmatically.

In [2]:
%pip install --upgrade --quiet boto3 botocore langchain datasets neo4j python-xz

Note: you may need to restart the kernel to use updated packages.


In [None]:
!git clone https://github.com/applicaai/kleister-nda.git

In [3]:
import lzma

In [9]:
lines = []
with lzma.open('/home/ec2-user/SageMaker/kleister-nda/train/in.tsv.xz', mode='rt', encoding='utf-8') as fid:
    for line in fid:
        fields = line.split('\t')
        lines.append(fields[2])

In [10]:
len(lines)

254

In [11]:
lines[0]

'EX-10.23 5 dex1023.htm COVENANT NOT TO COMPETE AND NON-DISCLOSURE AGREEMENT\\nExhibit 10.23\\nCOVENANT NOT TO COMPETE\\nAND NON-DISCLOSURE AGREEMENT\\nPARTIES:\\nEric Dean Sprunk (“EMPLOYEE”)\\nand\\nNIKE, Inc., divisions, subsidiaries\\nand affiliates. (“NIKE”):\\nRECITALS:\\nA. This Covenant Not to Compete and Non-Disclosure Agreement is executed upon initial employment or upon the EMPLOYEE’s\\nadvancement with NIKE and is a condition of such employment or advancement.\\nB. Over the course of EMPLOYEE’s employment with NIKE, EMPLOYEE will be or has been exposed to and/or is in a position to\\ndevelop confidential information peculiar to NIKE’s business and not generally known to the public as defined below (“Protected Information”). It is\\nanticipated that EMPLOYEE will continue to be exposed to Protected Information of greater sensitivity as EMPLOYEE advances in the company.\\nC. The nature of NIKE’s business is highly competitive and disclosure of any Protected Information would 

## Bedrock setup

Here we'll define helper methods to use both Claude and Meta's Llama-2 model. This includes methods to invoke the models for regular chat, and methods that have prompts designed for node and entity extraction.

In [12]:
import boto3
import json

In [13]:
llamaModelId = 'meta.llama2-70b-chat-v1' 
bedrock_runtime = boto3.client(
    service_name='bedrock-runtime', 
    region_name=config['aws']['region']
)

def call_llama(query):

    prompt = f"[INST]{query}[/INST]"
    llamaPayload = json.dumps({ 
    	'prompt': prompt,
        'max_gen_len': 512,
    	'top_p': 0.9,
    	'temperature': 0.2
    })

    response = bedrock_runtime.invoke_model(
        body=llamaPayload, 
        modelId=llamaModelId, 
        accept='application/json', 
        contentType='application/json'
    )

    body = response.get('body').read().decode('utf-8')
    response_body = json.loads(body)
    return response_body['generation'].strip()

In [15]:
call_llama("Tell me a story about Mars")



In [20]:
claudeModelId = 'anthropic.claude-3-sonnet-20240229-v1:0' 

def call_claude(query):

    claudePayload = json.dumps({ 
        "anthropic_version": "bedrock-2023-05-31",
        'max_tokens': 2048,
    	"messages": [
          {
            "role": "user",
            "content": [
              {
                "type": "text",
                "text": query
              }
            ]
          }
        ]
    })
    

    response = bedrock_runtime.invoke_model(
        body=claudePayload, 
        modelId=claudeModelId, 
        accept='application/json', 
        contentType='application/json'
    )

    body = response.get('body').read().decode('utf-8')

    response_body = json.loads(body)
    return response_body['content'][0]['text']

In [21]:
call_claude("Tell me a story about Mars")

"Here is a story about Mars:\n\nThe Year is 2085. Humans have finally set foot on the Red Planet after decades of planning and preparation. The first crewed mission to Mars, consisting of six brave astronauts from around the world, has landed safely in Acidalia Planitia - a smooth northern plain ideal for establishing the first outpost.\n\nAs the astronauts exit their lander and take their first steps on the rusty Martian soil, they can't help but feel a sense of awe and history. They are the first humans to walk on another planet in the solar system. Gazing up at the rust-colored sky, they see the two small moons Phobos and Deimos in the distance.\n\nOver the next several weeks, the crew constructs the first human habitat using materials brought from Earth as well as resources manufactured from the Martian soil itself. They explore the nearby area with rovers, conduct scientific experiments, and send video messages back to the fascinated people on Earth.\n\nOne day, while analyzing ro

In [22]:
def call_llama_kg(query):

    prompt = """[INST]You are a robot that extracts information from financial news to build a knowledge graph. You only output JSON. Nodes represent entities, like a company.  Edges represent the relationships between nodes, like the fact that a person is the CEO of a company. When extracting nodes, it's vital to ensure consistency. If a node, such as "Acme Corp", is mentioned multiple times in the text but is referred to by different names (e.g., "Acme"), always use the most complete identifier for that entity throughout the knowledge graph. In this example, use "Acme Corp" as the node ID. 

Example input: "John Doe was recently named the CEO of Acme Corp."
Example output: 

{
"nodes": [
   {
        "label": "person",
        "id": "John Doe",
        "firstName": "john",
        "lastName": "doe"
    },
    {
        "label": "company",
        "id": "Acme Corp",
    }
],
"edges": [
    {
        "label": "executive",
        "id": "e-john-doe-acme-corp",
        "node1": "John Doe",
        "node2": "Acme Corp"
    }
]
}

Use the given format to extract information from the following input, responding only with JSON and no extra text:
"""
    
    llamaPayload = json.dumps({ 
    	'prompt': prompt + query + "[/INST]",
        'max_gen_len': 2048,
    	'top_p': 0.9,
    	'temperature': 0.2
    })

    response = bedrock_runtime.invoke_model(
        body=llamaPayload, 
        modelId=llamaModelId, 
        accept='application/json', 
        contentType='application/json'
    )

    body = response.get('body').read().decode('utf-8')
    response_body = json.loads(body)
    return response_body['generation'].strip()

In [None]:
def format_llama_kg(j):
    c = j.replace("\n", "").replace("\t", "")
    idx = c.find('{')
    return json.loads(c[idx:])

In [23]:
def call_claude_kg(query):
    
    prompt_template = """

Below is an article from a financial news source. Your job is to extract nodes and edges to build a knowledge graph. A node is an entity like a company. An edge is a relationship between two nodes, like "John Smith is the CEO of Acme Corp". When extracting nodes, it's vital to ensure consistency. If a node, such as "Acme Corp", is mentioned multiple times in the text but is referred to by different names (e.g., "Acme"), always use the most complete identifier for that entity throughout the knowledge graph. In this example, use "Acme Corp" as the node ID. Use camel case for node id, like "acme_corp" instead of "Acme Corp". If you find additional information, add it as a property on the node or edge. For example, if Acme Corp is a mining company, you can add a property "industry" set to "mining". 

Each node should have at least an `id` field and a `type` field. The `id` is the unique identifier, and the `type` is the type of entity, like 'company' or 'executive'. You can include other properties if you find them.

Example output:

<json>
{
  "nodes": [
      {
          "id": "acme_corp",
          "type": "company",
          "name": "Acme Corp",
          "industry": "chemicals"
      },
      {
          "id": "john_doe",
          "type": "executive",
          "name": "John Doe"
      }
  ],
  "edges": [
      {
          "source": "acme_corp",
          "target": "john_doe",
          "type": "employee",
          "employee_type": "CEO"
      }
  ]
}
</json>

<article>
ARTICLE_HERE
</article>

You must output only valid JSON. Be concise - do not provide any extra text before or after the JSON.
"""

    prompt = prompt_template.replace("ARTICLE_HERE", query)
    
    claudePayload = json.dumps({ 
        "anthropic_version": "bedrock-2023-05-31",
        'max_tokens': 2048,
    	"messages": [
          {
            "role": "user",
            "content": [
              {
                "type": "text",
                "text": prompt
              }
            ]
          }
        ]
    })
    
    
    response = bedrock_runtime.invoke_model(
        body=claudePayload, 
        modelId=claudeModelId, 
        accept='application/json', 
        contentType='application/json'
    )

    body = response.get('body').read().decode('utf-8')

    response_body = json.loads(body)
    return response_body['content'][0]['text']

In [24]:
def format_claude_kg(j):
    if '<json>' in j:
        idx1 = j.find('<json>')
        idx2 = j.find('</json>')
        s = j[idx1+6:idx2]
        return(json.loads(s))
    elif '```json' in j:
        idx1 = j.find('```json')
        idx2 = j.rfind('```')
        s = j[idx1+7:idx2]
        return(json.loads(s))
    else:
        raise Exception("Unknown Claude response format")

## Node and edge extraction

Let's look at a single article and test our extraction methods.

In [25]:
text = lines[0]

In [26]:
text

'EX-10.23 5 dex1023.htm COVENANT NOT TO COMPETE AND NON-DISCLOSURE AGREEMENT\\nExhibit 10.23\\nCOVENANT NOT TO COMPETE\\nAND NON-DISCLOSURE AGREEMENT\\nPARTIES:\\nEric Dean Sprunk (“EMPLOYEE”)\\nand\\nNIKE, Inc., divisions, subsidiaries\\nand affiliates. (“NIKE”):\\nRECITALS:\\nA. This Covenant Not to Compete and Non-Disclosure Agreement is executed upon initial employment or upon the EMPLOYEE’s\\nadvancement with NIKE and is a condition of such employment or advancement.\\nB. Over the course of EMPLOYEE’s employment with NIKE, EMPLOYEE will be or has been exposed to and/or is in a position to\\ndevelop confidential information peculiar to NIKE’s business and not generally known to the public as defined below (“Protected Information”). It is\\nanticipated that EMPLOYEE will continue to be exposed to Protected Information of greater sensitivity as EMPLOYEE advances in the company.\\nC. The nature of NIKE’s business is highly competitive and disclosure of any Protected Information would 

In [27]:
j = call_llama_kg(text)

In [28]:
j

'{\n"nodes": [\n{\n"label": "person",\n"id": "Eric Dean Sprunk",\n"firstName": "Eric",\n"lastName": "Sprunk"\n},\n{\n"label": "company",\n"id": "NIKE, Inc.",\n"name": "NIKE"\n}\n],\n"edges": [\n{\n"label": "executive",\n"id": "e-Eric-Sprunk-NIKE",\n"node1": "Eric Dean Sprunk",\n"node2": "NIKE, Inc."\n}\n]\n}'

In [30]:
j = call_claude_kg(text)

In [31]:
j

'<json>\n{\n  "nodes": [\n    {\n      "id": "nike_inc",\n      "type": "company",\n      "name": "NIKE, Inc.",\n      "industry": "athletic footwear, athletic apparel, sports equipment and accessories"\n    },\n    {\n      "id": "eric_dean_sprunk",\n      "type": "employee",\n      "name": "Eric Dean Sprunk"\n    },\n    {\n      "id": "jeffrey_m_cava",\n      "type": "executive",\n      "name": "Jeffrey M. Cava",\n      "title": "Vice President, Global Human Resources"\n    }\n  ],\n  "edges": [\n    {\n      "source": "eric_dean_sprunk",\n      "target": "nike_inc",\n      "type": "employment",\n      "employment_type": "employee"\n    },\n    {\n      "source": "jeffrey_m_cava",\n      "target": "nike_inc",\n      "type": "employment",\n      "employment_type": "executive"\n    }\n  ]\n}\n</json>'

In [32]:
print(format_claude_kg(j))

{'nodes': [{'id': 'nike_inc', 'type': 'company', 'name': 'NIKE, Inc.', 'industry': 'athletic footwear, athletic apparel, sports equipment and accessories'}, {'id': 'eric_dean_sprunk', 'type': 'employee', 'name': 'Eric Dean Sprunk'}, {'id': 'jeffrey_m_cava', 'type': 'executive', 'name': 'Jeffrey M. Cava', 'title': 'Vice President, Global Human Resources'}], 'edges': [{'source': 'eric_dean_sprunk', 'target': 'nike_inc', 'type': 'employment', 'employment_type': 'employee'}, {'source': 'jeffrey_m_cava', 'target': 'nike_inc', 'type': 'employment', 'employment_type': 'executive'}]}


### Neptune

Let's check connectivity to the cluster and then try a few Cypher queries using Bolt. Note that the authentication username and password are not used and are just placeholder values.

In [33]:
%status

{'status': 'healthy',
 'startTime': 'Tue Mar 26 17:46:51 UTC 2024',
 'dbEngineVersion': '1.3.1.0.R1',
 'role': 'writer',
 'dfeQueryEngine': 'viaQueryHint',
 'gremlin': {'version': 'tinkerpop-3.6.4'},
 'sparql': {'version': 'sparql-1.1'},
 'opencypher': {'version': 'Neptune-9.0.20190305-1.0'},
 'labMode': {'ObjectIndex': 'disabled',
  'ReadWriteConflictDetection': 'enabled'},
 'features': {'SlowQueryLogs': 'disabled',
  'ResultCache': {'status': 'disabled'},
  'IAMAuthentication': 'disabled',
  'Streams': 'disabled',
  'AuditLog': 'disabled'},
 'settings': {'clusterQueryTimeoutInMs': '120000',
  'SlowQueryLogsThreshold': '5000'},
 'serverlessConfiguration': {'minCapacity': '1.0', 'maxCapacity': '128.0'}}

Note that this configuration secures the cluster using security groups and the authentication username and password are not used and are just placeholder values. This is for demonstration purposes only. For a production environment, you should use [IAM authentication](https://docs.aws.amazon.com/neptune/latest/userguide/get-started-security.html#get-started-security-iam-auth).

In [34]:
from neo4j import GraphDatabase
uri = f"bolt://{config['neptune']['endpoint']}:8182"
driver = GraphDatabase.driver(uri, auth=("username", "password"), encrypted=True)

### Process a few articles

Here we'll pick a few random articles from the dataset and process them.

In [37]:
article_indices = [0,1,2,3,4]

In [38]:
max_articles = len(lines)
max_articles

254

In [39]:
def insert_node(nid, nlabel, nprops, gdriver):
    propstr = []
    for p in nprops.keys():
        propstr.append(f"{p}: '{nprops[p]}'")
    q = "MERGE (:" + nlabel + " {" + ",".join(propstr) + "})"
    print(f"Query: {q}")
    gdriver.execute_query(q)
    
def insert_edge(elabel, en1, en2, et1, et2, eprops, gdriver):
    eprops['name'] = elabel
    propstr = []
    for p in eprops.keys():
        propstr.append(f"{p}: '{eprops[p]}'")
    print(f"eprops: {json.dumps(eprops)}")
    q = "MATCH (" + en1 + ":" + et1 + " {name: '" + en1 + "'}), (" + en2 + ":" + et2 + " {name: '" + en2 + "'}) " + \
        "CREATE (" + en1 + ")-[:" + elabel+ " {" + ",".join(propstr) + "}]->(" + en2 + ")"
    print(f"Query: {q}")
    gdriver.execute_query(q)

def process_article(a):
    n = a['nodes']
    e = a['edges']
    n_types = []
    e_types = []
    id_label_map = {}
    with GraphDatabase.driver(uri, auth=("username", "password"), encrypted=True) as gdriver:
        print(f"Processing nodes: {len(n)}")
        for node in n:
            try:
                nid = node['id']
                nlabel = node['type']
                n_types.append(nlabel)
                
                nprops = {}
                nprops['name'] = nid
                for k in node.keys():
                    if k in ['id', 'type', 'name']:
                        continue
                    else:
                        nprops[k] = node[k]
                if 'name' in node:
                    nprops['nname'] = node['name']
                    
                insert_node(nid, nlabel, nprops, gdriver)
                id_label_map[nid] = nlabel
            except Exception as ee: 
                print(f"Unable to process node {node} - {ee}")
        print(f"Processing edges: {len(e)}")
        for edge in e:
            try:
                elabel = edge['type']
                e_types.append(elabel)
                en1 = edge['source']
                en2 = edge['target']
                et1 = id_label_map[en1]
                et2 = id_label_map[en2]
                
                eprops = {}
                for k in edge.keys():
                    if k in ['source', 'type', 'target']:
                        continue
                    else:
                        eprops[k] = edge[k]
                
                insert_edge(elabel, en1, en2, et1, et2, eprops, gdriver)
            except Exception as ee: 
                print(f"Unable to process edge {edge} - {ee}")
          
    return n_types, e_types
            

In [40]:
for adx in article_indices:
    print(f"Article number {adx}")
    text = lines[adx]
    raw = call_claude_kg(text)
    print(f"Got Claude answer: {raw}")
    answer = format_claude_kg(raw)
    print(f"Claude JSON: {json.dumps(answer)}")
    process_article(answer)

Article number 0
Got Claude answer: <json>
{
  "nodes": [
    {
      "id": "nike_inc",
      "type": "company",
      "name": "NIKE, Inc."
    },
    {
      "id": "eric_dean_sprunk",
      "type": "executive",
      "name": "Eric Dean Sprunk"
    },
    {
      "id": "jeffrey_m_cava",
      "type": "executive",
      "name": "Jeffrey M. Cava",
      "title": "Vice President, Global Human Resources"
    }
  ],
  "edges": [
    {
      "source": "nike_inc",
      "target": "eric_dean_sprunk",
      "type": "employment",
      "employment_type": "EMPLOYEE"
    },
    {
      "source": "nike_inc",
      "target": "jeffrey_m_cava",
      "type": "employment",
      "employment_type": "Vice President, Global Human Resources"
    },
    {
      "source": "eric_dean_sprunk",
      "target": "nike_inc",
      "type": "employment_agreement",
      "agreement_type": "Covenant Not to Compete and Non-Disclosure Agreement"
    }
  ]
}
</json>
Claude JSON: {"nodes": [{"id": "nike_inc", "type": "com

### Optional reset

You can use this to clear everything out of the database if necessary. 

<div class="alert alert-block alert-info">
Replace the token in the second cell with the token created by the first cell.
</div>

In [None]:
%db_reset --generate-token

In [None]:
%db_reset --token 30c6e45f-2def-ea7f-fa50-8cceb92db088

## Explore the data

Now we can use regular Neptune queries to visualize the data. For example, let's say we have a company named `armanino`. First we can make sure we have this company in the graph.

In [47]:
%%oc

MATCH (a:company {name: 'albitar_oncology_consulting'}) RETURN a

Tab(children=(Output(layout=Layout(max_height='600px', max_width='940px', overflow='scroll')), Force(network=<…

Next we can run a Cypher query to show this company and all its relationships.

In [48]:
%%oc

MATCH (n:company {name: 'albitar_oncology_consulting'}) 
MATCH (n)-[r]-(m)
RETURN n,r, m

Tab(children=(Output(layout=Layout(max_height='600px', max_width='940px', overflow='scroll')), Force(network=<…

This Gremlin query is similar but will label each node and edge with a more descriptive label.

In [49]:
%%gremlin -p v,oute,inv
g.V().has("name", "albitar_oncology_consulting").bothE().bothV().path().by('name')

Tab(children=(Output(layout=Layout(max_height='600px', max_width='940px', overflow='scroll')), Force(network=<…

## Query graph with NLP

In [50]:
from langchain_community.graphs import NeptuneGraph

host = config['neptune']['endpoint']
port = 8182
use_https = True

graph = NeptuneGraph(host=host, port=port, use_https=use_https)

In [51]:
from langchain.llms.bedrock import Bedrock
from langchain.chains import NeptuneOpenCypherQAChain

modelId = 'anthropic.claude-v2:1' 
model_kwargs = {
    "max_tokens_to_sample": 512,
    "temperature": 0, 
    "top_k": 250, 
    "top_p": 1, 
    "stop_sequences": ["\n\nHuman:"] 
}

llm = Bedrock(
    model_id=modelId,
    model_kwargs=model_kwargs
)


In [52]:
chain = NeptuneOpenCypherQAChain.from_llm(llm = llm, graph=graph,verbose=True,)

In [53]:
chain.run("name the companies in the data")

  warn_deprecated(




[1m> Entering new NeptuneOpenCypherQAChain chain...[0m
Generated Cypher:
[32;1m[1;3mMATCH (c:company) 
RETURN c.name
[0m
Full Context:
[32;1m[1;3m{'ResponseMetadata': {'HTTPStatusCode': 200, 'HTTPHeaders': {'transfer-encoding': 'chunked', 'content-type': 'application/json;charset=UTF-8'}, 'RetryAttempts': 0}, 'results': [{'c.name': 'nike_inc'}, {'c.name': 'kite_pharma_inc'}, {'c.name': 'neogenomics_laboratories'}, {'c.name': 'gilead_sciences_inc'}, {'c.name': 'health_discovery_corporation'}, {'c.name': 'albitar_oncology_consulting'}, {'c.name': 'high_speed_net_solutions'}, {'c.name': 'r_j_seifert_enterprises'}, {'c.name': '99c_only_stores'}, {'c.name': 'leonard_green_and_partners'}, {'c.name': 'lazard'}]}[0m

[1m> Finished chain.[0m


' Based on the provided information, the companies are:\n\nnike_inc, kite_pharma_inc, neogenomics_laboratories, gilead_sciences_inc, health_discovery_corporation, albitar_oncology_consulting, high_speed_net_solutions, r_j_seifert_enterprises, 99c_only_stores, leonard_green_and_partners, lazard'

## Graph RAG

A more sophisticated way to use the graph is to follow this process.

* First, identify what concepts are being asked about.
* Second, query the graph for any related nodes.
* Extract a subgraph that includes the related nodes to a certain depth.
* Include the subgraph as context to the overall response.

In [54]:
def call_claude_get_concepts(query):

    prompt_template = """

Below is a question asked by a person. Identify the key concepts or ideas contained in the question, so that we can find more information about these concepts from other data sources.

Example question and output:

Can you tell me about Acme Corp?

<json>
{
  "nodes": [
      {
          "id": "acme_corp",
          "name": "Acme Corp",
      }
  ]
}
</json>

<question>
QUESTION_HERE
</question>

You must output only valid JSON. Be concise - do not provide any extra text before or after the JSON.
"""
    
    prompt = prompt_template.replace("QUESTION_HERE", query)
    
    claudePayload = json.dumps({ 
        "anthropic_version": "bedrock-2023-05-31",
        'max_tokens': 2048,
    	"messages": [
          {
            "role": "user",
            "content": [
              {
                "type": "text",
                "text": prompt
              }
            ]
          }
        ]
    })
    
    
    response = bedrock_runtime.invoke_model(
        body=claudePayload, 
        modelId=claudeModelId, 
        accept='application/json', 
        contentType='application/json'
    )

    body = response.get('body').read().decode('utf-8')

    response_body = json.loads(body)
    return response_body['content'][0]['text']

In [55]:
format_claude_kg(call_claude_get_concepts("Tell me about graph databases"))

{'nodes': [{'id': 'graph_databases', 'name': 'Graph Databases'}]}

In [56]:
records, summary, keys = driver.execute_query(
    "MATCH (src:company {name: 'albitar_oncology_consulting'}) MATCH (src)-[rel]-(tgt) RETURN src,rel,tgt"
)

# Loop through results and do something with them
for record in records:
    print(record.data())  # obtain record as dict
    
c_record = records[0].data()
c_record

{'src': {'name': 'albitar_oncology_consulting', 'nname': 'Albitar Oncology Consulting, LLC'}, 'rel': ({'name': 'neogenomics_laboratories', 'nname': 'NeoGenomics Laboratories, Inc.'}, 'agreement', {'name': 'albitar_oncology_consulting', 'nname': 'Albitar Oncology Consulting, LLC'}), 'tgt': {'name': 'neogenomics_laboratories', 'nname': 'NeoGenomics Laboratories, Inc.'}}
{'src': {'name': 'albitar_oncology_consulting', 'nname': 'Albitar Oncology Consulting, LLC'}, 'rel': ({'name': 'albitar_oncology_consulting', 'nname': 'Albitar Oncology Consulting, LLC'}, 'ownership', {'name': 'maher_albitar', 'nname': 'Maher Albitar, M.D.'}), 'tgt': {'name': 'maher_albitar', 'nname': 'Maher Albitar, M.D.'}}


{'src': {'name': 'albitar_oncology_consulting',
  'nname': 'Albitar Oncology Consulting, LLC'},
 'rel': ({'name': 'neogenomics_laboratories',
   'nname': 'NeoGenomics Laboratories, Inc.'},
  'agreement',
  {'name': 'albitar_oncology_consulting',
   'nname': 'Albitar Oncology Consulting, LLC'}),
 'tgt': {'name': 'neogenomics_laboratories',
  'nname': 'NeoGenomics Laboratories, Inc.'}}

In [57]:
def call_claude_graph_rag(query, relationships):

    prompt_template = """

Below is a question asked by a person. In order to help you answer, we include related relationship information the concepts in the question, extracted from a knowledge graph. Use the information from the knowledge graph to answer the question.

Here's an example.

<example_question>
Can you tell me about Acme Corp?
</example_question>

<example_relationships>
{'src': {'name': 'acme_corp'}, 'rel': ({'name': 'acme_corp'}, 'leadership', {'name': 'john_doe'}), 'tgt': {'name': 'john_doe'}}
</example_relationships>

<example_output>
Acme Corp employes John Doe as a senior leader.
</example_output>

<question>
QUESTION_HERE
</question>

<relationships>
RELS_HERE
</relationships>

Be concise.
"""
    if isinstance(relationships, list):
        rel_str =  "\n".join([json.dumps(x) for x in relationships])
        prompt = prompt_template.replace("QUESTION_HERE", query).replace("RELS_HERE", rel_str)
    else:
        prompt = prompt_template.replace("QUESTION_HERE", query).replace("RELS_HERE", json.dumps(relationships))
    claudePayload = json.dumps({ 
        "anthropic_version": "bedrock-2023-05-31",
        'max_tokens': 2048,
    	"messages": [
          {
            "role": "user",
            "content": [
              {
                "type": "text",
                "text": prompt
              }
            ]
          }
        ]
    })
    
    
    response = bedrock_runtime.invoke_model(
        body=claudePayload, 
        modelId=claudeModelId, 
        accept='application/json', 
        contentType='application/json'
    )

    body = response.get('body').read().decode('utf-8')

    response_body = json.loads(body)
    return response_body['content'][0]['text']

In [58]:
call_claude_graph_rag("Tell me about Albitar Oncology", c_record)

'Albitar Oncology Consulting, LLC has an agreement with NeoGenomics Laboratories, Inc.'