## Step 3: Conduct a vector similarity search with the Prompt and Send the Final Prompt to the LLM

In this notebook, we will first conduct a similarity search in the ChromaDB for texts which exhibit the least distance from the prompt created in step 02. 10 pieces of text will selected, but this number can be changed as desired. 

Subsequently, the final prompt (which includes both the prompt and the found results) will be sent to an LLM on AWS Bedrock. 

> Credit for parts of this notebook: https://github.com/Farzad-R/Advanced-QA-and-RAG-Series/commits?author=Farzad-R

In [7]:
import chromadb
from pyprojroot import here

# Load ChromaDB
chroma_client = chromadb.PersistentClient(path=str(here("data/chroma")))
vectordb = chroma_client.get_collection(name="renalworks")
vectordb.count()

165

In [8]:
import os
from src.utils.bedrock_caller import BedrockCaller

bedrock_caller = BedrockCaller()

# Specify the path of the prompt
prompt_file_name = 'clinical-notes-01-prompt-specific-02.txt'
prompt_file_path = os.path.join('prompts', prompt_file_name)

# Load prompt
with open(prompt_file_path, 'r') as f:
    prompt = f.read()

# Obtain prompt embedding
prompt_embedding = bedrock_caller.get_embedding(prompt)
print(prompt_embedding)

[-0.007839507, -0.01812603, 0.035889536, -0.00892707, 0.0059815897, -0.001076233, -0.01966674, -0.04259617, 0.026101481, 0.0255577, -0.023382578, 0.000101250866, 0.03715836, -0.04894028, -0.054740608, 0.03244559, -0.028276606, -0.058365814, -0.0150446035, 0.080842085, 0.026645262, 0.04513381, 0.04567759, -0.017310357, -0.10005568, -0.014047672, -0.045858853, -0.01839792, 0.038608443, -0.0109209325, 0.04259617, 0.054378085, -0.004101014, -0.02501392, 0.013413262, 0.045496333, -0.009788056, -0.010377152, 0.021841865, -0.061628498, 0.028820386, -0.027551563, 0.00070238364, -0.034620713, -0.046402633, 0.0027642194, 0.00046731168, 0.041871127, -0.0126882205, 0.023110686, 0.05619069, 0.03933348, -0.04459003, 0.0059815897, 0.07069151, 0.048577756, -0.027551563, -0.05002784, -0.022204386, -0.049302798, 0.017763508, -0.01250696, -0.0757668, -0.030089207, 0.014047672, -6.832663e-05, 0.042958688, -0.07177907, -0.050752882, -0.03298937, 0.05039036, -0.023563838, 0.03715836, 0.029364167, -0.0009402

In [9]:
# Conduct a similarity search in the ChromaDB
similarity_search_results = vectordb.query(
    query_embeddings = prompt_embedding,
    n_results=20 # Adjust the number of results to obtain as required 
)['documents']

print(similarity_search_results)

[["id: 5546.0,\nnote: - Pt alert\n- Stable\n- No c/o\n- On Linear UF profiling start 1.06L/h - stop 1.06L/h.\n- Reduce dialysate temperature at 36.5'C.,\ntype: HD,\ndetails: Pre-dialysis,\nfhir_id: 387111.0,\nid_patient: 1951.0,\ncreated_at: 10:13.8,\nmodified_at: nan,\ncreated_by: 66.0,\nmodified_by: nan,\nid_tenant: 100.0,\nid_branch: 10000.0,\nid_dialysis: 4529.0,\nmanually_created: False,\nhd_reading_id: nan,\nlab_order_id: nan,\n", 'id: 11620.0,\nnote: Patient afebrile. No sign and symptoms of Covid 19 seen\nOn UF Profiling linear:start:1.07 L/h-1.07 L/h\nReduced dialysate temperature to 36.5 0C\nHD 3 hour today due to water disruption start tomorrow,\ntype: HD,\ndetails: Pre-dialysis,\nfhir_id: 677517.0,\nid_patient: 1951.0,\ncreated_at: 01:13.7,\nmodified_at: nan,\ncreated_by: 231.0,\nmodified_by: nan,\nid_tenant: 100.0,\nid_branch: 10000.0,\nid_dialysis: 8753.0,\nmanually_created: False,\nhd_reading_id: nan,\nlab_order_id: nan,\n', 'id: 10499.0,\nnote: Patient afebrile. No sign

### Construction of Final Prompt & Response Generation

The final prompt is constructed as with the prompt from step 02 and the similarity search results obtained from above, before being sent to Anthropic's Claude 3 Sonnet for response generation. 

In [10]:
sample_file_name = 'clinical-notes-01.txt'
sample_file_path = os.path.join('sample_responses', sample_file_name)

# Load sample
with open(sample_file_path, 'r') as f:
    sample = f.read()

system_role = "You are a highly knowledgeable assistant specialized in creating clinical documents and answering questions. You will receive the user's question along with the search results of that question over a database. Please use the search results to generate the proper answer. "
final_prompt = f"User's question: {prompt} \n\n Search results:\n {similarity_search_results}"

messages_API_body = {
    "anthropic_version": "bedrock-2023-05-31", 
    "max_tokens": int(500/0.75),
    "system": system_role,
    "messages": [
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": final_prompt
                }
            ]
        }
    ]
}

In [11]:
response_body = bedrock_caller.call_claude3sonnet(messages_API_body)
llm_output = response_body['content'][0]['text']

print(llm_output)

Based on the provided clinical data, here is the report for the patient:

1. The dates and days that the patient is unwell and the nurses indicates that the patient is unstable:
The clinical notes do not mention any days or dates when the patient was unwell or unstable. All notes indicate that the patient was stable and alert during the dialysis sessions.

2. Itemize the dates and time that the patient receive any form of injections or medication and the person who administer the injection / medication:
- 2023-05-07 01:51:44 AM - IV Erysaa 4k (injection) given by SN Noraini

3. Did the patient missed or omit medication / injection during dialysis - list the dates:
Yes, the patient omitted the following medication on this date:
- 2023-01-09 09:03:05 PM - Ranofer (Iron Sucrose) 100mg/5mls Inj

4. Highlight the type of profiling that is used by the nursing team:
Based on the clinical notes, the nursing team used Linear Ultrafiltration (UF) Profiling for this patient during dialysis sessio

The LLM output is then saved into the `results` folder. 

In [12]:
import re
import os

sample_file_name = 'clinical-notes-01.txt'
save_dir = os.path.join('results', os.path.splitext(sample_file_name)[0])

# Regular expression to match files named 'attempt-x.txt'
pattern = re.compile(r'attempt-(\d+)\.txt')
max_num = -1

# Find the highest attempt number
for filename in os.listdir(save_dir):
    match = pattern.match(filename)
    if match:
        num = int(match.group(1))
        if num > max_num:
            max_num = num
next_num = max_num + 1 if max_num != -1 else 1

# Create the new filename
new_filename = f'attempt-{next_num}.txt'
new_filepath = os.path.join(save_dir, new_filename)

# Save the data to the new file
with open(new_filepath, 'w') as file:
    file.write(llm_output)