### Data Description

## Installing and Importing Necessary Libraries and Dependencies

In [None]:
# Installation for GPU llama-cpp-python
!CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install llama-cpp-python==0.1.85 --force-reinstall --no-cache-dir -q



In [None]:
# For installing the libraries & downloading models from HF Hub
!pip install huggingface_hub==0.25.0 pandas==2.2.2 tiktoken==0.6.0 pymupdf==1.25.1 langchain==0.3.0 langchain-community==0.3.0 langchain-text-splitters==0.3.6 chromadb==0.5.5 sentence-transformers==3.2.0 numpy==1.26.0 -q

In [None]:
#Libraries for processing dataframes,text
import json,os
import tiktoken
import pandas as pd

#Libraries for Loading Data, Chunking, Embedding, and Vector Databases
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.document_loaders import PyMuPDFLoader
from langchain_community.embeddings.sentence_transformer import SentenceTransformerEmbeddings
from langchain_community.vectorstores import Chroma

#Libraries for downloading and loading the llm
from huggingface_hub import hf_hub_download
from llama_cpp import Llama

## Question Answering using LLM

#### Downloading and Loading the model

In [None]:
model_name_or_path = "TheBloke/Mistral-7B-Instruct-v0.2-GGUF" # I used the Mistral 7b Model. This names the model
model_basename = "mistral-7b-instruct-v0.2.Q6_K.gguf" #This names the base model.

In [None]:
model_path = hf_hub_download(
    repo_id=model_name_or_path,
    filename=model_basename
)# This code is storing the model in model path so I can load the model later.

In [None]:
#uncomment the below snippet of code if the runtime is connected to GPU.
llm = Llama(
    model_path=model_path,
    n_ctx=5000,
    n_gpu_layers=38,
    n_batch=512
) # This code downloads the mistral model and defines context window, batch window, and offloading GPU layers.

#### Response

In [None]:
def response(query,max_tokens=128,temperature=0,top_p=0.95,top_k=50):
    model_output = llm(
      prompt=query,
      max_tokens=max_tokens,
      temperature=temperature,
      top_p=top_p,
      top_k=top_k
    )

    return model_output['choices'][0]['text']

### Query 1: What is the protocol for managing sepsis in a critical care unit?

In [None]:
user_input = "What is the protocol for managing sepsis in a critical care unit?"
model_response = response(user_input)
print(model_response)


**Query** 1: The query was cut off due to the size of the token. As it is limited to 128 tokens, it only provided one step of the protocol. It does provide informatio on what sepsis is.


### Query 2: What are the common symptoms for appendicitis, and can it be cured via medicine? If not, what surgical procedure should be followed to treat it?

In [None]:
user_input1 ="What are the common symptoms for appendicitis, and can it be cured via medicine? If not, what surgical procedure should be followed to treat it?"
model_response1 = response(user_input1)
print(model_response1)

**Query 2:** Similar to the first query, it was cut off because the size of tokens so it does not answer the question. It does a good job to of defining an abdominal pain.

### Query 3: What are the effective treatments or solutions for addressing sudden patchy hair loss, commonly seen as localized bald spots on the scalp, and what could be the possible causes behind it?

In [None]:
user_input = "What are the effective treatments or solutions for addressing sudden patchy hair loss, commonly seen as localized bald spots on the scalp, and what could be the possible causes behind it?"
model_response= response(user_input)
print(model_response)

**Query 3:** The response ignores the first part of the question as the query was not seperated out. Instead, it answers the second part of the question which is the cause.

### Query 4:  What treatments are recommended for a person who has sustained a physical injury to brain tissue, resulting in temporary or permanent impairment of brain function?

In [None]:
user_input = "What treatments are recommended for a person who has sustained a physical injury to brain tissue, resulting in temporary or permanent impairment of brain function?"
model_response= response(user_input)
print(model_response)

**Query 4**:The response is not complete because of the tokens. It does provide two pieces of information related to emergency car and medications, but it also gives an explanation of TBI which was not asked.

### Query 5: What are the necessary precautions and treatment steps for a person who has fractured their leg during a hiking trip, and what should be considered for their care and recovery?

In [None]:
user_input = "What are the necessary precautions and treatment steps for a person who has fractured their leg during a hiking trip, and what should be considered for their care and recovery?"
model_response= response(user_input)
print(model_response)

**Query 5:** The response doesn't complete due to token size. It does answer the question until it stops.

## Question Answering using LLM with Prompt Engineering

In [None]:
def response_2(query,max_tokens=1024,temperature=0,top_p=0.95,top_k=50): #increased the tokens due to limited responses in the previous queries.

# A system message was provided to define beter the expected response and information..
    system_message = """
    [INST]<<SYS>>


    You are an AI assistant designed to support a licensed medical professional. Your core functions are to assist with synthesizing information based on available records
    and prompts asked by the users. Your persona is that of a helpful, accurate, and ethical tool. You are not a human clinician.

    **Primary Directives:**
    1.  **Safety First:** You must not provide medical advice, diagnoses, or medication recommendations. When generating text, strictly adhere to the information provided by the human user.
    2.  **Source Reliability:** Use only peer-reviewed medical sources or explicitly provided patient data for generating responses.
    3.  **No Hallucinations:** You must prevent the generation of inaccurate or unsupported information. If information is ambiguous, state this and ask for clarification.
    4.  **Bias Mitigation:** Be mindful of and explicitly exclude non-clinically relevant demographic descriptors to avoid introducing bias.

 <</SYS>>[/INST]
    """

    model_output = llm(
      prompt=query,
      max_tokens=max_tokens,
      temperature=temperature,
      top_p=top_p,
      top_k=top_k
    )

    return model_output['choices'][0]['text']

### Query 1: What is the protocol for managing sepsis in a critical care unit?

In [None]:
user_input = "Provide details on the protocols needed for managing sepsis in a critical care unit. The output should be in an instructional format with bullets. Each bullet should be followed with a verb describing the key action." ##The prompt was updated to provide clear information and in a format that was easier to read
model_response = response_2(user_input)
print(model_response)

**Qeuery 1:** The prompt was updated from being asked as a question to asking the model to provide a response to treating Sepsis in a specific format. With this information, it was able to give more detailed instructions and clear steps on the protocals for treating sepsis.

### Query 2: What are the common symptoms for appendicitis, and can it be cured via medicine? If not, what surgical procedure should be followed to treat it?

In [None]:
user_input = '''

Provide the common symptons for appendicitis.

Stop providing the common symptoms for appendicitis.

Provide if appendicitis can be cured by medication or whether if surgery will be needed. If surgery is needed, describe which surgical procedures should be followed to treat appendicitis.
''' ##The query was updated to seperate the two questions and give clearer instructions on the ask.
model_response = response_2(user_input)
print(model_response)

**Query 2**: By seperating the query, I was able to get two responses from the query. It answered the symptoms and how it can be cured.

### Query 3: What are the effective treatments or solutions for addressing sudden patchy hair loss, commonly seen as localized bald spots on the scalp, and what could be the possible causes behind it?

In [None]:
user_input = '''

Explain the top 5 possible causes for men and women developing patch hair loss, also known as bald spots. Denote which causes are for women, men, or both.
Stop explaining the possible causes for men and women.


Provide effective treatments and solutions to address sudden patch hair loss, also known as bald spots in women and men. If they are different, provide which treatment is more effective for women or men.
'''
model_response = response_2(user_input)
print(model_response)

**Query 3**: The prompt was rewritten to ask for more specific details by enumerating the number of causes and the number of treatments. This seemed to work as it was better at providing more details on both.

### Query 4:  What treatments are recommended for a person who has sustained a physical injury to brain tissue, resulting in temporary or permanent impairment of brain function?

In [None]:
user_input = '''

Below I have described a recommended treamtment for someone who has sustained a muscle injury:


Initial treatment for a muscle injury, such as a strain, often follows the RICE or PRICE method, focusing on pain and swelling control. This involves:
- **Rest**: Avoiding activities that worsen pain or discomfort. It's generally recommended to avoid complete inactivity for too long, and gentle movement may be introduced as pain allows.
- **Ice**: Applying cold packs to the injured area to help reduce swelling and pain. It is important to avoid applying ice directly to the skin and to use it for limited durations.
- **Compression**: Using a bandage to wrap the injured area can help limit swelling. Care should be taken to ensure the bandage is not too tight, which could restrict circulation.
- **Elevation**: Keeping the injured body part raised above heart level can assist with reducing swelling.
- **Pain Management**:Managing pain can also involve over-the-counter options, but it is important to understand which ones are appropriate and when to use them, as some may interfere with healing in the very early stages of injury


Use the above description to generate what treatments can be used a physical injury to brain tissue that might result in temporary or permanent impairment of brain function. ''' #asking to have output in a certain way
model_response = response_2(user_input)
print(model_response)

**Query 4**: In this query, I engineered it so I could provide it an example of the information I was looking for. This allowed the response to provide details on the treatment to support brain injury. It did add seaking professional medical assistance, which meets the system message prompt.

### Query 5: What are the necessary precautions and treatment steps for a person who has fractured their leg during a hiking trip, and what should be considered for their care and recovery?

In [None]:
user_input = ''' What are the necessary precautions and treatment steps for a person who has fractured their leg during a hiking trip, and what should be considered for their care and recovery?  Ask me an relevant questions before answering''' #asking clarification
model_response = response_2(user_input)
print(model_response)

In [None]:
user_input = ''' What are the necessary precautions and treatment steps for a person who has fractured their leg during a hiking trip, and what should be considered for their care and recovery?

Below are some important circumstances and information to consider in the response:
1. The leg was factured below the knee.
2. There was swelling and bruising at the time of the injury.
3. There was no bleeding or open wound on the leg after the injury occurred.
4. The person was not able to bear weight on the leg after the injury occurred.
5. The injured person has a vitamin D deficiency but does not have any other underlying medical conditions.
6. There are two people that can provide assistance and support during the initial care and transport.
7. The only resources is car. The car is about a mile away.

Provide the response in a bullet format, listing precautions first and then treatment steps. '''


model_response = response_2(user_input)
print(model_response)

**Query 5**: In this query, I asked AI to ask what questions it would need to provide a good response, based on the initial question. From those questions, I made up the data to answer and it provided me a comprehensive treatment list.

## Data Preparation for RAG

### Loading the Data

In [None]:
from google.colab import drive
drive.mount('/content/drive')#mounting google drive

In [None]:
md_pdf = '/content/drive/MyDrive/medical_diagnosis_manual.pdf' #making the PDF data accessible

In [None]:
pdf_loader = PyMuPDFLoader(md_pdf) #loading the data

In [None]:
md = pdf_loader.load() #renaming the data

### Data Overview

#### Checking the first 5 pages

In [None]:
for i in range(5):
    print(f"Page Number : {i+1}",end="\n")
    print(md[i].page_content,end="\n") #This for loop checks the first five pages of the PDF.

The first five pages are just table of contents.

#### Checking the number of pages

In [None]:
len(md) #This checks the number of pages

There are 4,114 pages in the book.

### Data Chunking

In [None]:
text_splitter = RecursiveCharacterTextSplitter.from_tiktoken_encoder(
    encoding_name='cl100k_base',
    chunk_size=512,
    chunk_overlap= 40
) #This is breaking down the medical book into various chunks. The chunks are seperated at 512 tokenss with 40 tokers overlapped(40 was chosen after seeing initial have 20 tokens but only seeing one word being repeated)

In [None]:
document_chunks = pdf_loader.load_and_split(text_splitter) #creates a local variable to split the chunks

In [None]:
len(document_chunks) #checks the size of chunks. These are more than pages, which is expected

In [None]:
document_chunks[500].page_content #randomly select one page

In [None]:
document_chunks[501].page_content #randomly select the next page

In [None]:
document_chunks[502].page_content #randomly select the page after that

####**Conclusion**

The data chunking worked as the document chunks are 8,594 and there are about 4,000 pages.

I increased the overlap to 40 tokens so it could pull more of a sentence than just a word for the overlap. The three pages selected all have overlap.



### Embedding

In [None]:
embedding_model =SentenceTransformerEmbeddings(model_name='thenlper/gte-large') #this model was chosen because its the same size as the token size we are using

In [None]:
embedding_1 = embedding_model.embed_query(document_chunks[0].page_content) #this sets up checking the embedding
embedding_2 = embedding_model.embed_query(document_chunks[1].page_content)

In [None]:
print("Dimension of the embedding vector ",len(embedding_1))
len(embedding_1)==len(embedding_2) #this tests the embedding

####**Conclusion**

The data was embedded and it was verified that the dimensions were the same.


### Vector Database

In [None]:
out_dir = 'med_db'# This creates a dir database

if not os.path.exists(out_dir):
  os.makedirs(out_dir)

In [None]:
vectorstore = Chroma.from_documents(
    document_chunks,
    embedding_model,
    persist_directory=out_dir
)#This creates the vector databases from the document chunks and the embedding model. This is to store it in the database

In [None]:
vectorstore = Chroma(persist_directory=out_dir,embedding_function=embedding_model) #This loads the database

In [None]:
vectorstore.embeddings #This tells me which model is being used internally.  This confirms the right embedding model is being used

In [None]:
vectorstore.similarity_search("Cancer throat radiation",k=3) #This is a search to see how the embedding is finding the top 3 results.

####**Conclusion**

The vector database worked as intended as it did pull up the top responses related to Cancer throat radiation.

### Retriever

In [None]:
retriever = vectorstore.as_retriever(
    search_type='similarity',
    search_kwargs={'k': 2}
) #variable to conduct retriever activities. This defines the search type to be similarity and that the top 2 similar documents should be pulled

In [None]:
rel_docs = retriever.get_relevant_documents("what is the prognosis of someone with throat cancer?") #local variable to test the retriever and getting relevant documents based on a question
rel_docs

#### **Conclusion**

The response does provide details on the survival rate, but the response is difficult to understand as it it includes details on the source and other irrelevant information (e.g., my email address).

### System and User Prompt Template

In [None]:
qna_system_message = """

You are a medical assistant helping medical professionals synthesize vast volumes of medical data and information to create accurate descriptions, diagnoses, and treatment plans. Your core functions are to assist with synthesizing information based on available records and provide clear and concise information. Your persona is that of a helpful, accurate, and ethical tool. You are not a human clinician.

**Primary Directives:**
    1.  **Source Reliability:** Use only peer-reviewed medical sources or explicitly provided patient data for generating responses.
    2.  **No Hallucinations:** You must prevent the generation of inaccurate or unsupported information. If information is ambiguous, state this and ask for clarification.
    3.  **Bias Mitigation:** Be mindful of and explicitly exclude non-clinically relevant demographic descriptors to avoid introducing bias.

User questions will begin with the token: ###Question.


Remember that the answer to ###Question might not always be directly present in the information provided in the ###Context.
the answer can be indirectly derived from the information in ###Context.

If the answer is not found in the context, respond "I don't know"
"""

In [None]:
qna_user_message_template = """
Consider the following ###Context and ###Question
###Context
{context}

###Question
{question}
"""

#### Analysis

I created a system prompt that outline what the requirements of the AI were and what it's role were. I also did research to identify what potential directives I should be giving this to ensure that it kept hallucinations and biases to a minimum and telling the user "I don't know if it didn't have an answer". Then, I created the user message template using a standard context and question format.

The goal of the system message was to assist the user with getting informatiom, diagnoses, and treatment plan but not make any decisions.

### Response Function

In [None]:
def generate_rag_response(user_input,k=3,max_tokens=1024,temperature=0,top_p=0.95,top_k=50):
    global qna_system_message,qna_user_message_template
    # Retrieve relevant document chunks
    relevant_document_chunks = retriever.get_relevant_documents(query=user_input,k=k)
    context_list = [d.page_content for d in relevant_document_chunks]

    # Combine document chunks into a single context
    context_for_query = ". ".join(context_list)

    user_message = qna_user_message_template.replace('{context}', context_for_query)
    user_message = user_message.replace('{question}', user_input)

    prompt = qna_system_message + '\n' + user_message

    # Generate the response
    try:
        response = llm(
                  prompt=prompt,
                  max_tokens=max_tokens,
                  temperature=temperature,
                  top_p=top_p,
                  top_k=top_k
                  )

        # Extract and print the model's response
        response = response['choices'][0]['text'].strip()
    except Exception as e:
        response = f'Sorry, I encountered the following error: \n {e}'

    return response

## Question Answering using RAG

### Query 1: What is the protocol for managing sepsis in a critical care unit?

In [None]:
user_input = "What are the protocols for managing sepsis in a critical care unit? The output should be in an instructional format with bullets. Each bullet should be followed with a verb describing the key action."
model_response = generate_rag_response(user_input)
print(model_response)

####**Conclusion**

The response is clear and provides information that it does not have sufficient context to answer the directly as no context was given. However, it still responds with good information on how to approach managing sepsis.



### Query 2: What are the common symptoms for appendicitis, and can it be cured via medicine? If not, what surgical procedure should be followed to treat it?

In [None]:
user_input = '''What are the common symptions for appendicitis? Provide if appendicitis can be cured by medication or whether if surgery will be needed. If surgery is needed, describe which surgical procedures should be followed to treat appendicitis.'''
model_response = generate_rag_response(user_input)
print(model_response)

####**Conclusion**

The responses is pretty good as the question is outlined on what is needed. The response includes the common symptoms, provides the surgical procedures and whether mediation can be used cure it or how it can be used.

### Query 3: What are the effective treatments or solutions for addressing sudden patchy hair loss, commonly seen as localized bald spots on the scalp, and what could be the possible causes behind it?

In [None]:
user_input = '''
What are the effective treatments or solutions for addressing sudden patchy hair loss, commonly seen as localized bald spots on the scalp?

What could be the possible causes behind it?'''

model_response = generate_rag_response(user_input)
print(model_response)

####**Conclusion**

The response provides a generic answer on the possible causes and effective treatment. As no additional context is provided (e.g., the patient has a certain history or is experiencing certain symptoms), the response provides context that the most effective treatment is going to be based on the underlying cause and severity, which follows the system prompt response.

### Query 4:  What treatments are recommended for a person who has sustained a physical injury to brain tissue, resulting in temporary or permanent impairment of brain function?

In [None]:
user_input = '''What treatments are recommended for a person who has sustained a physical injury to brain tissue, resulting in temporary or permanent impairment of brain function?'''
model_response = generate_rag_response(user_input)
print(model_response)

####**Conclusion**

The response here is clear and provides good details on the treatment. In this response, I see clear details and information than the previous two responses to this questions as it using the RAG to provide a better resposne. It gives more details on there not being one specific treatment as I did not provide any additional context.

### Query 5: What are the necessary precautions and treatment steps for a person who has fractured their leg during a hiking trip, and what should be considered for their care and recovery?

In [None]:
user_input = '''What are the necessary precautions and treatment steps for a person who has fractured their leg during a hiking trip, and what should be considered for their care and recovery?'''
model_response = generate_rag_response(user_input)
print(model_response)

####**Conclusion**

The responses provided give good details on the precautions and treatments that need to be followed. Without giving the same inputs as I did in prompt engineering, the resposne is still able to give information on medications needed, treatment options, and things to consider. It does repeat itself as it list antibiotics twice in the response so more the system message may need to clarify that it does not need to repeat information.


### Fine-tuning

**Background/Choice Selection**

For the fine tuning, I selected one question and asked it various times by changing the temperature, the max tokens, the Top P, and the Top K choices.  

In [None]:
user_input = '''
What are the effective treatments or solutions for addressing sudden patchy hair loss, commonly seen as localized bald spots on the scalp?

What could be the possible causes behind it?'''

model_response = generate_rag_response(user_input,max_tokens=100) # I limited the number of tokens to 100 to see what difference it would make in the response
print(model_response)

**Conclusion**

By limiting the number of tokens, the response is shorter and cut off. However, it does provide the details on possible causes and some potential treatment options. It does not give details on the need to get more information or understanding the cause.

In [None]:
user_input = '''
What are the effective treatments or solutions for addressing sudden patchy hair loss, commonly seen as localized bald spots on the scalp?

What could be the possible causes behind it?'''

model_response = generate_rag_response(user_input,max_tokens=700, temperature=.5)# In this instance, I increased the temperature to .5 to allow more creative answering and had tokens set to 700, which is less than before but more than previous query.
print(model_response)

**Conclusion**

In this instance, I increased the temperature and changed tokens size. The change in temperature caused it to provide additional questions and answer to the differences between scarring and non-scarring forms of alopecia, which was not asked.  It also focuses in on only alopeicia areate as the only cause of hair loss when in previous answers it had given other causes.

The temperature being set at .5 may be too high for medical data.

The response was complete.

In [None]:
user_input = '''
What are the effective treatments or solutions for addressing sudden patchy hair loss, commonly seen as localized bald spots on the scalp?

What could be the possible causes behind it?'''

model_response = generate_rag_response(user_input,max_tokens=700,top_p=0.98, top_k=20) #In this instance, I changed the top_p and top_k variables to allow for more diversity of response without changing the temperature. I also kept tokens at the same
print(model_response)

**Conclusion**

For this response, I changed the top_p to .98 and the top_k to 20. The Top P increases the diversity slightly while the Top K reduces it.

This response is more focused and directly answers the questions about causes and treatments based on the retrieved context. It doesn't introduce additional questions or delve into differentiating scarring and non-scarring alopecia as the previous response did when temperature was set higher.

The response is complete.

## Output Evaluation

Let us now use the LLM-as-a-judge method to check the quality of the RAG system on two parameters - retrieval and generation. We illustrate this evaluation based on the answeres generated to the question from the previous section.

- We are using the same Mistral model for evaluation, so basically here the llm is rating itself on how well he has performed in the task.

In [None]:
groundedness_rater_system_message  = '''You are tasked with rating AI generated answers to questions posed by users.
You will be presented a question, context used by the AI system to generate the answer and an AI generated answer to the question.
In the input, the question will begin with ###Question, the context will begin with ###Context while the AI generated answer will begin with ###Answer.

Evaluation criteria:
The task is to judge the extent to which the metric is followed by the answer.
1 - The metric is not followed at all
2 - The metric is followed only to a limited extent
3 - The metric is followed to a good extent
4 - The metric is followed mostly
5 - The metric is followed completely

Metric:
The answer should be derived only from the information presented in the context

Instructions:
1. First write down the steps that are needed to evaluate the answer as per the metric.
2. Give a step-by-step explanation if the answer adheres to the metric considering the question and context as the input.
3. Next, evaluate the extent to which the metric is followed.
4. Use the previous information to rate the answer using the evaluaton criteria and assign a score.'''

In [None]:
relevance_rater_system_message = '''You are tasked with rating AI generated answers to questions posed by users.
You will be presented a question, context used by the AI system to generate the answer and an AI generated answer to the question.
In the input, the question will begin with ###Question, the context will begin with ###Context while the AI generated answer will begin with ###Answer.

Evaluation criteria:
The task is to judge the extent to which the metric is followed by the answer.
1 - The metric is not followed at all
2 - The metric is followed only to a limited extent
3 - The metric is followed to a good extent
4 - The metric is followed mostly
5 - The metric is followed completely

Metric:
Relevance measures how well the answer addresses the main aspects of the question, based on the context.
Consider whether all and only the important aspects are contained in the answer when evaluating relevance.

Instructions:
1. First write down the steps that are needed to evaluate the context as per the metric.
2. Give a step-by-step explanation if the context adheres to the metric considering the question as the input.
3. Next, evaluate the extent to which the metric is followed.
4. Use the previous information to rate the context using the evaluaton criteria and assign a score.'''

In [None]:
user_message_template = '''"""
###Question
{question}

###Context
{context}

###Answer
{answer}'''

In [None]:
def generate_ground_relevance_response(user_input,k=3,max_tokens=1024,temperature=0,top_p=0.95,top_k=50):
    global qna_system_message,qna_user_message_template
    # Retrieve relevant document chunks
    relevant_document_chunks = retriever.get_relevant_documents(query=user_input,k=3)
    context_list = [d.page_content for d in relevant_document_chunks]
    context_for_query = ". ".join(context_list)

    # Combine user_prompt and system_message to create the prompt
    prompt = f"""[INST]{qna_system_message}\n
                {'user'}: {qna_user_message_template.format(context=context_for_query, question=user_input)}
                [/INST]"""

    response = llm(
            prompt=prompt,
            max_tokens=max_tokens,
            temperature=temperature,
            top_p=top_p,
            top_k=top_k,
            stop=['INST'],
            echo=False
            )

    answer =  response["choices"][0]["text"]

    # Combine user_prompt and system_message to create the prompt
    groundedness_prompt = f"""[INST]{groundedness_rater_system_message}\n
                {'user'}: {user_message_template.format(context=context_for_query, question=user_input, answer=answer)}
                [/INST]"""

    # Combine user_prompt and system_message to create the prompt
    relevance_prompt = f"""[INST]{relevance_rater_system_message}\n
                {'user'}: {user_message_template.format(context=context_for_query, question=user_input, answer=answer)}
                [/INST]"""

    response_1 = llm(
            prompt=groundedness_prompt,
            max_tokens=max_tokens,
            temperature=temperature,
            top_p=top_p,
            top_k=top_k,
            stop=['INST'],
            echo=False
            )

    response_2 = llm(
            prompt=relevance_prompt,
            max_tokens=max_tokens,
            temperature=temperature,
            top_p=top_p,
            top_k=top_k,
            stop=['INST'],
            echo=False
            )

    return response_1['choices'][0]['text'],response_2['choices'][0]['text']

### Query 1: What is the protocol for managing sepsis in a critical care unit?

In [None]:
user_input = "What is the protocol for managing sepsis in a critical care unit"
ground,rel = generate_ground_relevance_response(user_input,max_tokens=600)

print(ground,end="\n\n")
print(rel)

### Query 2: What are the common symptoms for appendicitis, and can it be cured via medicine? If not, what surgical procedure should be followed to treat it?

In [None]:
user_input = "What are the common symptoms for appendicitis, and can it be cured via medicine? If not, what surgical procedure should be followed to treat it?"
ground,rel = generate_ground_relevance_response(user_input,max_tokens=600)

print(ground,end="\n\n")
print(rel)

### Query 3: What are the effective treatments or solutions for addressing sudden patchy hair loss, commonly seen as localized bald spots on the scalp, and what could be the possible causes behind it?

In [None]:
user_input = "What are the effective treatments or solutions for addressing sudden patchy hair loss, commonly seen as localized bald spots on the scalp, and what could be the possible causes behind it?"
ground,rel = generate_ground_relevance_response(user_input,max_tokens=600)

print(ground,end="\n\n")
print(rel)

### Query 4: What treatments are recommended for a person who has sustained a physical injury to brain tissue, resulting in temporary or permanent impairment of brain function?

In [None]:
user_input = "What treatments are recommended for a person who has sustained a physical injury to brain tissue, resulting in temporary or permanent impairment of brain function?"
ground,rel = generate_ground_relevance_response(user_input,max_tokens=600)
print(ground,end="\n\n")
print(rel)

### Query 5: What are the necessary precautions and treatment steps for a person who has fractured their leg during a hiking trip, and what should be considered for their care and recovery?

In [None]:
user_input = "What are the necessary precautions and treatment steps for a person who has fractured their leg during a hiking trip, and what should be considered for their care and recovery?"
ground,rel = generate_ground_relevance_response(user_input,max_tokens=600)
print(ground,end="\n\n")
print(rel)

####**Conclusion**

The evaluation says that the metrics were completely followed, but that the relevance is closer to a 4 for all queries except the first one.

The first question is the most specific question out of all of them as it as for a protocal for managing sepsis while the others ask multiple questions and/or do not have enough details to get specific and relevant context (e.g., hiker).

## Actionable Insights and Business Recommendations

To improve the model and ensure accuracy of information, I would recommend the following:

*   Train the model further with internal procedures and processes. Providing more details on any internal process, such as how they deal with sepsis may help give more relevant information.
*   Keep Temperature at 0 but consider making changes to the Top K. When the temperature was changed, more irrelevant information was produced. As the evaluation has shown that not all information is relevant, it might be good to reduce the Top K to 20 or so and see if that reduces the amount of not relevant information.
* Train the users of the model to provide context when asking a question to make sure the answer is relevant and can be precise.
* Review the system message. The system message may need further refinement to ensure that the answers are relevant and it can prompt the user for additional information.

<font size=6 color='blue'>Power Ahead</font>
___