### RAG

RAG is used to enhance LLMs by combining them with external data sources so it can look up relvenat facts from the "library" when responding to questions

In [3]:
!pip install llama-index
!pip install llama-index-embeddings-huggingface
!pip install peft
!pip install auto-gptq
!pip install optimum
!pip install bitsandbytes

Collecting llama-index
  Downloading llama_index-0.11.4-py3-none-any.whl.metadata (11 kB)
Collecting llama-index-agent-openai<0.4.0,>=0.3.0 (from llama-index)
  Downloading llama_index_agent_openai-0.3.0-py3-none-any.whl.metadata (728 bytes)
Collecting llama-index-cli<0.4.0,>=0.3.0 (from llama-index)
  Downloading llama_index_cli-0.3.0-py3-none-any.whl.metadata (1.5 kB)
Collecting llama-index-core<0.12.0,>=0.11.4 (from llama-index)
  Downloading llama_index_core-0.11.4-py3-none-any.whl.metadata (2.4 kB)
Collecting llama-index-embeddings-openai<0.3.0,>=0.2.0 (from llama-index)
  Downloading llama_index_embeddings_openai-0.2.4-py3-none-any.whl.metadata (635 bytes)
Collecting llama-index-indices-managed-llama-cloud>=0.3.0 (from llama-index)
  Downloading llama_index_indices_managed_llama_cloud-0.3.0-py3-none-any.whl.metadata (3.8 kB)
Collecting llama-index-legacy<0.10.0,>=0.9.48 (from llama-index)
  Downloading llama_index_legacy-0.9.48.post3-py3-none-any.whl.metadata (8.5 kB)
Collecting 

In [17]:
from peft import PeftModel, PeftConfig
from transformers import AutoModelForCausalLM, AutoTokenizer
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index.core import Settings, SimpleDirectoryReader, VectorStoreIndex
from llama_index.core.retrievers import VectorIndexRetriever
from llama_index.core.query_engine import RetrieverQueryEngine
from llama_index.core.postprocessor import SimilarityPostprocessor
from llama_index.core import Document
from datasets import load_dataset

### Load the Models from HuggingFace

In [5]:
# load the base model
model_name = "TheBloke/Mistral-7B-Instruct-v0.2-GPTQ"
model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto", trust_remote_code=False)

# load the fine-tuned model using Peft
config = PeftConfig.from_pretrained("ayanahye/DocGPT-ft")
model = PeftModel.from_pretrained(model, "ayanahye/DocGPT-ft")

# load the tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_name, use_fast=True)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/1.08k [00:00<?, ?B/s]



model.safetensors:   0%|          | 0.00/4.16G [00:00<?, ?B/s]

Some weights of the model checkpoint at TheBloke/Mistral-7B-Instruct-v0.2-GPTQ were not used when initializing MistralForCausalLM: ['model.layers.0.mlp.down_proj.bias', 'model.layers.0.mlp.gate_proj.bias', 'model.layers.0.mlp.up_proj.bias', 'model.layers.0.self_attn.k_proj.bias', 'model.layers.0.self_attn.o_proj.bias', 'model.layers.0.self_attn.q_proj.bias', 'model.layers.0.self_attn.v_proj.bias', 'model.layers.1.mlp.down_proj.bias', 'model.layers.1.mlp.gate_proj.bias', 'model.layers.1.mlp.up_proj.bias', 'model.layers.1.self_attn.k_proj.bias', 'model.layers.1.self_attn.o_proj.bias', 'model.layers.1.self_attn.q_proj.bias', 'model.layers.1.self_attn.v_proj.bias', 'model.layers.10.mlp.down_proj.bias', 'model.layers.10.mlp.gate_proj.bias', 'model.layers.10.mlp.up_proj.bias', 'model.layers.10.self_attn.k_proj.bias', 'model.layers.10.self_attn.o_proj.bias', 'model.layers.10.self_attn.q_proj.bias', 'model.layers.10.self_attn.v_proj.bias', 'model.layers.11.mlp.down_proj.bias', 'model.layers.11

generation_config.json:   0%|          | 0.00/111 [00:00<?, ?B/s]

adapter_config.json:   0%|          | 0.00/649 [00:00<?, ?B/s]

adapter_model.safetensors:   0%|          | 0.00/8.40M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/1.46k [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/493k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.80M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/72.0 [00:00<?, ?B/s]

### Test the Model Without RAG

In [7]:
# use model without context / rag
model.eval()

patient_query = "What's the difference between sadness and depression?"

instructions_string = """You are DocGPT, an experienced medical professional. Provide clear and concise advice to the patient based on the provided information. Maintain a calm and empathetic tone, do not repeat yourself, and end your response with your signature '-DocGPT'. Respond to the following patient query:"""

prompt = f"[INST] {instructions_string} Please respond to the following patient query: \n{patient_query} \n[/INST]"

inputs = tokenizer(prompt, return_tensors="pt")
# tokens to response
outputs = model.generate(input_ids=inputs["input_ids"].to("cuda"), max_new_tokens=300)

response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


[INST] You are DocGPT, an experienced medical professional. Provide clear and concise advice to the patient based on the provided information. Maintain a calm and empathetic tone, do not repeat yourself, and end your response with your signature '-DocGPT'. Respond to the following patient query: Please respond to the following patient query: 
What's the difference between sadness and depression? 
[/INST] Hello, I'm glad you've reached out to me with your question. I'd be happy to help answer your question.

Sadness is a normal human emotion that arises in response to a loss or disappointment. It's a natural response to a difficult situation. Sadness is usually temporary and goes away as the situation improves.

Depression, on the other hand, is a mental illness that affects mood, thoughts, and behavior. It's characterized by persistent feelings of sadness, hopelessness, and worthlessness. Depression can interfere with daily activities and relationships. It can also affect physical heal

### Settings for the LlamaIndex

In [8]:
# set the embedding model
Settings.embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5")

Settings.llm = None # wont use LlamaIndex to set up LLM
Settings.chunk_size = 256
Settings.chunk_overlap = 25

modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/94.8k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/52.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/743 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/133M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/366 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/711k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/125 [00:00<?, ?B/s]

1_Pooling/config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

LLM is explicitly disabled. Using MockLLM.


### Load the Data and Convert to Documents

In [18]:
dataset = load_dataset("tolu07/Mental_Health_FAQ")

print(dataset['train'][0])

# set up each document with a doc_id and text
documents = [
    Document(
        text=f"Question: {item['Questions']} Answer: {item['Answers']}",
        doc_id=f"doc_{i}"
    ) for i, item in enumerate(dataset['train'])
]

{'Question_ID': 1590140, 'Questions': 'What does it mean to have a mental illness?', 'Answers': 'Mental illnesses are health conditions that disrupt a personâ€™s thoughts, emotions, relationships, and daily functioning. They are associated with distress and diminished capacity to engage in the ordinary activities of daily life.\nMental illnesses fall along a continuum of severity: some are fairly mild and only interfere with some aspects of life, such as certain phobias. On the other end of the spectrum lie serious mental illnesses, which result in major functional impairment and interference with daily life. These include such disorders as major depression, schizophrenia, and bipolar disorder, and may require that the person receives care in a hospital.\nIt is important to know that mental illnesses are medical conditions that have nothing to do with a personâ€™s character, intelligence, or willpower. Just as diabetes is a disorder of the pancreas, mental illness is a medical conditio

Create the Vector Database and Assemble Query Engine

In [19]:
# preprocessing of text if necessary
index = VectorStoreIndex.from_documents(documents) # index is the vector database used for retrieval

In [20]:
# set up the retriever
# will retrieve 3 docs from vector database
top_k = 3

# configure retriever -- pass the vector database and num chunks to return from the search
retriever = VectorIndexRetriever(
    index=index,
    similarity_top_k=top_k,
)

In [22]:
# assemble query engine
# to get the relevant context
# RetrieverQueryEngine handles retrieving info from a dataset or index based on a query
# retriever is an instance of the VectorIndexRetriever which is responsible for searching thru the index
# SimilarityPostprocessor is used to refine the results based on their similarity to the query
query_engine = RetrieverQueryEngine(retriever=retriever, node_postprocessors=[SimilarityPostprocessor()])

### Use the RAG-Enhanced Model

In [26]:
def generate_response_with_context(patient_query):
  # get the context
  retrieved_context = query_engine.query(patient_query)

  instructions_string = """You are DocGPT, an experienced medical professional. Provide clear and concise advice to the patient based on the provided information. Maintain a calm and empathetic tone, do not repeat yourself, and end your response with your signature '-DocGPT'. Respond to the following patient query:"""

  prompt_w_context = f"""[INST] {instructions_string} \n{retrieved_context} Use the provided context if helpful to respond to the patient: \n{patient_query} \n[/INST]"""

  inputs = tokenizer(prompt_w_context, return_tensors="pt")
  # tokens to response
  outputs = model.generate(input_ids=inputs["input_ids"].to("cuda"), max_new_tokens=300)

  response = tokenizer.decode(outputs[0], skip_special_tokens=True)
  return response

patient_query = "What's the difference between sadness and depression?"
response = generate_response_with_context(patient_query)
print(response)

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


[INST] You are DocGPT, an experienced medical professional. Provide clear and concise advice to the patient based on the provided information. Maintain a calm and empathetic tone, do not repeat yourself, and end your response with your signature '-DocGPT'. Respond to the following patient query: 
Context information is below.
---------------------
Question: What's the difference between sadness and depression? Answer: Sadness is a normal reaction to a loss, disappointment, problems, or other difficult situations. Feeling sad from time to time is just another part of being human. In these cases, feelings of sadness go away quickly and you can go about your daily life. 
 Other ways to talk about sadness might be ‘feeling low,’ ‘feeling down,’ or ‘feeling blue.’ A person may say they are feeling ‘depressed,’ but if it goes away on its own and doesn’t impact life in a big way, it probably isn’t the illness of depression. 
 Depression is a mental illness that affects your mood, the way you 