In [1]:
!pip install llama-index
!pip install llama-index-embeddings-huggingface
!pip install accelerate
!pip install auto-gptq
!pip install optimum

Collecting llama-index
  Downloading llama_index-0.10.45-py3-none-any.whl (6.8 kB)
Collecting llama-index-agent-openai<0.3.0,>=0.1.4 (from llama-index)
  Downloading llama_index_agent_openai-0.2.7-py3-none-any.whl (12 kB)
Collecting llama-index-cli<0.2.0,>=0.1.2 (from llama-index)
  Downloading llama_index_cli-0.1.12-py3-none-any.whl (26 kB)
Collecting llama-index-core==0.10.44 (from llama-index)
  Downloading llama_index_core-0.10.44-py3-none-any.whl (15.4 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m15.4/15.4 MB[0m [31m81.6 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting llama-index-embeddings-openai<0.2.0,>=0.1.5 (from llama-index)
  Downloading llama_index_embeddings_openai-0.1.10-py3-none-any.whl (6.2 kB)
Collecting llama-index-indices-managed-llama-cloud<0.2.0,>=0.1.2 (from llama-index)
  Downloading llama_index_indices_managed_llama_cloud-0.1.6-py3-none-any.whl (6.7 kB)
Collecting llama-index-legacy<0.10.0,>=0.9.48 (from llama-index)
  Downloading llama_i

In [2]:
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index.core import Settings, SimpleDirectoryReader, VectorStoreIndex
from llama_index.core.retrievers import VectorIndexRetriever
from llama_index.core.query_engine import RetrieverQueryEngine
from llama_index.core.postprocessor import SimilarityPostprocessor
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline

In [29]:
# import any embedding model on HF hub (https://huggingface.co/spaces/mteb/leaderboard)
Settings.embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5")
# Settings.embed_model = HuggingFaceEmbedding(model_name="thenlper/gte-large") # alternative model

Settings.llm = None
Settings.chunk_size = 128
Settings.chunk_overlap = 25



LLM is explicitly disabled. Using MockLLM.


In [39]:
documents = SimpleDirectoryReader("../data/Medical_Guidelines/PDF").load_data()

In [40]:
print(documents[0])

Doc ID: 7590ea95-88a4-4b46-b851-379296248b20
Text: Guidelines ESCMID/EUCIC clinical practice guidelines on
perioperative antibiotic prophylaxis in patients colonized by
multidrug-resistant Gram-negative bacteria before surgery Elda
Righi1,x, Nico T. Mutters2,x, Xavier Guirao3, Maria Dolores del
Toro4,5,6, Christian Eckmann7, Alex W. Friedrich8,9, Maddalena
Giannella10,11, Jan Kluytmans12, Elisab...


In [11]:
# store docs into vector DB
index = VectorStoreIndex.from_documents(documents)

In [12]:
# set number of docs to retreive
top_k = 3

# configure retriever
retriever = VectorIndexRetriever(
    index=index,
    similarity_top_k=top_k,
)

In [13]:
# assemble query engine
query_engine = RetrieverQueryEngine(
    retriever=retriever,
    node_postprocessors=[SimilarityPostprocessor(similarity_cutoff=0.5)],
)

In [21]:
# query documents
query = "What is MDR-GNB??"
response = query_engine.query(query)

In [22]:
# reformat response
context = "Context:\n"
for i in range(top_k):
    context = context + response.source_nodes[i].text + "\n\n"

print(context)

Context:
Sampling techniques and microbiological practices were not
reviewed or discussed because they were beyond the scope of these
guidelines.
The main research questions addressed by the guidelines include:
1. Should screening for MDR-GNB be recommended before sur-
gery and when?
2. Which PAP have been evaluated for patients colonized with the
target MDR-GNB?
3.

The search started from January 2010 because the existing
guidelines reported no evidence to support the screening for
MDR-GNB and targeted PAP up to 2010 and 2015 [ 3,5]. A
focused search for any recently p ublished, relevant study was
also performed from January until April 30, 2022, using Medlineand Google Scholar.

To address the bene ﬁts of presurgical screening for
MDR-GNB to inform targeted PAP in carriers before surgery, the
articles reporting the rates of postoperative infections in MDR-GNB
carriers vs. noncarriers were reviewed.
Sampling techniques and microbiological practices were not
reviewed or discussed beca

In [16]:
# load model
model_name = "TheBloke/Mistral-7B-Instruct-v0.2-GPTQ"
model = AutoModelForCausalLM.from_pretrained(model_name,
                                             device_map="auto",
                                             trust_remote_code=False,
                                             revision="main")

# load tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_name, use_fast=True)

config.json:   0%|          | 0.00/1.08k [00:00<?, ?B/s]



model.safetensors:   0%|          | 0.00/4.16G [00:00<?, ?B/s]

Some weights of the model checkpoint at TheBloke/Mistral-7B-Instruct-v0.2-GPTQ were not used when initializing MistralForCausalLM: ['model.layers.0.mlp.down_proj.bias', 'model.layers.0.mlp.gate_proj.bias', 'model.layers.0.mlp.up_proj.bias', 'model.layers.0.self_attn.k_proj.bias', 'model.layers.0.self_attn.o_proj.bias', 'model.layers.0.self_attn.q_proj.bias', 'model.layers.0.self_attn.v_proj.bias', 'model.layers.1.mlp.down_proj.bias', 'model.layers.1.mlp.gate_proj.bias', 'model.layers.1.mlp.up_proj.bias', 'model.layers.1.self_attn.k_proj.bias', 'model.layers.1.self_attn.o_proj.bias', 'model.layers.1.self_attn.q_proj.bias', 'model.layers.1.self_attn.v_proj.bias', 'model.layers.10.mlp.down_proj.bias', 'model.layers.10.mlp.gate_proj.bias', 'model.layers.10.mlp.up_proj.bias', 'model.layers.10.self_attn.k_proj.bias', 'model.layers.10.self_attn.o_proj.bias', 'model.layers.10.self_attn.q_proj.bias', 'model.layers.10.self_attn.v_proj.bias', 'model.layers.11.mlp.down_proj.bias', 'model.layers.11

generation_config.json:   0%|          | 0.00/111 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/1.46k [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/493k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.80M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/72.0 [00:00<?, ?B/s]

In [17]:
# prompt (no context)
intstructions_string = f"""MedGPT, functioning as a virtual data science consultant on clinical questions, communicates in clear, accessible language, escalating to technical depth upon request. \
It reacts to questions aptly and ends responses with its signature '–MedGPT'. \
MedGPT will tailor the length of its responses to match the client's question, providing concise acknowledgments to brief expressions of gratitude or feedback, \
thus keeping the interaction natural and engaging.

Please respond to the following question.
"""
prompt_template = lambda question: f'''[INST] {intstructions_string} \n{question} \n[/INST]'''

In [18]:
question = "What is MDR-GNB?"

prompt = prompt_template(question)
print(prompt)

[INST] MedGPT, functioning as a virtual data science consultant on clinical questions, communicates in clear, accessible language, escalating to technical depth upon request. It reacts to questions aptly and ends responses with its signature '–MedGPT'. MedGPT will tailor the length of its responses to match the client's question, providing concise acknowledgments to brief expressions of gratitude or feedback, thus keeping the interaction natural and engaging.

Please respond to the following question.
 
What is MDR-GNB? 
[/INST]


In [19]:
model.eval()

inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(input_ids=inputs["input_ids"].to("cuda"), max_new_tokens=280)

print(tokenizer.batch_decode(outputs)[0])

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


<s> [INST] MedGPT, functioning as a virtual data science consultant on clinical questions, communicates in clear, accessible language, escalating to technical depth upon request. It reacts to questions aptly and ends responses with its signature '–MedGPT'. MedGPT will tailor the length of its responses to match the client's question, providing concise acknowledgments to brief expressions of gratitude or feedback, thus keeping the interaction natural and engaging.

Please respond to the following question.
 
What is MDR-GNB? 
[/INST] MDR-GNB stands for Multidrug-Resistant Gram-Negative bacteria. These are types of bacteria that have developed resistance to multiple antibiotics, making infections caused by them harder to treat. Gram-negative bacteria have a unique cell wall structure that makes it more difficult for antibiotics to penetrate and kill the bacteria compared to Gram-positive bacteria. The multidrug resistance in MDR-GNB makes these infections a significant public health conc

In [20]:
# prompt (with context)
prompt_template_w_context = lambda context, question: f"""[INST]MedGPT, functioning as a virtual data science consultant on clinical questions, communicates in clear, accessible language, escalating to technical depth upon request. \
It reacts to questions aptly and ends responses with its signature '–MedGPT'. \
MedGPT will tailor the length of its responses to match the client's question, providing concise acknowledgments to brief expressions of gratitude or feedback, \
thus keeping the interaction natural and engaging.

{context}
Please respond to the following question. Use the context above if it is helpful.

{question}
[/INST]
"""

In [23]:
prompt = prompt_template_w_context(context, question)

inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(input_ids=inputs["input_ids"].to("cuda"), max_new_tokens=280)

print(tokenizer.batch_decode(outputs)[0])

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


<s> [INST]MedGPT, functioning as a virtual data science consultant on clinical questions, communicates in clear, accessible language, escalating to technical depth upon request. It reacts to questions aptly and ends responses with its signature '–MedGPT'. MedGPT will tailor the length of its responses to match the client's question, providing concise acknowledgments to brief expressions of gratitude or feedback, thus keeping the interaction natural and engaging.

Context:
Sampling techniques and microbiological practices were not
reviewed or discussed because they were beyond the scope of these
guidelines.
The main research questions addressed by the guidelines include:
1. Should screening for MDR-GNB be recommended before sur-
gery and when?
2. Which PAP have been evaluated for patients colonized with the
target MDR-GNB?
3.

The search started from January 2010 because the existing
guidelines reported no evidence to support the screening for
MDR-GNB and targeted PAP up to 2010 and 201