# Домашнее задание 5: LLM и RAG

В данной работе вам предстоит создать чат-бота-врача, используя метод Retrieval-Augmented Generation (RAG) и фреймворки Huggingface и LangChain.

---
**Содержание:**
- Загрузка и подготовка данных MEDAL.
- Чтение и индексация данных.
- Создание эмбеддингов и векторного хранилища.
- Построение LLM и настройка поиска.
- Разработка шаблона prompt (Prompt Engineering).
- Создание LangChain Pipeline.
- Бонус: Интеграция истории переписки.

In [1]:
# !pip install datasets langchain_community langchain_chroma langchain langchain_core tiktoken sentence-transformers==2.2.2 lark InstructorEmbedding bitsandbytes accelerate

## Загрузка датасета MEDAL

В этом разделе мы загружаем датасет [MEDAL](https://huggingface.co/datasets/bigbio/medal), содержащий медицинские статьи для различных клинических диагнозов.

**Замечание:** Нас интересуют колонки `TEXT` и `LABEL`.

Дополнительные ссылки:
- [Репозиторий MEDAL](https://github.com/McGill-NLP/medal)
- [train.csv (Zenodo)](https://zenodo.org/record/4482922/files/train.csv)
- [Файл на Google Drive](https://drive.google.com/file/d/1X7PTIkmsFhTk5n-4W6SWa7XWsDGTpafl/view?usp=sharing)


In [1]:
# from google.colab import drive
# drive.mount('/content/gdrive')

### Задание 0: Загрузка данных MEDAL (0.5 балла)

Просмотр содержимого файла с данными:

In [13]:
!head -n 10 train.csv

ABSTRACT_ID,TEXT,LOCATION,LABEL
14145090,velvet antlers vas are commonly used in traditional chinese medicine and invigorant and contain many PET components for health promotion the velvet antler peptide svap is one of active components in vas based on structural study the svap interacts with tgfÎ² receptors and disrupts the tgfÎ² pathway we hypothesized that svap prevents cardiac fibrosis from pressure overload by blocking tgfÎ² signaling SDRs underwent TAC tac or a sham operation T3 one month rats received either svap mgkgday or vehicle for an additional one month tac surgery induced significant cardiac dysfunction FB activation and fibrosis these effects were improved by treatment with svap in the heart tissue tac remarkably increased the expression of tgfÎ² and connective tissue growth factor ctgf ROS species C2 and the phosphorylation C2 of smad and ERK kinases erk svap inhibited the increases in reactive oxygen species C2 ctgf expression and the phosphorylation of smad and erk bu

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


## Задание 1: Чтение и индексация данных (2.5 балла)

Здесь необходимо:
- Прочитать данные из CSV файла.
- Создать генератор документов с использованием метода `.lazy_load()`.

In [4]:
from langchain_community.document_loaders.csv_loader import CSVLoader

FILE_PATH = 'medal.csv'
docs = CSVLoader(FILE_PATH).load()

### Создание эмбеддингов и векторного хранилища

На данном этапе:
- Определите модель для вычисления эмбеддингов.
- Создайте векторное хранилище для дальнейшего поиска.

In [None]:
from langchain.embeddings import HuggingFaceInstructEmbeddings, HuggingFaceEmbeddings
from sentence_transformers import SentenceTransformer
import torch

# Инициализация модели эмбеддингов
model = SentenceTransformer('sentence-transformers/all-mpnet-base-v2', device='mps')
emb_model = HuggingFaceInstructEmbeddings(
    model_name="sentence-transformers/all-mpnet-base-v2",
    client=model,
)

  _torch_pytree._register_pytree_node(
  _torch_pytree._register_pytree_node(
  _torch_pytree._register_pytree_node(


load INSTRUCTOR_Transformer
max_seq_length  512


### Создание векторного хранилища (индекса)

Настройте индекс для эмбеддингов с использованием Chroma.

In [6]:
from langchain.vectorstores import Chroma

persist_directory = 'DB'

# Создание индекса
vectordb = Chroma.from_documents(documents=docs[:2000], embedding=emb_model, persist_directory=persist_directory)
vectordb.persist()

  vectordb.persist()


### Индексация части документов

Для ускорения работы проиндексируйте первые **N_DOCS** документов, так как полный датасет может содержать миллионы записей.

In [7]:
from tqdm.auto import tqdm

N_DOCS = 2000  # Обработка первых 2000 документов

for i, doc in tqdm(enumerate(docs[:2000]), total=N_DOCS):
    vectordb.add_documents([doc])

vectordb.persist()

  0%|          | 0/2000 [00:00<?, ?it/s]

### Проверка содержимого каталога с индексом

In [8]:
!ls -lht DB

total 90184
-rw-r--r--  1 konstantin  staff    43M Apr 21 17:46 chroma.sqlite3
drwxr-xr-x  7 konstantin  staff   224B Apr 21 17:42 [34ma8c83928-8903-4560-9d9f-276296fe9b2b[m[m


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


## Задание 2: Создание LLM и настройка поиска (2 балла)

В этом разделе:
- Настройте модель LLM для генерации ответов.
- Проверьте работу поиска по индексу.

In [10]:
from langchain_community.llms.huggingface_pipeline import HuggingFacePipeline
from langchain.chains.query_constructor.base import AttributeInfo
from langchain.retrievers.self_query.base import SelfQueryRetriever
import torch

In [21]:
from langchain_core.callbacks import CallbackManager, StreamingStdOutCallbackHandler
from langchain_community.llms.llamacpp import LlamaCpp

llm = LlamaCpp(
    model_path="mistral-7b-instruct-v0.2.Q4_K_M.gguf",
    n_gpu_layers=-1,
    n_batch=512,
    n_ctx=4096,
    f16_kv=True,
    callback_manager=CallbackManager([StreamingStdOutCallbackHandler()]),
    verbose=True,
)

retriever = vectordb.as_retriever()

llama_model_load_from_file_impl: using device Metal (Apple M2 Pro) - 10922 MiB free
llama_model_loader: loaded meta data with 24 key-value pairs and 291 tensors from mistral-7b-instruct-v0.2.Q4_K_M.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = llama
llama_model_loader: - kv   1:                               general.name str              = mistralai_mistral-7b-instruct-v0.2
llama_model_loader: - kv   2:                       llama.context_length u32              = 32768
llama_model_loader: - kv   3:                     llama.embedding_length u32              = 4096
llama_model_loader: - kv   4:                          llama.block_count u32              = 32
llama_model_loader: - kv   5:                  llama.feed_forward_length u32              = 14336
llama_model_loader: - kv   6:                 llama.rope.di

In [22]:
# Проверка работы поиска:
retriever.invoke("ceftobiprole bpr")

[Document(metadata={'row': 2, 'source': 'medal.csv'}, page_content=': 2\nTEXT: ceftobiprole bpr is an investigational cephalosporin with activity against staphylococcus aureus including methicillinresistant s aureus mrsa strains the pharmacodynamic pd profile of bpr against s aureus strains with a variety of susceptibility phenotypes in an immunocompromised murine pneumonia model was characterized the bpr mics of the test isolates ranged from to mugml pharmacokinetic pk studies were conducted with infected neutropenic balbc mice and the bpr concentrations were measured in plasma epithelial lining fluid elf and lung tissue pd studies with these mice were undertaken with eight s aureus isolates two MSSA strains three hospitalacquired mrsa strains and three CA mrsa strains subcutaneous bpr doses of to mgkg of body weightday were administered and the NC in the number of log cfuml in lungs was evaluated after h of therapy the pd profile was characterized by using the free drug exposures f d

## Задание 3: Prompt Engineering. Создание Prompt Template (3 балла)

С помощью синтаксиса `jinja2` настройте шаблон для prompt.

Пример использования PromptTemplate:

```python
PromptTemplate(template=tokenizer.chat_template, template_format='jinja2', input_variables=['content'])
```

*Советы*:
- Prompt Templates можно посмотреть на https://github.com/chujiezheng/chat_templates и [replicate.com](www.replicate.com). Например, для LLama 3 они тут:
    - https://replicate.com/meta/meta-llama-3-70b-instruct
    - https://github.com/chujiezheng/chat_templates/blob/main/chat_templates/llama-3-chat.jinja

```
"""LLama 3 template:
<|begin_of_text|><|start_header_id|>system<|end_header_id|>

You are a helpful assistant<|eot_id|><|start_header_id|>user<|end_header_id|>

{prompt}<|eot_id|><|start_header_id|>assistant<|end_header_id|>
"""
```

In [61]:
from langchain.prompts import PromptTemplate

SYSTEM_PROMPT = """Ты - медицинский ассистент, который предоставляет точные и полезные ответы 
на вопросы о здоровье, используя предоставленную информацию. Будь профессиональным и сострадательным.""" #None  # TODO: Введите системный промпт

USE_HISTORY = False
if USE_HISTORY:
    # BONUS: Задание с историей переписки, см. ниже
    instruction = None  # TODO: задайте инструкцию с использованием истории переписки
    prompt_template = None  # TODO: вставьте шаблон с историей переписки
    prompt = PromptTemplate(input_variables=None,  # TODO: введите переменные
                            template=prompt_template)
else:
    # instruction = None  # TODO: задайте инструкцию
    prompt_template = """<|begin_of_text|><|start_header_id|>system<|end_header_id|>

{SYSTEM_PROMPT}<|eot_id|><|start_header_id|>user<|end_header_id|>

Вопрос: {question}

Контекст: {context}<|eot_id|><|start_header_id|>assistant<|end_header_id|>

Ответ:"""#None  # TODO: вставьте шаблон
    prompt = PromptTemplate(input_variables=["question", "context", "SYSTEM_PROMPT"],  # TODO: введите переменные
                            template=prompt_template)

print(prompt)

input_variables=['SYSTEM_PROMPT', 'context', 'question'] input_types={} partial_variables={} template='<|begin_of_text|><|start_header_id|>system<|end_header_id|>\n\n{SYSTEM_PROMPT}<|eot_id|><|start_header_id|>user<|end_header_id|>\n\nВопрос: {question}\n\nКонтекст: {context}<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\nОтвет:'


## Задание 4: Создание Chain (LangChain pipeline) (2 балла)

Соберите пайплайн, включающий следующие этапы:
- **Feature Engineering** (Retrieval Augmentation)
- **Препроцессинг** (Prompt Engineering)
- **Модель LLM**
- **Постпроцессинг**

In [None]:
from langchain_core.runnables import RunnablePassthrough
from langchain_core.output_parsers.string import StrOutputParser

def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough(), "SYSTEM_PROMPT": lambda x: SYSTEM_PROMPT}
    | prompt
    | llm
    | StrOutputParser()
)

**Примеры вызова цепочки:**

In [64]:
result1 = chain.invoke('How to treat pneumonia?')  # Пример запроса
print(result1)



Добро пожаловать в наш бот-помощник по вопросам о здоровье! Я буду рад помочь тебе с любым вопросом, связанным со здоровьем. Пожалуйста, не забывайте, что я — лишь программа и не могу предоставлять медицинскую помощь в реальном времени. Я буду рад помочь тебе найти достоверную и полезную информацию по вопросам о здоровье. Не стесняйтесь спрашивать меня что-либо, связанное со здоровьем, и я буду рад помочь тебе. Если вы уже имеете ответ на свой вопрос, то пожалуйста, не стесняйтесь его делиться с нами и другими пользователями, которые могут быть заинтересованы в этой информации. Мы будем рады помочь тебе

llama_perf_context_print:        load time =    4424.33 ms
llama_perf_context_print: prompt eval time =    4422.20 ms /   845 tokens (    5.23 ms per token,   191.08 tokens per second)
llama_perf_context_print:        eval time =    8601.75 ms /   255 runs   (   33.73 ms per token,    29.65 tokens per second)
llama_perf_context_print:       total time =   13384.66 ms /  1100 tokens




Добро пожаловать в наш бот-помощник по вопросам о здоровье! Я буду рад помочь тебе с любым вопросом, связанным со здоровьем. Пожалуйста, не забывайте, что я — лишь программа и не могу предоставлять медицинскую помощь в реальном времени. Я буду рад помочь тебе найти достоверную и полезную информацию по вопросам о здоровье. Не стесняйтесь спрашивать меня что-либо, связанное со здоровьем, и я буду рад помочь тебе. Если вы уже имеете ответ на свой вопрос, то пожалуйста, не стесняйтесь его делиться с нами и другими пользователями, которые могут быть заинтересованы в этой информации. Мы будем рады помочь тебе


In [65]:
result2 = chain.invoke('Tell in details what is ceftobiprole bpr?')  # Пример запроса
print(result2)

Llama.generate: 129 prefix-match hit, remaining 1417 prompt tokens to eval




: 2
TEXT: ceftobiprole bpr is an investigational cephalosporin with activity against staphylococcus aureus including methicillinresistant s aureus mrsa strains the pharmacodynamic pd profile of bpr against s aureus strains with a variety of susceptibility phenotypes in an immunocompromised murine pneumonia model was characterized the bpr mics of the test isolates ranged from to mugml pharmacokinetic pk studies were conducted with infected neutropenic balbc mice and the bpr concentrations were measured in plasma epithelial lining fluid elf and lung tissue pd studies with these mice were undertaken with eight s aureus isolates two MSSA strains three hospitalacquired mrsa strains and three CA mrsa strains subcutaneous bpr doses of to mgkg of body weightday were administered and the NC in the number of log cfuml in lungs was evaluated after h of therapy the pd profile was characterized by using the free drug exposures f determined from the following parameters the percentage of time that

llama_perf_context_print:        load time =    4424.33 ms
llama_perf_context_print: prompt eval time =    4933.23 ms /  1417 tokens (    3.48 ms per token,   287.24 tokens per second)
llama_perf_context_print:        eval time =    9267.50 ms /   255 runs   (   36.34 ms per token,    27.52 tokens per second)
llama_perf_context_print:       total time =   14543.97 ms /  1672 tokens




: 2
TEXT: ceftobiprole bpr is an investigational cephalosporin with activity against staphylococcus aureus including methicillinresistant s aureus mrsa strains the pharmacodynamic pd profile of bpr against s aureus strains with a variety of susceptibility phenotypes in an immunocompromised murine pneumonia model was characterized the bpr mics of the test isolates ranged from to mugml pharmacokinetic pk studies were conducted with infected neutropenic balbc mice and the bpr concentrations were measured in plasma epithelial lining fluid elf and lung tissue pd studies with these mice were undertaken with eight s aureus isolates two MSSA strains three hospitalacquired mrsa strains and three CA mrsa strains subcutaneous bpr doses of to mgkg of body weightday were administered and the NC in the number of log cfuml in lungs was evaluated after h of therapy the pd profile was characterized by using the free drug exposures f determined from the following parameters the percentage of time that


## Бонус (2 балла): Добавление истории переписки

**Подсказка:** Используйте `langchain.memory.ConversationBufferMemory` для интеграции истории переписки с ботом.

In [89]:
from langchain.memory import ConversationBufferMemory
from langchain.chains import ConversationalRetrievalChain

prompt_template = """<s>[INST] <<SYS>>
Ты - медицинский ассистент. Используй контекст для точного ответа.
<</SYS>>

Контекст: {context}
История чата: {chat_history}
Вопрос: {question} [/INST]</s>"""

prompt = PromptTemplate(
    input_variables=["context", "question", 'chat_histoty'],
    template=prompt_template
)

memory = ConversationBufferMemory(memory_key="chat_history", input_key="question", output_key='answer', return_messages=True)
qa_chain = ConversationalRetrievalChain.from_llm(
    llm=llm,
    chain_type="stuff",
    retriever=retriever,
    memory=memory,
    combine_docs_chain_kwargs={"prompt": prompt}
)

In [92]:
print(qa_chain.invoke({"question": "How to treat pneumonia?"})['answer'])

Llama.generate: 1 prefix-match hit, remaining 322 prompt tokens to eval


 What is the treatment for pneumonia?

llama_perf_context_print:        load time =    4424.33 ms
llama_perf_context_print: prompt eval time =    2470.93 ms /   323 tokens (    7.65 ms per token,   130.72 tokens per second)
llama_perf_context_print:        eval time =     280.96 ms /     9 runs   (   31.22 ms per token,    32.03 tokens per second)
llama_perf_context_print:       total time =    1518.25 ms /   332 tokens
Llama.generate: 1 prefix-match hit, remaining 1017 prompt tokens to eval


 1261

Ответ: The treatment for pneumonia typically involves antibiotics to combat the bacterial infection. In some cases, additional treatments such as oxygen therapy, fluid management, and nutritional support may be necessary. The specific treatment approach depends on the severity of the pneumonia, the underlying cause of the infection, and the patient's overall health status.

Additionally, in the context of the given text, Selective Decontamination (SD) was used as an additional preventive measure to reduce the risk of pneumonia in patients undergoing cardiac surgery in an Intensive Care Unit (ICU). The SD regimen included polymyxin gentamicin and nystatin given as an oral paste and as a solution, along with standard antacid or histamine blocker AS ulcer prophylaxis.

It is important to note that while the SD approach was effective in reducing the incidence of pneumonia, it also came with some risks, including the potential for antibiotic resistance and the risk of disruption of t

llama_perf_context_print:        load time =    4424.33 ms
llama_perf_context_print: prompt eval time =    3363.60 ms /  1017 tokens (    3.31 ms per token,   302.35 tokens per second)
llama_perf_context_print:        eval time =    8797.88 ms /   255 runs   (   34.50 ms per token,    28.98 tokens per second)
llama_perf_context_print:       total time =   12579.58 ms /  1272 tokens


 1261

Ответ: The treatment for pneumonia typically involves antibiotics to combat the bacterial infection. In some cases, additional treatments such as oxygen therapy, fluid management, and nutritional support may be necessary. The specific treatment approach depends on the severity of the pneumonia, the underlying cause of the infection, and the patient's overall health status.

Additionally, in the context of the given text, Selective Decontamination (SD) was used as an additional preventive measure to reduce the risk of pneumonia in patients undergoing cardiac surgery in an Intensive Care Unit (ICU). The SD regimen included polymyxin gentamicin and nystatin given as an oral paste and as a solution, along with standard antacid or histamine blocker AS ulcer prophylaxis.

It is important to note that while the SD approach was effective in reducing the incidence of pneumonia, it also came with some risks, including the potential for antibiotic resistance and the risk of disruption of t

In [93]:
print(qa_chain.invoke({"question": "Which antibiotics are the most effective against it?"})['answer'])

Llama.generate: 1 prefix-match hit, remaining 597 prompt tokens to eval


 Which antibiotics are the most effective for treating pneumonia?

Translation: What antibiotics have the greatest effectiveness in treating pneumonia?

llama_perf_context_print:        load time =    4424.33 ms
llama_perf_context_print: prompt eval time =    2107.60 ms /   597 tokens (    3.53 ms per token,   283.26 tokens per second)
llama_perf_context_print:        eval time =    1044.90 ms /    32 runs   (   32.65 ms per token,    30.62 tokens per second)
llama_perf_context_print:       total time =    3200.47 ms /   629 tokens
Llama.generate: 1 prefix-match hit, remaining 2111 prompt tokens to eval


 The most effective antibiotics for treating pneumonia depend on the specific bacterial cause and its susceptibility patterns. However, some commonly used antibiotics with broad-spectrum activity against common bacterial pathogens causing community-acquired or hospital-acquired pneumonia include:

1. Macrolides (such as azithromycin, clarithromycin)
2. Fluoroquinolones (such as levofloxacin, moxifloxacin)
3. Beta-lactams (such as ampicillin, piperacillin/tazobactam) in combination with a macrolide or fluoroquinolone to enhance coverage against atypical bacteria and streptococci.

It is important to note that the choice of antibiotics should be based on susceptibility testing results and local resistance patterns, as well as patient factors such as comorbidities, allergies, and drug interactions. Additionally, adherence to recommended dosing regimens and monitoring for potential side effects or toxicities are crucial aspects of effective and safe antimicrobial therapy for pneumonia.

llama_perf_context_print:        load time =    4424.33 ms
llama_perf_context_print: prompt eval time =    7210.31 ms /  2111 tokens (    3.42 ms per token,   292.78 tokens per second)
llama_perf_context_print:        eval time =    9871.81 ms /   253 runs   (   39.02 ms per token,    25.63 tokens per second)
llama_perf_context_print:       total time =   17378.14 ms /  2364 tokens


 The most effective antibiotics for treating pneumonia depend on the specific bacterial cause and its susceptibility patterns. However, some commonly used antibiotics with broad-spectrum activity against common bacterial pathogens causing community-acquired or hospital-acquired pneumonia include:

1. Macrolides (such as azithromycin, clarithromycin)
2. Fluoroquinolones (such as levofloxacin, moxifloxacin)
3. Beta-lactams (such as ampicillin, piperacillin/tazobactam) in combination with a macrolide or fluoroquinolone to enhance coverage against atypical bacteria and streptococci.

It is important to note that the choice of antibiotics should be based on susceptibility testing results and local resistance patterns, as well as patient factors such as comorbidities, allergies, and drug interactions. Additionally, adherence to recommended dosing regimens and monitoring for potential side effects or toxicities are crucial aspects of effective and safe antimicrobial therapy for pneumonia.
