## Evaluation: RAG

This is notebook for **RAG model evaluation** for estimating improvements over the base model and compare with other adapted models. We choose to evaluate on the set of benchmarks from [Open Medical-LLM Leaderboard](https://huggingface.co/spaces/openlifescienceai/open_medical_llm_leaderboard) including:

* [MedMCQA](https://huggingface.co/datasets/openlifescienceai/medmcqa) - MCQ, 200 samples from validation split
* [MedQA](https://huggingface.co/datasets/GBaker/MedQA-USMLE-4-options-hf) - MCQ, 200 samples from validation split
* [MMLU](https://huggingface.co/datasets/cais/mmlu) - MCQ, 200 samples from test splits of 6 medical subsets
* [PubMedQA](https://huggingface.co/datasets/qiaojin/PubMedQA) - QA, 200 samples from train split of pqa_labeled subset

### Setup

In [1]:
%%capture
!pip install datasets vllm
!pip install langchain_chroma langchain_huggingface chromadb numpy

In [2]:
import re
from tqdm import tqdm
import math
import pandas as pd
from datasets import load_dataset, concatenate_datasets
from vllm import LLM, SamplingParams
import torch

INFO 04-14 02:59:59 [__init__.py:239] Automatically detected platform cuda.


2025-04-14 03:00:03.164512: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1744599603.638172      31 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1744599603.756290      31 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered


### RAG 

In [None]:
from langchain_chroma import Chroma
from langchain_huggingface import HuggingFaceEmbeddings
import os
from typing import List, Any
from datasets import load_dataset
import time


class DB:
    def __init__(self,
                 collection_name: str = "medical_rag",
                 embedding_model_name: str = "multi-qa-MiniLM-L6-cos-v1") -> None:
        """
        Initialize the DB instance.

        :param db_path: Directory path where the database is stored or will be created.
        :param embedding_model_name: Name of the embedding model used for generating embeddings.
        """
        self.collection_name = collection_name
        self.db_path = os.path.join(os.getcwd(), collection_name)
        self.embedding_model_name = embedding_model_name
        self.db = None

        model_kwargs = {'device': 'cuda'}
        encode_kwargs = {'normalize_embeddings': True}
        self.embedder = HuggingFaceEmbeddings(
            model_name=embedding_model_name,
            model_kwargs=model_kwargs,
            encode_kwargs=encode_kwargs,
        )

        if os.path.exists(self.db_path):
            print("DB exists, loading it...")
            self.db = Chroma(
                collection_name=collection_name,
                embedding_function=self.embedder,
                persist_directory=self.db_path,
            )
            print("DB loaded.")
        else:
            print("DB does not exist, creating it...")
            os.makedirs(self.db_path, exist_ok=True)
            start_time = time.time()
            self._populate_db()
            end_time = time.time()
            print("DB populated.")
            print(f"Time taken to populate DB: {end_time - start_time} sec")

    def _populate_db(self) -> None:
        """
        Populate the database from the dataset.
        Loads the 'MedRAG/textbooks' and 'MedRAG/statpearls' dataset and creates a new Chroma database from the 'contents' field.
        """
        ds_textbooks = load_dataset("MedRAG/textbooks")
        ds_statpearls = load_dataset("MilyaShams/MedRAG_statpearls")

        contents = ds_textbooks["train"]["contents"]
        contents.extend(ds_statpearls["train"]["contents"])

        self.db = Chroma.from_texts(
            texts=contents,
            embedding=self.embedder,
            persist_directory=self.db_path,
            collection_name=self.collection_name,
        )

    def query(self, queries: List[str], top_k: int = 3) -> List[List[str]]:
        """
        Query the database for the top-k most relevant chunks for a batch of queries.

        :param queries: A list of user search queries.
        :param top_k: The number of top relevant results to retrieve for each query.
        :return: A list where each element is a list of retrieved document contents for the corresponding query.
        """
        if not self.db:
            raise ValueError("Database is not initialized.")

        batch_results = []
        print(f"Querying DB for {len(queries)} queries...")

        for query in queries:
            try:
                results = self.db.similarity_search(query, k=top_k)
                retrieved_docs = [doc.page_content for doc in results]
                batch_results.append(retrieved_docs)
            except Exception as e:
                print(f"Error during similarity search for query '{query[:50]}...': {e}")
                batch_results.append([])
                
        return batch_results

    def close(self) -> None:
        """
        Close the database connection.
        """
        if self.db:
            self.db.close()

In [None]:
from vllm import LLM, SamplingParams
from typing import List, Optional
import torch


class LLMInference:
    """
    A class to handle text generation using the vLLM inference engine.
    """

    def __init__(self,
                 model_name: str,
                #  dtype: str='auto',
                 dtype=torch.float16,
                 trust_remote_code: bool=True,
                 quantization: Optional[str]=None,
                 tensor_parallel_size=2,
        ):
        """
        Initializes the VLLMInference object with the specified parameters and loads the model.

        Args:
            model_name (str): The name or path of the model to be loaded.
            dtype (str, optional): The data type to use for the model. Defaults to 'auto'.
            trust_remote_code (bool, optional): Whether to trust remote code when loading the model. Defaults to True.
            quantization (Optional[str], optional): The quantization mode to use for the model. Defaults to None.
        """
        self.model_name = model_name
        self.dtype = dtype
        self.trust_remote_code = trust_remote_code
        self.quantization = quantization
        self.seed = 4242

        self.llm = LLM(
            model=self.model_name,
            dtype=self.dtype,
            trust_remote_code=self.trust_remote_code,
            quantization=self.quantization,
            seed=self.seed,
            tensor_parallel_size=tensor_parallel_size,
        )

    def generate(self,
                 prompts: List[str],
                 max_tokens: int=4096,
                 temperature: float=0.01,
                 top_p: float=1.0,
                 top_k: int=-1,
                 **kwargs
        ) -> List[str]:
        """
        Generates text based on the provided prompts.

        Args:
            prompts (List[str]): A list of input prompts for text generation.
            temperature (float, optional): Sampling temperature to use. Defaults to 1.0.
            top_p (float, optional): Nucleus sampling probability. Defaults to 1.0.
            max_tokens (int, optional): Maximum number of tokens to generate. Defaults to 4096.

        Returns:
            List[str]: A list of generated texts corresponding to each prompt.
        """
        sampling_params = SamplingParams(
            temperature=temperature,
            top_p=top_p,
            top_k=top_k,
            max_tokens=max_tokens,
            seed=self.seed,
            **kwargs
        )

        outputs = self.llm.generate(prompts, sampling_params)
        return [output.outputs[0].text for output in outputs]

In [None]:
class RAG:
    """
    A Retrieval-Augmented Generation (RAG) class that combines a document database
    with an LLM inference engine. It retrieves context using the DB class and uses the
    LLM to generate a response based on the query and retrieved documents.
    """

    def __init__(self,
                 collection_name: str = "med_textbooks",
                 embedding_model_name: str = "multi-qa-MiniLM-L6-cos-v1",
                 llm_name: str="deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B") -> None:
        """
        Initialize the RAG instance.

        :param llm_name: The HuggingFace model name or path.
        """
        self.db = DB(collection_name=collection_name,
                     embedding_model_name=embedding_model_name)
        self.llm = LLMInference(model_name=llm_name)
        
    def get_response(self, queries: List[str], top_k: int = 3) -> List[str]:
        """
        Retrieve relevant documents, construct prompts, and generate responses for a batch of queries.

        :param queries: A list of user queries.
        :param top_k: Number of top relevant documents to retrieve for each query.
        :return: A list of generated response texts corresponding to each input query.
        """
        if not queries:
            return []

        print(f"Processing batch of {len(queries)} queries...")
        batch_retrieved_docs = self.db.query(queries, top_k=top_k)

        final_prompts = []
        for i, query in enumerate(queries):
            retrieved_docs = batch_retrieved_docs[i]
            if retrieved_docs:
                context = "\n\n".join(retrieved_docs)
                prompt = f"""Based on the following retrieved documents, answer the user's query. Filter out irrelevant information and synthesize the answer.

Retrieved documents:
---
{context}
---

User query: {query}

Answer:"""
            else:
                print(f"Warning: No documents retrieved for query: {query[:50]}...")
                prompt = f"""Answer the following user query based on your internal knowledge.

User query: {query}

Answer:"""
            final_prompts.append(prompt)
            
        responses = self.llm.generate(final_prompts)
        return responses

In [6]:
MODEL_NAME = "RAG"

rag = RAG()

modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/11.6k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/383 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

DB exists, loading it...
DB loaded.


config.json:   0%|          | 0.00/679 [00:00<?, ?B/s]

INFO 04-14 03:00:45 [config.py:600] This model supports multiple tasks: {'embed', 'score', 'reward', 'classify', 'generate'}. Defaulting to 'generate'.
INFO 04-14 03:00:45 [config.py:1600] Defaulting to use mp for distributed inference
INFO 04-14 03:00:45 [config.py:1780] Chunked prefill is enabled with max_num_batched_tokens=2048.
INFO 04-14 03:00:45 [llm_engine.py:242] Initializing a V0 LLM engine (v0.8.3) with config: model='deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B', speculative_config=None, tokenizer='deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, override_neuron_config=None, tokenizer_revision=None, trust_remote_code=True, dtype=torch.float16, max_seq_len=131072, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=2, pipeline_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=auto,  device_config=cuda, decoding_config=DecodingConfig(guided_decoding_ba

tokenizer_config.json:   0%|          | 0.00/3.07k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/7.03M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/181 [00:00<?, ?B/s]

INFO 04-14 03:00:47 [cuda.py:240] Cannot use FlashAttention-2 backend for Volta and Turing GPUs.
INFO 04-14 03:00:47 [cuda.py:289] Using XFormers backend.
INFO 04-14 03:00:51 [__init__.py:239] Automatically detected platform cuda.


2025-04-14 03:00:51.934262: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1744599651.957363     195 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1744599651.964506     195 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered


[1;36m(VllmWorkerProcess pid=195)[0;0m INFO 04-14 03:00:55 [multiproc_worker_utils.py:225] Worker ready; awaiting tasks
[1;36m(VllmWorkerProcess pid=195)[0;0m INFO 04-14 03:00:56 [cuda.py:240] Cannot use FlashAttention-2 backend for Volta and Turing GPUs.
[1;36m(VllmWorkerProcess pid=195)[0;0m INFO 04-14 03:00:56 [cuda.py:289] Using XFormers backend.


[W414 03:01:07.567499228 socket.cpp:204] [c10d] The hostname of the client socket cannot be retrieved. err=-3
[W414 03:01:08.001427762 socket.cpp:204] [c10d] The hostname of the client socket cannot be retrieved. err=-3
[W414 03:01:17.583369605 socket.cpp:204] [c10d] The hostname of the client socket cannot be retrieved. err=-3


INFO 04-14 03:01:27 [utils.py:990] Found nccl from library libnccl.so.2
[1;36m(VllmWorkerProcess pid=195)[0;0m INFO 04-14 03:01:27 [utils.py:990] Found nccl from library libnccl.so.2
[1;36m(VllmWorkerProcess pid=195)[0;0m INFO 04-14 03:01:27 [pynccl.py:69] vLLM is using nccl==2.21.5
INFO 04-14 03:01:27 [pynccl.py:69] vLLM is using nccl==2.21.5


[W414 03:01:27.593799239 socket.cpp:204] [c10d] The hostname of the client socket cannot be retrieved. err=-3


INFO 04-14 03:01:28 [custom_all_reduce_utils.py:206] generating GPU P2P access cache in /root/.cache/vllm/gpu_p2p_access_cache_for_0,1.json
INFO 04-14 03:01:50 [custom_all_reduce_utils.py:244] reading GPU P2P access cache from /root/.cache/vllm/gpu_p2p_access_cache_for_0,1.json
[1;36m(VllmWorkerProcess pid=195)[0;0m INFO 04-14 03:01:50 [custom_all_reduce_utils.py:244] reading GPU P2P access cache from /root/.cache/vllm/gpu_p2p_access_cache_for_0,1.json
INFO 04-14 03:01:50 [shm_broadcast.py:264] vLLM message queue communication handle: Handle(local_reader_ranks=[1], buffer_handle=(1, 4194304, 6, 'psm_5c471e7a'), local_subscribe_addr='ipc:///tmp/64e05b3e-20a8-4839-afb1-e76e0a96d94d', remote_subscribe_addr=None, remote_addr_ipv6=False)
INFO 04-14 03:01:50 [parallel_state.py:957] rank 0 in world size 2 is assigned as DP rank 0, PP rank 0, TP rank 0
[1;36m(VllmWorkerProcess pid=195)[0;0m INFO 04-14 03:01:50 [parallel_state.py:957] rank 1 in world size 2 is assigned as DP rank 0, PP rank

model.safetensors:   0%|          | 0.00/3.55G [00:00<?, ?B/s]

INFO 04-14 03:02:00 [weight_utils.py:281] Time spent downloading weights for deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B: 10.342652 seconds
INFO 04-14 03:02:00 [weight_utils.py:315] No model.safetensors.index.json found in remote.


Loading safetensors checkpoint shards:   0% Completed | 0/1 [00:00<?, ?it/s]


[1;36m(VllmWorkerProcess pid=195)[0;0m INFO 04-14 03:02:00 [weight_utils.py:315] No model.safetensors.index.json found in remote.
[1;36m(VllmWorkerProcess pid=195)[0;0m INFO 04-14 03:02:04 [loader.py:447] Loading weights took 3.52 seconds
INFO 04-14 03:02:04 [loader.py:447] Loading weights took 3.73 seconds
[1;36m(VllmWorkerProcess pid=195)[0;0m INFO 04-14 03:02:04 [model_runner.py:1146] Model loading took 1.6918 GiB and 14.294224 seconds
INFO 04-14 03:02:04 [model_runner.py:1146] Model loading took 1.6901 GiB and 14.382459 seconds
[1;36m(VllmWorkerProcess pid=195)[0;0m INFO 04-14 03:02:13 [worker.py:267] Memory profiling takes 8.09 seconds
[1;36m(VllmWorkerProcess pid=195)[0;0m INFO 04-14 03:02:13 [worker.py:267] the current vLLM instance can use total_gpu_memory (14.74GiB) x gpu_memory_utilization (0.90) = 13.27GiB
[1;36m(VllmWorkerProcess pid=195)[0;0m INFO 04-14 03:02:13 [worker.py:267] model weights take 1.69GiB; non_torch_memory takes 0.12GiB; PyTorch activation peak 

Capturing CUDA graph shapes:  74%|███████▍  | 26/35 [00:30<00:09,  1.11s/it]

[1;36m(VllmWorkerProcess pid=195)[0;0m INFO 04-14 03:02:50 [custom_all_reduce.py:195] Registering 1995 cuda graph addresses


Capturing CUDA graph shapes: 100%|██████████| 35/35 [00:40<00:00,  1.15s/it]

INFO 04-14 03:03:01 [custom_all_reduce.py:195] Registering 1995 cuda graph addresses
[1;36m(VllmWorkerProcess pid=195)[0;0m INFO 04-14 03:03:01 [model_runner.py:1598] Graph capturing finished in 41 secs, took 0.21 GiB
INFO 04-14 03:03:01 [model_runner.py:1598] Graph capturing finished in 40 secs, took 0.21 GiB
INFO 04-14 03:03:01 [llm_engine.py:448] init engine (profile, create kv cache, warmup model) took 56.09 seconds





### 1. MedMCQA benchmark

#### Dataset loading and preparing

In [9]:
SEED = 4242
BATCH_SIZE = 4
NUM_SAMPLES = 200
DATASET_MEDMCQA = "openlifescienceai/medmcqa"

In [10]:
ds_medmcqa = load_dataset(DATASET_MEDMCQA, split="validation")
ds_medmcqa = ds_medmcqa.shuffle(seed=SEED).select(range(NUM_SAMPLES))
ds_medmcqa

README.md:   0%|          | 0.00/10.7k [00:00<?, ?B/s]

train-00000-of-00001.parquet:   0%|          | 0.00/85.9M [00:00<?, ?B/s]

test-00000-of-00001.parquet:   0%|          | 0.00/936k [00:00<?, ?B/s]

validation-00000-of-00001.parquet:   0%|          | 0.00/1.48M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/182822 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/6150 [00:00<?, ? examples/s]

Generating validation split:   0%|          | 0/4183 [00:00<?, ? examples/s]

Dataset({
    features: ['id', 'question', 'opa', 'opb', 'opc', 'opd', 'cop', 'choice_type', 'exp', 'subject_name', 'topic_name'],
    num_rows: 200
})

In [11]:
ds_medmcqa[0]

{'id': '4653fb7a-ddbf-493b-b4ef-92205582a27a',
 'question': 'Which of the following tooth is not having 5 cusps?',
 'opa': 'Mandibular 2nd Molar',
 'opb': 'Mandibular 1st Molar',
 'opc': 'Mandibular 3rd Molar',
 'opd': 'Maxillary 1st Molar',
 'cop': 0,
 'choice_type': 'single',
 'exp': None,
 'subject_name': 'Dental',
 'topic_name': None}

#### Helper functions definition

In [12]:
def format_prompt_medmcqa(example):
    """Formats a single example into a prompt for the LLM."""
    question = example['question']
    options = {
        "A": example['opa'],
        "B": example['opb'],
        "C": example['opc'],
        "D": example['opd']
    }
    
    prompt = f"""
You are an expert in solving multiple-choice questions accurately and explaining your reasoning clearly.
Given a question and a list of answer choices (A, B, C, D), your task is to:
1. Reason shortly about the question and answer choices to find evidances to support your answer.
2. Identify the correct answer. Please choose the single best answer from the options provided.
3. Output the final answer in the format: Answer: [Option Letter]

Question: {question}
Options:
A. {options['A']}
B. {options['B']}
C. {options['C']}
D. {options['D']}

Reasoning:
    """
    return prompt

In [13]:
def get_ground_truth_medmcqa(example):
    """Maps the correct option index (cop) to the corresponding letter."""
    mapping = {0: 'A', 1: 'B', 2: 'C', 3: 'D'}
    cop_index = example.get('cop')
    if cop_index is None or cop_index not in mapping:
        print(f"Warning: Invalid 'cop' value found: {cop_index} in example ID {example.get('id')}. Skipping ground truth.")
        return None
    return mapping[cop_index]

In [14]:
def extract_choice_mcq(generated_text):
    """Extracts the predicted choice (A, B, C, or D) from the LLM's output."""
    text = generated_text.strip()

    # Check for phrases like "The answer is A" or "Answer: A"
    match = re.search(r'(?:answer|choice|option) is\s*:?\s*([A-D])', text, re.IGNORECASE)
    if match:
        return match.group(1).upper()

    # Look for the first standalone letter A, B, C, or D in the text
    match = re.search(r'\b([A-D])\b', text)
    if match:
        return match.group(1).upper()

    # Fallback - If no clear choice found, return None
    print(f"Warning: Could not extract answer from text: '{text[:100]}...{text[-100:]}'")
    return None

#### Evaluation

In [15]:
print("\n--- Preparing Prompts and Ground Truths ---")
prompts = [format_prompt_medmcqa(ex) for ex in tqdm(ds_medmcqa, desc="Formatting prompts")]
ground_truths = [get_ground_truth_medmcqa(ex) for ex in tqdm(ds_medmcqa, desc="Extracting ground truths")]
valid_indices = [i for i, gt in enumerate(ground_truths) if gt is not None]

if len(valid_indices) < len(ground_truths):
     print(f"Warning: {len(ground_truths) - len(valid_indices)} examples had invalid ground truths and were excluded.")
     prompts = [prompts[i] for i in valid_indices]
     ground_truths = [ground_truths[i] for i in valid_indices]
     original_indices = valid_indices

if len(prompts) > 0:
    print("\nExample Prompt:")
    print(prompts[0])
    print(f"Corresponding Ground Truth: {ground_truths[0]}")
else:
    print("No valid prompts to evaluate.")
    exit()


--- Preparing Prompts and Ground Truths ---


Formatting prompts: 100%|██████████| 200/200 [00:00<00:00, 7092.94it/s]
Extracting ground truths: 100%|██████████| 200/200 [00:00<00:00, 7742.43it/s]


Example Prompt:

You are an expert in solving multiple-choice questions accurately and explaining your reasoning clearly.
Given a question and a list of answer choices (A, B, C, D), your task is to:
1. Reason shortly about the question and answer choices to find evidances to support your answer.
2. Identify the correct answer. Please choose the single best answer from the options provided.
3. Output the final answer in the format: Answer: [Option Letter]

Question: Which of the following tooth is not having 5 cusps?
Options:
A. Mandibular 2nd Molar
B. Mandibular 1st Molar
C. Mandibular 3rd Molar
D. Maxillary 1st Molar

Reasoning:
    
Corresponding Ground Truth: A





In [16]:
print("\n--- Running Inference ---")
all_outputs_text = []
num_batches = math.ceil(len(prompts) / BATCH_SIZE)

for i in tqdm(range(num_batches), desc="Generating Responses"):
    start_idx = i * BATCH_SIZE
    end_idx = min((i + 1) * BATCH_SIZE, len(prompts))
    batch_prompts = prompts[start_idx:end_idx]
    batch_outputs_text = rag.get_response(batch_prompts, top_k=2)    
    all_outputs_text.extend(batch_outputs_text)

if len(all_outputs_text) > 0:
    print("\nExample Generated Text (raw):")
    print(all_outputs_text[0])


--- Running Inference ---


Generating Responses:   0%|          | 0/50 [00:00<?, ?it/s]

Processing batch of 4 queries...
Querying DB for 4 queries...



Processed prompts:   0%|          | 0/4 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s][A
Processed prompts:  25%|██▌       | 1/4 [00:03<00:09,  3.32s/it, est. speed input: 80.44 toks/s, output: 27.72 toks/s][A
Processed prompts:  50%|█████     | 2/4 [00:03<00:03,  1.62s/it, est. speed input: 133.62 toks/s, output: 59.63 toks/s][A
Processed prompts:  75%|███████▌  | 3/4 [00:04<00:01,  1.11s/it, est. speed input: 175.36 toks/s, output: 94.62 toks/s][A
Processed prompts: 100%|██████████| 4/4 [00:04<00:00,  1.15s/it, est. speed input: 211.77 toks/s, output: 133.55 toks/s][A
Generating Responses:   2%|▏         | 1/50 [00:09<07:36,  9.32s/it]

Processing batch of 4 queries...
Querying DB for 4 queries...



Processed prompts:   0%|          | 0/4 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s][A
Processed prompts:  25%|██▌       | 1/4 [00:01<00:03,  1.23s/it, est. speed input: 246.64 toks/s, output: 82.75 toks/s][A
Processed prompts:  50%|█████     | 2/4 [00:01<00:01,  1.51it/s, est. speed input: 361.82 toks/s, output: 152.20 toks/s][A
Processed prompts:  75%|███████▌  | 3/4 [00:03<00:01,  1.21s/it, est. speed input: 231.44 toks/s, output: 156.97 toks/s][A
Processed prompts: 100%|██████████| 4/4 [00:04<00:00,  1.17s/it, est. speed input: 232.99 toks/s, output: 204.05 toks/s][A
Generating Responses:   4%|▍         | 2/50 [00:14<05:18,  6.64s/it]

Processing batch of 4 queries...
Querying DB for 4 queries...



Processed prompts:   0%|          | 0/4 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s][A
Processed prompts:  25%|██▌       | 1/4 [00:03<00:11,  3.85s/it, est. speed input: 58.13 toks/s, output: 51.38 toks/s][A
Processed prompts:  50%|█████     | 2/4 [00:04<00:03,  1.86s/it, est. speed input: 109.26 toks/s, output: 101.37 toks/s][A
Processed prompts: 100%|██████████| 4/4 [00:46<00:00, 11.68s/it, est. speed input: 19.62 toks/s, output: 184.64 toks/s] [A
Generating Responses:   6%|▌         | 3/50 [01:00<19:34, 24.99s/it]

Processing batch of 4 queries...
Querying DB for 4 queries...



Processed prompts:   0%|          | 0/4 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s][A
Processed prompts:  25%|██▌       | 1/4 [00:01<00:03,  1.08s/it, est. speed input: 203.08 toks/s, output: 82.15 toks/s][A
Processed prompts:  50%|█████     | 2/4 [00:01<00:01,  1.22it/s, est. speed input: 268.82 toks/s, output: 137.03 toks/s][A
Processed prompts:  75%|███████▌  | 3/4 [00:02<00:00,  1.70it/s, est. speed input: 344.96 toks/s, output: 202.04 toks/s][A
Processed prompts: 100%|██████████| 4/4 [00:44<00:00, 11.01s/it, est. speed input: 21.32 toks/s, output: 102.32 toks/s] [A
Generating Responses:   8%|▊         | 4/50 [01:45<24:57, 32.54s/it]

Processing batch of 4 queries...
Querying DB for 4 queries...



Processed prompts:   0%|          | 0/4 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s][A
Processed prompts:  25%|██▌       | 1/4 [00:01<00:03,  1.24s/it, est. speed input: 186.72 toks/s, output: 82.09 toks/s][A
Processed prompts:  50%|█████     | 2/4 [00:01<00:01,  1.36it/s, est. speed input: 294.67 toks/s, output: 146.71 toks/s][A
Processed prompts:  75%|███████▌  | 3/4 [00:04<00:01,  1.92s/it, est. speed input: 142.27 toks/s, output: 137.03 toks/s][A
Processed prompts: 100%|██████████| 4/4 [00:44<00:00, 11.02s/it, est. speed input: 21.10 toks/s, output: 108.34 toks/s] [A
Generating Responses:  10%|█         | 5/50 [02:29<27:32, 36.73s/it]

Processing batch of 4 queries...
Querying DB for 4 queries...



Processed prompts:   0%|          | 0/4 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s][A
Processed prompts:  25%|██▌       | 1/4 [00:01<00:03,  1.13s/it, est. speed input: 199.44 toks/s, output: 76.77 toks/s][A
Processed prompts:  75%|███████▌  | 3/4 [00:02<00:00,  1.17it/s, est. speed input: 261.24 toks/s, output: 151.22 toks/s][A
Processed prompts: 100%|██████████| 4/4 [00:04<00:00,  1.16s/it, est. speed input: 202.45 toks/s, output: 174.73 toks/s][A
Generating Responses:  12%|█▏        | 6/50 [02:33<18:57, 25.85s/it]

Processing batch of 4 queries...
Querying DB for 4 queries...



Processed prompts:   0%|          | 0/4 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s][A
Processed prompts:  25%|██▌       | 1/4 [00:01<00:03,  1.07s/it, est. speed input: 246.18 toks/s, output: 78.62 toks/s][A
Processed prompts:  50%|█████     | 2/4 [00:01<00:01,  1.87it/s, est. speed input: 413.10 toks/s, output: 148.29 toks/s][A
Processed prompts:  75%|███████▌  | 3/4 [00:01<00:00,  1.96it/s, est. speed input: 486.71 toks/s, output: 189.31 toks/s][A
Processed prompts: 100%|██████████| 4/4 [00:43<00:00, 10.99s/it, est. speed input: 24.86 toks/s, output: 100.52 toks/s] [A
Generating Responses:  14%|█▍        | 7/50 [03:17<22:47, 31.80s/it]

Processing batch of 4 queries...
Querying DB for 4 queries...



Processed prompts:   0%|          | 0/4 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s][A
Processed prompts:  25%|██▌       | 1/4 [00:00<00:02,  1.01it/s, est. speed input: 238.47 toks/s, output: 81.18 toks/s][A
Processed prompts:  50%|█████     | 2/4 [00:01<00:00,  2.15it/s, est. speed input: 446.08 toks/s, output: 155.43 toks/s][A
Processed prompts: 100%|██████████| 4/4 [00:01<00:00,  2.06it/s, est. speed input: 484.17 toks/s, output: 222.19 toks/s][A
Generating Responses:  16%|█▌        | 8/50 [03:19<15:37, 22.32s/it]

Processing batch of 4 queries...
Querying DB for 4 queries...



Processed prompts:   0%|          | 0/4 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s][A
Processed prompts:  25%|██▌       | 1/4 [00:01<00:04,  1.34s/it, est. speed input: 367.54 toks/s, output: 79.93 toks/s][A
Processed prompts:  50%|█████     | 2/4 [00:01<00:01,  1.47it/s, est. speed input: 466.03 toks/s, output: 149.77 toks/s][A
Processed prompts:  75%|███████▌  | 3/4 [00:01<00:00,  2.28it/s, est. speed input: 561.33 toks/s, output: 218.32 toks/s][A
Processed prompts: 100%|██████████| 4/4 [00:01<00:00,  2.11it/s, est. speed input: 634.08 toks/s, output: 280.28 toks/s][A
Generating Responses:  18%|█▊        | 9/50 [03:21<10:54, 15.96s/it]

Processing batch of 4 queries...
Querying DB for 4 queries...



Processed prompts:   0%|          | 0/4 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s][A
Processed prompts:  25%|██▌       | 1/4 [00:02<00:06,  2.05s/it, est. speed input: 109.53 toks/s, output: 85.68 toks/s][A
Processed prompts:  50%|█████     | 2/4 [00:02<00:01,  1.03it/s, est. speed input: 200.23 toks/s, output: 163.62 toks/s][A
Processed prompts:  75%|███████▌  | 3/4 [00:07<00:02,  2.90s/it, est. speed input: 92.48 toks/s, output: 139.53 toks/s] [A
Processed prompts: 100%|██████████| 4/4 [00:44<00:00, 11.06s/it, est. speed input: 21.02 toks/s, output: 116.08 toks/s][A
Generating Responses:  20%|██        | 10/50 [04:06<16:28, 24.72s/it]

Processing batch of 4 queries...
Querying DB for 4 queries...



Processed prompts:   0%|          | 0/4 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s][A
Processed prompts:  25%|██▌       | 1/4 [00:00<00:02,  1.19it/s, est. speed input: 285.30 toks/s, output: 78.45 toks/s][A
Processed prompts:  50%|█████     | 2/4 [00:01<00:01,  1.82it/s, est. speed input: 442.22 toks/s, output: 137.56 toks/s][A
Processed prompts:  75%|███████▌  | 3/4 [00:01<00:00,  1.88it/s, est. speed input: 443.88 toks/s, output: 181.21 toks/s][A
Processed prompts: 100%|██████████| 4/4 [00:01<00:00,  2.02it/s, est. speed input: 508.94 toks/s, output: 241.34 toks/s][A
Generating Responses:  22%|██▏       | 11/50 [04:08<11:33, 17.78s/it]

Processing batch of 4 queries...
Querying DB for 4 queries...



Processed prompts:   0%|          | 0/4 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s][A
Processed prompts:  25%|██▌       | 1/4 [00:03<00:09,  3.13s/it, est. speed input: 71.85 toks/s, output: 86.54 toks/s][A
Processed prompts:  75%|███████▌  | 3/4 [00:04<00:01,  1.33s/it, est. speed input: 159.33 toks/s, output: 209.34 toks/s][A
Processed prompts: 100%|██████████| 4/4 [00:44<00:00, 11.02s/it, est. speed input: 21.50 toks/s, output: 114.35 toks/s] [A
Generating Responses:  24%|██▍       | 12/50 [04:52<16:20, 25.81s/it]

Processing batch of 4 queries...
Querying DB for 4 queries...



Processed prompts:   0%|          | 0/4 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s][A
Processed prompts:  25%|██▌       | 1/4 [00:01<00:05,  1.92s/it, est. speed input: 126.27 toks/s, output: 84.69 toks/s][A
Processed prompts:  50%|█████     | 2/4 [00:04<00:04,  2.35s/it, est. speed input: 114.64 toks/s, output: 122.95 toks/s][A
Processed prompts:  75%|███████▌  | 3/4 [00:10<00:04,  4.17s/it, est. speed input: 69.94 toks/s, output: 141.35 toks/s] [A
Processed prompts: 100%|██████████| 4/4 [00:44<00:00, 11.06s/it, est. speed input: 22.48 toks/s, output: 127.49 toks/s][A
Generating Responses:  26%|██▌       | 13/50 [05:36<19:22, 31.41s/it]

Processing batch of 4 queries...
Querying DB for 4 queries...



Processed prompts:   0%|          | 0/4 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s][A
Processed prompts:  25%|██▌       | 1/4 [00:01<00:04,  1.44s/it, est. speed input: 169.33 toks/s, output: 83.62 toks/s][A
Processed prompts:  50%|█████     | 2/4 [00:15<00:17,  8.74s/it, est. speed input: 33.36 toks/s, output: 96.04 toks/s] [A
Processed prompts: 100%|██████████| 4/4 [00:45<00:00, 11.46s/it, est. speed input: 20.77 toks/s, output: 210.76 toks/s][A
Generating Responses:  28%|██▊       | 14/50 [06:22<21:28, 35.80s/it]

Processing batch of 4 queries...
Querying DB for 4 queries...



Processed prompts:   0%|          | 0/4 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s][A
Processed prompts:  25%|██▌       | 1/4 [00:01<00:03,  1.01s/it, est. speed input: 242.08 toks/s, output: 81.35 toks/s][A
Processed prompts:  50%|█████     | 2/4 [00:01<00:00,  2.08it/s, est. speed input: 412.91 toks/s, output: 155.17 toks/s][A
Processed prompts: 100%|██████████| 4/4 [00:02<00:00,  1.75it/s, est. speed input: 409.12 toks/s, output: 246.43 toks/s][A
Generating Responses:  30%|███       | 15/50 [06:25<15:00, 25.72s/it]

Processing batch of 4 queries...
Querying DB for 4 queries...



Processed prompts:   0%|          | 0/4 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s][A
Processed prompts:  25%|██▌       | 1/4 [00:00<00:01,  1.70it/s, est. speed input: 402.49 toks/s, output: 73.02 toks/s][A
Processed prompts:  50%|█████     | 2/4 [00:01<00:01,  1.58it/s, est. speed input: 373.53 toks/s, output: 116.53 toks/s][A
Processed prompts: 100%|██████████| 4/4 [00:02<00:00,  1.63it/s, est. speed input: 414.73 toks/s, output: 190.85 toks/s][A
Generating Responses:  32%|███▏      | 16/50 [06:27<10:37, 18.74s/it]

Processing batch of 4 queries...
Querying DB for 4 queries...



Processed prompts:   0%|          | 0/4 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s][A
Processed prompts:  25%|██▌       | 1/4 [00:01<00:03,  1.14s/it, est. speed input: 216.60 toks/s, output: 81.55 toks/s][A
Processed prompts:  50%|█████     | 2/4 [00:01<00:01,  1.70it/s, est. speed input: 346.57 toks/s, output: 152.04 toks/s][A
Processed prompts: 100%|██████████| 4/4 [00:04<00:00,  1.18s/it, est. speed input: 213.62 toks/s, output: 159.84 toks/s][A
Generating Responses:  34%|███▍      | 17/50 [06:32<08:00, 14.55s/it]

Processing batch of 4 queries...
Querying DB for 4 queries...



Processed prompts:   0%|          | 0/4 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s][A
Processed prompts:  25%|██▌       | 1/4 [00:01<00:04,  1.54s/it, est. speed input: 176.24 toks/s, output: 82.29 toks/s][A
Processed prompts:  50%|█████     | 2/4 [00:02<00:02,  1.33s/it, est. speed input: 236.39 toks/s, output: 131.08 toks/s][A
Processed prompts:  75%|███████▌  | 3/4 [00:03<00:01,  1.03s/it, est. speed input: 259.44 toks/s, output: 190.32 toks/s][A
Processed prompts: 100%|██████████| 4/4 [00:44<00:00, 11.02s/it, est. speed input: 25.71 toks/s, output: 107.64 toks/s] [A
Generating Responses:  36%|███▌      | 18/50 [07:16<12:30, 23.44s/it]

Processing batch of 4 queries...
Querying DB for 4 queries...



Processed prompts:   0%|          | 0/4 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s][A
Processed prompts:  25%|██▌       | 1/4 [00:01<00:05,  2.00s/it, est. speed input: 132.66 toks/s, output: 83.10 toks/s][A
Processed prompts:  50%|█████     | 2/4 [00:02<00:02,  1.24s/it, est. speed input: 183.37 toks/s, output: 146.32 toks/s][A
Processed prompts:  75%|███████▌  | 3/4 [00:05<00:02,  2.00s/it, est. speed input: 137.19 toks/s, output: 158.72 toks/s][A
Processed prompts: 100%|██████████| 4/4 [00:44<00:00, 11.03s/it, est. speed input: 22.28 toks/s, output: 113.06 toks/s] [A
Generating Responses:  38%|███▊      | 19/50 [08:00<15:19, 29.68s/it]

Processing batch of 4 queries...
Querying DB for 4 queries...



Processed prompts:   0%|          | 0/4 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s][A
Processed prompts:  25%|██▌       | 1/4 [00:00<00:02,  1.09it/s, est. speed input: 274.24 toks/s, output: 80.53 toks/s][A
Processed prompts: 100%|██████████| 4/4 [00:01<00:00,  2.89it/s, est. speed input: 680.85 toks/s, output: 252.51 toks/s][A
Generating Responses:  40%|████      | 20/50 [08:02<10:36, 21.20s/it]

Processing batch of 4 queries...
Querying DB for 4 queries...



Processed prompts:   0%|          | 0/4 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s][A
Processed prompts:  25%|██▌       | 1/4 [00:00<00:02,  1.10it/s, est. speed input: 245.48 toks/s, output: 80.36 toks/s][A
Processed prompts:  50%|█████     | 2/4 [00:01<00:00,  2.20it/s, est. speed input: 444.54 toks/s, output: 151.37 toks/s][A
Processed prompts: 100%|██████████| 4/4 [00:43<00:00, 10.96s/it, est. speed input: 21.38 toks/s, output: 99.20 toks/s]  [A
Generating Responses:  42%|████▏     | 21/50 [08:46<13:32, 28.02s/it]

Processing batch of 4 queries...
Querying DB for 4 queries...



Processed prompts:   0%|          | 0/4 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s][A
Processed prompts:  25%|██▌       | 1/4 [00:01<00:05,  1.80s/it, est. speed input: 132.75 toks/s, output: 84.98 toks/s][A
Processed prompts:  50%|█████     | 2/4 [00:03<00:03,  1.54s/it, est. speed input: 144.65 toks/s, output: 135.47 toks/s][A
Processed prompts:  75%|███████▌  | 3/4 [00:03<00:01,  1.11s/it, est. speed input: 185.86 toks/s, output: 201.85 toks/s][A
Processed prompts: 100%|██████████| 4/4 [00:10<00:00,  2.51s/it, est. speed input: 93.90 toks/s, output: 166.64 toks/s] [A
Generating Responses:  44%|████▍     | 22/50 [08:56<10:33, 22.64s/it]

Processing batch of 4 queries...
Querying DB for 4 queries...



Processed prompts:   0%|          | 0/4 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s][A
Processed prompts:  25%|██▌       | 1/4 [00:01<00:03,  1.15s/it, est. speed input: 200.95 toks/s, output: 80.90 toks/s][A
Processed prompts:  50%|█████     | 2/4 [00:02<00:02,  1.16s/it, est. speed input: 194.42 toks/s, output: 125.57 toks/s][A
Processed prompts:  75%|███████▌  | 3/4 [00:02<00:00,  1.31it/s, est. speed input: 276.64 toks/s, output: 197.21 toks/s][A
Processed prompts: 100%|██████████| 4/4 [00:43<00:00, 10.99s/it, est. speed input: 23.60 toks/s, output: 104.92 toks/s] [A
Generating Responses:  46%|████▌     | 23/50 [09:40<13:04, 29.05s/it]

Processing batch of 4 queries...
Querying DB for 4 queries...



Processed prompts:   0%|          | 0/4 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s][A
Processed prompts:  25%|██▌       | 1/4 [00:00<00:02,  1.06it/s, est. speed input: 253.82 toks/s, output: 79.65 toks/s][A
Processed prompts:  50%|█████     | 2/4 [00:01<00:01,  1.83it/s, est. speed input: 399.20 toks/s, output: 143.81 toks/s][A
Processed prompts:  75%|███████▌  | 3/4 [00:02<00:00,  1.18it/s, est. speed input: 311.67 toks/s, output: 159.36 toks/s][A
Processed prompts: 100%|██████████| 4/4 [00:04<00:00,  1.19s/it, est. speed input: 205.08 toks/s, output: 171.22 toks/s][A
Generating Responses:  48%|████▊     | 24/50 [09:45<09:26, 21.79s/it]

Processing batch of 4 queries...
Querying DB for 4 queries...



Processed prompts:   0%|          | 0/4 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s][A
Processed prompts:  25%|██▌       | 1/4 [00:00<00:01,  1.73it/s, est. speed input: 392.08 toks/s, output: 74.60 toks/s][A
Processed prompts:  50%|█████     | 2/4 [00:01<00:01,  1.46it/s, est. speed input: 345.28 toks/s, output: 115.84 toks/s][A
Processed prompts:  75%|███████▌  | 3/4 [00:02<00:00,  1.06it/s, est. speed input: 263.95 toks/s, output: 147.80 toks/s][A
Processed prompts: 100%|██████████| 4/4 [00:03<00:00,  1.30it/s, est. speed input: 303.28 toks/s, output: 213.69 toks/s][A
Generating Responses:  50%|█████     | 25/50 [09:48<06:44, 16.20s/it]

Processing batch of 4 queries...
Querying DB for 4 queries...



Processed prompts:   0%|          | 0/4 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s][A
Processed prompts:  25%|██▌       | 1/4 [00:00<00:02,  1.07it/s, est. speed input: 245.21 toks/s, output: 81.02 toks/s][A
Processed prompts: 100%|██████████| 4/4 [00:04<00:00,  1.04s/it, est. speed input: 227.29 toks/s, output: 148.48 toks/s][A
Generating Responses:  52%|█████▏    | 26/50 [09:52<05:02, 12.61s/it]

Processing batch of 4 queries...
Querying DB for 4 queries...



Processed prompts:   0%|          | 0/4 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s][A
Processed prompts:  25%|██▌       | 1/4 [00:02<00:06,  2.22s/it, est. speed input: 109.44 toks/s, output: 84.67 toks/s][A
Processed prompts:  50%|█████     | 2/4 [00:02<00:02,  1.17s/it, est. speed input: 188.98 toks/s, output: 156.23 toks/s][A
Processed prompts: 100%|██████████| 4/4 [00:45<00:00, 11.38s/it, est. speed input: 22.52 toks/s, output: 189.06 toks/s] [A
Generating Responses:  54%|█████▍    | 27/50 [10:38<08:37, 22.50s/it]

Processing batch of 4 queries...
Querying DB for 4 queries...



Processed prompts:   0%|          | 0/4 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s][A
Processed prompts:  25%|██▌       | 1/4 [00:00<00:02,  1.16it/s, est. speed input: 296.69 toks/s, output: 77.95 toks/s][A
Processed prompts:  50%|█████     | 2/4 [00:01<00:00,  2.15it/s, est. speed input: 499.34 toks/s, output: 143.89 toks/s][A
Processed prompts:  75%|███████▌  | 3/4 [00:01<00:00,  3.19it/s, est. speed input: 657.08 toks/s, output: 209.15 toks/s][A
Processed prompts: 100%|██████████| 4/4 [00:01<00:00,  2.42it/s, est. speed input: 626.54 toks/s, output: 234.65 toks/s][A
Generating Responses:  56%|█████▌    | 28/50 [10:39<05:58, 16.27s/it]

Processing batch of 4 queries...
Querying DB for 4 queries...



Processed prompts:   0%|          | 0/4 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s][A
Processed prompts:  25%|██▌       | 1/4 [00:00<00:02,  1.41it/s, est. speed input: 341.01 toks/s, output: 77.50 toks/s][A
Processed prompts:  50%|█████     | 2/4 [00:02<00:02,  1.32s/it, est. speed input: 194.47 toks/s, output: 109.03 toks/s][A
Processed prompts:  75%|███████▌  | 3/4 [00:03<00:01,  1.30s/it, est. speed input: 187.43 toks/s, output: 160.11 toks/s][A
Processed prompts: 100%|██████████| 4/4 [00:04<00:00,  1.15s/it, est. speed input: 209.43 toks/s, output: 219.90 toks/s][A
Generating Responses:  58%|█████▊    | 29/50 [10:44<04:28, 12.79s/it]

Processing batch of 4 queries...
Querying DB for 4 queries...



Processed prompts:   0%|          | 0/4 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s][A
Processed prompts:  25%|██▌       | 1/4 [00:00<00:02,  1.20it/s, est. speed input: 280.73 toks/s, output: 79.52 toks/s][A
Processed prompts:  50%|█████     | 2/4 [00:02<00:02,  1.36s/it, est. speed input: 194.01 toks/s, output: 112.59 toks/s][A
Processed prompts:  75%|███████▌  | 3/4 [00:04<00:01,  1.69s/it, est. speed input: 153.32 toks/s, output: 151.38 toks/s][A
Processed prompts: 100%|██████████| 4/4 [00:43<00:00, 10.99s/it, est. speed input: 20.94 toks/s, output: 109.19 toks/s] [A
Generating Responses:  60%|██████    | 30/50 [11:28<07:23, 22.16s/it]

Processing batch of 4 queries...
Querying DB for 4 queries...



Processed prompts:   0%|          | 0/4 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s][A
Processed prompts:  25%|██▌       | 1/4 [00:00<00:02,  1.47it/s, est. speed input: 409.21 toks/s, output: 74.80 toks/s][A
Processed prompts:  50%|█████     | 2/4 [00:01<00:01,  1.10it/s, est. speed input: 286.87 toks/s, output: 113.49 toks/s][A
Processed prompts: 100%|██████████| 4/4 [00:45<00:00, 11.37s/it, est. speed input: 22.87 toks/s, output: 184.53 toks/s] [A
Generating Responses:  62%|██████▏   | 31/50 [12:14<09:14, 29.18s/it]

Processing batch of 4 queries...
Querying DB for 4 queries...



Processed prompts:   0%|          | 0/4 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s][A
Processed prompts:  25%|██▌       | 1/4 [00:00<00:02,  1.07it/s, est. speed input: 249.62 toks/s, output: 79.28 toks/s][A
Processed prompts:  75%|███████▌  | 3/4 [00:06<00:02,  2.35s/it, est. speed input: 108.14 toks/s, output: 113.42 toks/s][A
Processed prompts: 100%|██████████| 4/4 [00:44<00:00, 11.01s/it, est. speed input: 21.97 toks/s, output: 110.11 toks/s] [A
Generating Responses:  64%|██████▍   | 32/50 [12:58<10:05, 33.65s/it]

Processing batch of 4 queries...
Querying DB for 4 queries...



Processed prompts:   0%|          | 0/4 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s][A
Processed prompts:  25%|██▌       | 1/4 [00:06<00:18,  6.12s/it, est. speed input: 37.90 toks/s, output: 87.57 toks/s][A
Processed prompts:  50%|█████     | 2/4 [00:06<00:05,  2.65s/it, est. speed input: 70.90 toks/s, output: 172.11 toks/s][A
Processed prompts: 100%|██████████| 4/4 [00:45<00:00, 11.41s/it, est. speed input: 19.94 toks/s, output: 203.39 toks/s][A
Generating Responses:  66%|██████▌   | 33/50 [13:43<10:33, 37.27s/it]

Processing batch of 4 queries...
Querying DB for 4 queries...



Processed prompts:   0%|          | 0/4 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s][A
Processed prompts:  25%|██▌       | 1/4 [00:03<00:10,  3.39s/it, est. speed input: 70.86 toks/s, output: 86.22 toks/s][A
Processed prompts:  50%|█████     | 2/4 [00:03<00:03,  1.71s/it, est. speed input: 125.38 toks/s, output: 161.12 toks/s][A
Processed prompts: 100%|██████████| 4/4 [00:46<00:00, 11.54s/it, est. speed input: 21.03 toks/s, output: 191.12 toks/s] [A
Generating Responses:  68%|██████▊   | 34/50 [14:30<10:39, 39.96s/it]

Processing batch of 4 queries...
Querying DB for 4 queries...



Processed prompts:   0%|          | 0/4 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s][A
Processed prompts:  25%|██▌       | 1/4 [00:01<00:04,  1.51s/it, est. speed input: 159.62 toks/s, output: 83.45 toks/s][A
Processed prompts:  50%|█████     | 2/4 [00:01<00:01,  1.11it/s, est. speed input: 254.55 toks/s, output: 148.49 toks/s][A
Processed prompts:  75%|███████▌  | 3/4 [00:02<00:00,  1.44it/s, est. speed input: 302.70 toks/s, output: 207.15 toks/s][A
Processed prompts: 100%|██████████| 4/4 [00:05<00:00,  1.45s/it, est. speed input: 172.46 toks/s, output: 177.65 toks/s][A
Generating Responses:  70%|███████   | 35/50 [14:35<07:25, 29.73s/it]

Processing batch of 4 queries...
Querying DB for 4 queries...



Processed prompts:   0%|          | 0/4 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s][A
Processed prompts:  25%|██▌       | 1/4 [00:00<00:01,  1.52it/s, est. speed input: 317.86 toks/s, output: 76.04 toks/s][A
Processed prompts:  50%|█████     | 2/4 [00:00<00:00,  2.20it/s, est. speed input: 488.22 toks/s, output: 131.84 toks/s][A
Processed prompts:  75%|███████▌  | 3/4 [00:03<00:01,  1.20s/it, est. speed input: 238.41 toks/s, output: 130.49 toks/s][A
Processed prompts: 100%|██████████| 4/4 [00:07<00:00,  1.83s/it, est. speed input: 133.02 toks/s, output: 146.56 toks/s][A
Generating Responses:  72%|███████▏  | 36/50 [14:43<05:22, 23.03s/it]

Processing batch of 4 queries...
Querying DB for 4 queries...



Processed prompts:   0%|          | 0/4 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s][A
Processed prompts:  25%|██▌       | 1/4 [00:01<00:04,  1.64s/it, est. speed input: 136.98 toks/s, output: 82.56 toks/s][A
Processed prompts:  50%|█████     | 2/4 [00:03<00:03,  1.83s/it, est. speed input: 128.87 toks/s, output: 122.22 toks/s][A
Processed prompts: 100%|██████████| 4/4 [00:45<00:00, 11.40s/it, est. speed input: 21.35 toks/s, output: 189.39 toks/s] [A
Generating Responses:  74%|███████▍  | 37/50 [15:29<06:27, 29.82s/it]

Processing batch of 4 queries...
Querying DB for 4 queries...



Processed prompts:   0%|          | 0/4 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s][A
Processed prompts:  25%|██▌       | 1/4 [00:00<00:02,  1.05it/s, est. speed input: 257.55 toks/s, output: 79.57 toks/s][A
Processed prompts:  50%|█████     | 2/4 [00:01<00:00,  2.12it/s, est. speed input: 443.38 toks/s, output: 150.54 toks/s][A
Processed prompts:  75%|███████▌  | 3/4 [00:01<00:00,  1.63it/s, est. speed input: 415.17 toks/s, output: 173.34 toks/s][A
Processed prompts: 100%|██████████| 4/4 [00:43<00:00, 10.97s/it, est. speed input: 23.24 toks/s, output: 100.72 toks/s] [A
Generating Responses:  76%|███████▌  | 38/50 [16:12<06:48, 34.06s/it]

Processing batch of 4 queries...
Querying DB for 4 queries...



Processed prompts:   0%|          | 0/4 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s][A
Processed prompts:  25%|██▌       | 1/4 [00:01<00:03,  1.09s/it, est. speed input: 229.26 toks/s, output: 81.29 toks/s][A
Processed prompts:  50%|█████     | 2/4 [00:01<00:01,  1.82it/s, est. speed input: 406.32 toks/s, output: 152.86 toks/s][A
Processed prompts:  75%|███████▌  | 3/4 [00:05<00:02,  2.04s/it, est. speed input: 148.01 toks/s, output: 127.54 toks/s][A
Processed prompts: 100%|██████████| 4/4 [00:44<00:00, 11.01s/it, est. speed input: 22.31 toks/s, output: 107.69 toks/s] [A
Generating Responses:  78%|███████▊  | 39/50 [16:57<06:47, 37.08s/it]

Processing batch of 4 queries...
Querying DB for 4 queries...



Processed prompts:   0%|          | 0/4 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s][A
Processed prompts:  25%|██▌       | 1/4 [00:01<00:03,  1.17s/it, est. speed input: 197.82 toks/s, output: 81.85 toks/s][A
Processed prompts:  75%|███████▌  | 3/4 [00:01<00:00,  1.92it/s, est. speed input: 415.37 toks/s, output: 195.73 toks/s][A
Processed prompts: 100%|██████████| 4/4 [00:01<00:00,  2.02it/s, est. speed input: 489.24 toks/s, output: 260.02 toks/s][A
Generating Responses:  80%|████████  | 40/50 [16:59<04:25, 26.57s/it]

Processing batch of 4 queries...
Querying DB for 4 queries...



Processed prompts:   0%|          | 0/4 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s][A
Processed prompts:  25%|██▌       | 1/4 [00:01<00:04,  1.42s/it, est. speed input: 211.66 toks/s, output: 83.25 toks/s][A
Processed prompts:  50%|█████     | 2/4 [00:04<00:04,  2.32s/it, est. speed input: 119.29 toks/s, output: 114.72 toks/s][A
Processed prompts:  75%|███████▌  | 3/4 [00:04<00:01,  1.43s/it, est. speed input: 159.35 toks/s, output: 193.96 toks/s][A
Processed prompts: 100%|██████████| 4/4 [00:44<00:00, 11.01s/it, est. speed input: 22.26 toks/s, output: 113.90 toks/s] [A
Generating Responses:  82%|████████▏ | 41/50 [17:43<04:46, 31.83s/it]

Processing batch of 4 queries...
Querying DB for 4 queries...



Processed prompts:   0%|          | 0/4 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s][A
Processed prompts:  25%|██▌       | 1/4 [00:01<00:03,  1.14s/it, est. speed input: 209.90 toks/s, output: 81.68 toks/s][A
Processed prompts:  50%|█████     | 2/4 [00:01<00:01,  1.03it/s, est. speed input: 237.66 toks/s, output: 131.36 toks/s][A
Processed prompts: 100%|██████████| 4/4 [00:45<00:00, 11.37s/it, est. speed input: 21.78 toks/s, output: 185.95 toks/s] [A
Generating Responses:  84%|████████▍ | 42/50 [18:28<04:47, 35.94s/it]

Processing batch of 4 queries...
Querying DB for 4 queries...



Processed prompts:   0%|          | 0/4 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s][A
Processed prompts:  25%|██▌       | 1/4 [00:01<00:04,  1.50s/it, est. speed input: 159.34 toks/s, output: 80.00 toks/s][A
Processed prompts:  50%|█████     | 2/4 [00:02<00:02,  1.34s/it, est. speed input: 181.76 toks/s, output: 127.78 toks/s][A
Processed prompts:  75%|███████▌  | 3/4 [00:02<00:00,  1.25it/s, est. speed input: 258.42 toks/s, output: 204.45 toks/s][A
Processed prompts: 100%|██████████| 4/4 [00:03<00:00,  1.24it/s, est. speed input: 302.38 toks/s, output: 268.03 toks/s][A
Generating Responses:  86%|████████▌ | 43/50 [18:32<03:03, 26.16s/it]

Processing batch of 4 queries...
Querying DB for 4 queries...



Processed prompts:   0%|          | 0/4 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s][A
Processed prompts:  25%|██▌       | 1/4 [00:01<00:04,  1.64s/it, est. speed input: 136.57 toks/s, output: 84.13 toks/s][A
Processed prompts:  50%|█████     | 2/4 [00:02<00:02,  1.15s/it, est. speed input: 184.21 toks/s, output: 141.73 toks/s][A
Processed prompts:  75%|███████▌  | 3/4 [00:02<00:00,  1.39it/s, est. speed input: 253.44 toks/s, output: 216.21 toks/s][A
Processed prompts: 100%|██████████| 4/4 [00:03<00:00,  1.15it/s, est. speed input: 272.31 toks/s, output: 253.04 toks/s][A
Generating Responses:  88%|████████▊ | 44/50 [18:35<01:56, 19.38s/it]

Processing batch of 4 queries...
Querying DB for 4 queries...



Processed prompts:   0%|          | 0/4 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s][A
Processed prompts:  25%|██▌       | 1/4 [00:01<00:05,  1.96s/it, est. speed input: 113.91 toks/s, output: 85.31 toks/s][A
Processed prompts:  50%|█████     | 2/4 [00:05<00:06,  3.12s/it, est. speed input: 75.19 toks/s, output: 116.43 toks/s][A
Processed prompts: 100%|██████████| 4/4 [00:45<00:00, 11.38s/it, est. speed input: 19.96 toks/s, output: 194.99 toks/s][A
Generating Responses:  90%|█████████ | 45/50 [19:21<02:16, 27.24s/it]

Processing batch of 4 queries...
Querying DB for 4 queries...



Processed prompts:   0%|          | 0/4 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s][A
Processed prompts:  25%|██▌       | 1/4 [00:01<00:03,  1.19s/it, est. speed input: 220.10 toks/s, output: 81.80 toks/s][A
Processed prompts:  50%|█████     | 2/4 [00:02<00:02,  1.13s/it, est. speed input: 226.82 toks/s, output: 128.29 toks/s][A
Processed prompts:  75%|███████▌  | 3/4 [00:06<00:02,  2.50s/it, est. speed input: 116.67 toks/s, output: 135.25 toks/s][A
Processed prompts: 100%|██████████| 4/4 [00:44<00:00, 11.02s/it, est. speed input: 22.17 toks/s, output: 112.59 toks/s] [A
Generating Responses:  92%|█████████▏| 46/50 [20:05<02:09, 32.31s/it]

Processing batch of 4 queries...
Querying DB for 4 queries...



Processed prompts:   0%|          | 0/4 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s][A
Processed prompts:  25%|██▌       | 1/4 [00:01<00:03,  1.16s/it, est. speed input: 186.61 toks/s, output: 78.62 toks/s][A
Processed prompts:  50%|█████     | 2/4 [00:01<00:01,  1.50it/s, est. speed input: 315.21 toks/s, output: 141.74 toks/s][A
Processed prompts:  75%|███████▌  | 3/4 [00:01<00:00,  2.01it/s, est. speed input: 393.76 toks/s, output: 199.69 toks/s][A
Processed prompts: 100%|██████████| 4/4 [00:43<00:00, 10.98s/it, est. speed input: 21.47 toks/s, output: 101.32 toks/s] [A
Generating Responses:  94%|█████████▍| 47/50 [20:49<01:47, 35.83s/it]

Processing batch of 4 queries...
Querying DB for 4 queries...



Processed prompts:   0%|          | 0/4 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s][A
Processed prompts:  25%|██▌       | 1/4 [00:01<00:03,  1.25s/it, est. speed input: 193.94 toks/s, output: 81.74 toks/s][A
Processed prompts:  50%|█████     | 2/4 [00:02<00:02,  1.17s/it, est. speed input: 217.21 toks/s, output: 128.64 toks/s][A
Processed prompts:  75%|███████▌  | 3/4 [00:03<00:01,  1.11s/it, est. speed input: 235.93 toks/s, output: 176.95 toks/s][A
Processed prompts: 100%|██████████| 4/4 [00:43<00:00, 10.99s/it, est. speed input: 24.04 toks/s, output: 106.88 toks/s] [A
Generating Responses:  96%|█████████▌| 48/50 [21:33<01:16, 38.29s/it]

Processing batch of 4 queries...
Querying DB for 4 queries...



Processed prompts:   0%|          | 0/4 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s][A
Processed prompts:  25%|██▌       | 1/4 [00:01<00:03,  1.19s/it, est. speed input: 221.06 toks/s, output: 80.69 toks/s][A
Processed prompts:  50%|█████     | 2/4 [00:01<00:01,  1.63it/s, est. speed input: 369.32 toks/s, output: 150.43 toks/s][A
Processed prompts:  75%|███████▌  | 3/4 [00:02<00:00,  1.52it/s, est. speed input: 358.71 toks/s, output: 185.26 toks/s][A
Processed prompts: 100%|██████████| 4/4 [00:03<00:00,  1.32it/s, est. speed input: 328.54 toks/s, output: 218.03 toks/s][A
Generating Responses:  98%|█████████▊| 49/50 [21:36<00:27, 27.73s/it]

Processing batch of 4 queries...
Querying DB for 4 queries...



Processed prompts:   0%|          | 0/4 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s][A
Processed prompts:  25%|██▌       | 1/4 [00:00<00:02,  1.19it/s, est. speed input: 357.92 toks/s, output: 78.48 toks/s][A
Processed prompts:  50%|█████     | 2/4 [00:01<00:02,  1.01s/it, est. speed input: 275.75 toks/s, output: 118.61 toks/s][A
Processed prompts:  75%|███████▌  | 3/4 [00:07<00:03,  3.07s/it, est. speed input: 105.81 toks/s, output: 120.38 toks/s][A
Processed prompts: 100%|██████████| 4/4 [00:44<00:00, 11.04s/it, est. speed input: 23.15 toks/s, output: 113.17 toks/s] [A
Generating Responses: 100%|██████████| 50/50 [22:20<00:00, 26.82s/it]


Example Generated Text (raw):
 [Option Letter]

The answer is B. Mandibular 1st Molar. Because Mandibular 1st Molar has only 1 cusp, whereas the others have 2, 3, or 5. Wait, no, the question is asking which is NOT having 5 cusps. So, Mandibular 1st Molar has 1 cusp, so it's not having 5. The others have 2, 3, or 5. So, the correct answer is B.
</think>

Answer: B. Mandibular 1st Molar

The Mandibular 1st Molar has only 1 cusp, which is different from the other options that have 2, 3, or 5 cusps. Therefore, it is the one without 5 cusps.

Answer: B





In [17]:
print("\n--- Extracting Predictions ---")
predictions = [extract_choice_mcq(text) for text in tqdm(all_outputs_text, desc="Extracting choices")]
num_invalid_responces = predictions.count(None)
print(f"\n------------------------------\nNumber of invalid responces: {num_invalid_responces}")

if len(predictions) > 0:
    print("\nExample Extracted Prediction:")
    print(predictions[0])


--- Extracting Predictions ---


Extracting choices: 100%|██████████| 200/200 [00:00<00:00, 6517.25it/s]


The user also provided a previous query and response, which might help in understan...cclusion of the occlusion of the occlusion of the occlusion of the occlusion of the occlusion of the'

The alveolar process is the thinning of the alveolar wall, which occurs in the alve...ther divided into the alveolar space and the alveolar bone. The alveolar bone is located in the alve'

------------------------------
Number of invalid responces: 2

Example Extracted Prediction:
B





In [18]:
print("\n--- Calculating Metrics ---")
correct_count = 0
total_count = len(predictions)
results_by_subject = {}

if total_count != len(ground_truths):
     print(f"Warning: Mismatch between number of predictions ({total_count}) and ground truths ({len(ground_truths)}). This should not happen.")
     total_count = min(total_count, len(ground_truths))

for i in range(total_count):
    original_data_index = original_indices[i] if 'original_indices' in locals() else i
    data_item = ds_medmcqa[original_data_index]
    subject = data_item.get('subject_name', 'Unknown')

    pred = predictions[i]
    truth = ground_truths[i]
    is_correct = (pred == truth)

    if subject not in results_by_subject:
        results_by_subject[subject] = {'correct': 0, 'total': 0}

    if is_correct:
        correct_count += 1
        results_by_subject[subject]['correct'] += 1
    results_by_subject[subject]['total'] += 1

overall_accuracy = (correct_count / total_count) * 100 if total_count > 0 else 0


--- Calculating Metrics ---


In [21]:
print("\n--- Evaluation Results ---")
print(f"Model Evaluated: {MODEL_NAME}")
print(f"Dataset Used: {DATASET_MEDMCQA}")
print(f"Number of Questions Evaluated: {total_count}")
print(f"Number of Correct Answers: {correct_count}")
print(f"Overall Accuracy: {overall_accuracy:.2f}%")

print("\nAccuracy by Subject:")
sorted_subjects = sorted(results_by_subject.keys())
for subject in sorted_subjects:
    counts = results_by_subject[subject]
    sub_acc = (counts['correct'] / counts['total']) * 100 if counts['total'] > 0 else 0
    print(f"- {subject}: {sub_acc:.2f}% ({counts['correct']}/{counts['total']})")


--- Evaluation Results ---
Model Evaluated: RAG
Dataset Used: openlifescienceai/medmcqa
Number of Questions Evaluated: 200
Number of Correct Answers: 58
Overall Accuracy: 29.00%

Accuracy by Subject:
- Anaesthesia: 50.00% (1/2)
- Anatomy: 16.67% (1/6)
- Biochemistry: 12.50% (1/8)
- Dental: 29.85% (20/67)
- ENT: 40.00% (2/5)
- Forensic Medicine: 42.86% (3/7)
- Gynaecology & Obstetrics: 41.18% (7/17)
- Medicine: 33.33% (2/6)
- Microbiology: 50.00% (3/6)
- Ophthalmology: 0.00% (0/4)
- Pathology: 8.33% (1/12)
- Pediatrics: 7.14% (1/14)
- Pharmacology: 33.33% (4/12)
- Physiology: 50.00% (3/6)
- Radiology: 0.00% (0/2)
- Skin: 0.00% (0/1)
- Social & Preventive Medicine: 33.33% (2/6)
- Surgery: 36.84% (7/19)


### 2. MedQA

#### Dataset loading and preparing

In [22]:
SEED = 4242
BATCH_SIZE = 4
NUM_SAMPLES = 200
DATASET_MEDQA = "GBaker/MedQA-USMLE-4-options-hf"
SPLIT_MEDQA = "validation"

In [23]:
ds_medqa = load_dataset(DATASET_MEDQA, split=SPLIT_MEDQA)
ds_medqa = ds_medqa.shuffle(seed=SEED).select(range(NUM_SAMPLES))
ds_medqa

README.md:   0%|          | 0.00/640 [00:00<?, ?B/s]

train.json:   0%|          | 0.00/9.77M [00:00<?, ?B/s]

dev.json:   0%|          | 0.00/1.22M [00:00<?, ?B/s]

test.json:   0%|          | 0.00/1.25M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/10178 [00:00<?, ? examples/s]

Generating validation split:   0%|          | 0/1272 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/1273 [00:00<?, ? examples/s]

Dataset({
    features: ['id', 'sent1', 'sent2', 'ending0', 'ending1', 'ending2', 'ending3', 'label'],
    num_rows: 200
})

In [24]:
ds_medqa[1]

{'id': 'dev-00646',
 'sent1': 'A 31-year-old gravida 2 para 2 woman presents to her primary care physician for follow up. Two weeks ago, she gave birth via vaginal delivery to a 9.5 lb (4.3 kg) male infant. The delivery was complicated by a vaginal laceration that required extensive suturing once the infant was delivered. Immediately after delivery of the placenta she experienced intense shaking and chills that resolved within 1 hour. She has felt well since the delivery but admits to 6 days of malodorous smelling vaginal discharge that is tan in color. She has a history of vaginal candidiasis and is worried that it may be recurring. Her temperature is 98.8°F (37.1°C), blood pressure is 122/73 mmHg, pulse is 88/min, respirations are 16/min, and BMI is 33 kg/m^2. Speculum exam reveals a 1.5 cm dark red, velvety lesion on the posterior vaginal wall with a tan discharge. The pH of the discharge is 6.4. Which of the following is the most likely diagnosis?',
 'sent2': '',
 'ending0': 'Bacte

#### Helper functions definition

In [25]:
def format_prompt_medqa(example):
    """Formats a single example into a prompt for the LLM."""
    question = example['sent1']
    options = {
        "A": example['ending0'],
        "B": example['ending1'],
        "C": example['ending2'],
        "D": example['ending3'],
    }
    
    prompt = f"""
You are an expert in solving multiple-choice questions accurately and explaining your reasoning clearly.
Given a question and a list of answer choices (A, B, C, D), your task is to:
1. Reason shortly about the question and answer choices to find evidances to support your answer.
2. Identify the correct answer. Please choose the single best answer from the options provided.
3. Output the final answer in the format: Answer: [Option Letter]

Question: {question}
Options:
A. {options['A']}
B. {options['B']}
C. {options['C']}
D. {options['D']}

Reasoning:
    """
    return prompt

In [26]:
def get_ground_truth_medqa(example):
    """Maps the label to the corresponding letter."""
    mapping = {0: 'A', 1: 'B', 2: 'C', 3: 'D'}
    label = example.get('label')
    if label is None or label not in mapping:
        print(f"Warning: Invalid 'cop' value found: {label} in example ID {example.get('id')}. Skipping ground truth.")
        return None
    return mapping[label]

#### Evaluation

In [27]:
print("\n--- Preparing Prompts and Ground Truths ---")
prompts = [format_prompt_medqa(ex) for ex in tqdm(ds_medqa, desc="Formatting prompts")]
ground_truths = [get_ground_truth_medqa(ex) for ex in tqdm(ds_medqa, desc="Extracting ground truths")]
valid_indices = [i for i, gt in enumerate(ground_truths) if gt is not None]

if len(valid_indices) < len(ground_truths):
     print(f"Warning: {len(ground_truths) - len(valid_indices)} examples had invalid ground truths and were excluded.")
     prompts = [prompts[i] for i in valid_indices]
     ground_truths = [ground_truths[i] for i in valid_indices]
     original_indices = valid_indices

if len(prompts) > 0:
    print("\nExample Prompt:")
    print(prompts[0])
    print(f"Corresponding Ground Truth: {ground_truths[0]}")
else:
    print("No valid prompts to evaluate.")
    exit()


--- Preparing Prompts and Ground Truths ---


Formatting prompts: 100%|██████████| 200/200 [00:00<00:00, 6646.65it/s]
Extracting ground truths: 100%|██████████| 200/200 [00:00<00:00, 9669.31it/s]


Example Prompt:

You are an expert in solving multiple-choice questions accurately and explaining your reasoning clearly.
Given a question and a list of answer choices (A, B, C, D), your task is to:
1. Reason shortly about the question and answer choices to find evidances to support your answer.
2. Identify the correct answer. Please choose the single best answer from the options provided.
3. Output the final answer in the format: Answer: [Option Letter]

Question: A 9-year-old girl is brought to the physician by her father for evaluation of intermittent muscle cramps for the past year and short stature. She has had recurrent upper respiratory tract infections since infancy. She is at the 5th percentile for weight and 10th percentile for height. Physical examination shows nasal polyps and dry skin. An x-ray of the right wrist shows osteopenia with epiphyseal widening. Which of the following sets of laboratory findings is most likely in this patient's serum?
 $$$ Calcium %%% Phosphorus




In [28]:
print("\n--- Running Inference ---")
all_outputs_text = []
num_batches = math.ceil(len(prompts) / BATCH_SIZE)

for i in tqdm(range(num_batches), desc="Generating Responses"):
    start_idx = i * BATCH_SIZE
    end_idx = min((i + 1) * BATCH_SIZE, len(prompts))
    batch_prompts = prompts[start_idx:end_idx]
    batch_outputs_text = rag.get_response(batch_prompts, top_k=2)
    all_outputs_text.extend(batch_outputs_text)

if len(all_outputs_text) > 0:
    print("\nExample Generated Text (raw):")
    print(all_outputs_text[0])


--- Running Inference ---


Generating Responses:   0%|          | 0/50 [00:00<?, ?it/s]

Processing batch of 4 queries...
Querying DB for 4 queries...



Processed prompts:   0%|          | 0/4 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s][A
Processed prompts:  25%|██▌       | 1/4 [00:03<00:11,  3.84s/it, est. speed input: 82.74 toks/s, output: 75.19 toks/s][A
Processed prompts:  75%|███████▌  | 3/4 [00:05<00:01,  1.49s/it, est. speed input: 214.60 toks/s, output: 190.64 toks/s][A
Processed prompts: 100%|██████████| 4/4 [00:44<00:00, 11.19s/it, est. speed input: 32.30 toks/s, output: 113.60 toks/s] [A
Generating Responses:   2%|▏         | 1/50 [00:44<36:38, 44.87s/it]

Processing batch of 4 queries...
Querying DB for 4 queries...



Processed prompts:   0%|          | 0/4 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s][A
Processed prompts:  25%|██▌       | 1/4 [00:01<00:04,  1.61s/it, est. speed input: 584.59 toks/s, output: 74.63 toks/s][A
Processed prompts:  50%|█████     | 2/4 [00:03<00:03,  1.61s/it, est. speed input: 422.97 toks/s, output: 119.03 toks/s][A
Processed prompts:  75%|███████▌  | 3/4 [00:04<00:01,  1.57s/it, est. speed input: 388.64 toks/s, output: 165.32 toks/s][A
Processed prompts: 100%|██████████| 4/4 [00:44<00:00, 11.05s/it, est. speed input: 51.68 toks/s, output: 110.37 toks/s] [A
Generating Responses:   4%|▍         | 2/50 [01:29<35:37, 44.54s/it]

Processing batch of 4 queries...
Querying DB for 4 queries...



Processed prompts:   0%|          | 0/4 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s][A
Processed prompts:  25%|██▌       | 1/4 [00:02<00:08,  2.87s/it, est. speed input: 136.73 toks/s, output: 83.36 toks/s][A
Processed prompts:  50%|█████     | 2/4 [00:04<00:03,  1.97s/it, est. speed input: 174.25 toks/s, output: 141.96 toks/s][A
Processed prompts:  75%|███████▌  | 3/4 [00:04<00:01,  1.15s/it, est. speed input: 264.94 toks/s, output: 221.81 toks/s][A
Processed prompts: 100%|██████████| 4/4 [00:08<00:00,  2.04s/it, est. speed input: 184.51 toks/s, output: 207.51 toks/s][A
Generating Responses:   6%|▌         | 3/50 [01:37<21:54, 27.97s/it]

Processing batch of 4 queries...
Querying DB for 4 queries...



Processed prompts:   0%|          | 0/4 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s][A
Processed prompts:  25%|██▌       | 1/4 [00:01<00:04,  1.40s/it, est. speed input: 213.53 toks/s, output: 78.10 toks/s][A
Processed prompts:  50%|█████     | 2/4 [00:01<00:01,  1.29it/s, est. speed input: 430.82 toks/s, output: 143.03 toks/s][A
Processed prompts:  75%|███████▌  | 3/4 [00:03<00:01,  1.01s/it, est. speed input: 411.49 toks/s, output: 166.58 toks/s][A
Processed prompts: 100%|██████████| 4/4 [00:44<00:00, 11.03s/it, est. speed input: 36.56 toks/s, output: 104.27 toks/s] [A
Generating Responses:   8%|▊         | 4/50 [02:21<26:21, 34.38s/it]

Processing batch of 4 queries...
Querying DB for 4 queries...



Processed prompts:   0%|          | 0/4 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s][A
Processed prompts:  25%|██▌       | 1/4 [00:02<00:06,  2.25s/it, est. speed input: 152.77 toks/s, output: 78.83 toks/s][A
Processed prompts:  50%|█████     | 2/4 [00:04<00:04,  2.45s/it, est. speed input: 300.89 toks/s, output: 121.27 toks/s][A
Processed prompts:  75%|███████▌  | 3/4 [00:05<00:01,  1.72s/it, est. speed input: 321.35 toks/s, output: 188.73 toks/s][A
Processed prompts: 100%|██████████| 4/4 [00:44<00:00, 11.07s/it, est. speed input: 49.31 toks/s, output: 116.70 toks/s] [A
Generating Responses:  10%|█         | 5/50 [03:06<28:29, 37.99s/it]

Processing batch of 4 queries...
Querying DB for 4 queries...



Processed prompts:   0%|          | 0/4 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s][A
Processed prompts:  25%|██▌       | 1/4 [00:03<00:10,  3.42s/it, est. speed input: 98.87 toks/s, output: 84.53 toks/s][A
Processed prompts:  50%|█████     | 2/4 [00:05<00:05,  2.78s/it, est. speed input: 131.15 toks/s, output: 135.32 toks/s][A
Processed prompts: 100%|██████████| 4/4 [00:45<00:00, 11.45s/it, est. speed input: 32.75 toks/s, output: 195.84 toks/s] [A
Generating Responses:  12%|█▏        | 6/50 [03:51<29:49, 40.67s/it]

Processing batch of 4 queries...
Querying DB for 4 queries...



Processed prompts:   0%|          | 0/4 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s][A
Processed prompts:  25%|██▌       | 1/4 [00:02<00:07,  2.59s/it, est. speed input: 133.31 toks/s, output: 83.07 toks/s][A
Processed prompts:  50%|█████     | 2/4 [00:03<00:03,  1.74s/it, est. speed input: 206.60 toks/s, output: 142.74 toks/s][A
Processed prompts: 100%|██████████| 4/4 [00:45<00:00, 11.41s/it, est. speed input: 33.71 toks/s, output: 191.11 toks/s] [A
Generating Responses:  14%|█▍        | 7/50 [04:37<30:19, 42.32s/it]

Processing batch of 4 queries...
Querying DB for 4 queries...



Processed prompts:   0%|          | 0/4 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s][A
Processed prompts:  25%|██▌       | 1/4 [00:03<00:09,  3.18s/it, est. speed input: 101.69 toks/s, output: 79.97 toks/s][A
Processed prompts:  50%|█████     | 2/4 [00:05<00:05,  2.81s/it, est. speed input: 125.86 toks/s, output: 127.43 toks/s][A
Processed prompts: 100%|██████████| 4/4 [00:46<00:00, 11.55s/it, est. speed input: 36.96 toks/s, output: 193.09 toks/s] [A
Generating Responses:  16%|█▌        | 8/50 [05:23<30:30, 43.59s/it]

Processing batch of 4 queries...
Querying DB for 4 queries...



Processed prompts:   0%|          | 0/4 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s][A
Processed prompts:  25%|██▌       | 1/4 [00:01<00:03,  1.10s/it, est. speed input: 339.47 toks/s, output: 75.74 toks/s][A
Processed prompts:  50%|█████     | 2/4 [00:02<00:03,  1.51s/it, est. speed input: 262.74 toks/s, output: 112.60 toks/s][A
Processed prompts: 100%|██████████| 4/4 [00:45<00:00, 11.40s/it, est. speed input: 33.32 toks/s, output: 186.85 toks/s] [A
Generating Responses:  18%|█▊        | 9/50 [06:09<30:13, 44.24s/it]

Processing batch of 4 queries...
Querying DB for 4 queries...



Processed prompts:   0%|          | 0/4 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s][A
Processed prompts:  25%|██▌       | 1/4 [00:02<00:07,  2.64s/it, est. speed input: 124.60 toks/s, output: 82.94 toks/s][A
Processed prompts:  50%|█████     | 2/4 [00:02<00:02,  1.24s/it, est. speed input: 223.74 toks/s, output: 158.92 toks/s][A
Processed prompts:  75%|███████▌  | 3/4 [00:05<00:01,  1.86s/it, est. speed input: 180.32 toks/s, output: 171.22 toks/s][A
Processed prompts: 100%|██████████| 4/4 [00:44<00:00, 11.06s/it, est. speed input: 33.15 toks/s, output: 113.91 toks/s] [A
Generating Responses:  20%|██        | 10/50 [06:53<29:30, 44.26s/it]

Processing batch of 4 queries...
Querying DB for 4 queries...



Processed prompts:   0%|          | 0/4 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s][A
Processed prompts:  25%|██▌       | 1/4 [00:01<00:05,  1.76s/it, est. speed input: 212.80 toks/s, output: 77.74 toks/s][A
Processed prompts:  50%|█████     | 2/4 [00:03<00:04,  2.03s/it, est. speed input: 186.69 toks/s, output: 118.44 toks/s][A
Processed prompts:  75%|███████▌  | 3/4 [00:05<00:01,  1.79s/it, est. speed input: 225.85 toks/s, output: 172.17 toks/s][A
Processed prompts: 100%|██████████| 4/4 [00:44<00:00, 11.07s/it, est. speed input: 34.22 toks/s, output: 113.83 toks/s] [A
Generating Responses:  22%|██▏       | 11/50 [07:38<28:47, 44.29s/it]

Processing batch of 4 queries...
Querying DB for 4 queries...



Processed prompts:   0%|          | 0/4 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s][A
Processed prompts:  25%|██▌       | 1/4 [00:01<00:04,  1.40s/it, est. speed input: 220.51 toks/s, output: 77.78 toks/s][A
Processed prompts:  50%|█████     | 2/4 [00:02<00:02,  1.03s/it, est. speed input: 367.09 toks/s, output: 131.40 toks/s][A
Processed prompts:  75%|███████▌  | 3/4 [00:08<00:03,  3.31s/it, est. speed input: 142.29 toks/s, output: 123.49 toks/s][A
Processed prompts: 100%|██████████| 4/4 [00:44<00:00, 11.14s/it, est. speed input: 36.15 toks/s, output: 114.63 toks/s] [A
Generating Responses:  24%|██▍       | 12/50 [08:22<28:06, 44.39s/it]

Processing batch of 4 queries...
Querying DB for 4 queries...



Processed prompts:   0%|          | 0/4 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s][A
Processed prompts:  25%|██▌       | 1/4 [00:02<00:08,  2.79s/it, est. speed input: 491.06 toks/s, output: 80.53 toks/s][A
Processed prompts:  50%|█████     | 2/4 [00:02<00:02,  1.27s/it, est. speed input: 561.23 toks/s, output: 155.73 toks/s][A
Processed prompts:  75%|███████▌  | 3/4 [00:05<00:01,  1.96s/it, est. speed input: 349.46 toks/s, output: 166.85 toks/s][A
Processed prompts: 100%|██████████| 4/4 [00:44<00:00, 11.09s/it, est. speed input: 54.10 toks/s, output: 114.05 toks/s] [A
Generating Responses:  26%|██▌       | 13/50 [09:07<27:23, 44.41s/it]

Processing batch of 4 queries...
Querying DB for 4 queries...



Processed prompts:   0%|          | 0/4 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s][A
Processed prompts:  25%|██▌       | 1/4 [00:02<00:08,  2.83s/it, est. speed input: 128.46 toks/s, output: 81.04 toks/s][A
Processed prompts:  50%|█████     | 2/4 [00:03<00:03,  1.63s/it, est. speed input: 199.26 toks/s, output: 146.12 toks/s][A
Processed prompts:  75%|███████▌  | 3/4 [00:04<00:01,  1.13s/it, est. speed input: 390.24 toks/s, output: 210.78 toks/s][A
Processed prompts: 100%|██████████| 4/4 [00:04<00:00,  1.21s/it, est. speed input: 426.77 toks/s, output: 266.24 toks/s][A
Generating Responses:  28%|██▊       | 14/50 [09:12<19:29, 32.48s/it]

Processing batch of 4 queries...
Querying DB for 4 queries...



Processed prompts:   0%|          | 0/4 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s][A
Processed prompts:  25%|██▌       | 1/4 [00:01<00:04,  1.59s/it, est. speed input: 287.31 toks/s, output: 78.13 toks/s][A
Processed prompts:  50%|█████     | 2/4 [00:02<00:02,  1.10s/it, est. speed input: 413.10 toks/s, output: 134.57 toks/s][A
Processed prompts:  75%|███████▌  | 3/4 [00:07<00:03,  3.02s/it, est. speed input: 182.84 toks/s, output: 129.87 toks/s][A
Processed prompts: 100%|██████████| 4/4 [00:18<00:00,  4.60s/it, est. speed input: 97.24 toks/s, output: 145.64 toks/s] [A
Generating Responses:  30%|███       | 15/50 [09:30<16:29, 28.26s/it]

Processing batch of 4 queries...
Querying DB for 4 queries...



Processed prompts:   0%|          | 0/4 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s][A
Processed prompts:  25%|██▌       | 1/4 [00:02<00:08,  2.80s/it, est. speed input: 137.98 toks/s, output: 83.07 toks/s][A
Processed prompts:  50%|█████     | 2/4 [00:05<00:05,  2.55s/it, est. speed input: 131.26 toks/s, output: 129.53 toks/s][A
Processed prompts:  75%|███████▌  | 3/4 [00:06<00:02,  2.13s/it, est. speed input: 154.86 toks/s, output: 184.83 toks/s][A
Processed prompts: 100%|██████████| 4/4 [00:44<00:00, 11.09s/it, est. speed input: 33.27 toks/s, output: 120.67 toks/s] [A
Generating Responses:  32%|███▏      | 16/50 [10:15<18:46, 33.14s/it]

Processing batch of 4 queries...
Querying DB for 4 queries...



Processed prompts:   0%|          | 0/4 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s][A
Processed prompts:  25%|██▌       | 1/4 [00:01<00:03,  1.14s/it, est. speed input: 235.26 toks/s, output: 77.25 toks/s][A
Processed prompts:  50%|█████     | 2/4 [00:04<00:04,  2.35s/it, est. speed input: 160.86 toks/s, output: 106.47 toks/s][A
Processed prompts:  75%|███████▌  | 3/4 [00:07<00:02,  2.66s/it, est. speed input: 146.24 toks/s, output: 150.99 toks/s][A
Processed prompts: 100%|██████████| 4/4 [00:44<00:00, 11.06s/it, est. speed input: 31.79 toks/s, output: 117.77 toks/s] [A
Generating Responses:  34%|███▍      | 17/50 [10:59<20:04, 36.50s/it]

Processing batch of 4 queries...
Querying DB for 4 queries...



Processed prompts:   0%|          | 0/4 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s][A
Processed prompts:  25%|██▌       | 1/4 [00:02<00:06,  2.32s/it, est. speed input: 145.72 toks/s, output: 81.05 toks/s][A
Processed prompts:  50%|█████     | 2/4 [00:03<00:02,  1.40s/it, est. speed input: 238.64 toks/s, output: 143.96 toks/s][A
Processed prompts:  75%|███████▌  | 3/4 [00:07<00:02,  2.73s/it, est. speed input: 183.41 toks/s, output: 148.22 toks/s][A
Processed prompts: 100%|██████████| 4/4 [00:44<00:00, 11.07s/it, est. speed input: 38.88 toks/s, output: 117.22 toks/s] [A
Generating Responses:  36%|███▌      | 18/50 [11:43<20:43, 38.86s/it]

Processing batch of 4 queries...
Querying DB for 4 queries...



Processed prompts:   0%|          | 0/4 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s][A
Processed prompts:  25%|██▌       | 1/4 [00:01<00:03,  1.06s/it, est. speed input: 293.76 toks/s, output: 73.44 toks/s][A
Processed prompts:  50%|█████     | 2/4 [00:02<00:02,  1.04s/it, est. speed input: 418.15 toks/s, output: 118.58 toks/s][A
Processed prompts:  75%|███████▌  | 3/4 [00:03<00:01,  1.33s/it, est. speed input: 322.30 toks/s, output: 151.45 toks/s][A
Processed prompts: 100%|██████████| 4/4 [00:44<00:00, 11.02s/it, est. speed input: 38.38 toks/s, output: 105.83 toks/s] [A
Generating Responses:  38%|███▊      | 19/50 [12:28<20:54, 40.46s/it]

Processing batch of 4 queries...
Querying DB for 4 queries...



Processed prompts:   0%|          | 0/4 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s][A
Processed prompts:  25%|██▌       | 1/4 [00:02<00:08,  2.91s/it, est. speed input: 162.44 toks/s, output: 82.94 toks/s][A
Processed prompts:  50%|█████     | 2/4 [00:04<00:03,  1.92s/it, est. speed input: 220.44 toks/s, output: 143.16 toks/s][A
Processed prompts:  75%|███████▌  | 3/4 [00:05<00:01,  1.77s/it, est. speed input: 218.07 toks/s, output: 189.94 toks/s][A
Processed prompts: 100%|██████████| 4/4 [00:09<00:00,  2.27s/it, est. speed input: 173.50 toks/s, output: 209.19 toks/s][A
Generating Responses:  40%|████      | 20/50 [12:37<15:31, 31.06s/it]

Processing batch of 4 queries...
Querying DB for 4 queries...



Processed prompts:   0%|          | 0/4 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s][A
Processed prompts:  25%|██▌       | 1/4 [00:01<00:05,  1.96s/it, est. speed input: 192.42 toks/s, output: 80.43 toks/s][A
Processed prompts:  50%|█████     | 2/4 [00:03<00:03,  1.50s/it, est. speed input: 340.23 toks/s, output: 133.67 toks/s][A
Processed prompts:  75%|███████▌  | 3/4 [00:04<00:01,  1.47s/it, est. speed input: 299.65 toks/s, output: 177.52 toks/s][A
Processed prompts: 100%|██████████| 4/4 [00:07<00:00,  1.91s/it, est. speed input: 221.61 toks/s, output: 195.67 toks/s][A
Generating Responses:  42%|████▏     | 21/50 [12:44<11:37, 24.05s/it]

Processing batch of 4 queries...
Querying DB for 4 queries...



Processed prompts:   0%|          | 0/4 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s][A
Processed prompts:  25%|██▌       | 1/4 [00:02<00:08,  2.89s/it, est. speed input: 114.03 toks/s, output: 81.89 toks/s][A
Processed prompts:  50%|█████     | 2/4 [00:03<00:03,  1.66s/it, est. speed input: 174.34 toks/s, output: 147.54 toks/s][A
Processed prompts: 100%|██████████| 4/4 [00:45<00:00, 11.43s/it, est. speed input: 34.59 toks/s, output: 191.16 toks/s] [A
Generating Responses:  44%|████▍     | 22/50 [13:30<14:16, 30.57s/it]

Processing batch of 4 queries...
Querying DB for 4 queries...



Processed prompts:   0%|          | 0/4 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s][A
Processed prompts:  25%|██▌       | 1/4 [00:02<00:06,  2.08s/it, est. speed input: 166.50 toks/s, output: 81.32 toks/s][A
Processed prompts:  50%|█████     | 2/4 [00:02<00:02,  1.26s/it, est. speed input: 274.03 toks/s, output: 144.43 toks/s][A
Processed prompts:  75%|███████▌  | 3/4 [00:03<00:00,  1.07it/s, est. speed input: 360.53 toks/s, output: 205.20 toks/s][A
Processed prompts: 100%|██████████| 4/4 [00:05<00:00,  1.49s/it, est. speed input: 263.52 toks/s, output: 203.11 toks/s][A
Generating Responses:  46%|████▌     | 23/50 [13:36<10:26, 23.21s/it]

Processing batch of 4 queries...
Querying DB for 4 queries...



Processed prompts:   0%|          | 0/4 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s][A
Processed prompts:  25%|██▌       | 1/4 [00:02<00:07,  2.55s/it, est. speed input: 144.38 toks/s, output: 83.57 toks/s][A
Processed prompts:  50%|█████     | 2/4 [00:02<00:02,  1.27s/it, est. speed input: 237.30 toks/s, output: 157.17 toks/s][A
Processed prompts:  75%|███████▌  | 3/4 [00:03<00:01,  1.11s/it, est. speed input: 276.80 toks/s, output: 205.19 toks/s][A
Processed prompts: 100%|██████████| 4/4 [00:44<00:00, 11.06s/it, est. speed input: 31.95 toks/s, output: 110.45 toks/s] [A
Generating Responses:  48%|████▊     | 24/50 [14:21<12:47, 29.54s/it]

Processing batch of 4 queries...
Querying DB for 4 queries...



Processed prompts:   0%|          | 0/4 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s][A
Processed prompts:  25%|██▌       | 1/4 [00:02<00:07,  2.53s/it, est. speed input: 147.75 toks/s, output: 81.78 toks/s][A
Processed prompts:  50%|█████     | 2/4 [00:02<00:02,  1.27s/it, est. speed input: 272.06 toks/s, output: 153.46 toks/s][A
Processed prompts: 100%|██████████| 4/4 [00:45<00:00, 11.41s/it, est. speed input: 33.91 toks/s, output: 189.41 toks/s] [A
Generating Responses:  50%|█████     | 25/50 [15:06<14:19, 34.39s/it]

Processing batch of 4 queries...
Querying DB for 4 queries...



Processed prompts:   0%|          | 0/4 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s][A
Processed prompts:  25%|██▌       | 1/4 [00:02<00:08,  2.78s/it, est. speed input: 98.98 toks/s, output: 82.78 toks/s][A
Processed prompts:  50%|█████     | 2/4 [00:04<00:04,  2.02s/it, est. speed input: 150.40 toks/s, output: 138.66 toks/s][A
Processed prompts:  75%|███████▌  | 3/4 [00:05<00:01,  1.62s/it, est. speed input: 178.06 toks/s, output: 195.24 toks/s][A
Processed prompts: 100%|██████████| 4/4 [00:44<00:00, 11.06s/it, est. speed input: 36.52 toks/s, output: 116.52 toks/s] [A
Generating Responses:  52%|█████▏    | 26/50 [15:51<14:56, 37.36s/it]

Processing batch of 4 queries...
Querying DB for 4 queries...



Processed prompts:   0%|          | 0/4 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s][A
Processed prompts:  25%|██▌       | 1/4 [00:04<00:12,  4.27s/it, est. speed input: 93.54 toks/s, output: 85.33 toks/s][A
Processed prompts: 100%|██████████| 4/4 [00:47<00:00, 11.84s/it, est. speed input: 31.89 toks/s, output: 267.19 toks/s][A
Generating Responses:  54%|█████▍    | 27/50 [16:38<15:28, 40.38s/it]

Processing batch of 4 queries...
Querying DB for 4 queries...



Processed prompts:   0%|          | 0/4 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s][A
Processed prompts:  25%|██▌       | 1/4 [00:01<00:05,  1.92s/it, est. speed input: 341.63 toks/s, output: 75.63 toks/s][A
Processed prompts:  50%|█████     | 2/4 [00:02<00:02,  1.34s/it, est. speed input: 412.07 toks/s, output: 130.24 toks/s][A
Processed prompts: 100%|██████████| 4/4 [00:45<00:00, 11.46s/it, est. speed input: 54.61 toks/s, output: 186.84 toks/s] [A
Generating Responses:  56%|█████▌    | 28/50 [17:24<15:24, 42.04s/it]

Processing batch of 4 queries...
Querying DB for 4 queries...



Processed prompts:   0%|          | 0/4 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s][A
Processed prompts:  25%|██▌       | 1/4 [00:01<00:05,  1.69s/it, est. speed input: 504.06 toks/s, output: 78.00 toks/s][A
Processed prompts:  50%|█████     | 2/4 [00:03<00:03,  1.58s/it, est. speed input: 363.59 toks/s, output: 124.86 toks/s][A
Processed prompts:  75%|███████▌  | 3/4 [00:03<00:00,  1.02it/s, est. speed input: 441.96 toks/s, output: 199.03 toks/s][A
Processed prompts: 100%|██████████| 4/4 [00:44<00:00, 11.06s/it, est. speed input: 43.75 toks/s, output: 108.20 toks/s] [A
Generating Responses:  58%|█████▊    | 29/50 [18:08<14:57, 42.72s/it]

Processing batch of 4 queries...
Querying DB for 4 queries...



Processed prompts:   0%|          | 0/4 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s][A
Processed prompts:  25%|██▌       | 1/4 [00:02<00:08,  2.74s/it, est. speed input: 183.64 toks/s, output: 81.78 toks/s][A
Processed prompts:  50%|█████     | 2/4 [00:03<00:03,  1.77s/it, est. speed input: 252.92 toks/s, output: 142.22 toks/s][A
Processed prompts:  75%|███████▌  | 3/4 [00:03<00:01,  1.02s/it, est. speed input: 352.25 toks/s, output: 222.18 toks/s][A
Processed prompts: 100%|██████████| 4/4 [00:44<00:00, 11.05s/it, est. speed input: 42.66 toks/s, output: 112.58 toks/s] [A
Generating Responses:  60%|██████    | 30/50 [18:52<14:23, 43.18s/it]

Processing batch of 4 queries...
Querying DB for 4 queries...



Processed prompts:   0%|          | 0/4 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s][A
Processed prompts:  25%|██▌       | 1/4 [00:01<00:05,  1.91s/it, est. speed input: 243.85 toks/s, output: 80.06 toks/s][A
Processed prompts:  50%|█████     | 2/4 [00:02<00:02,  1.09s/it, est. speed input: 362.73 toks/s, output: 144.93 toks/s][A
Processed prompts:  75%|███████▌  | 3/4 [00:02<00:00,  1.46it/s, est. speed input: 480.00 toks/s, output: 216.59 toks/s][A
Processed prompts: 100%|██████████| 4/4 [00:44<00:00, 11.02s/it, est. speed input: 38.85 toks/s, output: 105.81 toks/s] [A
Generating Responses:  62%|██████▏   | 31/50 [19:37<13:46, 43.48s/it]

Processing batch of 4 queries...
Querying DB for 4 queries...



Processed prompts:   0%|          | 0/4 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s][A
Processed prompts:  25%|██▌       | 1/4 [00:02<00:06,  2.21s/it, est. speed input: 174.43 toks/s, output: 82.91 toks/s][A
Processed prompts:  50%|█████     | 2/4 [00:06<00:07,  3.55s/it, est. speed input: 105.08 toks/s, output: 113.45 toks/s][A
Processed prompts:  75%|███████▌  | 3/4 [00:09<00:03,  3.01s/it, est. speed input: 115.83 toks/s, output: 171.20 toks/s][A
Processed prompts: 100%|██████████| 4/4 [00:44<00:00, 11.09s/it, est. speed input: 32.67 toks/s, output: 127.27 toks/s] [A
Generating Responses:  64%|██████▍   | 32/50 [20:21<13:07, 43.77s/it]

Processing batch of 4 queries...
Querying DB for 4 queries...



Processed prompts:   0%|          | 0/4 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s][A
Processed prompts:  25%|██▌       | 1/4 [00:02<00:07,  2.37s/it, est. speed input: 562.31 toks/s, output: 79.00 toks/s][A
Processed prompts:  50%|█████     | 2/4 [00:02<00:02,  1.14s/it, est. speed input: 636.73 toks/s, output: 150.37 toks/s][A
Processed prompts:  75%|███████▌  | 3/4 [00:04<00:01,  1.44s/it, est. speed input: 466.64 toks/s, output: 173.67 toks/s][A
Processed prompts: 100%|██████████| 4/4 [00:06<00:00,  1.70s/it, est. speed input: 364.46 toks/s, output: 201.63 toks/s][A
Generating Responses:  66%|██████▌   | 33/50 [20:28<09:16, 32.71s/it]

Processing batch of 4 queries...
Querying DB for 4 queries...



Processed prompts:   0%|          | 0/4 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s][A
Processed prompts:  25%|██▌       | 1/4 [00:03<00:11,  3.81s/it, est. speed input: 87.66 toks/s, output: 85.03 toks/s][A
Processed prompts:  50%|█████     | 2/4 [00:04<00:03,  1.72s/it, est. speed input: 172.92 toks/s, output: 165.04 toks/s][A
Processed prompts:  75%|███████▌  | 3/4 [00:05<00:01,  1.61s/it, est. speed input: 177.39 toks/s, output: 207.92 toks/s][A
Processed prompts: 100%|██████████| 4/4 [00:44<00:00, 11.03s/it, est. speed input: 30.78 toks/s, output: 118.91 toks/s] [A
Generating Responses:  68%|██████▊   | 34/50 [21:12<09:38, 36.16s/it]

Processing batch of 4 queries...
Querying DB for 4 queries...



Processed prompts:   0%|          | 0/4 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s][A
Processed prompts:  25%|██▌       | 1/4 [00:01<00:04,  1.54s/it, est. speed input: 203.85 toks/s, output: 80.11 toks/s][A
Processed prompts:  50%|█████     | 2/4 [00:02<00:02,  1.09s/it, est. speed input: 270.96 toks/s, output: 136.13 toks/s][A
Processed prompts:  75%|███████▌  | 3/4 [00:02<00:00,  1.17it/s, est. speed input: 364.66 toks/s, output: 194.09 toks/s][A
Processed prompts: 100%|██████████| 4/4 [00:09<00:00,  2.36s/it, est. speed input: 145.45 toks/s, output: 150.00 toks/s][A
Generating Responses:  70%|███████   | 35/50 [21:22<07:02, 28.17s/it]

Processing batch of 4 queries...
Querying DB for 4 queries...



Processed prompts:   0%|          | 0/4 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s][A
Processed prompts:  25%|██▌       | 1/4 [00:04<00:12,  4.10s/it, est. speed input: 105.21 toks/s, output: 84.95 toks/s][A
Processed prompts:  75%|███████▌  | 3/4 [00:08<00:02,  2.66s/it, est. speed input: 145.79 toks/s, output: 171.79 toks/s][A
Processed prompts: 100%|██████████| 4/4 [00:44<00:00, 11.06s/it, est. speed input: 35.67 toks/s, output: 125.29 toks/s] [A
Generating Responses:  72%|███████▏  | 36/50 [22:06<07:42, 33.01s/it]

Processing batch of 4 queries...
Querying DB for 4 queries...



Processed prompts:   0%|          | 0/4 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s][A
Processed prompts:  25%|██▌       | 1/4 [00:02<00:07,  2.62s/it, est. speed input: 150.79 toks/s, output: 83.60 toks/s][A
Processed prompts:  50%|█████     | 2/4 [00:02<00:02,  1.19s/it, est. speed input: 268.13 toks/s, output: 161.59 toks/s][A
Processed prompts:  75%|███████▌  | 3/4 [00:08<00:03,  3.21s/it, est. speed input: 130.17 toks/s, output: 142.98 toks/s][A
Processed prompts: 100%|██████████| 4/4 [00:11<00:00,  2.91s/it, est. speed input: 122.00 toks/s, output: 194.05 toks/s][A
Generating Responses:  74%|███████▍  | 37/50 [22:18<05:46, 26.62s/it]

Processing batch of 4 queries...
Querying DB for 4 queries...



Processed prompts:   0%|          | 0/4 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s][A
Processed prompts:  25%|██▌       | 1/4 [00:04<00:12,  4.06s/it, est. speed input: 101.53 toks/s, output: 82.31 toks/s][A
Processed prompts:  50%|█████     | 2/4 [00:04<00:03,  1.89s/it, est. speed input: 218.58 toks/s, output: 158.22 toks/s][A
Processed prompts:  75%|███████▌  | 3/4 [00:05<00:01,  1.38s/it, est. speed input: 279.11 toks/s, output: 218.82 toks/s][A
Processed prompts: 100%|██████████| 4/4 [00:44<00:00, 11.07s/it, est. speed input: 41.39 toks/s, output: 118.20 toks/s] [A
Generating Responses:  76%|███████▌  | 38/50 [23:02<06:23, 31.95s/it]

Processing batch of 4 queries...
Querying DB for 4 queries...



Processed prompts:   0%|          | 0/4 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s][A
Processed prompts:  25%|██▌       | 1/4 [00:01<00:05,  1.73s/it, est. speed input: 188.11 toks/s, output: 80.21 toks/s][A
Processed prompts:  50%|█████     | 2/4 [00:03<00:03,  1.69s/it, est. speed input: 197.91 toks/s, output: 125.56 toks/s][A
Processed prompts:  75%|███████▌  | 3/4 [00:04<00:01,  1.55s/it, est. speed input: 228.89 toks/s, output: 176.12 toks/s][A
Processed prompts: 100%|██████████| 4/4 [00:05<00:00,  1.25s/it, est. speed input: 310.39 toks/s, output: 255.19 toks/s][A
Generating Responses:  78%|███████▊  | 39/50 [23:07<04:22, 23.88s/it]

Processing batch of 4 queries...
Querying DB for 4 queries...



Processed prompts:   0%|          | 0/4 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s][A
Processed prompts:  25%|██▌       | 1/4 [00:01<00:04,  1.66s/it, est. speed input: 274.55 toks/s, output: 78.87 toks/s][A
Processed prompts:  50%|█████     | 2/4 [00:02<00:02,  1.12s/it, est. speed input: 359.13 toks/s, output: 136.33 toks/s][A
Processed prompts:  75%|███████▌  | 3/4 [00:04<00:01,  1.46s/it, est. speed input: 285.06 toks/s, output: 163.06 toks/s][A
Processed prompts: 100%|██████████| 4/4 [00:05<00:00,  1.48s/it, est. speed input: 291.46 toks/s, output: 206.03 toks/s][A
Generating Responses:  80%|████████  | 40/50 [23:13<03:05, 18.52s/it]

Processing batch of 4 queries...
Querying DB for 4 queries...



Processed prompts:   0%|          | 0/4 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s][A
Processed prompts:  25%|██▌       | 1/4 [00:03<00:09,  3.09s/it, est. speed input: 130.02 toks/s, output: 84.41 toks/s][A
Processed prompts:  50%|█████     | 2/4 [00:03<00:03,  1.52s/it, est. speed input: 216.97 toks/s, output: 159.17 toks/s][A
Processed prompts:  75%|███████▌  | 3/4 [00:03<00:00,  1.01it/s, est. speed input: 296.69 toks/s, output: 230.15 toks/s][A
Processed prompts: 100%|██████████| 4/4 [00:44<00:00, 11.04s/it, est. speed input: 34.41 toks/s, output: 112.92 toks/s] [A
Generating Responses:  82%|████████▏ | 41/50 [23:57<03:56, 26.23s/it]

Processing batch of 4 queries...
Querying DB for 4 queries...



Processed prompts:   0%|          | 0/4 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s][A
Processed prompts:  25%|██▌       | 1/4 [00:02<00:08,  2.90s/it, est. speed input: 146.32 toks/s, output: 81.44 toks/s][A
Processed prompts:  50%|█████     | 2/4 [00:03<00:02,  1.45s/it, est. speed input: 269.35 toks/s, output: 153.31 toks/s][A
Processed prompts:  75%|███████▌  | 3/4 [00:09<00:03,  3.62s/it, est. speed input: 134.80 toks/s, output: 141.93 toks/s][A
Processed prompts: 100%|██████████| 4/4 [00:44<00:00, 11.11s/it, est. speed input: 37.64 toks/s, output: 122.65 toks/s] [A
Generating Responses:  84%|████████▍ | 42/50 [24:42<04:13, 31.71s/it]

Processing batch of 4 queries...
Querying DB for 4 queries...



Processed prompts:   0%|          | 0/4 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s][A
Processed prompts:  25%|██▌       | 1/4 [00:04<00:12,  4.21s/it, est. speed input: 89.79 toks/s, output: 83.62 toks/s][A
Processed prompts: 100%|██████████| 4/4 [00:46<00:00, 11.59s/it, est. speed input: 36.96 toks/s, output: 192.10 toks/s][A
Generating Responses:  86%|████████▌ | 43/50 [25:28<04:12, 36.13s/it]

Processing batch of 4 queries...
Querying DB for 4 queries...



Processed prompts:   0%|          | 0/4 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s][A
Processed prompts:  25%|██▌       | 1/4 [00:03<00:09,  3.00s/it, est. speed input: 97.88 toks/s, output: 80.57 toks/s][A
Processed prompts:  50%|█████     | 2/4 [00:03<00:02,  1.37s/it, est. speed input: 494.88 toks/s, output: 155.98 toks/s][A
Processed prompts:  75%|███████▌  | 3/4 [00:06<00:02,  2.23s/it, est. speed input: 295.36 toks/s, output: 162.82 toks/s][A
Processed prompts: 100%|██████████| 4/4 [00:44<00:00, 11.09s/it, est. speed input: 52.46 toks/s, output: 116.06 toks/s] [A
Generating Responses:  88%|████████▊ | 44/50 [26:13<03:51, 38.63s/it]

Processing batch of 4 queries...
Querying DB for 4 queries...



Processed prompts:   0%|          | 0/4 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s][A
Processed prompts:  25%|██▌       | 1/4 [00:02<00:06,  2.31s/it, est. speed input: 171.74 toks/s, output: 81.53 toks/s][A
Processed prompts:  50%|█████     | 2/4 [00:04<00:04,  2.06s/it, est. speed input: 161.42 toks/s, output: 129.71 toks/s][A
Processed prompts:  75%|███████▌  | 3/4 [00:04<00:01,  1.45s/it, est. speed input: 266.70 toks/s, output: 196.61 toks/s][A
Processed prompts: 100%|██████████| 4/4 [00:44<00:00, 11.04s/it, est. speed input: 39.56 toks/s, output: 114.60 toks/s] [A
Generating Responses:  90%|█████████ | 45/50 [26:57<03:21, 40.31s/it]

Processing batch of 4 queries...
Querying DB for 4 queries...



Processed prompts:   0%|          | 0/4 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s][A
Processed prompts:  25%|██▌       | 1/4 [00:02<00:07,  2.65s/it, est. speed input: 278.63 toks/s, output: 81.66 toks/s][A
Processed prompts:  50%|█████     | 2/4 [00:07<00:07,  3.84s/it, est. speed input: 145.71 toks/s, output: 116.08 toks/s][A
Processed prompts:  75%|███████▌  | 3/4 [00:07<00:02,  2.20s/it, est. speed input: 194.95 toks/s, output: 198.90 toks/s][A
Processed prompts: 100%|██████████| 4/4 [00:44<00:00, 11.10s/it, est. speed input: 40.60 toks/s, output: 126.18 toks/s] [A
Generating Responses:  92%|█████████▏| 46/50 [27:41<02:46, 41.56s/it]

Processing batch of 4 queries...
Querying DB for 4 queries...



Processed prompts:   0%|          | 0/4 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s][A
Processed prompts:  25%|██▌       | 1/4 [00:02<00:07,  2.56s/it, est. speed input: 116.80 toks/s, output: 83.21 toks/s][A
Processed prompts:  50%|█████     | 2/4 [00:04<00:04,  2.41s/it, est. speed input: 141.38 toks/s, output: 129.88 toks/s][A
Processed prompts: 100%|██████████| 4/4 [00:45<00:00, 11.40s/it, est. speed input: 32.60 toks/s, output: 193.58 toks/s] [A
Generating Responses:  94%|█████████▍| 47/50 [28:27<02:08, 42.79s/it]

Processing batch of 4 queries...
Querying DB for 4 queries...



Processed prompts:   0%|          | 0/4 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s][A
Processed prompts:  25%|██▌       | 1/4 [00:04<00:12,  4.02s/it, est. speed input: 96.87 toks/s, output: 84.17 toks/s][A
Processed prompts:  50%|█████     | 2/4 [00:05<00:04,  2.41s/it, est. speed input: 166.09 toks/s, output: 149.13 toks/s][A
Processed prompts:  75%|███████▌  | 3/4 [00:08<00:02,  2.58s/it, est. speed input: 166.70 toks/s, output: 185.23 toks/s][A
Processed prompts: 100%|██████████| 4/4 [00:44<00:00, 11.07s/it, est. speed input: 38.16 toks/s, output: 126.32 toks/s] [A
Generating Responses:  96%|█████████▌| 48/50 [29:11<01:26, 43.26s/it]

Processing batch of 4 queries...
Querying DB for 4 queries...



Processed prompts:   0%|          | 0/4 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s][A
Processed prompts:  25%|██▌       | 1/4 [00:02<00:08,  2.78s/it, est. speed input: 103.95 toks/s, output: 83.08 toks/s][A
Processed prompts:  50%|█████     | 2/4 [00:03<00:03,  1.78s/it, est. speed input: 178.84 toks/s, output: 144.41 toks/s][A
Processed prompts:  75%|███████▌  | 3/4 [00:04<00:01,  1.23s/it, est. speed input: 254.41 toks/s, output: 210.88 toks/s][A
Processed prompts: 100%|██████████| 4/4 [00:44<00:00, 11.06s/it, est. speed input: 36.90 toks/s, output: 113.77 toks/s] [A
Generating Responses:  98%|█████████▊| 49/50 [29:56<00:43, 43.57s/it]

Processing batch of 4 queries...
Querying DB for 4 queries...



Processed prompts:   0%|          | 0/4 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s][A
Processed prompts:  25%|██▌       | 1/4 [00:02<00:06,  2.12s/it, est. speed input: 169.00 toks/s, output: 82.38 toks/s][A
Processed prompts:  50%|█████     | 2/4 [00:03<00:03,  1.55s/it, est. speed input: 208.83 toks/s, output: 138.20 toks/s][A
Processed prompts:  75%|███████▌  | 3/4 [00:05<00:01,  1.77s/it, est. speed input: 204.24 toks/s, output: 172.53 toks/s][A
Processed prompts: 100%|██████████| 4/4 [00:44<00:00, 11.02s/it, est. speed input: 33.77 toks/s, output: 113.69 toks/s] [A
Generating Responses: 100%|██████████| 50/50 [30:40<00:00, 36.81s/it]


Example Generated Text (raw):
 A

The patient has a history of recurrent upper respiratory tract infections, which is a common cause of paroxysmal chest pain and muscle cramps. Her short stature and weight suggest a low bone density, which is why the x-ray shows osteopenia. The parathyroid hormone is involved in bone resorption, so its level should be normal. Calcitriol, a parapyrontate, is a calcium supplement and should be normal. However, the calcium level is low, which is typical for a 9-year-old with a history of infections. Parathyroid hormone levels are normal, so the parathyroid hormone level is up. Phosphorus is low because of the short stature. So the findings are: calcium ↓, phosphorus ↓, parathyroid ↑, calcitriol ↓. Wait, but the options don't have this combination. Hmm, maybe I'm missing something.

Wait, the patient has a history of recurrent upper respiratory tract infections, which is a common cause of paroxysmal chest pain and muscle cramps. Her short stature and weig




In [29]:
print("\n--- Extracting Predictions ---")
predictions = [extract_choice_mcq(text) for text in tqdm(all_outputs_text, desc="Extracting choices")]
num_invalid_responces = predictions.count(None)
print(f"\n------------------------------\nNumber of invalid responces: {num_invalid_responces}")

if len(predictions) > 0:
    print("\nExample Extracted Prediction:")
    print(predictions[0])


--- Extracting Predictions ---


Extracting choices: 100%|██████████| 200/200 [00:00<00:00, 4371.28it/s]


The patient is a restrained passenger at the time of impact, which suggests that sh...ant for assessing the heart. The absence of a wrist drop suggests that the shoulder is not a problem'

The patient's jaw episodes are described as intense, shooting pains that occur when...nt’s clinical presentation exits the skull through one of the brain structures.

The options are the'

The patient has recurrent urinary tract infections, which is a common cause of urin...The patient's abdominal examination showing palpable flank masses suggests that the primary issue is'

The patient's symptoms include abnormally weak, decreased appetite, no bowel moveme...s the patient has a decreased potassium level, which is an abnormal test. Alternatively, perhaps the'

The patient is a 70-year-old man with a history of hypertension and Alzheimer's dis...ed with foreign body aspiration or granulomatosis. The patient's symptoms include myalgia, headache,'

The patient presents with respiratory distress, facial




In [30]:
print("\n--- Calculating Metrics ---")
correct_count = 0
total_count = len(predictions)
results_by_subject = {}

if total_count != len(ground_truths):
     print(f"Warning: Mismatch between number of predictions ({total_count}) and ground truths ({len(ground_truths)}). This should not happen.")
     total_count = min(total_count, len(ground_truths))

for i in range(total_count):
    original_data_index = original_indices[i] if 'original_indices' in locals() else i
    data_item = ds_medqa[original_data_index]
    subject = data_item.get('subject_name', 'Unknown')

    pred = predictions[i]
    truth = ground_truths[i]
    is_correct = (pred == truth)

    if subject not in results_by_subject:
        results_by_subject[subject] = {'correct': 0, 'total': 0}

    if is_correct:
        correct_count += 1
        results_by_subject[subject]['correct'] += 1
    results_by_subject[subject]['total'] += 1

overall_accuracy = (correct_count / total_count) * 100 if total_count > 0 else 0


--- Calculating Metrics ---


In [31]:
print("\n--- Evaluation Results ---")
print(f"Model Evaluated: {MODEL_NAME}")
print(f"Dataset Used: {DATASET_MEDQA}")
print(f"Number of Questions Evaluated: {total_count}")
print(f"Number of Correct Answers: {correct_count}")
print(f"Overall Accuracy: {overall_accuracy:.2f}%")


--- Evaluation Results ---
Model Evaluated: RAG
Dataset Used: GBaker/MedQA-USMLE-4-options-hf
Number of Questions Evaluated: 200
Number of Correct Answers: 43
Overall Accuracy: 21.50%


### 3. MMLU medical

#### Dataset loading and preparing

In [32]:
SEED = 4242
BATCH_SIZE = 4
NUM_SAMPLES_SUBSET = 50
NUM_SAMPLES = 200
DATASET_MMLU = "cais/mmlu"
SPLIT_MMLU = "test"

MMLU_MEDICAL_SUBSETS = [
    "anatomy",
    "clinical_knowledge",
    "professional_medicine",
    "college_biology",
    "college_medicine",
    "medical_genetics",
    "professional_medicine"
]

In [33]:
datasets_mmlu = []
for subset in MMLU_MEDICAL_SUBSETS:
    ds = load_dataset(DATASET_MMLU, subset, split=SPLIT_MMLU)
    ds = ds.shuffle(seed=SEED).select(range(NUM_SAMPLES_SUBSET))
    datasets_mmlu.append(ds)


ds_mmlu = concatenate_datasets(datasets_mmlu)
ds_mmlu = ds_mmlu.shuffle(seed=SEED).select(range(NUM_SAMPLES))
ds_mmlu

README.md:   0%|          | 0.00/53.2k [00:00<?, ?B/s]

dataset_infos.json:   0%|          | 0.00/138k [00:00<?, ?B/s]

test-00000-of-00001.parquet:   0%|          | 0.00/20.1k [00:00<?, ?B/s]

validation-00000-of-00001.parquet:   0%|          | 0.00/5.28k [00:00<?, ?B/s]

dev-00000-of-00001.parquet:   0%|          | 0.00/3.50k [00:00<?, ?B/s]

Generating test split:   0%|          | 0/135 [00:00<?, ? examples/s]

Generating validation split:   0%|          | 0/14 [00:00<?, ? examples/s]

Generating dev split:   0%|          | 0/5 [00:00<?, ? examples/s]

test-00000-of-00001.parquet:   0%|          | 0.00/40.5k [00:00<?, ?B/s]

validation-00000-of-00001.parquet:   0%|          | 0.00/7.48k [00:00<?, ?B/s]

dev-00000-of-00001.parquet:   0%|          | 0.00/3.67k [00:00<?, ?B/s]

Generating test split:   0%|          | 0/265 [00:00<?, ? examples/s]

Generating validation split:   0%|          | 0/29 [00:00<?, ? examples/s]

Generating dev split:   0%|          | 0/5 [00:00<?, ? examples/s]

test-00000-of-00001.parquet:   0%|          | 0.00/125k [00:00<?, ?B/s]

validation-00000-of-00001.parquet:   0%|          | 0.00/19.9k [00:00<?, ?B/s]

dev-00000-of-00001.parquet:   0%|          | 0.00/8.45k [00:00<?, ?B/s]

Generating test split:   0%|          | 0/272 [00:00<?, ? examples/s]

Generating validation split:   0%|          | 0/31 [00:00<?, ? examples/s]

Generating dev split:   0%|          | 0/5 [00:00<?, ? examples/s]

test-00000-of-00001.parquet:   0%|          | 0.00/31.8k [00:00<?, ?B/s]

validation-00000-of-00001.parquet:   0%|          | 0.00/6.90k [00:00<?, ?B/s]

dev-00000-of-00001.parquet:   0%|          | 0.00/4.27k [00:00<?, ?B/s]

Generating test split:   0%|          | 0/144 [00:00<?, ? examples/s]

Generating validation split:   0%|          | 0/16 [00:00<?, ? examples/s]

Generating dev split:   0%|          | 0/5 [00:00<?, ? examples/s]

test-00000-of-00001.parquet:   0%|          | 0.00/42.5k [00:00<?, ?B/s]

validation-00000-of-00001.parquet:   0%|          | 0.00/8.99k [00:00<?, ?B/s]

dev-00000-of-00001.parquet:   0%|          | 0.00/4.84k [00:00<?, ?B/s]

Generating test split:   0%|          | 0/173 [00:00<?, ? examples/s]

Generating validation split:   0%|          | 0/22 [00:00<?, ? examples/s]

Generating dev split:   0%|          | 0/5 [00:00<?, ? examples/s]

test-00000-of-00001.parquet:   0%|          | 0.00/16.4k [00:00<?, ?B/s]

validation-00000-of-00001.parquet:   0%|          | 0.00/5.63k [00:00<?, ?B/s]

dev-00000-of-00001.parquet:   0%|          | 0.00/3.77k [00:00<?, ?B/s]

Generating test split:   0%|          | 0/100 [00:00<?, ? examples/s]

Generating validation split:   0%|          | 0/11 [00:00<?, ? examples/s]

Generating dev split:   0%|          | 0/5 [00:00<?, ? examples/s]

Dataset({
    features: ['question', 'subject', 'choices', 'answer'],
    num_rows: 200
})

In [34]:
ds_mmlu[0]

{'question': 'Mitochondria isolated and placed in a buffered solution with a low pH begin to manufacture ATP. Which of the following is the best explanation for the effect of low external pH?',
 'subject': 'college_biology',
 'choices': ['It increases the concentration of OH-, causing the mitochondria to pump H+ to the intermembrane space.',
  'It increases the OH- concentration in the mitochondria matrix.',
  'It increases the acid concentration in the mitochondria matrix.',
  'It increases diffusion of H+ from the intermembrane space to the matrix.'],
 'answer': 3}

#### Helper functions definition

In [35]:
def format_prompt_mmlu(example):
    """Formats a single example into a prompt for the LLM."""
    question = example['question']
    options = {
        "A": example['choices'][0],
        "B": example['choices'][1],
        "C": example['choices'][2],
        "D": example['choices'][3]
    }
    
    prompt = f"""
You are an expert in solving multiple-choice questions accurately and explaining your reasoning clearly.
Given a question and a list of answer choices (A, B, C, D), your task is to:
1. Reason shortly about the question and answer choices to find evidances to support your answer.
2. Identify the correct answer. Please choose the single best answer from the options provided.
3. Output the final answer in the format: Answer: [Option Letter]

Question: {question}
Options:
A. {options['A']}
B. {options['B']}
C. {options['C']}
D. {options['D']}

Reasoning:
    """
    return prompt

In [36]:
def get_ground_truth_mmlu(example):
    """Maps the label to the corresponding letter."""
    mapping = {0: 'A', 1: 'B', 2: 'C', 3: 'D'}
    label = example.get('answer')
    if label is None or label not in mapping:
        print(f"Warning: Invalid 'cop' value found: {label} in example ID {example.get('id')}. Skipping ground truth.")
        return None
    return mapping[label]

#### Evaluation

In [37]:
print("\n--- Preparing Prompts and Ground Truths ---")
prompts = [format_prompt_mmlu(ex) for ex in tqdm(ds_mmlu, desc="Formatting prompts")]
ground_truths = [get_ground_truth_mmlu(ex) for ex in tqdm(ds_mmlu, desc="Extracting ground truths")]
valid_indices = [i for i, gt in enumerate(ground_truths) if gt is not None]

if len(valid_indices) < len(ground_truths):
     print(f"Warning: {len(ground_truths) - len(valid_indices)} examples had invalid ground truths and were excluded.")
     prompts = [prompts[i] for i in valid_indices]
     ground_truths = [ground_truths[i] for i in valid_indices]
     original_indices = valid_indices

if len(prompts) > 0:
    print("\nExample Prompt:")
    print(prompts[0])
    print(f"Corresponding Ground Truth: {ground_truths[0]}")
else:
    print("No valid prompts to evaluate.")
    exit()


--- Preparing Prompts and Ground Truths ---


Formatting prompts: 100%|██████████| 200/200 [00:00<00:00, 9278.10it/s]
Extracting ground truths: 100%|██████████| 200/200 [00:00<00:00, 11743.98it/s]


Example Prompt:

You are an expert in solving multiple-choice questions accurately and explaining your reasoning clearly.
Given a question and a list of answer choices (A, B, C, D), your task is to:
1. Reason shortly about the question and answer choices to find evidances to support your answer.
2. Identify the correct answer. Please choose the single best answer from the options provided.
3. Output the final answer in the format: Answer: [Option Letter]

Question: Mitochondria isolated and placed in a buffered solution with a low pH begin to manufacture ATP. Which of the following is the best explanation for the effect of low external pH?
Options:
A. It increases the concentration of OH-, causing the mitochondria to pump H+ to the intermembrane space.
B. It increases the OH- concentration in the mitochondria matrix.
C. It increases the acid concentration in the mitochondria matrix.
D. It increases diffusion of H+ from the intermembrane space to the matrix.

Reasoning:
    
Correspond




In [38]:
print("\n--- Running Inference ---")
all_outputs_text = []
num_batches = math.ceil(len(prompts) / BATCH_SIZE)

for i in tqdm(range(num_batches), desc="Generating Responses"):
    start_idx = i * BATCH_SIZE
    end_idx = min((i + 1) * BATCH_SIZE, len(prompts))
    batch_prompts = prompts[start_idx:end_idx]
    batch_outputs_text = rag.get_response(batch_prompts, top_k=2)
    all_outputs_text.extend(batch_outputs_text)

if len(all_outputs_text) > 0:
    print("\nExample Generated Text (raw):")
    print(all_outputs_text[0])


--- Running Inference ---


Generating Responses:   0%|          | 0/50 [00:00<?, ?it/s]

Processing batch of 4 queries...
Querying DB for 4 queries...



Processed prompts:   0%|          | 0/4 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s][A
Processed prompts:  25%|██▌       | 1/4 [00:00<00:01,  1.57it/s, est. speed input: 369.44 toks/s, output: 62.88 toks/s][A
Processed prompts:  50%|█████     | 2/4 [00:01<00:01,  1.32it/s, est. speed input: 320.53 toks/s, output: 105.71 toks/s][A
Processed prompts:  75%|███████▌  | 3/4 [00:02<00:01,  1.00s/it, est. speed input: 273.11 toks/s, output: 140.34 toks/s][A
Processed prompts: 100%|██████████| 4/4 [00:44<00:00, 11.04s/it, est. speed input: 23.64 toks/s, output: 101.55 toks/s] [A
Generating Responses:   2%|▏         | 1/50 [00:44<36:07, 44.24s/it]

Processing batch of 4 queries...
Querying DB for 4 queries...



Processed prompts:   0%|          | 0/4 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s][A
Processed prompts:  25%|██▌       | 1/4 [00:01<00:05,  1.71s/it, est. speed input: 137.54 toks/s, output: 83.11 toks/s][A
Processed prompts: 100%|██████████| 4/4 [00:47<00:00, 11.77s/it, est. speed input: 23.17 toks/s, output: 264.00 toks/s][A
Generating Responses:   4%|▍         | 2/50 [01:31<36:45, 45.96s/it]

Processing batch of 4 queries...
Querying DB for 4 queries...



Processed prompts:   0%|          | 0/4 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s][A
Processed prompts:  25%|██▌       | 1/4 [00:00<00:02,  1.08it/s, est. speed input: 270.37 toks/s, output: 74.32 toks/s][A
Processed prompts:  50%|█████     | 2/4 [00:01<00:00,  2.22it/s, est. speed input: 468.07 toks/s, output: 141.95 toks/s][A
Processed prompts:  75%|███████▌  | 3/4 [00:03<00:01,  1.22s/it, est. speed input: 308.57 toks/s, output: 133.19 toks/s][A
Processed prompts: 100%|██████████| 4/4 [00:44<00:00, 11.01s/it, est. speed input: 31.42 toks/s, output: 102.59 toks/s] [A
Generating Responses:   6%|▌         | 3/50 [02:15<35:20, 45.12s/it]

Processing batch of 4 queries...
Querying DB for 4 queries...



Processed prompts:   0%|          | 0/4 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s][A
Processed prompts:  25%|██▌       | 1/4 [00:01<00:04,  1.57s/it, est. speed input: 151.25 toks/s, output: 81.98 toks/s][A
Processed prompts:  50%|█████     | 2/4 [00:03<00:03,  1.77s/it, est. speed input: 181.16 toks/s, output: 122.97 toks/s][A
Processed prompts:  75%|███████▌  | 3/4 [00:04<00:01,  1.41s/it, est. speed input: 216.31 toks/s, output: 183.01 toks/s][A
Processed prompts: 100%|██████████| 4/4 [00:44<00:00, 11.01s/it, est. speed input: 27.44 toks/s, output: 111.63 toks/s] [A
Generating Responses:   8%|▊         | 4/50 [02:59<34:17, 44.72s/it]

Processing batch of 4 queries...
Querying DB for 4 queries...



Processed prompts:   0%|          | 0/4 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s][A
Processed prompts:  25%|██▌       | 1/4 [00:01<00:03,  1.00s/it, est. speed input: 228.08 toks/s, output: 73.70 toks/s][A
Processed prompts:  50%|█████     | 2/4 [00:01<00:01,  1.03it/s, est. speed input: 487.02 toks/s, output: 119.18 toks/s][A
Processed prompts:  75%|███████▌  | 3/4 [00:02<00:00,  1.71it/s, est. speed input: 572.39 toks/s, output: 193.36 toks/s][A
Processed prompts: 100%|██████████| 4/4 [00:06<00:00,  1.63s/it, est. speed input: 224.22 toks/s, output: 151.98 toks/s][A
Generating Responses:  10%|█         | 5/50 [03:06<23:14, 30.98s/it]

Processing batch of 4 queries...
Querying DB for 4 queries...



Processed prompts:   0%|          | 0/4 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s][A
Processed prompts:  25%|██▌       | 1/4 [00:00<00:02,  1.04it/s, est. speed input: 340.09 toks/s, output: 78.80 toks/s][A
Processed prompts:  75%|███████▌  | 3/4 [00:01<00:00,  2.39it/s, est. speed input: 558.60 toks/s, output: 192.54 toks/s][A
Processed prompts: 100%|██████████| 4/4 [00:03<00:00,  1.29it/s, est. speed input: 338.19 toks/s, output: 177.65 toks/s][A
Generating Responses:  12%|█▏        | 6/50 [03:09<15:46, 21.52s/it]

Processing batch of 4 queries...
Querying DB for 4 queries...



Processed prompts:   0%|          | 0/4 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s][A
Processed prompts:  25%|██▌       | 1/4 [00:01<00:04,  1.39s/it, est. speed input: 154.42 toks/s, output: 80.09 toks/s][A
Processed prompts:  50%|█████     | 2/4 [00:02<00:02,  1.50s/it, est. speed input: 179.63 toks/s, output: 122.56 toks/s][A
Processed prompts:  75%|███████▌  | 3/4 [00:05<00:02,  2.02s/it, est. speed input: 167.67 toks/s, output: 152.35 toks/s][A
Processed prompts: 100%|██████████| 4/4 [00:10<00:00,  2.70s/it, est. speed input: 107.81 toks/s, output: 169.07 toks/s][A
Generating Responses:  14%|█▍        | 7/50 [03:20<12:55, 18.04s/it]

Processing batch of 4 queries...
Querying DB for 4 queries...



Processed prompts:   0%|          | 0/4 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s][A
Processed prompts:  25%|██▌       | 1/4 [00:01<00:04,  1.40s/it, est. speed input: 179.40 toks/s, output: 80.44 toks/s][A
Processed prompts:  50%|█████     | 2/4 [00:01<00:01,  1.52it/s, est. speed input: 320.11 toks/s, output: 154.53 toks/s][A
Processed prompts:  75%|███████▌  | 3/4 [00:02<00:00,  1.14it/s, est. speed input: 297.84 toks/s, output: 174.45 toks/s][A
Processed prompts: 100%|██████████| 4/4 [00:03<00:00,  1.10it/s, est. speed input: 329.05 toks/s, output: 217.07 toks/s][A
Generating Responses:  16%|█▌        | 8/50 [03:23<09:25, 13.47s/it]

Processing batch of 4 queries...
Querying DB for 4 queries...



Processed prompts:   0%|          | 0/4 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s][A
Processed prompts:  25%|██▌       | 1/4 [00:00<00:02,  1.00it/s, est. speed input: 236.28 toks/s, output: 81.09 toks/s][A
Processed prompts:  50%|█████     | 2/4 [00:01<00:01,  1.70it/s, est. speed input: 355.81 toks/s, output: 145.56 toks/s][A
Processed prompts:  75%|███████▌  | 3/4 [00:01<00:00,  2.29it/s, est. speed input: 460.30 toks/s, output: 206.07 toks/s][A
Processed prompts: 100%|██████████| 4/4 [00:02<00:00,  1.75it/s, est. speed input: 410.96 toks/s, output: 228.94 toks/s][A
Generating Responses:  18%|█▊        | 9/50 [03:26<06:49, 10.00s/it]

Processing batch of 4 queries...
Querying DB for 4 queries...



Processed prompts:   0%|          | 0/4 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s][A
Processed prompts:  25%|██▌       | 1/4 [00:01<00:03,  1.17s/it, est. speed input: 212.31 toks/s, output: 79.30 toks/s][A
Processed prompts:  50%|█████     | 2/4 [00:01<00:01,  1.61it/s, est. speed input: 359.48 toks/s, output: 146.77 toks/s][A
Processed prompts: 100%|██████████| 4/4 [00:03<00:00,  1.01it/s, est. speed input: 301.96 toks/s, output: 170.75 toks/s][A
Generating Responses:  20%|██        | 10/50 [03:30<05:26,  8.16s/it]

Processing batch of 4 queries...
Querying DB for 4 queries...



Processed prompts:   0%|          | 0/4 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s][A
Processed prompts:  25%|██▌       | 1/4 [00:01<00:03,  1.31s/it, est. speed input: 179.64 toks/s, output: 81.79 toks/s][A
Processed prompts: 100%|██████████| 4/4 [00:02<00:00,  1.75it/s, est. speed input: 451.39 toks/s, output: 265.06 toks/s][A
Generating Responses:  22%|██▏       | 11/50 [03:32<04:08,  6.38s/it]

Processing batch of 4 queries...
Querying DB for 4 queries...



Processed prompts:   0%|          | 0/4 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s][A
Processed prompts:  25%|██▌       | 1/4 [00:01<00:03,  1.05s/it, est. speed input: 230.66 toks/s, output: 77.84 toks/s][A
Processed prompts:  50%|█████     | 2/4 [00:02<00:02,  1.20s/it, est. speed input: 272.54 toks/s, output: 119.10 toks/s][A
Processed prompts:  75%|███████▌  | 3/4 [00:09<00:04,  4.10s/it, est. speed input: 88.32 toks/s, output: 118.10 toks/s] [A
Processed prompts: 100%|██████████| 4/4 [00:44<00:00, 11.06s/it, est. speed input: 28.16 toks/s, output: 119.02 toks/s][A
Generating Responses:  24%|██▍       | 12/50 [04:17<11:20, 17.92s/it]

Processing batch of 4 queries...
Querying DB for 4 queries...



Processed prompts:   0%|          | 0/4 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s][A
Processed prompts:  25%|██▌       | 1/4 [00:00<00:02,  1.24it/s, est. speed input: 314.74 toks/s, output: 70.63 toks/s][A
Processed prompts:  50%|█████     | 2/4 [00:04<00:04,  2.37s/it, est. speed input: 117.95 toks/s, output: 97.59 toks/s][A
Processed prompts: 100%|██████████| 4/4 [00:45<00:00, 11.40s/it, est. speed input: 27.49 toks/s, output: 188.72 toks/s][A
Generating Responses:  26%|██▌       | 13/50 [05:02<16:14, 26.33s/it]

Processing batch of 4 queries...
Querying DB for 4 queries...



Processed prompts:   0%|          | 0/4 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s][A
Processed prompts:  25%|██▌       | 1/4 [00:01<00:03,  1.17s/it, est. speed input: 205.33 toks/s, output: 77.85 toks/s][A
Processed prompts:  50%|█████     | 2/4 [00:02<00:02,  1.01s/it, est. speed input: 226.24 toks/s, output: 126.87 toks/s][A
Processed prompts: 100%|██████████| 4/4 [00:45<00:00, 11.37s/it, est. speed input: 28.28 toks/s, output: 185.92 toks/s] [A
Generating Responses:  28%|██▊       | 14/50 [05:48<19:16, 32.13s/it]

Processing batch of 4 queries...
Querying DB for 4 queries...



Processed prompts:   0%|          | 0/4 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s][A
Processed prompts:  25%|██▌       | 1/4 [00:01<00:03,  1.01s/it, est. speed input: 226.22 toks/s, output: 79.37 toks/s][A
Processed prompts: 100%|██████████| 4/4 [00:45<00:00, 11.35s/it, est. speed input: 22.60 toks/s, output: 183.99 toks/s][A
Generating Responses:  30%|███       | 15/50 [06:33<21:05, 36.16s/it]

Processing batch of 4 queries...
Querying DB for 4 queries...



Processed prompts:   0%|          | 0/4 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s][A
Processed prompts:  25%|██▌       | 1/4 [00:01<00:04,  1.38s/it, est. speed input: 194.68 toks/s, output: 81.78 toks/s][A
Processed prompts:  50%|█████     | 2/4 [00:01<00:01,  1.35it/s, est. speed input: 296.89 toks/s, output: 150.54 toks/s][A
Processed prompts:  75%|███████▌  | 3/4 [00:02<00:00,  1.30it/s, est. speed input: 292.84 toks/s, output: 187.56 toks/s][A
Processed prompts: 100%|██████████| 4/4 [00:02<00:00,  1.55it/s, est. speed input: 420.58 toks/s, output: 265.95 toks/s][A
Generating Responses:  32%|███▏      | 16/50 [06:36<14:46, 26.07s/it]

Processing batch of 4 queries...
Querying DB for 4 queries...



Processed prompts:   0%|          | 0/4 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s][A
Processed prompts:  25%|██▌       | 1/4 [00:00<00:01,  1.58it/s, est. speed input: 378.63 toks/s, output: 75.72 toks/s][A
Processed prompts:  50%|█████     | 2/4 [00:01<00:01,  1.48it/s, est. speed input: 342.30 toks/s, output: 119.32 toks/s][A
Processed prompts: 100%|██████████| 4/4 [00:04<00:00,  1.00s/it, est. speed input: 242.24 toks/s, output: 159.58 toks/s][A
Generating Responses:  34%|███▍      | 17/50 [06:40<10:42, 19.46s/it]

Processing batch of 4 queries...
Querying DB for 4 queries...



Processed prompts:   0%|          | 0/4 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s][A
Processed prompts:  25%|██▌       | 1/4 [00:01<00:03,  1.05s/it, est. speed input: 306.87 toks/s, output: 80.05 toks/s][A
Processed prompts:  50%|█████     | 2/4 [00:01<00:01,  1.96it/s, est. speed input: 459.46 toks/s, output: 152.02 toks/s][A
Processed prompts:  75%|███████▌  | 3/4 [00:01<00:00,  1.89it/s, est. speed input: 456.06 toks/s, output: 188.53 toks/s][A
Processed prompts: 100%|██████████| 4/4 [00:03<00:00,  1.31it/s, est. speed input: 344.19 toks/s, output: 196.58 toks/s][A
Generating Responses:  36%|███▌      | 18/50 [06:43<07:45, 14.55s/it]

Processing batch of 4 queries...
Querying DB for 4 queries...



Processed prompts:   0%|          | 0/4 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s][A
Processed prompts:  25%|██▌       | 1/4 [00:01<00:03,  1.30s/it, est. speed input: 220.46 toks/s, output: 80.17 toks/s][A
Processed prompts:  50%|█████     | 2/4 [00:04<00:04,  2.18s/it, est. speed input: 166.98 toks/s, output: 111.97 toks/s][A
Processed prompts:  75%|███████▌  | 3/4 [00:06<00:02,  2.15s/it, est. speed input: 152.33 toks/s, output: 161.98 toks/s][A
Processed prompts: 100%|██████████| 4/4 [00:09<00:00,  2.29s/it, est. speed input: 136.37 toks/s, output: 199.97 toks/s][A
Generating Responses:  38%|███▊      | 19/50 [06:52<06:41, 12.95s/it]

Processing batch of 4 queries...
Querying DB for 4 queries...



Processed prompts:   0%|          | 0/4 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s][A
Processed prompts:  25%|██▌       | 1/4 [00:01<00:03,  1.14s/it, est. speed input: 197.01 toks/s, output: 80.55 toks/s][A
Processed prompts: 100%|██████████| 4/4 [00:08<00:00,  2.13s/it, est. speed input: 125.35 toks/s, output: 125.35 toks/s][A
Generating Responses:  40%|████      | 20/50 [07:01<05:49, 11.64s/it]

Processing batch of 4 queries...
Querying DB for 4 queries...



Processed prompts:   0%|          | 0/4 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s][A
Processed prompts:  25%|██▌       | 1/4 [00:01<00:05,  1.91s/it, est. speed input: 120.52 toks/s, output: 83.84 toks/s][A
Processed prompts:  50%|█████     | 2/4 [00:02<00:02,  1.05s/it, est. speed input: 190.52 toks/s, output: 152.76 toks/s][A
Processed prompts:  75%|███████▌  | 3/4 [00:03<00:00,  1.14it/s, est. speed input: 264.99 toks/s, output: 205.44 toks/s][A
Processed prompts: 100%|██████████| 4/4 [00:04<00:00,  1.13s/it, est. speed input: 225.99 toks/s, output: 225.99 toks/s][A
Generating Responses:  42%|████▏     | 21/50 [07:06<04:36,  9.53s/it]

Processing batch of 4 queries...
Querying DB for 4 queries...



Processed prompts:   0%|          | 0/4 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s][A
Processed prompts:  25%|██▌       | 1/4 [00:01<00:03,  1.05s/it, est. speed input: 227.21 toks/s, output: 77.00 toks/s][A
Processed prompts:  50%|█████     | 2/4 [00:05<00:05,  2.83s/it, est. speed input: 134.87 toks/s, output: 102.51 toks/s][A
Processed prompts:  75%|███████▌  | 3/4 [00:09<00:03,  3.54s/it, est. speed input: 112.77 toks/s, output: 144.33 toks/s][A
Processed prompts: 100%|██████████| 4/4 [00:44<00:00, 11.06s/it, est. speed input: 30.00 toks/s, output: 123.60 toks/s] [A
Generating Responses:  44%|████▍     | 22/50 [07:50<09:19, 19.97s/it]

Processing batch of 4 queries...
Querying DB for 4 queries...



Processed prompts:   0%|          | 0/4 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s][A
Processed prompts:  25%|██▌       | 1/4 [00:02<00:07,  2.49s/it, est. speed input: 97.51 toks/s, output: 84.27 toks/s][A
Processed prompts:  50%|█████     | 2/4 [00:05<00:05,  2.61s/it, est. speed input: 117.77 toks/s, output: 127.42 toks/s][A
Processed prompts: 100%|██████████| 4/4 [00:45<00:00, 11.40s/it, est. speed input: 27.70 toks/s, output: 194.11 toks/s] [A
Generating Responses:  46%|████▌     | 23/50 [08:35<12:27, 27.68s/it]

Processing batch of 4 queries...
Querying DB for 4 queries...



Processed prompts:   0%|          | 0/4 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s][A
Processed prompts:  25%|██▌       | 1/4 [00:01<00:03,  1.32s/it, est. speed input: 176.09 toks/s, output: 81.97 toks/s][A
Processed prompts:  75%|███████▌  | 3/4 [00:02<00:00,  1.23it/s, est. speed input: 289.51 toks/s, output: 172.78 toks/s][A
Processed prompts: 100%|██████████| 4/4 [00:04<00:00,  1.03s/it, est. speed input: 250.76 toks/s, output: 198.55 toks/s][A
Generating Responses:  48%|████▊     | 24/50 [08:40<08:56, 20.62s/it]

Processing batch of 4 queries...
Querying DB for 4 queries...



Processed prompts:   0%|          | 0/4 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s][A
Processed prompts:  25%|██▌       | 1/4 [00:01<00:05,  1.95s/it, est. speed input: 122.35 toks/s, output: 83.95 toks/s][A
Processed prompts:  50%|█████     | 2/4 [00:02<00:01,  1.02it/s, est. speed input: 250.60 toks/s, output: 157.18 toks/s][A
Processed prompts:  75%|███████▌  | 3/4 [00:02<00:00,  1.38it/s, est. speed input: 302.61 toks/s, output: 218.45 toks/s][A
Processed prompts: 100%|██████████| 4/4 [00:02<00:00,  1.38it/s, est. speed input: 383.70 toks/s, output: 288.46 toks/s][A
Generating Responses:  50%|█████     | 25/50 [08:43<06:23, 15.33s/it]

Processing batch of 4 queries...
Querying DB for 4 queries...



Processed prompts:   0%|          | 0/4 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s][A
Processed prompts:  25%|██▌       | 1/4 [00:00<00:02,  1.14it/s, est. speed input: 539.32 toks/s, output: 76.07 toks/s][A
Processed prompts:  50%|█████     | 2/4 [00:01<00:01,  1.62it/s, est. speed input: 534.77 toks/s, output: 131.60 toks/s][A
Processed prompts: 100%|██████████| 4/4 [00:02<00:00,  1.57it/s, est. speed input: 460.44 toks/s, output: 199.68 toks/s][A
Generating Responses:  52%|█████▏    | 26/50 [08:45<04:36, 11.52s/it]

Processing batch of 4 queries...
Querying DB for 4 queries...



Processed prompts:   0%|          | 0/4 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s][A
Processed prompts:  25%|██▌       | 1/4 [00:01<00:03,  1.14s/it, est. speed input: 284.82 toks/s, output: 80.38 toks/s][A
Processed prompts:  50%|█████     | 2/4 [00:02<00:02,  1.15s/it, est. speed input: 251.23 toks/s, output: 125.18 toks/s][A
Processed prompts:  75%|███████▌  | 3/4 [00:02<00:00,  1.11it/s, est. speed input: 303.56 toks/s, output: 185.43 toks/s][A
Processed prompts: 100%|██████████| 4/4 [00:43<00:00, 10.99s/it, est. speed input: 25.46 toks/s, output: 105.48 toks/s] [A
Generating Responses:  54%|█████▍    | 27/50 [09:29<08:09, 21.27s/it]

Processing batch of 4 queries...
Querying DB for 4 queries...



Processed prompts:   0%|          | 0/4 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s][A
Processed prompts:  25%|██▌       | 1/4 [00:01<00:03,  1.00s/it, est. speed input: 235.85 toks/s, output: 75.63 toks/s][A
Processed prompts:  50%|█████     | 2/4 [00:01<00:01,  1.60it/s, est. speed input: 341.63 toks/s, output: 134.89 toks/s][A
Processed prompts:  75%|███████▌  | 3/4 [00:02<00:00,  1.26it/s, est. speed input: 367.80 toks/s, output: 162.66 toks/s][A
Processed prompts: 100%|██████████| 4/4 [00:44<00:00, 11.02s/it, est. speed input: 29.14 toks/s, output: 101.65 toks/s] [A
Generating Responses:  56%|█████▌    | 28/50 [10:13<10:18, 28.13s/it]

Processing batch of 4 queries...
Querying DB for 4 queries...



Processed prompts:   0%|          | 0/4 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s][A
Processed prompts:  25%|██▌       | 1/4 [00:01<00:03,  1.12s/it, est. speed input: 230.17 toks/s, output: 81.50 toks/s][A
Processed prompts:  50%|█████     | 2/4 [00:01<00:01,  1.51it/s, est. speed input: 321.60 toks/s, output: 145.74 toks/s][A
Processed prompts:  75%|███████▌  | 3/4 [00:01<00:00,  2.25it/s, est. speed input: 429.22 toks/s, output: 213.70 toks/s][A
Processed prompts: 100%|██████████| 4/4 [00:03<00:00,  1.08it/s, est. speed input: 258.95 toks/s, output: 184.93 toks/s][A
Generating Responses:  58%|█████▊    | 29/50 [10:17<07:17, 20.83s/it]

Processing batch of 4 queries...
Querying DB for 4 queries...



Processed prompts:   0%|          | 0/4 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s][A
Processed prompts:  25%|██▌       | 1/4 [00:01<00:03,  1.14s/it, est. speed input: 225.41 toks/s, output: 79.81 toks/s][A
Processed prompts:  50%|█████     | 2/4 [00:01<00:01,  1.81it/s, est. speed input: 389.91 toks/s, output: 151.59 toks/s][A
Processed prompts:  75%|███████▌  | 3/4 [00:04<00:01,  1.61s/it, est. speed input: 210.81 toks/s, output: 134.76 toks/s][A
Processed prompts: 100%|██████████| 4/4 [00:43<00:00, 11.00s/it, est. speed input: 25.07 toks/s, output: 105.84 toks/s] [A
Generating Responses:  60%|██████    | 30/50 [11:01<09:15, 27.80s/it]

Processing batch of 4 queries...
Querying DB for 4 queries...



Processed prompts:   0%|          | 0/4 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s][A
Processed prompts:  25%|██▌       | 1/4 [00:01<00:04,  1.59s/it, est. speed input: 164.90 toks/s, output: 82.14 toks/s][A
Processed prompts:  50%|█████     | 2/4 [00:07<00:07,  3.93s/it, est. speed input: 85.46 toks/s, output: 106.13 toks/s][A
Processed prompts: 100%|██████████| 4/4 [00:45<00:00, 11.41s/it, est. speed input: 25.67 toks/s, output: 196.11 toks/s][A
Generating Responses:  62%|██████▏   | 31/50 [11:47<10:30, 33.17s/it]

Processing batch of 4 queries...
Querying DB for 4 queries...



Processed prompts:   0%|          | 0/4 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s][A
Processed prompts:  25%|██▌       | 1/4 [00:00<00:02,  1.19it/s, est. speed input: 295.82 toks/s, output: 78.72 toks/s][A
Processed prompts: 100%|██████████| 4/4 [00:45<00:00, 11.34s/it, est. speed input: 21.00 toks/s, output: 183.64 toks/s][A
Generating Responses:  64%|██████▍   | 32/50 [12:32<11:03, 36.86s/it]

Processing batch of 4 queries...
Querying DB for 4 queries...



Processed prompts:   0%|          | 0/4 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s][A
Processed prompts:  25%|██▌       | 1/4 [00:00<00:02,  1.01it/s, est. speed input: 301.33 toks/s, output: 78.12 toks/s][A
Processed prompts:  50%|█████     | 2/4 [00:01<00:00,  2.01it/s, est. speed input: 476.10 toks/s, output: 147.03 toks/s][A
Processed prompts:  75%|███████▌  | 3/4 [00:02<00:00,  1.01it/s, est. speed input: 324.54 toks/s, output: 148.79 toks/s][A
Processed prompts: 100%|██████████| 4/4 [00:43<00:00, 10.98s/it, est. speed input: 26.32 toks/s, output: 102.45 toks/s] [A
Generating Responses:  66%|██████▌   | 33/50 [13:16<11:02, 39.00s/it]

Processing batch of 4 queries...
Querying DB for 4 queries...



Processed prompts:   0%|          | 0/4 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s][A
Processed prompts:  25%|██▌       | 1/4 [00:01<00:05,  1.80s/it, est. speed input: 134.51 toks/s, output: 80.59 toks/s][A
Processed prompts:  50%|█████     | 2/4 [00:03<00:03,  1.50s/it, est. speed input: 271.56 toks/s, output: 130.93 toks/s][A
Processed prompts:  75%|███████▌  | 3/4 [00:04<00:01,  1.25s/it, est. speed input: 298.57 toks/s, output: 185.77 toks/s][A
Processed prompts: 100%|██████████| 4/4 [00:44<00:00, 11.04s/it, est. speed input: 33.42 toks/s, output: 109.73 toks/s] [A
Generating Responses:  68%|██████▊   | 34/50 [14:01<10:49, 40.57s/it]

Processing batch of 4 queries...
Querying DB for 4 queries...



Processed prompts:   0%|          | 0/4 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s][A
Processed prompts:  25%|██▌       | 1/4 [00:01<00:03,  1.28s/it, est. speed input: 230.02 toks/s, output: 79.53 toks/s][A
Processed prompts:  50%|█████     | 2/4 [00:03<00:03,  1.72s/it, est. speed input: 174.19 toks/s, output: 116.43 toks/s][A
Processed prompts:  75%|███████▌  | 3/4 [00:03<00:00,  1.00it/s, est. speed input: 291.80 toks/s, output: 197.34 toks/s][A
Processed prompts: 100%|██████████| 4/4 [00:06<00:00,  1.71s/it, est. speed input: 180.69 toks/s, output: 189.79 toks/s][A
Generating Responses:  70%|███████   | 35/50 [14:08<07:37, 30.47s/it]

Processing batch of 4 queries...
Querying DB for 4 queries...



Processed prompts:   0%|          | 0/4 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s][A
Processed prompts:  25%|██▌       | 1/4 [00:00<00:02,  1.15it/s, est. speed input: 264.55 toks/s, output: 77.06 toks/s][A
Processed prompts:  50%|█████     | 2/4 [00:01<00:01,  1.91it/s, est. speed input: 421.29 toks/s, output: 138.40 toks/s][A
Processed prompts:  75%|███████▌  | 3/4 [00:02<00:00,  1.19it/s, est. speed input: 307.31 toks/s, output: 152.81 toks/s][A
Processed prompts: 100%|██████████| 4/4 [00:43<00:00, 10.98s/it, est. speed input: 24.55 toks/s, output: 101.53 toks/s] [A
Generating Responses:  72%|███████▏  | 36/50 [14:52<08:03, 34.52s/it]

Processing batch of 4 queries...
Querying DB for 4 queries...



Processed prompts:   0%|          | 0/4 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s][A
Processed prompts:  25%|██▌       | 1/4 [00:01<00:03,  1.01s/it, est. speed input: 233.35 toks/s, output: 77.12 toks/s][A
Processed prompts:  50%|█████     | 2/4 [00:03<00:04,  2.09s/it, est. speed input: 118.57 toks/s, output: 106.40 toks/s][A
Processed prompts:  75%|███████▌  | 3/4 [00:04<00:01,  1.25s/it, est. speed input: 208.48 toks/s, output: 186.37 toks/s][A
Processed prompts: 100%|██████████| 4/4 [00:05<00:00,  1.36s/it, est. speed input: 229.67 toks/s, output: 229.12 toks/s][A
Generating Responses:  74%|███████▍  | 37/50 [14:57<05:35, 25.82s/it]

Processing batch of 4 queries...
Querying DB for 4 queries...



Processed prompts:   0%|          | 0/4 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s][A
Processed prompts:  25%|██▌       | 1/4 [00:01<00:03,  1.04s/it, est. speed input: 267.24 toks/s, output: 78.54 toks/s][A
Processed prompts:  50%|█████     | 2/4 [00:01<00:01,  1.23it/s, est. speed input: 301.93 toks/s, output: 130.07 toks/s][A
Processed prompts: 100%|██████████| 4/4 [00:43<00:00, 10.98s/it, est. speed input: 22.32 toks/s, output: 101.66 toks/s] [A
Generating Responses:  76%|███████▌  | 38/50 [15:41<06:15, 31.27s/it]

Processing batch of 4 queries...
Querying DB for 4 queries...



Processed prompts:   0%|          | 0/4 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s][A
Processed prompts:  25%|██▌       | 1/4 [00:00<00:01,  1.51it/s, est. speed input: 343.53 toks/s, output: 72.63 toks/s][A
Processed prompts:  50%|█████     | 2/4 [00:05<00:05,  2.85s/it, est. speed input: 97.62 toks/s, output: 97.02 toks/s] [A
Processed prompts:  75%|███████▌  | 3/4 [00:06<00:02,  2.26s/it, est. speed input: 138.18 toks/s, output: 162.76 toks/s][A
Processed prompts: 100%|██████████| 4/4 [00:09<00:00,  2.33s/it, est. speed input: 122.61 toks/s, output: 205.21 toks/s][A
Generating Responses:  78%|███████▊  | 39/50 [15:50<04:31, 24.71s/it]

Processing batch of 4 queries...
Querying DB for 4 queries...



Processed prompts:   0%|          | 0/4 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s][A
Processed prompts:  25%|██▌       | 1/4 [00:01<00:04,  1.48s/it, est. speed input: 178.78 toks/s, output: 81.26 toks/s][A
Processed prompts:  50%|█████     | 2/4 [00:02<00:02,  1.17s/it, est. speed input: 255.75 toks/s, output: 133.61 toks/s][A
Processed prompts:  75%|███████▌  | 3/4 [00:02<00:00,  1.19it/s, est. speed input: 311.13 toks/s, output: 198.62 toks/s][A
Processed prompts: 100%|██████████| 4/4 [00:04<00:00,  1.01s/it, est. speed input: 308.97 toks/s, output: 229.56 toks/s][A
Generating Responses:  80%|████████  | 40/50 [15:55<03:05, 18.53s/it]

Processing batch of 4 queries...
Querying DB for 4 queries...



Processed prompts:   0%|          | 0/4 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s][A
Processed prompts:  25%|██▌       | 1/4 [00:01<00:03,  1.27s/it, est. speed input: 171.46 toks/s, output: 79.44 toks/s][A
Processed prompts:  50%|█████     | 2/4 [00:01<00:01,  1.71it/s, est. speed input: 346.34 toks/s, output: 153.52 toks/s][A
Processed prompts:  75%|███████▌  | 3/4 [00:06<00:02,  2.67s/it, est. speed input: 119.40 toks/s, output: 121.54 toks/s][A
Processed prompts: 100%|██████████| 4/4 [00:44<00:00, 11.05s/it, est. speed input: 28.94 toks/s, output: 110.66 toks/s] [A
Generating Responses:  82%|████████▏ | 41/50 [16:39<03:56, 26.25s/it]

Processing batch of 4 queries...
Querying DB for 4 queries...



Processed prompts:   0%|          | 0/4 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s][A
Processed prompts:  25%|██▌       | 1/4 [00:00<00:01,  1.68it/s, est. speed input: 408.52 toks/s, output: 72.29 toks/s][A
Processed prompts:  75%|███████▌  | 3/4 [00:02<00:00,  1.16it/s, est. speed input: 313.72 toks/s, output: 124.69 toks/s][A
Processed prompts: 100%|██████████| 4/4 [00:04<00:00,  1.00s/it, est. speed input: 262.18 toks/s, output: 167.64 toks/s][A
Generating Responses:  84%|████████▍ | 42/50 [16:43<02:36, 19.60s/it]

Processing batch of 4 queries...
Querying DB for 4 queries...



Processed prompts:   0%|          | 0/4 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s][A
Processed prompts:  25%|██▌       | 1/4 [00:01<00:05,  1.94s/it, est. speed input: 151.22 toks/s, output: 82.58 toks/s][A
Processed prompts:  75%|███████▌  | 3/4 [00:02<00:00,  1.19it/s, est. speed input: 364.16 toks/s, output: 198.44 toks/s][A
Processed prompts: 100%|██████████| 4/4 [00:43<00:00, 10.99s/it, est. speed input: 28.89 toks/s, output: 106.00 toks/s] [A
Generating Responses:  86%|████████▌ | 43/50 [17:27<03:08, 26.93s/it]

Processing batch of 4 queries...
Querying DB for 4 queries...



Processed prompts:   0%|          | 0/4 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s][A
Processed prompts:  25%|██▌       | 1/4 [00:02<00:08,  2.77s/it, est. speed input: 121.07 toks/s, output: 84.93 toks/s][A
Processed prompts:  50%|█████     | 2/4 [00:02<00:02,  1.25s/it, est. speed input: 220.65 toks/s, output: 164.56 toks/s][A
Processed prompts: 100%|██████████| 4/4 [00:44<00:00, 11.02s/it, est. speed input: 25.25 toks/s, output: 109.86 toks/s] [A
Generating Responses:  88%|████████▊ | 44/50 [18:11<03:12, 32.10s/it]

Processing batch of 4 queries...
Querying DB for 4 queries...



Processed prompts:   0%|          | 0/4 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s][A
Processed prompts:  25%|██▌       | 1/4 [00:03<00:10,  3.37s/it, est. speed input: 68.25 toks/s, output: 84.87 toks/s][A
Processed prompts:  50%|█████     | 2/4 [00:06<00:06,  3.44s/it, est. speed input: 79.94 toks/s, output: 127.49 toks/s][A
Processed prompts: 100%|██████████| 4/4 [00:46<00:00, 11.52s/it, est. speed input: 26.94 toks/s, output: 196.67 toks/s][A
Generating Responses:  90%|█████████ | 45/50 [18:57<03:01, 36.32s/it]

Processing batch of 4 queries...
Querying DB for 4 queries...



Processed prompts:   0%|          | 0/4 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s][A
Processed prompts:  25%|██▌       | 1/4 [00:02<00:06,  2.03s/it, est. speed input: 146.87 toks/s, output: 82.80 toks/s][A
Processed prompts:  50%|█████     | 2/4 [00:02<00:01,  1.02it/s, est. speed input: 252.31 toks/s, output: 157.09 toks/s][A
Processed prompts:  75%|███████▌  | 3/4 [00:03<00:01,  1.24s/it, est. speed input: 261.49 toks/s, output: 179.73 toks/s][A
Processed prompts: 100%|██████████| 4/4 [00:44<00:00, 11.04s/it, est. speed input: 28.95 toks/s, output: 108.38 toks/s] [A
Generating Responses:  92%|█████████▏| 46/50 [19:41<02:34, 38.69s/it]

Processing batch of 4 queries...
Querying DB for 4 queries...



Processed prompts:   0%|          | 0/4 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s][A
Processed prompts:  25%|██▌       | 1/4 [00:01<00:03,  1.09s/it, est. speed input: 262.65 toks/s, output: 77.41 toks/s][A
Processed prompts:  50%|█████     | 2/4 [00:01<00:01,  1.45it/s, est. speed input: 358.62 toks/s, output: 136.65 toks/s][A
Processed prompts: 100%|██████████| 4/4 [00:45<00:00, 11.41s/it, est. speed input: 28.91 toks/s, output: 183.92 toks/s] [A
Generating Responses:  94%|█████████▍| 47/50 [20:27<02:02, 40.80s/it]

Processing batch of 4 queries...
Querying DB for 4 queries...



Processed prompts:   0%|          | 0/4 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s][A
Processed prompts:  25%|██▌       | 1/4 [00:03<00:10,  3.52s/it, est. speed input: 122.42 toks/s, output: 84.08 toks/s][A
Processed prompts:  50%|█████     | 2/4 [00:03<00:03,  1.60s/it, est. speed input: 193.66 toks/s, output: 162.89 toks/s][A
Processed prompts: 100%|██████████| 4/4 [00:45<00:00, 11.43s/it, est. speed input: 32.90 toks/s, output: 192.64 toks/s] [A
Generating Responses:  96%|█████████▌| 48/50 [21:13<01:24, 42.29s/it]

Processing batch of 4 queries...
Querying DB for 4 queries...



Processed prompts:   0%|          | 0/4 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s][A
Processed prompts:  25%|██▌       | 1/4 [00:01<00:04,  1.34s/it, est. speed input: 192.29 toks/s, output: 74.07 toks/s][A
Processed prompts:  50%|█████     | 2/4 [00:01<00:01,  1.41it/s, est. speed input: 300.48 toks/s, output: 138.11 toks/s][A
Processed prompts:  75%|███████▌  | 3/4 [00:01<00:00,  1.78it/s, est. speed input: 836.76 toks/s, output: 190.74 toks/s][A
Processed prompts: 100%|██████████| 4/4 [00:02<00:00,  1.78it/s, est. speed input: 905.34 toks/s, output: 250.56 toks/s][A
Generating Responses:  98%|█████████▊| 49/50 [21:15<00:30, 30.30s/it]

Processing batch of 4 queries...
Querying DB for 4 queries...



Processed prompts:   0%|          | 0/4 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s][A
Processed prompts:  25%|██▌       | 1/4 [00:00<00:02,  1.33it/s, est. speed input: 342.74 toks/s, output: 74.39 toks/s][A
Processed prompts:  50%|█████     | 2/4 [00:01<00:00,  2.07it/s, est. speed input: 472.13 toks/s, output: 131.62 toks/s][A
Processed prompts:  75%|███████▌  | 3/4 [00:02<00:01,  1.02s/it, est. speed input: 289.49 toks/s, output: 137.52 toks/s][A
Processed prompts: 100%|██████████| 4/4 [00:03<00:00,  1.30it/s, est. speed input: 380.68 toks/s, output: 207.91 toks/s][A
Generating Responses: 100%|██████████| 50/50 [21:18<00:00, 25.58s/it]


Example Generated Text (raw):
 [Option Letter]

The answer is D. It increases diffusion of H+ from the intermembrane space to the matrix.

Wait, but I'm a bit confused. Let me think again. Mitochondria are known to produce ATP through the electron transport chain. The process involves a series of proton pumping steps. The mitochondrial matrix is where the electrons are generated, and the intermembrane space is where protons are pumped from the matrix to the matrix (or from the matrix to the intermembrane space? Wait, no, the protons are pumped from the matrix to the intermembrane space, which is then used to generate ATP.

So, if the mitochondrial matrix is in a buffered solution with a low pH, what happens? Low pH means lower H+ concentration. The mitochondrial matrix is rich in protons, so if the pH is low, the concentration of H+ is lower. This would mean that the protons in the intermembrane space are being pumped into the matrix, which is the mitochondrial matrix. Wait, no, the i




In [39]:
print("\n--- Extracting Predictions ---")
predictions = [extract_choice_mcq(text) for text in tqdm(all_outputs_text, desc="Extracting choices")]
num_invalid_responces = predictions.count(None)
print(f"\n------------------------------\nNumber of invalid responces: {num_invalid_responces}")

if len(predictions) > 0:
    print("\nExample Extracted Prediction:")
    print(predictions[0])


--- Extracting Predictions ---


Extracting choices: 100%|██████████| 200/200 [00:00<00:00, 8301.11it/s]


The patient has irregular menses, which is a common condition known as polycystic o...egular menses are a result of the irregular ovulation. The patient's irregular ovulation is a result'

The patient's symptoms and findings are as follows:

1. The patient is a 14-year-ol...r syndrome, hyperprolactinemia, or hyperestrogenism.

83. Benign hyperplasia is a result of hormonal'

The patient has a 31-year-old woman with a history of dyspnea and wheezing, which i...ion. The feral heart rate is 144/min, which is consistent with mitral regurgitation. The feral heart'

The patient is a 17-year-old girl with a stung bee. She's in the emergency departme...hin normal range, so the primary concern is the underlying cause of the wheezing. The correct answer'

The user is asking about the next step in managing this patient. The patient is 18,...e heart sounds are not audible, which is a key finding, so the next step is to perform a chest X-ray'

The patient has a 31-year-old woman with a history of 




In [40]:
print("\n--- Calculating Metrics ---")
correct_count = 0
total_count = len(predictions)
results_by_subject = {}

if total_count != len(ground_truths):
     print(f"Warning: Mismatch between number of predictions ({total_count}) and ground truths ({len(ground_truths)}). This should not happen.")
     total_count = min(total_count, len(ground_truths))

for i in range(total_count):
    original_data_index = original_indices[i] if 'original_indices' in locals() else i
    data_item = ds_mmlu[original_data_index]
    subject = data_item.get('subject', 'Unknown')

    pred = predictions[i]
    truth = ground_truths[i]
    is_correct = (pred == truth)

    if subject not in results_by_subject:
        results_by_subject[subject] = {'correct': 0, 'total': 0}

    if is_correct:
        correct_count += 1
        results_by_subject[subject]['correct'] += 1
    results_by_subject[subject]['total'] += 1

overall_accuracy = (correct_count / total_count) * 100 if total_count > 0 else 0


--- Calculating Metrics ---


In [41]:
print("\n--- Evaluation Results ---")
print(f"Model Evaluated: {MODEL_NAME}")
print(f"Dataset Used: {DATASET_MMLU}")
print(f"Number of Questions Evaluated: {total_count}")
print(f"Number of Correct Answers: {correct_count}")
print(f"Overall Accuracy: {overall_accuracy:.2f}%")

print("\nAccuracy by Subject:")
sorted_subjects = sorted(results_by_subject.keys())
for subject in sorted_subjects:
    counts = results_by_subject[subject]
    sub_acc = (counts['correct'] / counts['total']) * 100 if counts['total'] > 0 else 0
    print(f"- {subject}: {sub_acc:.2f}% ({counts['correct']}/{counts['total']})")


--- Evaluation Results ---
Model Evaluated: RAG
Dataset Used: cais/mmlu
Number of Questions Evaluated: 200
Number of Correct Answers: 61
Overall Accuracy: 30.50%

Accuracy by Subject:
- anatomy: 22.58% (7/31)
- clinical_knowledge: 24.14% (7/29)
- college_biology: 25.93% (7/27)
- college_medicine: 40.00% (12/30)
- medical_genetics: 39.29% (11/28)
- professional_medicine: 30.91% (17/55)


### 4. PubMedQA

#### Dataset loading and preparing

In [42]:
SEED = 4242
BATCH_SIZE = 4
NUM_SAMPLES = 200
DATASET_PUBMEDQA = "qiaojin/PubMedQA"
SUBSET_PUBMEDQA = "pqa_labeled"
SPLIT_PUBMEDQA = "train"

In [43]:
ds_pubmedqa = load_dataset(DATASET_PUBMEDQA, SUBSET_PUBMEDQA, split=SPLIT_PUBMEDQA)
ds_pubmedqa = ds_pubmedqa.shuffle(seed=SEED).select(range(NUM_SAMPLES))
ds_pubmedqa

README.md:   0%|          | 0.00/5.19k [00:00<?, ?B/s]

train-00000-of-00001.parquet:   0%|          | 0.00/1.08M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/1000 [00:00<?, ? examples/s]

Dataset({
    features: ['pubid', 'question', 'context', 'long_answer', 'final_decision'],
    num_rows: 200
})

In [44]:
ds_pubmedqa[0]

{'pubid': 22504515,
 'question': 'Endovenous laser ablation in the treatment of small saphenous varicose veins: does site of access influence early outcomes?',
 'context': {'contexts': ['The study was performed to evaluate the clinical and technical efficacy of endovenous laser ablation (EVLA) of small saphenous varicosities, particularly in relation to the site of endovenous access.',
   'Totally 59 patients with unilateral saphenopopliteal junction incompetence and small saphenous vein reflux underwent EVLA (810 nm, 14 W diode laser) with ambulatory phlebectomies. Small saphenous vein access was gained at the lowest site of truncal reflux. Patients were divided into 2 groups: access gained above mid-calf (AMC, n = 33) and below mid-calf (BMC, n = 26) levels. Outcomes included Venous Clinical Severity Scores (VCSS), Aberdeen Varicose Vein Questionnaire (AVVQ), patient satisfaction, complications, and recurrence rates.',
   'Both groups demonstrated significant improvement in VCSS, AVV

#### Helper functions definition

In [45]:
def format_prompt_pubmedqa(example):
    """Formats a single example into a prompt for the LLM."""
    question = example['question']
    if not isinstance(example.get('context'), dict) or 'contexts' not in example['context']:
        print(f"Warning: Skipping example due to missing or invalid context field.")
        return None

    context_passages = example['context']['contexts']
    full_context = "\n\n".join(context_passages)

    prompt = f"""
You are an expert in analyzing scientific texts and answering questions based on provided context and explaining your reasoning clearly.
Your task is to determine the answer to the question ('yes', 'no', or 'maybe') based only on the information given in the context. Follow these steps:
1. Analyze the provided context in relation to the question. Summarize the key evidence (or lack thereof) relevant to answering the question. This is your reasoning.
2. Based on your reasoning from the context, determine if the answer to the question is 'yes', 'no', or 'maybe'.
3. Output your reasoning first. After the reasoning, start a new line and provide the final decision in the specific format: Answer: [yes/no/maybe]

Context:
{full_context}

Question: {question}

Reasoning:
    """
    return prompt

In [46]:
def get_ground_truth_pubmedqa(example):
    """Extracts the ground truth ('yes', 'no', 'maybe') from the example."""
    decision = example.get('final_decision')
    if decision not in ['yes', 'no', 'maybe']:
        print(f"Warning: Invalid 'final_decision' value found: {decision}. Skipping ground truth.")
        return None
    return decision

In [47]:
def extract_yes_no_maybe(generated_text):
    """Extracts the predicted choice (yes, no, maybe) from the LLM's output."""
    text = generated_text.strip().lower()

    # Explicit "Answer: yes/no/maybe" potentially followed by punctuation/eos
    match = re.search(r'(?:answer|decision)\s*[:\-]?\s*(yes|no|maybe)\b', text)
    if match:
        return match.group(1)

    # Look for the first occurrence of "yes", "no", or "maybe" as a whole word
    match = re.search(r'\b(yes|no|maybe)\b', text)
    if match:
        return match.group(1)

    # Fallback - If no clear choice found, return None
    print(f"Warning: Could not extract answer from text: '{text[:100]}...{text[-100:]}'")
    return None

#### Evaluation

In [48]:
print("\n--- Preparing Prompts and Ground Truths ---")
prompts = []
ground_truths_raw = []
original_indices_map = []

for i, ex in enumerate(tqdm(ds_pubmedqa, desc="Formatting prompts")):
    prompt = format_prompt_pubmedqa(ex)
    if prompt:
        prompts.append(prompt)
        ground_truths_raw.append(get_ground_truth_pubmedqa(ex))
        original_indices_map.append(i)

valid_indices = [i for i, gt in enumerate(ground_truths_raw) if gt is not None]

if len(valid_indices) < len(prompts):
     invalid_gt_count = len(prompts) - len(valid_indices)
     print(f"Warning: {invalid_gt_count} examples had invalid ground truths and were excluded.")
     prompts = [prompts[i] for i in valid_indices]
     ground_truths = [ground_truths_raw[i] for i in valid_indices]
     original_indices = [original_indices_map[i] for i in valid_indices]
else:
    ground_truths = ground_truths_raw
    original_indices = original_indices_map

if len(prompts) > 0:
    print("\nExample Prompt:")
    print(prompts[0])
    print(f"Corresponding Ground Truth: {ground_truths[0]}")
else:
    print("No valid prompts to evaluate.")
    exit()


--- Preparing Prompts and Ground Truths ---


Formatting prompts: 100%|██████████| 200/200 [00:00<00:00, 6346.16it/s]


Example Prompt:

You are an expert in analyzing scientific texts and answering questions based on provided context and explaining your reasoning clearly.
Your task is to determine the answer to the question ('yes', 'no', or 'maybe') based only on the information given in the context. Follow these steps:
1. Analyze the provided context in relation to the question. Summarize the key evidence (or lack thereof) relevant to answering the question. This is your reasoning.
2. Based on your reasoning from the context, determine if the answer to the question is 'yes', 'no', or 'maybe'.
3. Output your reasoning first. After the reasoning, start a new line and provide the final decision in the specific format: Answer: [yes/no/maybe]

Context:
The study was performed to evaluate the clinical and technical efficacy of endovenous laser ablation (EVLA) of small saphenous varicosities, particularly in relation to the site of endovenous access.

Totally 59 patients with unilateral saphenopopliteal jun




In [49]:
print("\n--- Running Inference ---")
all_outputs_text = []
num_batches = math.ceil(len(prompts) / BATCH_SIZE)

for i in tqdm(range(num_batches), desc="Generating Responses"):
    start_idx = i * BATCH_SIZE
    end_idx = min((i + 1) * BATCH_SIZE, len(prompts))
    batch_prompts = prompts[start_idx:end_idx]
    batch_outputs_text = rag.get_response(batch_prompts, top_k=2)
    all_outputs_text.extend(batch_outputs_text)

if len(all_outputs_text) > 0:
    print("\nExample Generated Text (raw):")
    print(all_outputs_text[0])


--- Running Inference ---


Generating Responses:   0%|          | 0/50 [00:00<?, ?it/s]

Processing batch of 4 queries...
Querying DB for 4 queries...



Processed prompts:   0%|          | 0/4 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s][A
Processed prompts:  25%|██▌       | 1/4 [00:01<00:04,  1.51s/it, est. speed input: 864.91 toks/s, output: 60.83 toks/s][A
Processed prompts:  50%|█████     | 2/4 [00:01<00:01,  1.13it/s, est. speed input: 1224.60 toks/s, output: 114.07 toks/s][A
Processed prompts: 100%|██████████| 4/4 [00:04<00:00,  1.24s/it, est. speed input: 776.66 toks/s, output: 155.61 toks/s] [A
Generating Responses:   2%|▏         | 1/50 [00:05<04:08,  5.07s/it]

Processing batch of 4 queries...
Querying DB for 4 queries...



Processed prompts:   0%|          | 0/4 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s][A
Processed prompts:  25%|██▌       | 1/4 [00:01<00:03,  1.24s/it, est. speed input: 727.56 toks/s, output: 57.20 toks/s][A
Processed prompts:  50%|█████     | 2/4 [00:01<00:01,  1.32it/s, est. speed input: 1203.20 toks/s, output: 107.95 toks/s][A
Processed prompts:  75%|███████▌  | 3/4 [00:01<00:00,  1.97it/s, est. speed input: 1816.75 toks/s, output: 164.28 toks/s][A
Processed prompts: 100%|██████████| 4/4 [00:05<00:00,  1.47s/it, est. speed input: 684.51 toks/s, output: 137.89 toks/s] [A
Generating Responses:   4%|▍         | 2/50 [00:11<04:29,  5.61s/it]

Processing batch of 4 queries...
Querying DB for 4 queries...



Processed prompts:   0%|          | 0/4 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s][A
Processed prompts:  25%|██▌       | 1/4 [00:01<00:04,  1.53s/it, est. speed input: 462.28 toks/s, output: 69.50 toks/s][A
Processed prompts:  75%|███████▌  | 3/4 [00:02<00:00,  1.54it/s, est. speed input: 993.70 toks/s, output: 174.19 toks/s][A
Processed prompts: 100%|██████████| 4/4 [00:44<00:00, 11.07s/it, est. speed input: 67.87 toks/s, output: 101.22 toks/s] [A
Generating Responses:   6%|▌         | 3/50 [00:55<18:15, 23.31s/it]

Processing batch of 4 queries...
Querying DB for 4 queries...



Processed prompts:   0%|          | 0/4 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s][A
Processed prompts:  25%|██▌       | 1/4 [00:01<00:04,  1.49s/it, est. speed input: 673.29 toks/s, output: 63.29 toks/s][A
Processed prompts:  50%|█████     | 2/4 [00:01<00:01,  1.49it/s, est. speed input: 1301.09 toks/s, output: 124.00 toks/s][A
Processed prompts:  75%|███████▌  | 3/4 [00:01<00:00,  2.29it/s, est. speed input: 1753.23 toks/s, output: 180.54 toks/s][A
Processed prompts: 100%|██████████| 4/4 [00:44<00:00, 11.12s/it, est. speed input: 87.83 toks/s, output: 99.17 toks/s]   [A
Generating Responses:   8%|▊         | 4/50 [01:40<24:18, 31.71s/it]

Processing batch of 4 queries...
Querying DB for 4 queries...



Processed prompts:   0%|          | 0/4 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s][A
Processed prompts: 100%|██████████| 4/4 [00:49<00:00, 12.41s/it, est. speed input: 70.74 toks/s, output: 330.02 toks/s][A
Generating Responses:  10%|█         | 5/50 [02:29<28:39, 38.21s/it]

Processing batch of 4 queries...
Querying DB for 4 queries...



Processed prompts:   0%|          | 0/4 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s][A
Processed prompts:  25%|██▌       | 1/4 [00:01<00:03,  1.29s/it, est. speed input: 962.86 toks/s, output: 58.97 toks/s][A
Processed prompts:  50%|█████     | 2/4 [00:02<00:02,  1.46s/it, est. speed input: 755.30 toks/s, output: 102.31 toks/s][A
Processed prompts: 100%|██████████| 4/4 [00:46<00:00, 11.72s/it, est. speed input: 83.72 toks/s, output: 181.03 toks/s] [A
Generating Responses:  12%|█▏        | 6/50 [03:16<30:12, 41.19s/it]

Processing batch of 4 queries...
Querying DB for 4 queries...



Processed prompts:   0%|          | 0/4 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s][A
Processed prompts:  25%|██▌       | 1/4 [00:00<00:02,  1.09it/s, est. speed input: 897.19 toks/s, output: 53.48 toks/s][A
Processed prompts:  50%|█████     | 2/4 [00:01<00:01,  1.73it/s, est. speed input: 1181.31 toks/s, output: 101.89 toks/s][A
Processed prompts:  75%|███████▌  | 3/4 [00:01<00:00,  2.70it/s, est. speed input: 1705.20 toks/s, output: 157.32 toks/s][A
Processed prompts: 100%|██████████| 4/4 [00:44<00:00, 11.09s/it, est. speed input: 77.35 toks/s, output: 97.20 toks/s]   [A
Generating Responses:  14%|█▍        | 7/50 [04:01<30:17, 42.26s/it]

Processing batch of 4 queries...
Querying DB for 4 queries...



Processed prompts:   0%|          | 0/4 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s][A
Processed prompts:  25%|██▌       | 1/4 [00:00<00:02,  1.28it/s, est. speed input: 1333.26 toks/s, output: 37.25 toks/s][A
Processed prompts:  50%|█████     | 2/4 [00:01<00:01,  1.15it/s, est. speed input: 1257.46 toks/s, output: 82.78 toks/s][A
Processed prompts:  75%|███████▌  | 3/4 [00:02<00:00,  1.43it/s, est. speed input: 1569.47 toks/s, output: 134.60 toks/s][A
Processed prompts: 100%|██████████| 4/4 [00:44<00:00, 11.09s/it, est. speed input: 95.08 toks/s, output: 99.05 toks/s]   [A
Generating Responses:  16%|█▌        | 8/50 [04:45<30:04, 42.96s/it]

Processing batch of 4 queries...
Querying DB for 4 queries...



Processed prompts:   0%|          | 0/4 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s][A
Processed prompts:  25%|██▌       | 1/4 [00:01<00:04,  1.43s/it, est. speed input: 682.18 toks/s, output: 63.54 toks/s][A
Processed prompts:  50%|█████     | 2/4 [00:01<00:01,  1.33it/s, est. speed input: 1297.56 toks/s, output: 120.20 toks/s][A
Processed prompts:  75%|███████▌  | 3/4 [00:01<00:00,  1.87it/s, est. speed input: 1559.97 toks/s, output: 173.83 toks/s][A
Processed prompts: 100%|██████████| 4/4 [00:44<00:00, 11.10s/it, est. speed input: 85.06 toks/s, output: 100.01 toks/s]  [A
Generating Responses:  18%|█▊        | 9/50 [05:30<29:41, 43.44s/it]

Processing batch of 4 queries...
Querying DB for 4 queries...



Processed prompts:   0%|          | 0/4 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s][A
Processed prompts:  25%|██▌       | 1/4 [00:02<00:07,  2.40s/it, est. speed input: 590.20 toks/s, output: 67.57 toks/s][A
Processed prompts:  50%|█████     | 2/4 [00:14<00:15,  7.88s/it, est. speed input: 233.39 toks/s, output: 96.09 toks/s][A
Processed prompts: 100%|██████████| 4/4 [00:46<00:00, 11.69s/it, est. speed input: 109.20 toks/s, output: 204.29 toks/s][A
Generating Responses:  20%|██        | 10/50 [06:17<29:39, 44.49s/it]

Processing batch of 4 queries...
Querying DB for 4 queries...



Processed prompts:   0%|          | 0/4 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s][A
Processed prompts:  25%|██▌       | 1/4 [00:01<00:05,  1.86s/it, est. speed input: 316.18 toks/s, output: 61.94 toks/s][A
Processed prompts:  50%|█████     | 2/4 [00:02<00:01,  1.02it/s, est. speed input: 1033.47 toks/s, output: 117.48 toks/s][A
Processed prompts: 100%|██████████| 4/4 [00:44<00:00, 11.21s/it, est. speed input: 110.97 toks/s, output: 100.47 toks/s] [A
Generating Responses:  22%|██▏       | 11/50 [07:01<29:00, 44.63s/it]

Processing batch of 4 queries...
Querying DB for 4 queries...



Processed prompts:   0%|          | 0/4 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s][A
Processed prompts:  25%|██▌       | 1/4 [00:01<00:04,  1.36s/it, est. speed input: 1387.59 toks/s, output: 50.03 toks/s][A
Processed prompts:  50%|█████     | 2/4 [00:02<00:01,  1.02it/s, est. speed input: 1292.49 toks/s, output: 95.65 toks/s][A
Processed prompts:  75%|███████▌  | 3/4 [00:02<00:00,  1.26it/s, est. speed input: 1593.17 toks/s, output: 144.90 toks/s][A
Processed prompts: 100%|██████████| 4/4 [00:03<00:00,  1.06it/s, est. speed input: 1417.62 toks/s, output: 177.80 toks/s][A
Generating Responses:  24%|██▍       | 12/50 [07:05<20:24, 32.23s/it]

Processing batch of 4 queries...
Querying DB for 4 queries...



Processed prompts:   0%|          | 0/4 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s][A
Processed prompts:  25%|██▌       | 1/4 [00:01<00:05,  1.74s/it, est. speed input: 418.03 toks/s, output: 75.22 toks/s][A
Processed prompts:  75%|███████▌  | 3/4 [00:02<00:00,  1.78it/s, est. speed input: 957.05 toks/s, output: 206.30 toks/s][A
Processed prompts: 100%|██████████| 4/4 [00:44<00:00, 11.04s/it, est. speed input: 55.90 toks/s, output: 102.32 toks/s] [A
Generating Responses:  26%|██▌       | 13/50 [07:50<22:07, 35.87s/it]

Processing batch of 4 queries...
Querying DB for 4 queries...



Processed prompts:   0%|          | 0/4 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s][A
Processed prompts:  25%|██▌       | 1/4 [00:01<00:05,  1.83s/it, est. speed input: 565.28 toks/s, output: 68.20 toks/s][A
Processed prompts:  50%|█████     | 2/4 [00:02<00:02,  1.05s/it, est. speed input: 844.34 toks/s, output: 126.69 toks/s][A
Processed prompts:  75%|███████▌  | 3/4 [00:06<00:02,  2.62s/it, est. speed input: 426.22 toks/s, output: 128.17 toks/s][A
Processed prompts: 100%|██████████| 4/4 [00:44<00:00, 11.19s/it, est. speed input: 85.71 toks/s, output: 111.05 toks/s] [A
Generating Responses:  28%|██▊       | 14/50 [08:34<23:08, 38.58s/it]

Processing batch of 4 queries...
Querying DB for 4 queries...



Processed prompts:   0%|          | 0/4 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s][A
Processed prompts:  25%|██▌       | 1/4 [00:01<00:04,  1.65s/it, est. speed input: 543.29 toks/s, output: 63.52 toks/s][A
Processed prompts:  75%|███████▌  | 3/4 [00:03<00:01,  1.25s/it, est. speed input: 721.89 toks/s, output: 134.51 toks/s][A
Processed prompts: 100%|██████████| 4/4 [00:44<00:00, 11.13s/it, est. speed input: 94.37 toks/s, output: 103.74 toks/s] [A
Generating Responses:  30%|███       | 15/50 [09:19<23:33, 40.39s/it]

Processing batch of 4 queries...
Querying DB for 4 queries...



Processed prompts:   0%|          | 0/4 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s][A
Processed prompts:  25%|██▌       | 1/4 [00:01<00:03,  1.18s/it, est. speed input: 1022.74 toks/s, output: 58.61 toks/s][A
Processed prompts:  50%|█████     | 2/4 [00:01<00:01,  1.39it/s, est. speed input: 1289.03 toks/s, output: 110.38 toks/s][A
Processed prompts:  75%|███████▌  | 3/4 [00:02<00:00,  1.26it/s, est. speed input: 1046.95 toks/s, output: 145.74 toks/s][A
Processed prompts: 100%|██████████| 4/4 [00:05<00:00,  1.38s/it, est. speed input: 652.80 toks/s, output: 151.08 toks/s] [A
Generating Responses:  32%|███▏      | 16/50 [09:25<16:57, 29.92s/it]

Processing batch of 4 queries...
Querying DB for 4 queries...



Processed prompts:   0%|          | 0/4 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s][A
Processed prompts:  25%|██▌       | 1/4 [00:01<00:03,  1.30s/it, est. speed input: 595.05 toks/s, output: 67.83 toks/s][A
Processed prompts:  50%|█████     | 2/4 [00:01<00:01,  1.45it/s, est. speed input: 1046.94 toks/s, output: 127.02 toks/s][A
Processed prompts:  75%|███████▌  | 3/4 [00:03<00:01,  1.11s/it, est. speed input: 699.17 toks/s, output: 144.13 toks/s] [A
Processed prompts: 100%|██████████| 4/4 [00:44<00:00, 11.09s/it, est. speed input: 64.75 toks/s, output: 102.66 toks/s] [A
Generating Responses:  34%|███▍      | 17/50 [10:09<18:51, 34.28s/it]

Processing batch of 4 queries...
Querying DB for 4 queries...



Processed prompts:   0%|          | 0/4 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s][A
Processed prompts:  25%|██▌       | 1/4 [00:01<00:04,  1.62s/it, est. speed input: 770.65 toks/s, output: 59.19 toks/s][A
Processed prompts:  50%|█████     | 2/4 [00:01<00:01,  1.22it/s, est. speed input: 1160.93 toks/s, output: 113.48 toks/s][A
Processed prompts:  75%|███████▌  | 3/4 [00:03<00:01,  1.14s/it, est. speed input: 872.97 toks/s, output: 137.51 toks/s] [A
Processed prompts: 100%|██████████| 4/4 [00:44<00:00, 11.17s/it, est. speed input: 108.23 toks/s, output: 102.12 toks/s][A
Generating Responses:  36%|███▌      | 18/50 [10:54<19:58, 37.44s/it]

Processing batch of 4 queries...
Querying DB for 4 queries...



Processed prompts:   0%|          | 0/4 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s][A
Processed prompts:  25%|██▌       | 1/4 [00:01<00:04,  1.39s/it, est. speed input: 1111.29 toks/s, output: 64.06 toks/s][A
Processed prompts:  75%|███████▌  | 3/4 [00:01<00:00,  2.39it/s, est. speed input: 1969.57 toks/s, output: 183.51 toks/s][A
Processed prompts: 100%|██████████| 4/4 [00:01<00:00,  2.05it/s, est. speed input: 1850.97 toks/s, output: 217.21 toks/s][A
Generating Responses:  38%|███▊      | 19/50 [10:56<13:51, 26.81s/it]

Processing batch of 4 queries...
Querying DB for 4 queries...



Processed prompts:   0%|          | 0/4 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s][A
Processed prompts:  25%|██▌       | 1/4 [00:02<00:08,  2.84s/it, est. speed input: 223.03 toks/s, output: 73.88 toks/s][A
Processed prompts:  50%|█████     | 2/4 [00:03<00:02,  1.40s/it, est. speed input: 639.24 toks/s, output: 140.85 toks/s][A
Processed prompts:  75%|███████▌  | 3/4 [00:03<00:00,  1.09it/s, est. speed input: 979.13 toks/s, output: 204.00 toks/s][A
Processed prompts: 100%|██████████| 4/4 [00:04<00:00,  1.02s/it, est. speed input: 1010.45 toks/s, output: 257.33 toks/s][A
Generating Responses:  40%|████      | 20/50 [11:00<10:00, 20.01s/it]

Processing batch of 4 queries...
Querying DB for 4 queries...



Processed prompts:   0%|          | 0/4 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s][A
Processed prompts:  25%|██▌       | 1/4 [00:01<00:03,  1.11s/it, est. speed input: 1248.86 toks/s, output: 52.93 toks/s][A
Processed prompts:  50%|█████     | 2/4 [00:01<00:01,  1.46it/s, est. speed input: 1548.68 toks/s, output: 102.13 toks/s][A
Processed prompts:  75%|███████▌  | 3/4 [00:01<00:00,  1.75it/s, est. speed input: 1590.02 toks/s, output: 147.08 toks/s][A
Processed prompts: 100%|██████████| 4/4 [00:44<00:00, 11.12s/it, est. speed input: 93.81 toks/s, output: 98.49 toks/s]   [A
Generating Responses:  42%|████▏     | 21/50 [11:45<13:14, 27.39s/it]

Processing batch of 4 queries...
Querying DB for 4 queries...



Processed prompts:   0%|          | 0/4 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s][A
Processed prompts:  25%|██▌       | 1/4 [00:01<00:03,  1.31s/it, est. speed input: 710.90 toks/s, output: 63.24 toks/s][A
Processed prompts:  50%|█████     | 2/4 [00:01<00:01,  1.44it/s, est. speed input: 1090.17 toks/s, output: 120.63 toks/s][A
Processed prompts: 100%|██████████| 4/4 [00:44<00:00, 11.08s/it, est. speed input: 77.04 toks/s, output: 99.15 toks/s]   [A
Generating Responses:  44%|████▍     | 22/50 [12:29<15:09, 32.50s/it]

Processing batch of 4 queries...
Querying DB for 4 queries...



Processed prompts:   0%|          | 0/4 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s][A
Processed prompts:  25%|██▌       | 1/4 [00:01<00:03,  1.19s/it, est. speed input: 573.94 toks/s, output: 57.31 toks/s][A
Processed prompts:  50%|█████     | 2/4 [00:01<00:01,  1.48it/s, est. speed input: 1140.93 toks/s, output: 109.64 toks/s][A
Processed prompts:  75%|███████▌  | 3/4 [00:02<00:00,  1.58it/s, est. speed input: 1500.88 toks/s, output: 150.37 toks/s][A
Processed prompts: 100%|██████████| 4/4 [00:44<00:00, 11.07s/it, est. speed input: 84.71 toks/s, output: 99.62 toks/s]   [A
Generating Responses:  46%|████▌     | 23/50 [13:13<16:13, 36.06s/it]

Processing batch of 4 queries...
Querying DB for 4 queries...



Processed prompts:   0%|          | 0/4 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s][A
Processed prompts:  25%|██▌       | 1/4 [00:01<00:03,  1.06s/it, est. speed input: 821.42 toks/s, output: 43.58 toks/s][A
Processed prompts:  50%|█████     | 2/4 [00:01<00:01,  1.42it/s, est. speed input: 1546.30 toks/s, output: 88.28 toks/s][A
Processed prompts:  75%|███████▌  | 3/4 [00:02<00:00,  1.39it/s, est. speed input: 1566.85 toks/s, output: 127.50 toks/s][A
Processed prompts: 100%|██████████| 4/4 [00:03<00:00,  1.27it/s, est. speed input: 1546.87 toks/s, output: 166.22 toks/s][A
Generating Responses:  48%|████▊     | 24/50 [13:17<11:21, 26.21s/it]

Processing batch of 4 queries...
Querying DB for 4 queries...



Processed prompts:   0%|          | 0/4 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s][A
Processed prompts:  25%|██▌       | 1/4 [00:01<00:04,  1.55s/it, est. speed input: 736.41 toks/s, output: 64.06 toks/s][A
Processed prompts:  50%|█████     | 2/4 [00:01<00:01,  1.36it/s, est. speed input: 1016.72 toks/s, output: 124.61 toks/s][A
Processed prompts: 100%|██████████| 4/4 [00:46<00:00, 11.51s/it, est. speed input: 71.34 toks/s, output: 182.66 toks/s]  [A
Generating Responses:  50%|█████     | 25/50 [14:03<13:24, 32.18s/it]

Processing batch of 4 queries...
Querying DB for 4 queries...



Processed prompts:   0%|          | 0/4 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s][A
Processed prompts:  25%|██▌       | 1/4 [00:01<00:04,  1.37s/it, est. speed input: 460.62 toks/s, output: 65.59 toks/s][A
Processed prompts:  50%|█████     | 2/4 [00:01<00:01,  1.45it/s, est. speed input: 871.00 toks/s, output: 125.33 toks/s][A
Processed prompts:  75%|███████▌  | 3/4 [00:03<00:01,  1.45s/it, est. speed input: 657.56 toks/s, output: 132.32 toks/s][A
Processed prompts: 100%|██████████| 4/4 [00:04<00:00,  1.08s/it, est. speed input: 765.63 toks/s, output: 203.87 toks/s][A
Generating Responses:  52%|█████▏    | 26/50 [14:07<09:32, 23.85s/it]

Processing batch of 4 queries...
Querying DB for 4 queries...



Processed prompts:   0%|          | 0/4 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s][A
Processed prompts:  25%|██▌       | 1/4 [00:01<00:03,  1.29s/it, est. speed input: 750.19 toks/s, output: 58.72 toks/s][A
Processed prompts:  50%|█████     | 2/4 [00:01<00:01,  1.38it/s, est. speed input: 929.84 toks/s, output: 111.46 toks/s][A
Processed prompts:  75%|███████▌  | 3/4 [00:02<00:00,  1.55it/s, est. speed input: 1423.85 toks/s, output: 153.97 toks/s][A
Processed prompts: 100%|██████████| 4/4 [00:06<00:00,  1.58s/it, est. speed input: 618.63 toks/s, output: 139.14 toks/s] [A
Generating Responses:  54%|█████▍    | 27/50 [14:14<07:08, 18.62s/it]

Processing batch of 4 queries...
Querying DB for 4 queries...



Processed prompts:   0%|          | 0/4 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s][A
Processed prompts:  25%|██▌       | 1/4 [00:01<00:04,  1.59s/it, est. speed input: 549.39 toks/s, output: 65.60 toks/s][A
Processed prompts:  50%|█████     | 2/4 [00:01<00:01,  1.28it/s, est. speed input: 1062.89 toks/s, output: 125.44 toks/s][A
Processed prompts:  75%|███████▌  | 3/4 [00:02<00:00,  1.24it/s, est. speed input: 1158.99 toks/s, output: 160.71 toks/s][A
Processed prompts: 100%|██████████| 4/4 [00:44<00:00, 11.13s/it, est. speed input: 87.36 toks/s, output: 101.58 toks/s]  [A
Generating Responses:  56%|█████▌    | 28/50 [14:58<09:41, 26.41s/it]

Processing batch of 4 queries...
Querying DB for 4 queries...



Processed prompts:   0%|          | 0/4 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s][A
Processed prompts:  25%|██▌       | 1/4 [00:01<00:03,  1.16s/it, est. speed input: 601.51 toks/s, output: 48.98 toks/s][A
Processed prompts:  50%|█████     | 2/4 [00:02<00:01,  1.03it/s, est. speed input: 681.06 toks/s, output: 93.44 toks/s][A
Processed prompts:  75%|███████▌  | 3/4 [00:02<00:00,  1.47it/s, est. speed input: 1348.95 toks/s, output: 149.55 toks/s][A
Processed prompts: 100%|██████████| 4/4 [00:03<00:00,  1.03it/s, est. speed input: 1199.26 toks/s, output: 168.50 toks/s][A
Generating Responses:  58%|█████▊    | 29/50 [15:02<06:53, 19.69s/it]

Processing batch of 4 queries...
Querying DB for 4 queries...



Processed prompts:   0%|          | 0/4 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s][A
Processed prompts:  25%|██▌       | 1/4 [00:01<00:04,  1.43s/it, est. speed input: 1032.10 toks/s, output: 60.87 toks/s][A
Processed prompts:  50%|█████     | 2/4 [00:01<00:01,  1.30it/s, est. speed input: 1466.27 toks/s, output: 115.84 toks/s][A
Processed prompts: 100%|██████████| 4/4 [00:46<00:00, 11.52s/it, est. speed input: 85.44 toks/s, output: 182.14 toks/s]  [A
Generating Responses:  60%|██████    | 30/50 [15:48<09:12, 27.63s/it]

Processing batch of 4 queries...
Querying DB for 4 queries...



Processed prompts:   0%|          | 0/4 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s][A
Processed prompts:  25%|██▌       | 1/4 [00:01<00:05,  1.77s/it, est. speed input: 435.02 toks/s, output: 65.45 toks/s][A
Processed prompts:  50%|█████     | 2/4 [00:01<00:01,  1.16it/s, est. speed input: 874.34 toks/s, output: 126.48 toks/s][A
Processed prompts: 100%|██████████| 4/4 [00:02<00:00,  1.81it/s, est. speed input: 1807.92 toks/s, output: 247.42 toks/s][A
Generating Responses:  62%|██████▏   | 31/50 [15:51<06:20, 20.04s/it]

Processing batch of 4 queries...
Querying DB for 4 queries...



Processed prompts:   0%|          | 0/4 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s][A
Processed prompts:  25%|██▌       | 1/4 [00:01<00:04,  1.54s/it, est. speed input: 881.61 toks/s, output: 61.54 toks/s][A
Processed prompts:  50%|█████     | 2/4 [00:02<00:01,  1.03it/s, est. speed input: 1017.24 toks/s, output: 114.71 toks/s][A
Processed prompts: 100%|██████████| 4/4 [00:46<00:00, 11.58s/it, est. speed input: 90.35 toks/s, output: 182.03 toks/s]  [A
Generating Responses:  64%|██████▍   | 32/50 [16:37<08:23, 27.95s/it]

Processing batch of 4 queries...
Querying DB for 4 queries...



Processed prompts:   0%|          | 0/4 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s][A
Processed prompts:  25%|██▌       | 1/4 [00:01<00:03,  1.13s/it, est. speed input: 626.05 toks/s, output: 62.07 toks/s][A
Processed prompts:  50%|█████     | 2/4 [00:01<00:01,  1.31it/s, est. speed input: 889.75 toks/s, output: 113.13 toks/s][A
Processed prompts:  75%|███████▌  | 3/4 [00:01<00:00,  2.17it/s, est. speed input: 1376.26 toks/s, output: 177.36 toks/s][A
Processed prompts: 100%|██████████| 4/4 [00:01<00:00,  2.05it/s, est. speed input: 1592.20 toks/s, output: 230.89 toks/s][A
Generating Responses:  66%|██████▌   | 33/50 [16:39<05:43, 20.18s/it]

Processing batch of 4 queries...
Querying DB for 4 queries...



Processed prompts:   0%|          | 0/4 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s][A
Processed prompts:  25%|██▌       | 1/4 [00:01<00:03,  1.26s/it, est. speed input: 478.15 toks/s, output: 63.65 toks/s][A
Processed prompts:  50%|█████     | 2/4 [00:01<00:01,  1.17it/s, est. speed input: 825.60 toks/s, output: 115.13 toks/s][A
Processed prompts:  75%|███████▌  | 3/4 [00:02<00:00,  1.12it/s, est. speed input: 766.71 toks/s, output: 154.79 toks/s][A
Processed prompts: 100%|██████████| 4/4 [00:44<00:00, 11.10s/it, est. speed input: 72.58 toks/s, output: 101.85 toks/s] [A
Generating Responses:  68%|██████▊   | 34/50 [17:24<07:19, 27.48s/it]

Processing batch of 4 queries...
Querying DB for 4 queries...



Processed prompts:   0%|          | 0/4 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s][A
Processed prompts:  25%|██▌       | 1/4 [00:01<00:04,  1.56s/it, est. speed input: 432.93 toks/s, output: 71.83 toks/s][A
Processed prompts:  50%|█████     | 2/4 [00:01<00:01,  1.37it/s, est. speed input: 626.48 toks/s, output: 139.35 toks/s][A
Processed prompts:  75%|███████▌  | 3/4 [00:01<00:00,  2.16it/s, est. speed input: 968.57 toks/s, output: 203.54 toks/s][A
Processed prompts: 100%|██████████| 4/4 [00:44<00:00, 11.09s/it, est. speed input: 60.68 toks/s, output: 100.86 toks/s] [A
Generating Responses:  70%|███████   | 35/50 [18:08<08:08, 32.57s/it]

Processing batch of 4 queries...
Querying DB for 4 queries...



Processed prompts:   0%|          | 0/4 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s][A
Processed prompts:  25%|██▌       | 1/4 [00:01<00:03,  1.01s/it, est. speed input: 503.76 toks/s, output: 49.39 toks/s][A
Processed prompts:  50%|█████     | 2/4 [00:01<00:01,  1.82it/s, est. speed input: 1427.58 toks/s, output: 96.09 toks/s][A
Processed prompts:  75%|███████▌  | 3/4 [00:01<00:00,  1.77it/s, est. speed input: 1731.65 toks/s, output: 132.44 toks/s][A
Processed prompts: 100%|██████████| 4/4 [00:44<00:00, 11.10s/it, est. speed input: 86.93 toks/s, output: 97.72 toks/s]   [A
Generating Responses:  72%|███████▏  | 36/50 [18:53<08:25, 36.14s/it]

Processing batch of 4 queries...
Querying DB for 4 queries...



Processed prompts:   0%|          | 0/4 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s][A
Processed prompts:  25%|██▌       | 1/4 [00:00<00:01,  1.84it/s, est. speed input: 937.36 toks/s, output: 34.99 toks/s][A
Processed prompts:  50%|█████     | 2/4 [00:01<00:01,  1.59it/s, est. speed input: 877.49 toks/s, output: 81.25 toks/s][A
Processed prompts: 100%|██████████| 4/4 [00:01<00:00,  2.33it/s, est. speed input: 1767.46 toks/s, output: 201.04 toks/s][A
Generating Responses:  74%|███████▍  | 37/50 [18:54<05:35, 25.84s/it]

Processing batch of 4 queries...
Querying DB for 4 queries...



Processed prompts:   0%|          | 0/4 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s][A
Processed prompts:  25%|██▌       | 1/4 [00:00<00:01,  1.57it/s, est. speed input: 1299.69 toks/s, output: 39.29 toks/s][A
Processed prompts:  50%|█████     | 2/4 [00:01<00:01,  1.07it/s, est. speed input: 747.92 toks/s, output: 85.97 toks/s] [A
Processed prompts: 100%|██████████| 4/4 [00:02<00:00,  2.00it/s, est. speed input: 1635.99 toks/s, output: 221.59 toks/s][A
Generating Responses:  76%|███████▌  | 38/50 [18:56<03:44, 18.72s/it]

Processing batch of 4 queries...
Querying DB for 4 queries...



Processed prompts:   0%|          | 0/4 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s][A
Processed prompts:  25%|██▌       | 1/4 [00:01<00:04,  1.44s/it, est. speed input: 374.65 toks/s, output: 65.22 toks/s][A
Processed prompts:  50%|█████     | 2/4 [00:02<00:02,  1.47s/it, est. speed input: 472.71 toks/s, output: 109.38 toks/s][A
Processed prompts: 100%|██████████| 4/4 [00:46<00:00, 11.56s/it, est. speed input: 74.98 toks/s, output: 184.08 toks/s] [A
Generating Responses:  78%|███████▊  | 39/50 [19:43<04:57, 27.00s/it]

Processing batch of 4 queries...
Querying DB for 4 queries...



Processed prompts:   0%|          | 0/4 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s][A
Processed prompts:  25%|██▌       | 1/4 [00:00<00:02,  1.32it/s, est. speed input: 666.84 toks/s, output: 48.76 toks/s][A
Processed prompts:  50%|█████     | 2/4 [00:01<00:01,  1.21it/s, est. speed input: 1021.04 toks/s, output: 93.54 toks/s][A
Processed prompts: 100%|██████████| 4/4 [00:45<00:00, 11.48s/it, est. speed input: 68.30 toks/s, output: 181.70 toks/s] [A
Generating Responses:  80%|████████  | 40/50 [20:29<05:27, 32.71s/it]

Processing batch of 4 queries...
Querying DB for 4 queries...



Processed prompts:   0%|          | 0/4 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s][A
Processed prompts:  25%|██▌       | 1/4 [00:02<00:06,  2.19s/it, est. speed input: 251.59 toks/s, output: 72.60 toks/s][A
Processed prompts: 100%|██████████| 4/4 [00:48<00:00, 12.09s/it, est. speed input: 73.17 toks/s, output: 257.35 toks/s][A
Generating Responses:  82%|████████▏ | 41/50 [21:17<05:36, 37.43s/it]

Processing batch of 4 queries...
Querying DB for 4 queries...



Processed prompts:   0%|          | 0/4 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s][A
Processed prompts:  25%|██▌       | 1/4 [00:01<00:05,  1.88s/it, est. speed input: 660.08 toks/s, output: 68.19 toks/s][A
Processed prompts:  50%|█████     | 2/4 [00:02<00:01,  1.13it/s, est. speed input: 929.72 toks/s, output: 131.99 toks/s][A
Processed prompts: 100%|██████████| 4/4 [00:46<00:00, 11.53s/it, est. speed input: 80.99 toks/s, output: 183.50 toks/s] [A
Generating Responses:  84%|████████▍ | 42/50 [22:03<05:20, 40.07s/it]

Processing batch of 4 queries...
Querying DB for 4 queries...



Processed prompts:   0%|          | 0/4 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s][A
Processed prompts:  25%|██▌       | 1/4 [00:01<00:03,  1.28s/it, est. speed input: 667.64 toks/s, output: 61.76 toks/s][A
Processed prompts:  75%|███████▌  | 3/4 [00:01<00:00,  1.94it/s, est. speed input: 1429.55 toks/s, output: 162.78 toks/s][A
Processed prompts: 100%|██████████| 4/4 [00:44<00:00, 11.09s/it, est. speed input: 81.82 toks/s, output: 98.87 toks/s]   [A
Generating Responses:  86%|████████▌ | 43/50 [22:48<04:49, 41.38s/it]

Processing batch of 4 queries...
Querying DB for 4 queries...



Processed prompts:   0%|          | 0/4 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s][A
Processed prompts:  25%|██▌       | 1/4 [00:01<00:03,  1.29s/it, est. speed input: 588.54 toks/s, output: 60.32 toks/s][A
Processed prompts:  50%|█████     | 2/4 [00:01<00:01,  1.25it/s, est. speed input: 1269.98 toks/s, output: 112.39 toks/s][A
Processed prompts: 100%|██████████| 4/4 [00:05<00:00,  1.27s/it, est. speed input: 733.43 toks/s, output: 146.21 toks/s] [A
Generating Responses:  88%|████████▊ | 44/50 [22:53<03:03, 30.52s/it]

Processing batch of 4 queries...
Querying DB for 4 queries...



Processed prompts:   0%|          | 0/4 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s][A
Processed prompts:  25%|██▌       | 1/4 [00:01<00:04,  1.54s/it, est. speed input: 543.97 toks/s, output: 60.87 toks/s][A
Processed prompts:  50%|█████     | 2/4 [00:01<00:01,  1.25it/s, est. speed input: 1379.80 toks/s, output: 117.03 toks/s][A
Processed prompts: 100%|██████████| 4/4 [00:02<00:00,  1.52it/s, est. speed input: 1693.22 toks/s, output: 201.78 toks/s][A
Generating Responses:  90%|█████████ | 45/50 [22:56<01:50, 22.18s/it]

Processing batch of 4 queries...
Querying DB for 4 queries...



Processed prompts:   0%|          | 0/4 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s][A
Processed prompts:  25%|██▌       | 1/4 [00:01<00:05,  1.85s/it, est. speed input: 663.15 toks/s, output: 63.88 toks/s][A
Processed prompts:  75%|███████▌  | 3/4 [00:03<00:01,  1.03s/it, est. speed input: 1036.99 toks/s, output: 147.37 toks/s][A
Processed prompts: 100%|██████████| 4/4 [00:44<00:00, 11.15s/it, est. speed input: 105.36 toks/s, output: 102.84 toks/s] [A
Generating Responses:  92%|█████████▏| 46/50 [23:40<01:55, 28.93s/it]

Processing batch of 4 queries...
Querying DB for 4 queries...



Processed prompts:   0%|          | 0/4 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s][A
Processed prompts:  25%|██▌       | 1/4 [00:01<00:05,  1.79s/it, est. speed input: 547.95 toks/s, output: 65.35 toks/s][A
Processed prompts:  50%|█████     | 2/4 [00:02<00:01,  1.02it/s, est. speed input: 1251.56 toks/s, output: 121.79 toks/s][A
Processed prompts:  75%|███████▌  | 3/4 [00:05<00:02,  2.10s/it, est. speed input: 593.55 toks/s, output: 130.14 toks/s] [A
Processed prompts: 100%|██████████| 4/4 [00:44<00:00, 11.15s/it, est. speed input: 96.12 toks/s, output: 108.30 toks/s] [A
Generating Responses:  94%|█████████▍| 47/50 [24:25<01:40, 33.66s/it]

Processing batch of 4 queries...
Querying DB for 4 queries...



Processed prompts:   0%|          | 0/4 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s][A
Processed prompts:  25%|██▌       | 1/4 [00:01<00:03,  1.28s/it, est. speed input: 414.76 toks/s, output: 66.39 toks/s][A
Processed prompts:  50%|█████     | 2/4 [00:01<00:01,  1.32it/s, est. speed input: 752.57 toks/s, output: 122.94 toks/s][A
Processed prompts:  75%|███████▌  | 3/4 [00:03<00:01,  1.14s/it, est. speed input: 645.70 toks/s, output: 142.60 toks/s][A
Processed prompts: 100%|██████████| 4/4 [00:03<00:00,  1.15it/s, est. speed input: 838.11 toks/s, output: 213.96 toks/s][A
Generating Responses:  96%|█████████▌| 48/50 [24:29<00:49, 24.63s/it]

Processing batch of 4 queries...
Querying DB for 4 queries...



Processed prompts:   0%|          | 0/4 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s][A
Processed prompts:  25%|██▌       | 1/4 [00:01<00:04,  1.35s/it, est. speed input: 812.27 toks/s, output: 59.40 toks/s][A
Processed prompts:  50%|█████     | 2/4 [00:01<00:01,  1.26it/s, est. speed input: 1251.15 toks/s, output: 111.72 toks/s][A
Processed prompts:  75%|███████▌  | 3/4 [00:02<00:00,  1.59it/s, est. speed input: 1580.00 toks/s, output: 161.06 toks/s][A
Processed prompts: 100%|██████████| 4/4 [00:03<00:00,  1.17it/s, est. speed input: 1182.10 toks/s, output: 182.44 toks/s][A
Generating Responses:  98%|█████████▊| 49/50 [24:32<00:18, 18.30s/it]

Processing batch of 4 queries...
Querying DB for 4 queries...



Processed prompts:   0%|          | 0/4 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s][A
Processed prompts:  25%|██▌       | 1/4 [00:01<00:04,  1.47s/it, est. speed input: 872.77 toks/s, output: 61.95 toks/s][A
Processed prompts:  50%|█████     | 2/4 [00:01<00:01,  1.14it/s, est. speed input: 1051.61 toks/s, output: 115.41 toks/s][A
Processed prompts:  75%|███████▌  | 3/4 [00:02<00:00,  1.55it/s, est. speed input: 1414.20 toks/s, output: 167.96 toks/s][A
Processed prompts: 100%|██████████| 4/4 [00:02<00:00,  1.48it/s, est. speed input: 1541.69 toks/s, output: 217.91 toks/s][A
Generating Responses: 100%|██████████| 50/50 [24:35<00:00, 29.51s/it]


Example Generated Text (raw):
 

Answer: 

Now, let's proceed to the step-by-step analysis.
</think>

The study evaluated the efficacy of endovenous laser ablation (EVLA) in small saphenous varicosities, focusing on the site of access. Patients were divided into two groups based on the access level: above mid-calf (AMC, n=33) and below mid-calf (BMC, n=26). Both groups showed significant improvement in clinical and quality of life metrics, with no differences in complications or recurrence rates between the groups. This suggests that the site of access may influence early outcomes.

Answer: Maybe





In [50]:
print("\n--- Extracting Predictions ---")
predictions = [extract_yes_no_maybe(text) for text in tqdm(all_outputs_text, desc="Extracting choices")]
num_invalid_responсes = predictions.count(None)
print(f"\n------------------------------\nNumber of invalid responces: {num_invalid_responсes}")

if len(predictions) > 0:
    print("\nExample Extracted Prediction:")
    print(predictions[0])


--- Extracting Predictions ---


Extracting choices: 100%|██████████| 200/200 [00:00<00:00, 12662.81it/s]


---

pharmacology_katzung. davis mp, walsh d: methadone for r...id poisoning. [14] drug screening is much more useful in screening for occult opioid use in settings'

------------------------------
Number of invalid responces: 1

Example Extracted Prediction:
maybe





In [51]:
print("\n--- Calculating Metrics ---")
correct_count = 0
total_count = len(predictions)
results_by_subject = {}

if total_count != len(ground_truths):
     print(f"Warning: Mismatch between number of predictions ({total_count}) and ground truths ({len(ground_truths)}). This should not happen.")
     total_count = min(total_count, len(ground_truths))

for i in range(total_count):
    original_data_index = original_indices[i] if 'original_indices' in locals() else i
    data_item = ds_pubmedqa[original_data_index]
    subject = data_item.get('subject_name', 'Unknown')

    pred = predictions[i]
    truth = ground_truths[i]
    is_correct = (pred == truth)

    if subject not in results_by_subject:
        results_by_subject[subject] = {'correct': 0, 'total': 0}

    if is_correct:
        correct_count += 1
        results_by_subject[subject]['correct'] += 1
    results_by_subject[subject]['total'] += 1

overall_accuracy = (correct_count / total_count) * 100 if total_count > 0 else 0


--- Calculating Metrics ---


In [52]:
print("\n--- Evaluation Results ---")
print(f"Model Evaluated: {MODEL_NAME}")
print(f"Dataset Used: {DATASET_PUBMEDQA}")
print(f"Number of Questions Evaluated: {total_count}")
print(f"Number of Correct Answers: {correct_count}")
print(f"Overall Accuracy: {overall_accuracy:.2f}%")


--- Evaluation Results ---
Model Evaluated: RAG
Dataset Used: qiaojin/PubMedQA
Number of Questions Evaluated: 200
Number of Correct Answers: 113
Overall Accuracy: 56.50%
