# INTRO 👋

2023 marked the rise of the GenAI buzz, and many companies worldwide are working hard to take advantage of its capabilities to solve advanced problems. The [Malawi Public Health Systems LLM Challenge](https://zindi.africa/competitions/malawi-public-health-systems-llm-challenge) is one of the first GenAI competitions on Zindi. If you are new to Generative AI or perhaps just trying to get a sense of how to solve this challenge, then this notebook can get you started. This notebook is focused on RAG (Retrieval Augmented Generation), which leverages existing LLMs to perform Q&A using a provided context. 

# The Data 📊

When examining the data tab on Zindi, we find three files. Let's break it down:

- **Train.csv:** This file contains 748 rows × 6 columns, which can be used to train a model.
- **Test.csv:** This file contains 499 rows, which are the test questions.
- **SampleSubmission.csv:** This CSV file is the sample submission format that Zindi expects.

### Extras 📁:
- **MWTGBookletsExcel** This folder contains six Excel spreadsheets. In this competition, I renamed the spreadsheets as the original names were too long. I've manually shortened the names to keep things simple. Please ensure you do this via code if you'd like to do the same.

Original Filenames:

1. TG Booklet 1 Introduction Module Booklet 1TG_final_04112021.xlsx
2. TG Booklet 2 Sections 1,2,3_final_04112021.xlsx
3. TG Booklet 3 Section 4,5,6,7_final_04112021.xlsx
4. TG Booklet 4 Sections 8, 9_final_04112021.xlsx
5. TG Booklet 5 Section 10_final_04112021.xlsx
6. TG Booklet 6_Section 11_final_04112021.xlsx

Renamed:

1. TG Booklet 1.xlsx
2. TG Booklet 2.xlsx
3. TG Booklet 3.xlsx
4. TG Booklet 4.xlsx
5. TG Booklet 5.xlsx
6. TG Booklet 6.xlsx


## Requirements 🛠️

Just a basic setup: please use a GPU-enabled setup for your inference, but don't spend the whole day on it. I'm using Kaggle since they offer free GPUs; you can also use any other free platform you like... I guess.. 😬. Of course, if you don't have access to GPUs for some reason, you can also run it on CPUs.

###### Last Thing Before You Start! 🚀

Read the description before you start coding, so you can have some insight into the challenge. This notebook does not cover training/fine-tuning a model. Below is a simple workflow.

## Setting up Huggingface. 🤖

There are several pretrained models for text-to-text generation. For this demo, we'll be using the Llama-2-70b-chat-hf. It is available on the hub and is open source, however it is gated. This means you will have to request access to the model via the hub. This is due to the safety and ethics principles Meta aims to uphold. It's still completely free and open source. Note that request approvals may take up to a few hours to a few days. If you can't wait, feel free to switch to other models that are not gated. ⏳


Replace `YOUR_HF_TOKEN` 🔑 with your Huggingface token. Hugging Face needs to be sure you have access to the model.

In [1]:
!python -c "from huggingface_hub.hf_api import HfFolder; HfFolder.save_token ('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')"

## Installing Libraries. 

##### Langchain

The major library we'll be using among others is Langchain. LangChain is a framework for developing applications powered by language models. It enables applications that:

1. Are context-aware: connect a language model to sources of context (prompt instructions, few shot examples, content to ground its response in, etc.) 🧠
2. Reason: rely on a language model to reason (about how to answer based on provided context, what actions to take, etc.) 🤔

Read more about LangChain at [LangChain Documentation](https://python.langchain.com/docs/get_started/introduction) 📚


##### chromadb

Chroma is a database for building AI applications with embeddings. It comes with everything you need to get started built in, and runs on your machine locally. According to their website, a hosted version is coming soon! Read more about ChromaDB here at [ChromaDB Documentation](https://docs.trychroma.com/getting-started) 📦

###### bitsandbytes

The bitsandbytes library is a lightweight Python wrapper around CUDA custom functions, particularly 8-bit optimizers, matrix multiplication (`LLM.int8()`), and 8 + 4-bit quantization functions. Learn more about bitsandbytes [here](https://huggingface.co/docs/bitsandbytes/main/en/index) 🧊🔢

In [2]:
import pandas as pd
import numpy as np

## Taking a Look at the Data 👀

The Train and Test files come as a CSV file. For convenience and to check out the data, we can load it into a dataframe using pandas. 🐼

If you are getting an error, it means you haven't uploaded the dataset into Kaggle, or your path is incorrect. 🚨


###### more details

- **ID:** The question ID
- **Question Text:** Essentially the text of the questions.
- **Question Answer:** The Answer to the Question Text
- **Reference Document:** This is where the Answer is in the textbook (Remember there are 6 excel sheets where the textbooks are)
- **Paragraph(s) Number:** This is Paragraph in the Reference Document where the answers are
- **Keywords:** The contextual Keywords 📝


In [3]:
path = "/kaggle/input/malawi-public-health-dataset/strengthening-health-systems-llm-challenge-for-integrated-disease-surveillance-and-response-in-malawi20240125-12750-1x85c8a"
train = pd.read_csv(f"{path}/Train.csv")
train

Unnamed: 0,ID,Question Text,Question Answer,Reference Document,Paragraph(s) Number,Keywords
0,Q829,Compare the laboratory confirmation methods fo...,Chikungunya is confirmed using serological tes...,TG Booklet 6,"154, 166",Laboratory Confirmation For Chikungunya Vs. Di...
1,Q721,When should specimens be collected for Anthrax...,Specimens should be collected during the vesic...,TG Booklet 6,140,"Anthrax Specimen Collection: Timing, Preparati..."
2,Q464,Which key information should be recorded durin...,"During a register review, key information abou...",TG Booklet 3,439-440,"Register Review, Key Information, Suspected Ca..."
3,Q449,Why is the District log of suspected outbreaks...,The log includes information about response ac...,TG Booklet 3,412,"District Log, Response Activities, Steps Taken..."
4,Q6,What do Community based surveillance strategie...,Community-based surveillance strategies focus ...,TG Booklet 1,86,"Community-based Surveillance Strategies, Ident..."
...,...,...,...,...,...,...
743,Q413,Which section of the guidelines provides a des...,Section 11.0 of these 3rd Edition Malawi IDSR ...,TG Booklet 3,376,"Control Measures Description, Priority Disease..."
744,Q626,"Does MEF stand for an abbreviation in the TG, ...",Medical Teams International,TG Booklet 6,106,Medical Teams International
745,Q1141,In what ways do the verification and documenta...,"In emergency contexts, verification and docume...",TG Booklet 5,105-106,"Verification, Documentation, Early Warning, Em..."
746,Q331,What role does the examination of burial cerem...,Examining burial ceremonies helps identify pot...,TG Booklet 3,287,"Burial Ceremonies Examination, Exposure, Trans..."


In [4]:
test = pd.read_csv(f"{path}/Test.csv")
test

Unnamed: 0,ID,Question Text
0,Q4,"What is the definition of ""unusual event"""
1,Q5,What is Community Based Surveillance (CBS)?
2,Q9,What kind of training should members of VHC re...
3,Q10,What is indicator based surveillance (IBS)?
4,Q13,What is Case based surveillance?
...,...,...
494,Q1229,Where should completeness be evaluated in the ...
495,Q1230,Which dimensions of completeness are crucial i...
496,Q1236,How can the completeness of case reporting be ...
497,Q1239,Where should completeness and timeliness of re...


###### Submission Format 

Everything has to be passed as a separate index 📝


In [5]:
ss = pd.read_csv(f"{path}/SampleSubmission.csv")
ss

Unnamed: 0,ID,Target
0,Q1000_keywords,
1,Q1000_paragraph(s)_number,
2,Q1000_question_answer,
3,Q1000_reference_document,
4,Q1002_keywords,
...,...,...
1991,Q999_reference_document,
1992,Q9_keywords,
1993,Q9_paragraph(s)_number,
1994,Q9_question_answer,


######  Make the necessary imports 📚


In [6]:
!pip install unstructured -q
!pip install chromadb -q
!pip install langchain -q
!pip install sentence_transformers -q
!pip install bitsandbytes -q
!pip install langchain_community -q

[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
cudf 23.8.0 requires cubinlinker, which is not installed.
cudf 23.8.0 requires cupy-cuda11x>=12.0.0, which is not installed.
cudf 23.8.0 requires ptxcompiler, which is not installed.
cuml 23.8.0 requires cupy-cuda11x>=12.0.0, which is not installed.
dask-cudf 23.8.0 requires cupy-cuda11x>=12.0.0, which is not installed.
cudf 23.8.0 requires cuda-python<12.0a0,>=11.7.1, but you have cuda-python 12.3.0 which is incompatible.
cudf 23.8.0 requires pandas<1.6.0dev0,>=1.3, but you have pandas 2.1.4 which is incompatible.
cudf 23.8.0 requires protobuf<5,>=4.21, but you have protobuf 3.20.3 which is incompatible.
cuml 23.8.0 requires dask==2023.7.1, but you have dask 2024.1.0 which is incompatible.
cuml 23.8.0 requires distributed==2023.7.1, but you have distributed 2024.1.0 which is incompatible.
dask-cud

In [7]:
from torch import cuda, bfloat16
import torch
import transformers
from transformers import AutoTokenizer
from time import time
#import chromadb
#from chromadb.config import Settings
from langchain.llms import HuggingFacePipeline
from langchain.document_loaders import TextLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.chains import RetrievalQA,ConversationalRetrievalChain
from langchain.vectorstores import Chroma

import warnings
warnings.filterwarnings("ignore")
warnings.filterwarnings("ignore", category=UserWarning, module="transformers")

In [8]:
# Checking if GPU is available
if torch.cuda.is_available():
    gpu_name = torch.cuda.get_device_name(torch.cuda.current_device())
    total_memory = torch.cuda.get_device_properties(0).total_memory
    total_memory_gb = total_memory / (1024**3) # Converting memory to Gb
    print("GPU is available. \nUsing GPU")
    print("\nGPU Name:", gpu_name)
    print(f"Total GPU Memory: {total_memory_gb:.2f} GB")
    
    device = torch.device('cuda')
else:
    print("GPU is not available. \nUsing CPU")
    device = torch.device('cpu')

GPU is available. 
Using GPU

GPU Name: Tesla P100-PCIE-16GB
Total GPU Memory: 15.89 GB


## Large Language  (LLMs) 

LLMs, which stand for Large Language Models, are foundational language models that are very large as the name implies. They can understand and generate human language text. 

They are trained by analyzing massive datasets of text and learning the statistical relationships between words and phrases. This allows them to perform a variety of tasks, such as:

- Answering your questions in an informative way, even if they are open-ended, challenging, or strange.
- Generating different creative text formats, like poems, code, scripts, musical pieces, email, letters, etc.
- Translating languages
- Writing different kinds of creative content
- Summarizing factual topics

In this tutorial, we're focused on using them for question answering. There are a couple of Open Source LLMs, but for this tutorial, we're using Meta's Llama model. There are various sizes on the hub, but here we'll use the "Llama-2-13b-chat-hf" 🦙


Let's see what a sample question looks like from the test set. 🕵️‍♂️

In [9]:
question = test["Question Text"][0]
question

'What is the definition of "unusual event"'

###### Use Bits and Bytes library to create a config.

LLMs can be very large, but a lot of awesome work has been done in loading large language models on low resource. I'll encourage you to read papers on QLORA, PEFT, GPTQ, etc. Very interesting ideas have been created thanks to the research community. We'll load this large model using the Normal Float 4 precision. 🛠️🔢

In [10]:
# Configuring BitsAndBytesConfig for loading model in an optimal way
quantization_config = transformers.BitsAndBytesConfig(load_in_4bit = True,
                                        bnb_4bit_quant_type = 'nf4',
                                        bnb_4bit_use_double_quant = True,
                                        bnb_4bit_compute_dtype = bfloat16)

###### Using Transformers to load in the model

The transformers library is still one of the best resources to load transformer-based models. Everything is automatically set up; we pass in the model configuration and the configuration for Bits and Bytes. 🤖📦

###### Transformers Pipeline 🚀

In [11]:
llm = HuggingFacePipeline.from_model_id(model_id='/kaggle/input/mistral/pytorch/7b-instruct-v0.1-hf/1',
                                       task = 'text-generation',
                                       model_kwargs={'do_sample': True,
                                                   'temperature': .3,
                                                    'max_length': 2048,
                                                    'quantization_config': quantization_config},
                                       device_map = "auto")
# checking again that everything is working fine
llm(prompt=question)

2024-02-27 15:56:51.446731: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-02-27 15:56:51.446845: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-02-27 15:56:51.575196: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered


Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


' in the context of a random variable?\nAnswer: An event that occurs with a probability significantly different from the expected probability.'

## RAG 📚

Retrieval Augmented Generation, or RAG, is a technique used to improve the accuracy and reliability of Large Language Models (LLMs). As you know, LLMs are trained on massive amounts of text data, but they can still struggle with factual consistency and sometimes generate incorrect or misleading information. RAG helps address this by incorporating external/relevant knowledge sources into the generation process.

Here's how it works:

- Retrieval: When you ask an LLM a question or give it a prompt, RAG first retrieves relevant information from the Public Health Vector Database created with the 6 textbooks. 📖

- Augmentation: This retrieved information is then combined with the original prompt to provide the LLM with additional context and factual grounding. 

- Generation: Finally, the LLM uses this augmented prompt to generate its response. This response would be more accurate and reliable because it's based on both the LLM's internal knowledge and the retrieved factual information. 🎯

### Questions ❓

- Why not pass the entire textbook to the LLM? The entire textbook is a lot. LLMs have a "context length" which is the maximum amount of input text they can take in and understand. Also, the more text, the more compute required. It makes more sense to first look for the relevant parts then pass it to the LLM. 🤔


###### Load in the Textbooks 📚


In [12]:
import os
books_path = "/kaggle/input/malawi-public-health-dataset/strengthening-health-systems-llm-challenge-for-integrated-disease-surveillance-and-response-in-malawi20240125-12750-1x85c8a/MWTGBookletsExcel"
booklets = os.listdir(books_path)

There are several dataloaders that have been created by the LangChain community. One of such is the ```UnstructuredExcelLoader```, which loads in an Excel spreadsheet as an unstructured data format. 📊📄

In [13]:
from langchain_community.document_loaders import UnstructuredExcelLoader

###### Load all the 6 textbooks

Here we load it and extend it in a list called docs 📚


In [14]:
loaders = [UnstructuredExcelLoader(f"{books_path}/{i}") for i in booklets]
docs = []
for loader in loaders:
    docs.extend(loader.load())

###### Splitting the data.

LangChain has several splitting methods, a basic one is the "RecursiveCharacterTextSplitter" which splits the data (text) by chunks of n characters, and includes an overlap of k characters called the chunk overlap. 📝🔀


In [15]:
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=50)
all_splits = text_splitter.split_documents(docs)

###### Tokenizing the data

NLP took a huge leap during the discovery of embeddings. Sentence Embeddings or Sentence Vectors are numeric vector inputs that represent a sentence in a lower-dimensional space. It allows sentences with similar meanings to have a similar representation. You can read more about embeddings [here](https://huggingface.co/blog/getting-started-with-embeddings)

For this tutorial, we'll be making use of `sentence-transformers/all-MiniLM-L6-v2`. It is a model that maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for tasks like clustering or semantic search. It's available and open source on the Hub. 🌐🔠


In [16]:
model_name = "sentence-transformers/all-MiniLM-L6-v2"
model_kwargs = {"device": "cuda"}

embeddings = HuggingFaceEmbeddings(model_name=model_name, model_kwargs=model_kwargs)

modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/10.7k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

1_Pooling/config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

##### Vector Databases

Vector databases provide the ability to store and retrieve vectors as high-dimensional points. They add additional capabilities for efficient and fast lookup of nearest-neighbors in the N-dimensional space. 🗄️🔍

In the code below, we create a vector database with Chroma. We then pass in all the splits (chunks of the entire textbook) and the embedding model to convert them into embeddings. 💡📚


In [17]:
vectordb = Chroma.from_documents(documents=all_splits, embedding=embeddings, persist_directory="chroma_db")

In [18]:
vectordb

<langchain_community.vectorstores.chroma.Chroma at 0x7d8cb4a8f640>

###### Langchain's Retrieval QA

LangChain has the ability to create or arrange chains, one of such popular chains is the RetrievalQA which can take in the retriever (the function that retrieves the relevant chunk) and the LLM (that will answer the question).

For the Retriever, you'll find that I've set k=3. This is the maximum number of splits I want returned from the vector database. In other words, the vector database finds the correct split and gives us the best 3 to work with. We then combine this best 3 chunks with the question (prompt) and get a RAG-based answer. 🔄🔍

In [19]:
retriever = vectordb.as_retriever(k=5)

In [20]:
retriever

VectorStoreRetriever(tags=['Chroma', 'HuggingFaceEmbeddings'], vectorstore=<langchain_community.vectorstores.chroma.Chroma object at 0x7d8cb4a8f640>)

In [21]:
qa = RetrievalQA.from_chain_type(
    llm=llm, 
    chain_type="stuff", 
    retriever=retriever, 
    verbose=True,
)

###### Using the RAG for the Test Set

In [22]:
def test_rag(qa, query):
    #print(f"Query: {query}\n")
    time_1 = time()
    result = qa.run(query)
    time_2 = time()
    time_taken = round(time_2-time_1, 3)
    #print(f"Inference time: {round(time_2-time_1, 3)} sec.")
    #print("\nResult: ", result)
    return result,time_taken

I wanted to keep track of some data aside from the answer:

1. One that is important is the source, i.e., which of the Excel spreadsheets did it find the relevant information?
2. The other one is the time it took to answer the question through the RAG.

As you'll remember, the sample submission requires us to submit the "Answer", "Textbook Source", "Paragraph", and even "Keywords". For now, we'll deal with the "Answer and Textbook Source" and later on use simpler methods to extract the relevant paragraph and the Keywords. 📊⏱️


In [23]:
from tqdm import tqdm

In [24]:
times = []
results = []
sources = []
for question in tqdm(test["Question Text"]):
    try:
        result,time_taken = test_rag(qa, question)
        docs = vectordb.similarity_search(result)
        source = docs[0].metadata['source'].split("/")[-1]

        times.append(time_taken)
        results.append(result)
        sources.append(source)
    except:
        
        times.append("Error")
        results.append("Error")
        sources.append("Error")   

  0%|          | 0/499 [00:00<?, ?it/s]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.




[1m> Entering new RetrievalQA chain...[0m


  0%|          | 1/499 [00:10<1:30:26, 10.90s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


  0%|          | 2/499 [00:46<3:30:24, 25.40s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


  1%|          | 3/499 [01:02<2:53:00, 20.93s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


  1%|          | 4/499 [01:08<2:04:25, 15.08s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


  1%|          | 5/499 [01:15<1:42:28, 12.45s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


  1%|          | 6/499 [01:28<1:43:42, 12.62s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


  1%|▏         | 7/499 [01:49<2:05:46, 15.34s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


  2%|▏         | 8/499 [02:03<2:00:00, 14.66s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


  2%|▏         | 9/499 [02:14<1:52:35, 13.79s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


  2%|▏         | 10/499 [02:31<1:58:29, 14.54s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


  2%|▏         | 11/499 [02:36<1:36:27, 11.86s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


  2%|▏         | 12/499 [02:47<1:32:26, 11.39s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


  3%|▎         | 13/499 [03:18<2:20:17, 17.32s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


  3%|▎         | 14/499 [03:28<2:02:14, 15.12s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


  3%|▎         | 15/499 [03:31<1:33:33, 11.60s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


  3%|▎         | 16/499 [03:35<1:14:04,  9.20s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


  3%|▎         | 17/499 [03:39<1:01:26,  7.65s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


  4%|▎         | 18/499 [03:51<1:11:37,  8.93s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


  4%|▍         | 19/499 [04:04<1:22:25, 10.30s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


  4%|▍         | 20/499 [04:13<1:19:06,  9.91s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


  4%|▍         | 21/499 [04:24<1:20:24, 10.09s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


  4%|▍         | 22/499 [04:29<1:08:07,  8.57s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


  5%|▍         | 23/499 [04:34<59:57,  7.56s/it]  Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


  5%|▍         | 24/499 [04:46<1:09:53,  8.83s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


  5%|▌         | 25/499 [04:51<1:00:44,  7.69s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


  5%|▌         | 26/499 [05:01<1:05:24,  8.30s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


  5%|▌         | 27/499 [07:12<5:54:51, 45.11s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


  6%|▌         | 28/499 [07:24<4:36:34, 35.23s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


  6%|▌         | 29/499 [07:41<3:54:01, 29.88s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


  6%|▌         | 30/499 [07:53<3:11:30, 24.50s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


  6%|▌         | 31/499 [07:58<2:24:23, 18.51s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


  6%|▋         | 32/499 [08:04<1:55:13, 14.80s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


  7%|▋         | 33/499 [08:11<1:37:07, 12.51s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


  7%|▋         | 34/499 [08:18<1:23:26, 10.77s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


  7%|▋         | 35/499 [08:24<1:12:34,  9.38s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


  7%|▋         | 36/499 [08:35<1:16:12,  9.88s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


  7%|▋         | 37/499 [08:58<1:45:44, 13.73s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


  8%|▊         | 38/499 [09:05<1:30:46, 11.81s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


  8%|▊         | 39/499 [09:12<1:20:22, 10.48s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


  8%|▊         | 40/499 [09:23<1:21:42, 10.68s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


  8%|▊         | 41/499 [09:31<1:14:57,  9.82s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


  8%|▊         | 42/499 [09:43<1:19:08, 10.39s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


  9%|▊         | 43/499 [09:54<1:20:21, 10.57s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


  9%|▉         | 44/499 [11:31<4:36:37, 36.48s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


  9%|▉         | 45/499 [11:39<3:30:50, 27.86s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


  9%|▉         | 46/499 [11:52<2:57:25, 23.50s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


  9%|▉         | 47/499 [12:04<2:30:29, 19.98s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 10%|▉         | 48/499 [12:11<2:02:08, 16.25s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 10%|▉         | 49/499 [12:17<1:37:12, 12.96s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 10%|█         | 50/499 [12:25<1:27:28, 11.69s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 10%|█         | 51/499 [12:45<1:45:02, 14.07s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 10%|█         | 52/499 [12:56<1:37:26, 13.08s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 11%|█         | 53/499 [13:06<1:31:10, 12.26s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 11%|█         | 54/499 [13:18<1:29:50, 12.11s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 11%|█         | 55/499 [13:27<1:24:07, 11.37s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 11%|█         | 56/499 [13:34<1:14:17, 10.06s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 11%|█▏        | 57/499 [13:57<1:42:27, 13.91s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 12%|█▏        | 58/499 [14:11<1:42:55, 14.00s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 12%|█▏        | 59/499 [16:00<5:11:23, 42.46s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 12%|█▏        | 60/499 [16:14<4:06:37, 33.71s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 12%|█▏        | 61/499 [17:11<4:57:56, 40.81s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 12%|█▏        | 62/499 [18:28<6:15:12, 51.52s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 13%|█▎        | 63/499 [19:40<6:59:35, 57.74s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 13%|█▎        | 64/499 [19:49<5:14:02, 43.32s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 13%|█▎        | 65/499 [19:56<3:54:34, 32.43s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 13%|█▎        | 66/499 [20:15<3:24:48, 28.38s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 13%|█▎        | 67/499 [20:26<2:46:43, 23.16s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 14%|█▎        | 68/499 [20:41<2:26:53, 20.45s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 14%|█▍        | 69/499 [21:01<2:27:17, 20.55s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 14%|█▍        | 70/499 [21:08<1:56:48, 16.34s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 14%|█▍        | 71/499 [22:49<4:58:08, 41.79s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 14%|█▍        | 72/499 [23:25<4:44:54, 40.03s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 15%|█▍        | 73/499 [23:32<3:33:23, 30.06s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 15%|█▍        | 74/499 [23:45<2:56:30, 24.92s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 15%|█▌        | 75/499 [23:59<2:32:54, 21.64s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 15%|█▌        | 76/499 [25:23<4:44:18, 40.33s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 15%|█▌        | 77/499 [25:29<3:31:12, 30.03s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 16%|█▌        | 78/499 [26:53<5:24:20, 46.22s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 16%|█▌        | 79/499 [27:00<4:02:22, 34.63s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 16%|█▌        | 80/499 [28:36<6:10:31, 53.06s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 16%|█▌        | 81/499 [28:42<4:31:43, 39.00s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 16%|█▋        | 82/499 [28:45<3:16:05, 28.22s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 17%|█▋        | 83/499 [28:56<2:39:29, 23.00s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 17%|█▋        | 84/499 [30:23<4:50:32, 42.01s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 17%|█▋        | 85/499 [30:30<3:37:21, 31.50s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 17%|█▋        | 86/499 [30:41<2:55:03, 25.43s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 17%|█▋        | 87/499 [30:53<2:26:51, 21.39s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 18%|█▊        | 88/499 [31:00<1:56:53, 17.07s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 18%|█▊        | 89/499 [31:07<1:36:50, 14.17s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 18%|█▊        | 90/499 [31:13<1:19:14, 11.63s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 18%|█▊        | 91/499 [31:22<1:14:30, 10.96s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 18%|█▊        | 92/499 [31:27<1:01:01,  9.00s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 19%|█▊        | 93/499 [33:11<4:14:44, 37.65s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 19%|█▉        | 94/499 [33:24<3:22:48, 30.05s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 19%|█▉        | 95/499 [33:35<2:44:06, 24.37s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 19%|█▉        | 96/499 [33:47<2:19:15, 20.73s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 19%|█▉        | 97/499 [34:03<2:08:41, 19.21s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 20%|█▉        | 98/499 [34:15<1:54:01, 17.06s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 20%|█▉        | 99/499 [34:21<1:32:05, 13.81s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 20%|██        | 100/499 [34:29<1:20:10, 12.06s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 20%|██        | 101/499 [34:42<1:22:09, 12.39s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 20%|██        | 102/499 [34:50<1:13:59, 11.18s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 21%|██        | 103/499 [35:02<1:14:15, 11.25s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 21%|██        | 104/499 [36:48<4:21:30, 39.72s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 21%|██        | 105/499 [38:25<6:13:12, 56.83s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 21%|██        | 106/499 [38:29<4:29:59, 41.22s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 21%|██▏       | 107/499 [40:06<6:17:52, 57.84s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 22%|██▏       | 108/499 [40:17<4:44:47, 43.70s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 22%|██▏       | 109/499 [40:29<3:43:16, 34.35s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 22%|██▏       | 110/499 [42:05<5:41:20, 52.65s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 22%|██▏       | 111/499 [42:41<5:08:25, 47.69s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 22%|██▏       | 112/499 [42:58<4:08:16, 38.49s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 23%|██▎       | 113/499 [43:34<4:04:04, 37.94s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 23%|██▎       | 114/499 [43:54<3:27:48, 32.39s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 23%|██▎       | 115/499 [44:16<3:06:54, 29.20s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 23%|██▎       | 116/499 [44:31<2:39:15, 24.95s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 23%|██▎       | 117/499 [44:39<2:06:24, 19.85s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 24%|██▎       | 118/499 [44:45<1:40:25, 15.81s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 24%|██▍       | 119/499 [44:52<1:22:31, 13.03s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 24%|██▍       | 120/499 [44:58<1:09:53, 11.07s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 24%|██▍       | 121/499 [45:16<1:23:05, 13.19s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 24%|██▍       | 122/499 [45:49<1:59:35, 19.03s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 25%|██▍       | 123/499 [46:03<1:49:22, 17.45s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 25%|██▍       | 124/499 [46:11<1:32:28, 14.79s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 25%|██▌       | 125/499 [46:17<1:15:59, 12.19s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 25%|██▌       | 126/499 [46:28<1:12:22, 11.64s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 25%|██▌       | 127/499 [46:42<1:17:20, 12.47s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 26%|██▌       | 128/499 [46:52<1:12:35, 11.74s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 26%|██▌       | 129/499 [47:13<1:28:23, 14.33s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 26%|██▌       | 130/499 [47:26<1:27:09, 14.17s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 26%|██▋       | 131/499 [47:34<1:14:21, 12.12s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 26%|██▋       | 132/499 [47:39<1:01:34, 10.07s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 27%|██▋       | 133/499 [47:50<1:02:19, 10.22s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 27%|██▋       | 134/499 [48:04<1:10:40, 11.62s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 27%|██▋       | 135/499 [48:14<1:05:59, 10.88s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 27%|██▋       | 136/499 [48:30<1:16:35, 12.66s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 27%|██▋       | 137/499 [48:43<1:16:29, 12.68s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 28%|██▊       | 138/499 [48:46<58:48,  9.77s/it]  Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 28%|██▊       | 139/499 [49:04<1:13:45, 12.29s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 28%|██▊       | 140/499 [49:13<1:06:34, 11.13s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 28%|██▊       | 141/499 [50:37<3:17:50, 33.16s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 28%|██▊       | 142/499 [50:51<2:43:11, 27.43s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 29%|██▊       | 143/499 [51:00<2:08:40, 21.69s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 29%|██▉       | 144/499 [51:09<1:46:00, 17.92s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 29%|██▉       | 145/499 [51:14<1:22:35, 14.00s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 29%|██▉       | 146/499 [52:54<3:54:21, 39.84s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 29%|██▉       | 147/499 [52:58<2:51:39, 29.26s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 30%|██▉       | 148/499 [53:08<2:16:18, 23.30s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 30%|██▉       | 149/499 [53:13<1:43:56, 17.82s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 30%|███       | 150/499 [53:20<1:24:56, 14.60s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 30%|███       | 151/499 [53:37<1:28:22, 15.24s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 30%|███       | 152/499 [53:47<1:19:02, 13.67s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 31%|███       | 153/499 [53:55<1:10:35, 12.24s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 31%|███       | 154/499 [54:02<1:00:07, 10.46s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 31%|███       | 155/499 [54:15<1:05:05, 11.35s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 31%|███▏      | 156/499 [54:23<58:21, 10.21s/it]  Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 31%|███▏      | 157/499 [54:38<1:06:14, 11.62s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 32%|███▏      | 158/499 [54:43<54:40,  9.62s/it]  Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 32%|███▏      | 159/499 [54:56<1:00:09, 10.62s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 32%|███▏      | 160/499 [55:05<57:34, 10.19s/it]  Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 32%|███▏      | 161/499 [55:20<1:06:04, 11.73s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 32%|███▏      | 162/499 [55:30<1:03:33, 11.31s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 33%|███▎      | 163/499 [55:41<1:02:29, 11.16s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 33%|███▎      | 164/499 [55:50<57:54, 10.37s/it]  Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 33%|███▎      | 165/499 [56:06<1:08:15, 12.26s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 33%|███▎      | 166/499 [56:22<1:13:58, 13.33s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 33%|███▎      | 167/499 [56:30<1:04:12, 11.60s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 34%|███▎      | 168/499 [56:40<1:01:22, 11.13s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 34%|███▍      | 169/499 [56:53<1:05:15, 11.87s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 34%|███▍      | 170/499 [57:03<1:01:56, 11.30s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 34%|███▍      | 171/499 [57:09<52:44,  9.65s/it]  Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 34%|███▍      | 172/499 [58:47<3:16:57, 36.14s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 35%|███▍      | 173/499 [58:59<2:37:09, 28.92s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 35%|███▍      | 174/499 [59:12<2:10:06, 24.02s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 35%|███▌      | 175/499 [59:22<1:47:56, 19.99s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 35%|███▌      | 176/499 [59:34<1:33:59, 17.46s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 35%|███▌      | 177/499 [59:44<1:21:33, 15.20s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 36%|███▌      | 178/499 [59:58<1:19:20, 14.83s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 36%|███▌      | 179/499 [1:00:20<1:31:16, 17.11s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 36%|███▌      | 180/499 [1:01:44<3:16:53, 37.03s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 36%|███▋      | 181/499 [1:01:57<2:37:41, 29.75s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 36%|███▋      | 182/499 [1:02:15<2:20:02, 26.51s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 37%|███▋      | 183/499 [1:02:28<1:58:16, 22.46s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 37%|███▋      | 184/499 [1:02:38<1:37:54, 18.65s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 37%|███▋      | 185/499 [1:02:55<1:34:20, 18.03s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 37%|███▋      | 186/499 [1:03:02<1:16:31, 14.67s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 37%|███▋      | 187/499 [1:03:11<1:08:05, 13.10s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 38%|███▊      | 188/499 [1:03:25<1:09:58, 13.50s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 38%|███▊      | 189/499 [1:03:32<59:02, 11.43s/it]  Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 38%|███▊      | 190/499 [1:03:38<50:44,  9.85s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 38%|███▊      | 191/499 [1:03:51<54:33, 10.63s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 38%|███▊      | 192/499 [1:04:03<57:12, 11.18s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 39%|███▊      | 193/499 [1:04:12<53:25, 10.48s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 39%|███▉      | 194/499 [1:04:21<51:10, 10.07s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 39%|███▉      | 195/499 [1:04:25<41:27,  8.18s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 39%|███▉      | 196/499 [1:04:36<45:57,  9.10s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 39%|███▉      | 197/499 [1:04:46<46:43,  9.28s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 40%|███▉      | 198/499 [1:04:49<37:56,  7.56s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 40%|███▉      | 199/499 [1:05:12<1:01:05, 12.22s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 40%|████      | 200/499 [1:05:16<48:15,  9.68s/it]  Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 40%|████      | 201/499 [1:05:27<49:31,  9.97s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 40%|████      | 202/499 [1:05:37<50:06, 10.12s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 41%|████      | 203/499 [1:07:28<3:18:21, 40.21s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 41%|████      | 204/499 [1:07:45<2:44:21, 33.43s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 41%|████      | 205/499 [1:07:57<2:12:13, 26.99s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 41%|████▏     | 206/499 [1:08:09<1:49:20, 22.39s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 41%|████▏     | 207/499 [1:08:16<1:26:24, 17.76s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 42%|████▏     | 208/499 [1:08:23<1:10:56, 14.63s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 42%|████▏     | 209/499 [1:08:43<1:18:29, 16.24s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 42%|████▏     | 210/499 [1:08:57<1:14:35, 15.49s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 42%|████▏     | 211/499 [1:09:12<1:13:59, 15.42s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 42%|████▏     | 212/499 [1:10:34<2:48:39, 35.26s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 43%|████▎     | 213/499 [1:10:38<2:03:01, 25.81s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 43%|████▎     | 214/499 [1:10:42<1:32:04, 19.38s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 43%|████▎     | 215/499 [1:10:46<1:09:20, 14.65s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 43%|████▎     | 216/499 [1:12:56<3:53:00, 49.40s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 43%|████▎     | 217/499 [1:13:01<2:49:03, 35.97s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 44%|████▎     | 218/499 [1:13:06<2:05:03, 26.70s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 44%|████▍     | 219/499 [1:14:42<3:41:34, 47.48s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 44%|████▍     | 220/499 [1:14:46<2:40:06, 34.43s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 44%|████▍     | 221/499 [1:14:51<1:58:34, 25.59s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 44%|████▍     | 222/499 [1:16:15<3:18:58, 43.10s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 45%|████▍     | 223/499 [1:16:18<2:23:33, 31.21s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 45%|████▍     | 224/499 [1:16:22<1:44:51, 22.88s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 45%|████▌     | 225/499 [1:18:05<3:35:30, 47.19s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 45%|████▌     | 226/499 [1:18:18<2:48:03, 36.94s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 45%|████▌     | 227/499 [1:18:22<2:01:48, 26.87s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 46%|████▌     | 228/499 [1:18:56<2:11:43, 29.17s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 46%|████▌     | 229/499 [1:19:00<1:36:55, 21.54s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 46%|████▌     | 230/499 [1:19:05<1:13:47, 16.46s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 46%|████▋     | 231/499 [1:19:13<1:02:48, 14.06s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 46%|████▋     | 232/499 [1:19:17<49:21, 11.09s/it]  Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 47%|████▋     | 233/499 [1:19:26<45:34, 10.28s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 47%|████▋     | 234/499 [1:19:30<37:33,  8.50s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 47%|████▋     | 235/499 [1:21:22<2:53:44, 39.49s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 47%|████▋     | 236/499 [1:21:25<2:05:51, 28.71s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 47%|████▋     | 237/499 [1:22:43<3:08:49, 43.24s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 48%|████▊     | 238/499 [1:22:46<2:15:32, 31.16s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 48%|████▊     | 239/499 [1:22:51<1:41:41, 23.47s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 48%|████▊     | 240/499 [1:22:57<1:19:04, 18.32s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 48%|████▊     | 241/499 [1:24:25<2:48:03, 39.09s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 48%|████▊     | 242/499 [1:24:28<2:01:36, 28.39s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 49%|████▊     | 243/499 [1:24:33<1:30:32, 21.22s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 49%|████▉     | 244/499 [1:24:37<1:08:23, 16.09s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 49%|████▉     | 245/499 [1:24:40<51:36, 12.19s/it]  Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 49%|████▉     | 246/499 [1:27:08<3:43:17, 52.95s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 49%|████▉     | 247/499 [1:27:13<2:41:26, 38.44s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 50%|████▉     | 248/499 [1:29:26<4:39:47, 66.88s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 50%|████▉     | 249/499 [1:29:31<3:21:52, 48.45s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 50%|█████     | 250/499 [1:30:07<3:04:51, 44.55s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 50%|█████     | 251/499 [1:30:12<2:15:35, 32.81s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 51%|█████     | 252/499 [1:30:16<1:39:45, 24.23s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 51%|█████     | 253/499 [1:31:12<2:17:55, 33.64s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 51%|█████     | 254/499 [1:31:17<1:42:21, 25.07s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 51%|█████     | 255/499 [1:31:21<1:16:29, 18.81s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 51%|█████▏    | 256/499 [1:31:26<59:34, 14.71s/it]  Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 52%|█████▏    | 257/499 [1:31:31<46:58, 11.64s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 52%|█████▏    | 258/499 [1:31:36<38:46,  9.65s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 52%|█████▏    | 259/499 [1:31:41<33:22,  8.34s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 52%|█████▏    | 260/499 [1:31:45<27:55,  7.01s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 52%|█████▏    | 261/499 [1:31:52<27:38,  6.97s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 53%|█████▎    | 262/499 [1:31:58<26:05,  6.60s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 53%|█████▎    | 263/499 [1:32:09<31:14,  7.94s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 53%|█████▎    | 264/499 [1:32:16<30:03,  7.68s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 53%|█████▎    | 265/499 [1:32:18<23:58,  6.15s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 53%|█████▎    | 266/499 [1:32:22<20:37,  5.31s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 54%|█████▎    | 267/499 [1:32:27<20:44,  5.37s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 54%|█████▎    | 268/499 [1:32:32<19:46,  5.14s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 54%|█████▍    | 269/499 [1:32:37<19:59,  5.22s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 54%|█████▍    | 270/499 [1:32:45<22:32,  5.91s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 54%|█████▍    | 271/499 [1:32:51<22:49,  6.01s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 55%|█████▍    | 272/499 [1:33:02<28:31,  7.54s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 55%|█████▍    | 273/499 [1:33:10<28:34,  7.59s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 55%|█████▍    | 274/499 [1:34:52<2:14:38, 35.90s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 55%|█████▌    | 275/499 [1:35:33<2:20:07, 37.53s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 55%|█████▌    | 276/499 [1:35:58<2:04:48, 33.58s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 56%|█████▌    | 277/499 [1:36:03<1:32:26, 24.99s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 56%|█████▌    | 278/499 [1:36:18<1:22:04, 22.28s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 56%|█████▌    | 279/499 [1:36:31<1:10:46, 19.30s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 56%|█████▌    | 280/499 [1:37:51<2:16:51, 37.49s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 56%|█████▋    | 281/499 [1:38:05<1:50:34, 30.43s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 57%|█████▋    | 282/499 [1:38:12<1:24:37, 23.40s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 57%|█████▋    | 283/499 [1:38:29<1:17:55, 21.65s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 57%|█████▋    | 284/499 [1:38:38<1:03:15, 17.66s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 57%|█████▋    | 285/499 [1:39:12<1:21:16, 22.79s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 57%|█████▋    | 286/499 [1:39:28<1:12:50, 20.52s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 58%|█████▊    | 287/499 [1:39:37<1:00:53, 17.23s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 58%|█████▊    | 288/499 [1:39:45<50:18, 14.31s/it]  Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 58%|█████▊    | 289/499 [1:39:57<48:14, 13.78s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 58%|█████▊    | 290/499 [1:40:05<41:39, 11.96s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 58%|█████▊    | 291/499 [1:41:48<2:16:01, 39.24s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 59%|█████▊    | 292/499 [1:42:10<1:57:23, 34.03s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 59%|█████▊    | 293/499 [1:42:22<1:34:53, 27.64s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 59%|█████▉    | 294/499 [1:43:40<2:25:11, 42.49s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 59%|█████▉    | 295/499 [1:45:13<3:16:29, 57.79s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 59%|█████▉    | 296/499 [1:45:42<2:45:52, 49.03s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 60%|█████▉    | 297/499 [1:45:51<2:05:07, 37.16s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 60%|█████▉    | 298/499 [1:46:13<1:49:31, 32.69s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 60%|█████▉    | 299/499 [1:46:31<1:34:09, 28.25s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 60%|██████    | 300/499 [1:46:41<1:15:16, 22.70s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 60%|██████    | 301/499 [1:46:46<57:04, 17.30s/it]  Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 61%|██████    | 302/499 [1:47:02<55:56, 17.04s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 61%|██████    | 303/499 [1:47:14<50:24, 15.43s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 61%|██████    | 304/499 [1:47:25<46:14, 14.23s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 61%|██████    | 305/499 [1:47:44<50:45, 15.70s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 61%|██████▏   | 306/499 [1:47:51<41:59, 13.05s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 62%|██████▏   | 307/499 [1:49:46<2:19:36, 43.63s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 62%|██████▏   | 308/499 [1:49:50<1:40:47, 31.66s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 62%|██████▏   | 309/499 [1:50:10<1:29:32, 28.28s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 62%|██████▏   | 310/499 [1:50:27<1:18:28, 24.91s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 62%|██████▏   | 311/499 [1:52:21<2:41:53, 51.67s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 63%|██████▎   | 312/499 [1:54:19<3:42:09, 71.28s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 63%|██████▎   | 313/499 [1:54:30<2:44:54, 53.20s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 63%|██████▎   | 314/499 [1:54:41<2:05:03, 40.56s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 63%|██████▎   | 315/499 [1:55:09<1:52:52, 36.81s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 63%|██████▎   | 316/499 [1:57:08<3:07:18, 61.41s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 64%|██████▎   | 317/499 [1:57:34<2:34:58, 51.09s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 64%|██████▎   | 318/499 [1:57:47<1:59:06, 39.48s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 64%|██████▍   | 319/499 [1:57:58<1:33:15, 31.08s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 64%|██████▍   | 320/499 [1:58:10<1:15:03, 25.16s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 64%|██████▍   | 321/499 [1:58:34<1:14:02, 24.96s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 65%|██████▍   | 322/499 [1:58:43<59:19, 20.11s/it]  Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 65%|██████▍   | 323/499 [1:59:17<1:11:34, 24.40s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 65%|██████▍   | 324/499 [1:59:29<59:54, 20.54s/it]  Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 65%|██████▌   | 325/499 [1:59:39<50:51, 17.54s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 65%|██████▌   | 326/499 [1:59:49<43:14, 15.00s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 66%|██████▌   | 327/499 [2:00:52<1:24:43, 29.56s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 66%|██████▌   | 328/499 [2:01:03<1:07:56, 23.84s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 66%|██████▌   | 329/499 [2:01:11<54:25, 19.21s/it]  Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 66%|██████▌   | 330/499 [2:01:26<50:31, 17.94s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 66%|██████▋   | 331/499 [2:01:41<47:26, 16.94s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 67%|██████▋   | 332/499 [2:01:49<39:50, 14.31s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 67%|██████▋   | 333/499 [2:02:04<40:05, 14.49s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 67%|██████▋   | 334/499 [2:02:19<40:51, 14.86s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 67%|██████▋   | 335/499 [2:02:50<53:19, 19.51s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 67%|██████▋   | 336/499 [2:03:10<53:17, 19.62s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 68%|██████▊   | 337/499 [2:03:43<1:03:54, 23.67s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 68%|██████▊   | 338/499 [2:03:58<56:25, 21.03s/it]  Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 68%|██████▊   | 339/499 [2:04:16<53:47, 20.17s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 68%|██████▊   | 340/499 [2:04:27<46:17, 17.47s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 68%|██████▊   | 341/499 [2:04:42<44:09, 16.77s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 69%|██████▊   | 342/499 [2:04:52<38:18, 14.64s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 69%|██████▊   | 343/499 [2:05:01<33:32, 12.90s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 69%|██████▉   | 344/499 [2:05:33<48:08, 18.64s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 69%|██████▉   | 345/499 [2:05:45<43:07, 16.80s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 69%|██████▉   | 346/499 [2:06:03<43:25, 17.03s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 70%|██████▉   | 347/499 [2:07:18<1:27:31, 34.55s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 70%|██████▉   | 348/499 [2:07:32<1:11:39, 28.47s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 70%|██████▉   | 349/499 [2:07:44<58:25, 23.37s/it]  Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 70%|███████   | 350/499 [2:07:53<47:30, 19.13s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 70%|███████   | 351/499 [2:07:56<35:02, 14.20s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 71%|███████   | 352/499 [2:08:09<33:47, 13.79s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 71%|███████   | 353/499 [2:08:14<27:21, 11.24s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 71%|███████   | 354/499 [2:08:24<26:05, 10.80s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 71%|███████   | 355/499 [2:08:31<23:13,  9.68s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 71%|███████▏  | 356/499 [2:09:03<39:20, 16.51s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 72%|███████▏  | 357/499 [2:09:14<34:54, 14.75s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 72%|███████▏  | 358/499 [2:09:26<33:08, 14.10s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 72%|███████▏  | 359/499 [2:10:05<50:10, 21.51s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 72%|███████▏  | 360/499 [2:10:19<44:10, 19.07s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 72%|███████▏  | 361/499 [2:10:30<38:37, 16.79s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 73%|███████▎  | 362/499 [2:10:41<34:01, 14.90s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 73%|███████▎  | 363/499 [2:12:19<1:30:24, 39.89s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 73%|███████▎  | 364/499 [2:12:35<1:13:27, 32.65s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 73%|███████▎  | 365/499 [2:12:51<1:01:47, 27.67s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 73%|███████▎  | 366/499 [2:13:08<54:36, 24.64s/it]  Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 74%|███████▎  | 367/499 [2:13:16<42:54, 19.51s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 74%|███████▎  | 368/499 [2:13:22<33:50, 15.50s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 74%|███████▍  | 369/499 [2:13:25<25:26, 11.74s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 74%|███████▍  | 370/499 [2:13:30<20:44,  9.65s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 74%|███████▍  | 371/499 [2:13:38<20:06,  9.42s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 75%|███████▍  | 372/499 [2:15:41<1:31:27, 43.21s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 75%|███████▍  | 373/499 [2:15:44<1:05:59, 31.42s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 75%|███████▍  | 374/499 [2:15:50<49:25, 23.72s/it]  Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 75%|███████▌  | 375/499 [2:15:56<37:40, 18.23s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 75%|███████▌  | 376/499 [2:16:24<43:34, 21.26s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 76%|███████▌  | 377/499 [2:16:34<36:11, 17.80s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 76%|███████▌  | 378/499 [2:16:40<28:42, 14.24s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 76%|███████▌  | 379/499 [2:16:52<27:25, 13.71s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 76%|███████▌  | 380/499 [2:17:08<28:42, 14.47s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 76%|███████▋  | 381/499 [2:17:54<46:56, 23.87s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 77%|███████▋  | 382/499 [2:18:13<43:44, 22.43s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 77%|███████▋  | 383/499 [2:18:55<54:40, 28.28s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 77%|███████▋  | 384/499 [2:19:03<42:37, 22.24s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 77%|███████▋  | 385/499 [2:19:11<34:02, 17.91s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 77%|███████▋  | 386/499 [2:20:18<1:01:21, 32.58s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 78%|███████▊  | 387/499 [2:20:22<44:55, 24.06s/it]  Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 78%|███████▊  | 388/499 [2:20:31<36:22, 19.66s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 78%|███████▊  | 389/499 [2:20:39<29:13, 15.94s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 78%|███████▊  | 390/499 [2:20:49<25:59, 14.31s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 78%|███████▊  | 391/499 [2:20:55<21:07, 11.74s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 79%|███████▊  | 392/499 [2:21:11<23:12, 13.01s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 79%|███████▉  | 393/499 [2:22:36<1:01:01, 34.54s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 79%|███████▉  | 394/499 [2:22:40<44:36, 25.49s/it]  Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 79%|███████▉  | 395/499 [2:22:49<35:30, 20.48s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 79%|███████▉  | 396/499 [2:22:59<29:43, 17.31s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 80%|███████▉  | 397/499 [2:23:04<23:19, 13.72s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 80%|███████▉  | 398/499 [2:23:21<24:52, 14.78s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 80%|███████▉  | 399/499 [2:23:43<28:02, 16.83s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 80%|████████  | 400/499 [2:23:54<24:49, 15.05s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 80%|████████  | 401/499 [2:24:01<20:37, 12.63s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 81%|████████  | 402/499 [2:24:07<17:22, 10.74s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 81%|████████  | 403/499 [2:24:15<15:54,  9.94s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 81%|████████  | 404/499 [2:24:22<14:16,  9.01s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 81%|████████  | 405/499 [2:24:48<22:01, 14.05s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 81%|████████▏ | 406/499 [2:25:18<29:16, 18.89s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 82%|████████▏ | 407/499 [2:25:28<24:38, 16.07s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 82%|████████▏ | 408/499 [2:25:42<23:39, 15.60s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 82%|████████▏ | 409/499 [2:25:54<21:45, 14.50s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 82%|████████▏ | 410/499 [2:26:01<18:17, 12.33s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 82%|████████▏ | 411/499 [2:28:00<1:05:00, 44.32s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 83%|████████▎ | 412/499 [2:28:18<52:28, 36.19s/it]  Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 83%|████████▎ | 413/499 [2:28:34<43:34, 30.40s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 83%|████████▎ | 414/499 [2:28:51<37:23, 26.39s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 83%|████████▎ | 415/499 [2:29:07<32:21, 23.11s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 83%|████████▎ | 416/499 [2:30:26<55:15, 39.95s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 84%|████████▎ | 417/499 [2:30:32<40:44, 29.81s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 84%|████████▍ | 418/499 [2:30:36<29:51, 22.12s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 84%|████████▍ | 419/499 [2:31:54<51:37, 38.72s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 84%|████████▍ | 420/499 [2:32:18<45:14, 34.36s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 84%|████████▍ | 421/499 [2:32:35<37:50, 29.10s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 85%|████████▍ | 422/499 [2:32:49<31:24, 24.47s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 85%|████████▍ | 423/499 [2:33:00<26:04, 20.59s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 85%|████████▍ | 424/499 [2:33:17<24:27, 19.57s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 85%|████████▌ | 425/499 [2:33:30<21:37, 17.54s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 85%|████████▌ | 426/499 [2:33:46<20:38, 16.96s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 86%|████████▌ | 427/499 [2:34:01<19:38, 16.37s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 86%|████████▌ | 428/499 [2:34:04<14:41, 12.41s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 86%|████████▌ | 429/499 [2:34:13<13:26, 11.52s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 86%|████████▌ | 430/499 [2:34:22<12:14, 10.65s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 86%|████████▋ | 431/499 [2:34:34<12:30, 11.04s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 87%|████████▋ | 432/499 [2:34:37<09:43,  8.71s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 87%|████████▋ | 433/499 [2:36:26<42:28, 38.62s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 87%|████████▋ | 434/499 [2:36:40<34:04, 31.45s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 87%|████████▋ | 435/499 [2:36:47<25:33, 23.96s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 87%|████████▋ | 436/499 [2:36:53<19:34, 18.64s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 88%|████████▊ | 437/499 [2:36:58<14:59, 14.51s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 88%|████████▊ | 438/499 [2:37:09<13:45, 13.54s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 88%|████████▊ | 439/499 [2:37:27<14:52, 14.88s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 88%|████████▊ | 440/499 [2:39:21<43:44, 44.48s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 88%|████████▊ | 441/499 [2:39:26<31:30, 32.59s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 89%|████████▊ | 442/499 [2:41:15<52:51, 55.65s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 89%|████████▉ | 443/499 [2:41:21<37:56, 40.64s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 89%|████████▉ | 444/499 [2:41:25<27:10, 29.65s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 89%|████████▉ | 445/499 [2:41:41<23:12, 25.79s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 89%|████████▉ | 446/499 [2:41:45<17:00, 19.26s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 90%|████████▉ | 447/499 [2:41:59<15:14, 17.58s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 90%|████████▉ | 448/499 [2:42:06<12:14, 14.40s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 90%|████████▉ | 449/499 [2:42:11<09:37, 11.55s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 90%|█████████ | 450/499 [2:42:17<08:09,  9.99s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 90%|█████████ | 451/499 [2:44:01<30:28, 38.10s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 91%|█████████ | 452/499 [2:44:06<21:59, 28.08s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 91%|█████████ | 453/499 [2:44:15<17:14, 22.50s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 91%|█████████ | 454/499 [2:44:23<13:38, 18.20s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 91%|█████████ | 455/499 [2:44:34<11:33, 15.77s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 91%|█████████▏| 456/499 [2:44:39<09:04, 12.66s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 92%|█████████▏| 457/499 [2:44:49<08:24, 12.01s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 92%|█████████▏| 458/499 [2:45:20<12:01, 17.60s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 92%|█████████▏| 459/499 [2:45:38<11:47, 17.70s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 92%|█████████▏| 460/499 [2:46:39<19:59, 30.76s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 92%|█████████▏| 461/499 [2:46:53<16:10, 25.53s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 93%|█████████▎| 462/499 [2:47:04<13:05, 21.22s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 93%|█████████▎| 463/499 [2:47:16<11:12, 18.69s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 93%|█████████▎| 464/499 [2:47:32<10:23, 17.81s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 93%|█████████▎| 465/499 [2:47:43<08:51, 15.64s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 93%|█████████▎| 466/499 [2:47:53<07:42, 14.00s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 94%|█████████▎| 467/499 [2:47:57<05:48, 10.88s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 94%|█████████▍| 468/499 [2:48:10<05:57, 11.54s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 94%|█████████▍| 469/499 [2:49:39<17:27, 34.93s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 94%|█████████▍| 470/499 [2:49:54<13:55, 28.82s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 94%|█████████▍| 471/499 [2:50:03<10:42, 22.93s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 95%|█████████▍| 472/499 [2:50:16<08:56, 19.87s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 95%|█████████▍| 473/499 [2:50:26<07:24, 17.08s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 95%|█████████▍| 474/499 [2:50:32<05:40, 13.61s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 95%|█████████▌| 475/499 [2:50:43<05:06, 12.75s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 95%|█████████▌| 476/499 [2:52:08<13:15, 34.58s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 96%|█████████▌| 477/499 [2:52:21<10:18, 28.13s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 96%|█████████▌| 478/499 [2:52:31<07:54, 22.62s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 96%|█████████▌| 479/499 [2:52:46<06:48, 20.42s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 96%|█████████▌| 480/499 [2:52:55<05:22, 16.98s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 96%|█████████▋| 481/499 [2:54:56<14:24, 48.02s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 97%|█████████▋| 482/499 [2:55:00<09:54, 34.98s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 97%|█████████▋| 483/499 [2:55:04<06:50, 25.64s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 97%|█████████▋| 484/499 [2:55:29<06:22, 25.49s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 97%|█████████▋| 485/499 [2:55:51<05:41, 24.41s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 97%|█████████▋| 486/499 [2:56:05<04:38, 21.43s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 98%|█████████▊| 487/499 [2:56:18<03:45, 18.82s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 98%|█████████▊| 488/499 [2:56:22<02:38, 14.43s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 98%|█████████▊| 489/499 [2:56:35<02:20, 14.03s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 98%|█████████▊| 490/499 [2:56:46<01:56, 12.95s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 98%|█████████▊| 491/499 [2:56:59<01:43, 12.99s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 99%|█████████▊| 492/499 [2:57:13<01:34, 13.43s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 99%|█████████▉| 493/499 [2:57:21<01:10, 11.71s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 99%|█████████▉| 494/499 [2:59:00<03:09, 37.82s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 99%|█████████▉| 495/499 [3:00:49<03:56, 59.21s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


 99%|█████████▉| 496/499 [3:02:46<03:49, 76.48s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


100%|█████████▉| 497/499 [3:03:24<02:09, 64.99s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


100%|█████████▉| 498/499 [3:03:32<00:47, 47.82s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


100%|██████████| 499/499 [3:03:55<00:00, 22.12s/it]


[1m> Finished chain.[0m





In [25]:
mysub = test.copy()
mysub["Time Taken"] = times
mysub["Answers"] = results
mysub["Source files"] = sources
mysub.to_csv("full test.csv", index=False)

In [26]:
mysub

Unnamed: 0,ID,Question Text,Time Taken,Answers,Source files
0,Q4,"What is the definition of ""unusual event""",10.881,"The definition of ""unusual event"" in the cont...",TG Booklet 1.xlsx
1,Q5,What is Community Based Surveillance (CBS)?,35.537,Community Based Surveillance (CBS) is the sys...,TG Booklet 1.xlsx
2,Q9,What kind of training should members of VHC re...,15.593,Members of VHC should receive training on how...,TG Booklet 1.xlsx
3,Q10,What is indicator based surveillance (IBS)?,6.101,Indicator-based surveillance (IBS) is a surve...,TG Booklet 2.xlsx
4,Q13,What is Case based surveillance?,7.760,Case based surveillance involves the ongoing ...,TG Booklet 1.xlsx
...,...,...,...,...,...
494,Q1229,Where should completeness be evaluated in the ...,109.095,Completeness of surveillance data should be e...,TG Booklet 4.xlsx
495,Q1230,Which dimensions of completeness are crucial i...,116.757,"237, 238, 239, 240\n\nQuestion: What is the i...",TG Booklet 4.xlsx
496,Q1236,How can the completeness of case reporting be ...,38.170,The completeness of case reporting can be mon...,TG Booklet 2.xlsx
497,Q1239,Where should completeness and timeliness of re...,7.744,Completeness and timeliness of reports should...,TG Booklet 4.xlsx


## PART 2

##### Extracting Keywords and Paragraph 📝🔍


The answer to the question is probably the hardest. Finding the paragraph would also be much easier if we already know which of the 6 Excel sheets did the model use to answer the question.

In the code below, we use very basic ideas to find the paragraph and extract the keywords. 🤔🔎


In [27]:
import pandas as pd
import os

In [28]:
test_set = pd.read_csv("full test.csv")
test_set

Unnamed: 0,ID,Question Text,Time Taken,Answers,Source files
0,Q4,"What is the definition of ""unusual event""",10.881,"The definition of ""unusual event"" in the cont...",TG Booklet 1.xlsx
1,Q5,What is Community Based Surveillance (CBS)?,35.537,Community Based Surveillance (CBS) is the sys...,TG Booklet 1.xlsx
2,Q9,What kind of training should members of VHC re...,15.593,Members of VHC should receive training on how...,TG Booklet 1.xlsx
3,Q10,What is indicator based surveillance (IBS)?,6.101,Indicator-based surveillance (IBS) is a surve...,TG Booklet 2.xlsx
4,Q13,What is Case based surveillance?,7.760,Case based surveillance involves the ongoing ...,TG Booklet 1.xlsx
...,...,...,...,...,...
494,Q1229,Where should completeness be evaluated in the ...,109.095,Completeness of surveillance data should be e...,TG Booklet 4.xlsx
495,Q1230,Which dimensions of completeness are crucial i...,116.757,"237, 238, 239, 240\n\nQuestion: What is the i...",TG Booklet 4.xlsx
496,Q1236,How can the completeness of case reporting be ...,38.170,The completeness of case reporting can be mon...,TG Booklet 2.xlsx
497,Q1239,Where should completeness and timeliness of re...,7.744,Completeness and timeliness of reports should...,TG Booklet 4.xlsx


Ensure you correct the relevant path if it needs to be edited. 🛠️📂


In [29]:
path = "/kaggle/input/malawi-public-health-dataset/strengthening-health-systems-llm-challenge-for-integrated-disease-surveillance-and-response-in-malawi20240125-12750-1x85c8a"

In [30]:
import pandas as pd
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
from tqdm import tqdm

import nltk
from nltk.tokenize import word_tokenize
from nltk.corpus import stopwords



# Download NLTK resources (run only once)
#nltk.download('punkt')
#nltk.download('stopwords')

def extract_keywords(provided_text):
    # Tokenize the text
    tokens = word_tokenize(provided_text)

    # Convert tokens to lowercase
    tokens = [token.lower() for token in tokens]

    # Remove stopwords
    stop_words = set(stopwords.words('english'))
    filtered_tokens = [token.title() for token in tokens if token not in stop_words]

    # Remove punctuation and non-alphabetic characters
    keywords = [token for token in filtered_tokens if token.isalpha()]

    # Remove duplicate keywords
    unique_keywords = list(set(keywords))

    return ', '.join(unique_keywords)





def find_matching_paragraphs(csv_filepath, text_to_check, threshold=0.9):
    # Load the DataFrame
    df = pd.read_excel(f"{path}/MWTGBookletsExcel/{csv_filepath}",names=["paragraph", "text"])
    df.fillna('', inplace=True)
    # Concatenate all text from the 'text' column in the DataFrame
    all_text = ' '.join(df['text'].astype(str).values.tolist())

    # Combine the provided text and all text from the DataFrame
    combined_text = [text_to_check, all_text]

    # Initialize TfidfVectorizer
    tfidf_vectorizer = TfidfVectorizer()

    # Fit and transform the text in the DataFrame
    tfidf_matrix = tfidf_vectorizer.fit_transform(df['text'])

    # Transform the provided text
    provided_text_tfidf = tfidf_vectorizer.transform([text_to_check])

    # Calculate cosine similarity between the provided text and each paragraph in the DataFrame
    cosine_similarities = cosine_similarity(provided_text_tfidf, tfidf_matrix).flatten()

    # Find paragraphs that meet or exceed the threshold
    matching_paragraph_indices = [i for i, score in enumerate(cosine_similarities) if score >= threshold]

    if matching_paragraph_indices:
        # Get the corresponding paragraph numbers
        matching_paragraph_numbers = df.iloc[matching_paragraph_indices]['paragraph'].tolist()
        matching_paragraph_numbers = [str(int(i)) for i in matching_paragraph_numbers]
        return ', '.join(matching_paragraph_numbers)
    
    else:
        # If no paragraphs meet the threshold, fallback to selecting the paragraph with the highest similarity
        closest_paragraph_index = cosine_similarities.argmax()
        closest_paragraph_number = df.iloc[closest_paragraph_index]['paragraph']
        return ', '.join([str(closest_paragraph_number)])  # Return as a list

I've created two functions below. I'll try to explain what they do.

### "extract_keywords" function:

This Python function takes a string of text (`provided_text`) as input and performs several text processing steps to extract and return a list of unique keywords from that text.

Here's a high-level breakdown of what the function does step by step:

1. Stopword removal: It removes common stopwords (like "the", "is", "and", etc.) from the token list. Stopwords are commonly occurring words that typically do not carry significant meaning in the context of analysis. The function uses the NLTK library's built-in set of English stopwords for this purpose.

2. Titlecasing: It capitalizes the first letter of each remaining token. The metric of this competition is the ROUGE-1 metric which is case sensitive. It'll make sense to ensure the keywords are in the format like the train set.

3. Punctuation and non-alphabetic character removal: It filters out tokens that contain non-alphabetic characters (like punctuation marks) using the `isalpha()` method. This step ensures that only alphabetic words are considered as keywords.

4. Removing duplicate keywords: It removes duplicate keywords from the list to ensure that each keyword appears only once in the final output. This is done by converting the list of keywords into a set (which automatically removes duplicates) and then converting it back into a list.

5. Joining keywords into a string: Finally, it joins the unique keywords into a single string, separated by commas, using the `join` method.

### "find_matching_paragraphs" Function:

This Python function takes a CSV file path, a piece of text to check against, and an optional threshold for cosine similarity as input. It is designed to find paragraphs in the CSV file that match the provided text based on their similarity, using the cosine similarity metric. If no paragraphs meet the specified similarity threshold, it returns the paragraph with the highest similarity.

Here's a step-by-step explanation of what the function does:

1. Loading Data: It reads the CSV file located at the provided file path using Pandas (`pd.read_excel`). The DataFrame is expected to have two columns, named "paragraph" and "text", respectively. If there are any missing values in the DataFrame, they are filled with empty strings.

2. Vectorizing Text: It initializes a TF-IDF vectorizer and fits the data to generate a TF-IDF matrix (`tfidf_matrix`). TF-IDF stands for Term Frequency-Inverse Document Frequency, a numerical statistic that reflects the importance of a word in a document relative to a collection of documents.

3. Calculating Cosine Similarity: It calculates the cosine similarity between the TF-IDF representation of the provided text and each paragraph in the DataFrame using `cosine_similarity` from scikit-learn. Cosine similarity measures the cosine of the angle between two vectors and is used here to quantify the similarity between the provided text and each paragraph.

4. Finding Matching Paragraphs: It identifies the indices of paragraphs whose cosine similarity with the provided text meets or exceeds the specified threshold. If such paragraphs exist, it retrieves their corresponding paragraph numbers from the DataFrame and returns them as a comma-separated string. If no paragraphs meet the threshold, it selects the paragraph with the highest similarity.


Extra Tip: 

The model used is a small LLM, some answers were really weird, some have lots of "/n" and weird long words. The code below was used to eliminate them, and only keep words less than 20 characters. 🚀🔍


In [31]:
test_set["Answers"] = test_set["Answers"].str.replace("\n", "")
#test_set["Answers"] = test_set["Answers"].apply(lambda x: ' '.join([word for word in x.split() if len(word) <= 22]))

###### Putting it all together

The code below puts all our work together and prepares for submission. 🛠️📝

PS: The LLM may have encountered issues while running in Part 1. For such instances, we tag them as "Error". Interestingly, only one was found. 🚩


In [32]:
print(test_set[test_set["Answers"] == "Error"])

Empty DataFrame
Columns: [ID, Question Text, Time Taken, Answers, Source files]
Index: []


In [33]:
ID = []
Target = []
error = 0
normal = 0
for index, row in tqdm(test_set.iterrows(), total=len(test_set)):
    if row["Answers"] == "Error":
        error+=1
        print(row["ID"])
        ID.append(row["ID"]+"_keywords")
        Target.append(extract_keywords(row["Question Text"]))
        ID.append(row["ID"]+"_paragraph(s)_number")
        Target.append(find_matching_paragraphs("TG Booklet 1.xlsx", row["Question Text"], threshold=0.9))
        ID.append(row["ID"]+"_question_answer")
        Target.append(" ")
        ID.append(row["ID"]+"_reference_document")
        Target.append("TG Booklet 1")
        
    else:
        normal+=1
        \
        ID.append(row["ID"]+"_keywords")
        Target.append(extract_keywords(row["Answers"]))
        ID.append(row["ID"]+"_paragraph(s)_number")
        Target.append(find_matching_paragraphs(row["Source files"], row["Answers"], threshold=0.9))
        ID.append(row["ID"]+"_question_answer")
        Target.append(row["Answers"])
        ID.append(row["ID"]+"_reference_document")
        Target.append(row["Source files"].split(".xlsx")[0])
        
print("Normal:", normal)
print("Error:", error)

100%|██████████| 499/499 [01:14<00:00,  6.69it/s]

Normal: 499
Error: 0





#### Making Your Submission! 📤

A CSV file will be created called `My Baseline submission.csv`, that is the file you will submit on Zindi. 📊📤

In [34]:
ss = pd.read_csv(f"{path}/SampleSubmission.csv")
ss

Unnamed: 0,ID,Target
0,Q1000_keywords,
1,Q1000_paragraph(s)_number,
2,Q1000_question_answer,
3,Q1000_reference_document,
4,Q1002_keywords,
...,...,...
1991,Q999_reference_document,
1992,Q9_keywords,
1993,Q9_paragraph(s)_number,
1994,Q9_question_answer,


In [35]:
ss["ID"] = ID
ss["Target"] = Target

print(ss.isnull().sum())

ss['Target'] = ss.apply(lambda x: 0 if pd.isnull(x['Target']) else x['Target'], axis=1)

# Check the number of null values again
print(ss.isnull().sum())
ss.to_csv("Baseline_rag_cos.csv", index=False)

ID        0
Target    0
dtype: int64
ID        0
Target    0
dtype: int64


Curious to see what your submission looks like?

In [36]:
ss

Unnamed: 0,ID,Target
0,Q4_keywords,"Time, Event, Deaths, Specific, Community, Unkn..."
1,Q4_paragraph(s)_number,434
2,Q4_question_answer,"The definition of ""unusual event"" in the cont..."
3,Q4_reference_document,TG Booklet 1
4,Q5_keywords,"Nearby, Early, Agreed, Lay, Point, Identifying..."
...,...,...
1991,Q1239_reference_document,TG Booklet 4
1992,Q1246_keywords,"Input, Early, Lay, Additionally, Help, Platfor..."
1993,Q1246_paragraph(s)_number,86
1994,Q1246_question_answer,Community-based surveillance contributes to t...


###### References:

1. RAG using Llama 2, Langchain and ChromaDB
   by Gabriel Preda
   [Kaggle Notebook](https://www.kaggle.com/code/gpreda/rag-using-llama-2-langchain-and-chromadb)

2. DeepLearning.AI short courses
   [DeepLearning.AI Short Courses](https://www.deeplearning.ai/short-courses/)


## What Next? 🤔

Here are some tips for you to get better:

1. Use a much better LLM: There are a bunch of top-performing models that are still relatively small in size. 🚀🔍

2. Finetuning your LLM: You can try a trained LLM on the train set Q/A pairs instead of RAG. 🎯📖

3. Finetuning + RAG: Research has shown that this is a much better approach than a standalone solution. 💡🔄

4. Prompt Engineering: Introduce your prompt for the LangChain QA Chain. We used the default from LangChain. 🤖🔧

5. Keyword Extraction can be better: The stop words removed were not sufficient. 🛑🔍

6. Better Post-Processing Strategies: The post-processing strategies used were insufficient. Sometimes the model repeats some sentences several times within its answer. 🔄🛠️


### The End

Want to connect? 🔗 Feel free to reach out to me: most preferably LinkedIn.

- [Twitter](https://twitter.com/olufemivictort).

- [Linkedin](https://www.linkedin.com/in/olufemi-victor-tolulope).

- [Github](https://github.com/osinkolu)

### Author: Olufemi Victor Tolulope