| Version | Date     | Creator          | Change description                               |
|---------|----------|------------------|--------------------------------------------------|
| v0.05   | 07/09/23 | Jaikishan Khatri | Model comparison and tuning for better outputs   |
| v0.04   | 06/09/23 | Jaikishan Khatri | Memory integration for chat functionality        |
| v0.03   | 05/09/23 | Jaikishan Khatri | Trial of different embedding models              |
| v0.02   | 04/09/23 | Jaikishan Khatri | Generation with diff locat LLM models            |
| v0.01   | 03/09/23 | Jaikishan Khatri | Loader, Splitter, Storage, Retreival, Generation |

# QA Chatbot for parsing Harry Potter books to generate answers

## Process

According to 🦜🔗*LangChain*, process for transforming unstructured raw data into a QA chain is as follows:

1. **Loading**: We must load our data first. Numerous sources can be used to load unstructured data.
2. **Splitting**: Documents are divided into splits of a predetermined size using text splitters. 
3. **Storage**: The splits will be stored and frequently embedded in storage (such as a vectorstore).
4. **Retrieval**: The app fetches splits from storage (for instance, frequently with embeddings similar to the input query).
5. **Generation**: An LLM generates a response using a prompt that contains the query and the data that was retrieved. 
6. **Conversation** (Extension): Adds Memory to the QA chain to hold a multi-turn dialogue.

![LLM-QA-flowchart.jpeg](https://python.langchain.com/assets/images/qa_flow-9fbd91de9282eb806bda1c6db501ecec.jpeg)

Image source: [LangChain](https://python.langchain.com/assets/images/qa_flow-9fbd91de9282eb806bda1c6db501ecec.jpeg)

### Dependencies

### Imports

In [1]:
import os
import torch 

import warnings
warnings.filterwarnings("ignore")

from langchain.chains import RetrievalQA
from langchain.document_loaders import DirectoryLoader, PyPDFLoader, PyMuPDFLoader
from langchain.embeddings import HuggingFaceInstructEmbeddings
from langchain.llms import HuggingFacePipeline
from langchain.prompts import PromptTemplate
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.vectorstores import FAISS
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline

  from tqdm.autonotebook import trange
2023-09-07 20:11:26.507345: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-09-07 20:11:26.705343: I tensorflow/core/util/port.cc:104] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2023-09-07 20:11:27.553363: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory
2023-09-07 20:11:27.553491: W tensorflow/compiler/xla/stream_executor/platfor

In [2]:
torch.cuda.is_available()

True

### Configuration
Major Hyperparameters which can be used to tune the output of LLMs:  
- `temperature` is a hyperparameter that controls the randomness of language model output.  
- `top_p` also known as nucleus sampling, is another hyperparameter that controls the randomness of language model output. It sets a threshold probability and selects the top tokens whose cumulative probability exceeds the threshold. The model then randomly samples from this set of tokens to generate output.  
- `max_length`
- `repetition_penalty`
- `chunk_size` is a hyperparameter for embeddings model which changes the size of chunks created by splitting.
- `chunk_overlap` is another hyperparameter for embeddings model to have overlap between two chunks, this in turn helps in reducing data loss because of low semantic chunks created by the splitter.
- `k` is the number of document chunks to feed the LLM model as context after extracting from the retriever.

In [3]:
class CFG:
    # LLMs
    model_name = 'vicuna' # wizardlm, bloom, falcon, llama2-7b, Photolens-llama-2-7b, vicuna
    temperature = 0 # using temperature 0 or 0.1 because we don't want to go out of context
    top_p = 0.95
    repetition_penalty = 1
    
    # splitting
    split_chunk_size = 500
    split_overlap = 100
    
    # embeddings
    embeddings_model_repo = 'intfloat/multilingual-e5-large' 
    
    ### English major embedding models
    # 'sentence-transformers/multi-qa-mpnet-base-dot-v1'
    # 'sentence-transformers/all-MiniLM-L6-v2', 
    # 'sentence-transformers/multi-qa-MiniLM-L6-cos-v1'
    # 'sentence-transformers/multi-qa-distilbert-cos-v1'
    # BAAI/bge-large-en
    
    ### Spanish/multilingual embedding models
    # intfloat/multilingual-e5-large
    
    # retriever
    retriever_type = 'similarity_search' # 'similarity_search', 'MultiQueryRetriever', 'Max marginal relevance', 'SVMRetriever'
    
    # create a new vectorstore, False for using pre built vectorstore
    new_vectorstore = True
    
    # number of extracted passages
    k = 5
    
    # quantization config
#     quantization_config = BitsAndBytesConfig(llm_int8_enable_fp32_cpu_offload=True)
    
    # paths
    pdfs_path = './data/hp-books-en/'
    embeddings_path =  './data/vectorstore-en'
    Persist_directory = './data/vectorstore-en/' 
    offload_folder = './offload_folder/'
    csv_path = './model_comparison/'

In [4]:
def get_model(model_name = CFG.model_name):
    
    """ Returns the tokenizer and model for the specified model name.
    Models are currently selected based on the ability to run on local machine with 32 GB Memory and 8 GB Cuda RAM to run.
    
    Parameters:
    ----------
    model : str
        Name of the model to be used.
        
    Returns:
    -------
    tokenizer : transformers.tokenization_utils_base.PreTrainedTokenizerBase
        Tokenizer for the specified model.
    model : transformers.modeling_utils.PreTrainedModel
        Model for the specified model.
    max_len : int
        Maximum length of the input sequence for the specified model.
    """
    if model_name == 'vicuna':
        model_repo = 'lmsys/vicuna-7b-v1.3'
        
        model_tokenizer = AutoTokenizer.from_pretrained(model_repo, use_fast=True)
        
        model_name = AutoModelForCausalLM.from_pretrained(
            model_repo,
            load_in_4bit=True,
            device_map='auto',
            torch_dtype=torch.float16,
            low_cpu_mem_usage=True,
            trust_remote_code=True
        )
        
        max_length = 4096    
        
    elif model_name == 'wizardlm':
        model_repo = 'TheBloke/wizardLM-7B-HF'
        
        model_tokenizer = AutoTokenizer.from_pretrained(model_repo)

        model_name = AutoModelForCausalLM.from_pretrained(
            model_repo,
            load_in_4bit=True,
            device_map='auto',
            torch_dtype=torch.float16,
            low_cpu_mem_usage=True
        )
        
        max_length = 4096  
        
    elif model_name == 'wizardlm-13b':
        model_repo = 'WizardLM/WizardLM-13B-V1.2'
        
        model_tokenizer = AutoTokenizer.from_pretrained(model_repo, use_fast=True)
        
        model_name = AutoModelForCausalLM.from_pretrained(
            model_repo,
            load_in_4bit=True,
            device_map='auto',
            torch_dtype=torch.float16,
            low_cpu_mem_usage=True,
            trust_remote_code=True
        )
        
        max_length = 4096
        
    elif model_name == 'Photolens-llama-2-7b':
        model_repo = 'Photolens/llama-2-7b-langchain-chat'
        
        model_tokenizer = AutoTokenizer.from_pretrained(model_repo, use_fast=True)
        
        model_name = AutoModelForCausalLM.from_pretrained(
            model_repo,
            load_in_4bit=True,
            device_map='auto',
            torch_dtype=torch.float16,
            low_cpu_mem_usage=True,
            trust_remote_code=True
        )
        
        max_length = 4096
        
    elif model_name == 'llama2-7b':
        model_repo = 'daryl149/llama-2-7b-chat-hf'
        
        model_tokenizer = AutoTokenizer.from_pretrained(model_repo, use_fast=True)

        model_name = AutoModelForCausalLM.from_pretrained(
            model_repo,
            load_in_4bit=True,
            device_map='auto',
            torch_dtype=torch.float16,
            low_cpu_mem_usage=True,
            trust_remote_code=True
        )
        
        max_length = 4096

    elif model_name == 'bloom':
        model_repo = 'bigscience/bloom-7b1'
        
        model_tokenizer = AutoTokenizer.from_pretrained(model_repo)

        model_name = AutoModelForCausalLM.from_pretrained(
            model_repo,
            load_in_4bit=True,
            device_map='auto',
            torch_dtype=torch.float16,
            low_cpu_mem_usage=True,
        )
        
        max_length = 4096

    elif model_name == 'falcon':
        model_repo = 'h2oai/h2ogpt-gm-oasst1-en-2048-falcon-7b-v2'
        
        model_tokenizer = AutoTokenizer.from_pretrained(model_repo)

        model_name = AutoModelForCausalLM.from_pretrained(
            model_repo,
            load_in_4bit=True,
            device_map='auto',
            torch_dtype=torch.float16,
            low_cpu_mem_usage=True,
            trust_remote_code=True
        )
        
        max_length = 4096

    else:
        raise ValueError('Incorrect Model Name')

    return model_tokenizer, model_name, max_length

In [5]:
%%time

tokenizer, model, max_len = get_model(model_name= CFG.model_name)


Downloading model:  vicuna 


You are using the default legacy behaviour of the <class 'transformers.models.llama.tokenization_llama.LlamaTokenizer'>. If you see this, DO NOT PANIC! This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=True`. This should only be set if you understand what it means, and thouroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565
Widget Javascript not detected.  It may not be installed or enabled properly. Reconnecting the current kernel may help.



CPU times: user 5.81 s, sys: 7.74 s, total: 13.5 s
Wall time: 20.7 s


### Pipeline

Create a pipeline for the model using HuggingFacePipeline

In [6]:
pipe = pipeline(
    task = "text-generation",
    model = model,
    tokenizer = tokenizer,
    pad_token_id = tokenizer.eos_token_id,
    max_length = max_len,
    temperature = CFG.temperature,
    top_p = CFG.top_p,
    repetition_penalty = CFG.repetition_penalty
)

llm = HuggingFacePipeline(pipeline = pipe)

In [7]:
llm

HuggingFacePipeline(cache=None, verbose=False, callbacks=None, callback_manager=None, tags=None, metadata=None, pipeline=<transformers.pipelines.text_generation.TextGenerationPipeline object at 0x7fb7f36d5de0>, model_id='gpt2', model_kwargs=None, pipeline_kwargs=None)

### Step 1: Loading

- Load data (here Harry Potter PDF books) and parse data from a directory and convert them into text.  
- Using `DirectoryLoader` to load the directory of the PDF documents.  
- Benefits of using `PyPDFLoader` is that it creates chunks at character level and also stores page numbers in metadata which can be used to reference the source files.
- Loading can be done by multiple ways as mentioned in the [LangChain Document Loaders](https://python.langchain.com/docs/modules/data_connection/document_loaders.html) section.

**Note:**
The LangChain integration portal currently has [157 Document Loaders](https://integrations.langchain.com/). Each loader produces a LangChain Document as the data output.

In [38]:
def get_raw_pdf(pdfs_path):
    """ Loads PDF documents from a directory and converts them into text.
     
    Parameters
    ----------
    pdfs_path : str
        Path to the directory containing PDF documents.
        
    Returns
    -------
    documents : list
        List of LangChain Documents.
    """
    
    loader = DirectoryLoader(
        pdfs_path,
        loader_cls=PyPDFLoader,
        glob="./*.pdf",
        show_progress=True,
        use_multithreading=True
    )
    documents = loader.load()
    return documents

### Step 2: Splitter

- Split the text up into small, semantically meaningful chunks (often sentences) of predefined sizes. This helps in creating smaller batches of data to be embedded in VectorStore. 
- These semantically related pieces of text are stored closer to each other for better extraction.
- Benefit of using `RecursiveCharacterTextSplitter` is that it splits text by recursively looking at characters. Recursively tries to split by different characters to find one that works.


In [9]:
def get_document_chunks(documents, split_chunk_size: int=500, split_overlap: int=0):
    
    """ Splits the documents into chunks of predefined size.
    
    Parameters
    ----------
    documents : list
        List of LangChain Documents.
    split_chunk_size : int
        Size of the chunks to be created from the documents.
    split_overlap : int
        Overlap between two chunks.
            
    Returns
    -------
    doc_chunks : list
        List of LangChain Documents.   
    """
        
    text_splitter = RecursiveCharacterTextSplitter(
        chunk_size=split_chunk_size,
        chunk_overlap=split_overlap,
        separators = ["\n\n", "\n", "\t", " ", ""],
        length_function=len,
        add_start_index=True,
    )
    doc_chunks = text_splitter.split_documents(documents)
    return doc_chunks

### Step 3: Store

- Embedding models are used to embedded sentences (Performance Sentence Embeddings) and to embedded search queries & paragraphs (Performance Semantic Search).  
- The current embedding model was selected from the list of best [pre-trained sentence transformers](https://www.sbert.net/docs/pretrained_models.html#sentence-embedding-models) which are hosted on HuggingFace.
- The benefit of using [FAISS vector store](https://python.langchain.com/docs/integrations/vectorstores/faiss)  is that it can use GPU for constructing embeddings which is faster that many other Vector Stores. It also stores the embeddings in local directory to be used by the models locally.

**Note:**
The LangChain integration portal currently has [53 VectorStores](https://integrations.langchain.com/vectorstores) and [38 Embedding Models](https://integrations.langchain.com/embeddings).

In [10]:
def _create_new_vectorstore(pdfs_path: str, split_chunk_size: int, split_overlap: int, 
                            embeddings_model_repo: str, embeddings_path: str = './data/vectorstore-en'):
    """ Creates a new vectorstore and embeddings model. 
    
    Parameters
    ----------
    pdfs_path : str
        Path to the directory containing PDF documents.
    split_chunk_size : int
        Size of the chunks to be created from the documents.
    split_overlap : int
        Overlap between two chunks.
    embeddings_model_repo : str
        Name of the embeddings model to be used.
    embeddings_path : str
        Path to the directory where the vectorstore is to be stored.
        
    Returns
    -------
    vectorstore : langchain.vectorstores.faiss.FAISS
        Vectorstore containing the embeddings of the documents.
    """

    # load PDF documents
    documents = get_raw_pdf(pdfs_path)
    
    # split them into chunks
    doc_chunks = get_document_chunks(documents, split_chunk_size, split_overlap)
    
    # create embeddings model
    embeddings = HuggingFaceInstructEmbeddings(model_name = embeddings_model_repo,
                                               model_kwargs = {"device": "cuda"})

    # create new_vectorstore
    vectorstore = FAISS.from_documents(documents = doc_chunks, 
                                       embedding = embeddings)

    # persist vector database
    vectorstore.save_local(embeddings_path)
    
    return vectorstore, embeddings

def _load_prev_vectorstore(embeddings_model_repo, embeddings_path):
    """ Loads a previously created vectorstore and embeddings model.
    
    Parameters
    ----------
    embeddings_model_repo : str
        Name of the embeddings model to be used.
    embeddings_path : str
        Path to the directory where the vectorstore is to be stored.
                
    Returns
    -------
    vectorstore : langchain.vectorstores.faiss.FAISS
        Vectorstore containing the embeddings of the documents.
    """
    
    # download embeddings model
    model_embeddings = HuggingFaceInstructEmbeddings(model_name = embeddings_model_repo,
                                                     model_kwargs = {"device": "cuda"})

    # load vectorstore and embeddings
    vec_store = FAISS.load_local(embeddings_path, model_embeddings)
    
    return vec_store, model_embeddings


def create_vectorstore_embeddings(pdfs_path: str, split_chunk_size, split_overlap, 
                                  embeddings_model_repo: str, embeddings_path: str = './data/vectorstore-en',
                                  new_vectorstore: bool=False):
    
    """ Creates a new vectorstore and embeddings model if new_vectorstore is True else loads a previously created vectorstore and embeddings model.
    
    Parameters
    ----------
    pdfs_path : str
        Path to the directory containing PDF documents.
    split_chunk_size : int
        Size of the chunks to be created from the documents.
    split_overlap : int
        Overlap between two chunks.
    embeddings_model_repo : str
        Name of the embeddings model to be used.
    embeddings_path : str
        Path to the directory where the vectorstore is to be stored.
    new_vectorstore : bool
        If True, creates a new vectorstore and embeddings model else loads a previously created vectorstore and embeddings model.
        
    Returns
    -------
    vectorstore : langchain.vectorstores.faiss.FAISS
        Vectorstore containing the embeddings of the documents.
    """
    
    if not new_vectorstore:
        if os.path.isfile('./data/vectorstore-en/index.faiss'):
            vec_store, model_embeddings = _load_prev_vectorstore(embeddings_model_repo, embeddings_path)
            
        else:
            vec_store, model_embeddings = _create_new_vectorstore(pdfs_path, split_chunk_size, split_overlap, embeddings_model_repo, embeddings_path)
    
    else:
        vec_store, model_embeddings = _create_new_vectorstore(pdfs_path, split_chunk_size, split_overlap, embeddings_model_repo, embeddings_path)
     
        
    return vec_store, model_embeddings

In [11]:
vectorstore, embeddings = create_vectorstore_embeddings(CFG.pdfs_path,
                                                        CFG.split_chunk_size,
                                                        CFG.split_overlap,
                                                        CFG.embeddings_model_repo,
                                                        CFG.embeddings_path,
                                                        new_vectorstore=CFG.new_vectorstore)

100%|██████████| 7/7 [00:34<00:00,  4.93s/it]


load INSTRUCTOR_Transformer
max_seq_length  512


### Step 4. Retrieve
- A retriever is an interface that returns documents given an unstructured query. Vector Stores can be taken as retrievers to retrieve relevant documents.

- Different types of retrieval methods include Similarity search, MultiQueryRetriever, Max marginal relevance, SVMRetriever  

In [13]:
retriever = vectorstore.as_retriever(search_kwargs = {"k": CFG.k, "search_type" : "similarity"})

### Custom Prompt

- Prompt engineering is the process of structuring text that can be interpreted and understood by a generative AI model. It is a part of tuning methodology for generating better outputs from LLMs. .  
- The context is extracted from the Retriever and passed into the `context` variable and the query or user question is passed into `question` variable and passed through `PromptTemplate` to create a custom prompt.


In [14]:
prompt_template = """Use the following pieces of context to answer the question at the end.
If you don't know the answer, just say that you don't know, don't try to make up an answer.
Answer in the same language the question was asked.

{context}

Question: {question}
Answer:"""


PROMPT = PromptTemplate(
    template = prompt_template, 
    input_variables = ["context", "question"]
)

### Step 5: Generate
Retriever chain for QA
- The LangChain integration portal currently has 68 LLMs and 13 Chat Models.

In [15]:
# from langchain.llms import GPT4All

# llm_gpt = GPT4All(model='./models/nous-hermes-13b.ggmlv3.q4_0', max_tokens=4096, n_threads = 12)

# nous-hermes-13b.ggmlv3.q4_0
# GPT4All-13B-snoozy.ggmlv3.q4_0
# ggml-gpt4all-j-v1.3-groovy

qa_chain = RetrievalQA.from_chain_type(
    llm = llm,
    chain_type = "stuff", # map_reduce, map_rerank, stuff, refine
    retriever = retriever, 
    chain_type_kwargs = {"prompt": PROMPT},
    return_source_documents = True,
    verbose = False
)

### Compare models

In [16]:
def compare_model_ans(user_query, model_answer, answer_dict):
    """ Compares the answers from different models and stores them in a dictionary.
    
    Parameters
    ----------
    user_query : str
        Query or question asked by the user.    
    model_answer : dict
        Answer returned by the model.
    answer_dict : dict
        Dictionary to store the answers from different models.
        
    Returns
    -------
    ans_dict : dict
        Dictionary with answers from different models.
    """
    
    if answer_dict is None:
        answer_dict = {user_query: model_answer['result']}
    else:
        answer_dict = {**answer_dict, **{user_query: model_answer['result']}}
    return answer_dict

In [17]:
%%time

ans_dict = None

query = "Which are Hagrid's favorite animals?"
ans = qa_chain(query)

ans_dict = compare_model_ans(query, ans, ans_dict)
print(ans['result'])

 Hagrid's favorite animals are the Kneazles and the dragons.
CPU times: user 1.49 s, sys: 332 ms, total: 1.82 s
Wall time: 2.06 s


In [18]:
%%time

query = "Which challenges does Harry face during the Triwizard Tournament?"

ans = qa_chain(query)
ans_dict = compare_model_ans(query, ans, ans_dict)
print(ans['result'])

 Harry faces three challenges during the Triwizard Tournament: navigating the swamp, overcoming the dragon, and solving the riddle.
CPU times: user 2.76 s, sys: 541 ms, total: 3.3 s
Wall time: 3.35 s


In [19]:
%%time

query = "Give me 5 examples of cool potions and explain what they do"
ans = qa_chain(query)

ans_dict = compare_model_ans(query, ans, ans_dict)
print(ans['result'])



1. Polyjuice Potion: This potion allows the drinker to transform into anyone they can see in their mind. It's used by Harry in Harry Potter and the Goblet of Fire to impersonate another student during the Triwizard Tournament.
2. Felix Felicis: This potion, also known as "liquid luck," is consumed to enhance one's luck and increase their chances of success. It's featured in Harry Potter and the Potions Master, where Harry drinks it to pass his practical exam.
3. Unicorn Blood Potion: This potion is used to heal wounds and cure illnesses. It's mentioned in Harry Potter and the Philosopher's Stone when Harry drinks it to recover from the Dursleys' abuse.
4. Skele-Gro: This potion is used to revive the dead. It's mentioned in Harry Potter and the Goblet of Fire when Harry uses it to bring back the mischievous ghost, Moaning Myrtle.
5. Amortentia: This potion is used to create a scent that represents one's deepest desires. It's featured in Harry Potter and the Half-Blood Prince when Harr

In [20]:
%%time

query = "Name all seven Weasley children."
ans = qa_chain(query)

ans_dict = compare_model_ans(query, ans, ans_dict)
print(ans['result'])

 George, Fred, Ron, Ginny, Bill, Charlie, and Percy.
CPU times: user 1.38 s, sys: 350 ms, total: 1.73 s
Wall time: 1.74 s


In [21]:
%%time

query = "Moony, Wormtail, Padfoot, and Prongs are code names for which four characters?"
ans = qa_chain(query)

ans_dict = compare_model_ans(query, ans, ans_dict)
print(ans['result'])

 Moony, Wormtail, Padfoot, and Prongs are code names for the four Marauders: James Potter, Sirius Black, Peter Pettigrew, and Remus Lupin.
CPU times: user 2.82 s, sys: 661 ms, total: 3.48 s
Wall time: 3.48 s


In [37]:
%%time

query = "What position does Harry play on the Gryffindor Quidditch team?"
ans = qa_chain(query)

ans_dict = compare_model_ans(query, ans, ans_dict)
print(ans['result'])

OutOfMemoryError: CUDA out of memory. Tried to allocate 18.00 MiB (GPU 0; 8.00 GiB total capacity; 7.12 GiB already allocated; 0 bytes free; 7.32 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

In [33]:
%%time

query = "Name the three different types of balls used in Quidditch."
ans = qa_chain(query)

ans_dict = compare_model_ans(query, ans, ans_dict)
print(ans['result'])

 The three different types of balls used in Quidditch are the Quaffle, the Bludger, and the Golden Snitch.
CPU times: user 2.47 s, sys: 842 ms, total: 3.32 s
Wall time: 3.32 s


In [24]:
%%time

query = "What is Hermione's cat's name?"
ans = qa_chain(query)

ans_dict = compare_model_ans(query, ans, ans_dict)
print(ans['result'])

 Crookshanks
CPU times: user 919 ms, sys: 230 ms, total: 1.15 s
Wall time: 1.15 s


#### Out of scope questions

In [25]:
%%time

query = "What did Gandalf do in the story?"
ans = qa_chain(query)

ans_dict = compare_model_ans(query, ans, ans_dict)
print(ans['result'])

 Gandalf was killed by a group of dark creatures in the story.
CPU times: user 1.14 s, sys: 290 ms, total: 1.43 s
Wall time: 1.43 s


In [26]:
%%time

query = "Which insect is Ron afraid of?"
ans = qa_chain(query)

ans_dict = compare_model_ans(query, ans, ans_dict)
print(ans['result'])


 Ron is afraid of spiders.
CPU times: user 1.31 s, sys: 531 ms, total: 1.84 s
Wall time: 1.84 s


In [27]:
%%time

query = "Who killed Dobby?"
ans = qa_chain(query)

ans_dict = compare_model_ans(query, ans, ans_dict)
print(ans['result'])

 It is not specified who killed Dobby in the given context.
CPU times: user 1.33 s, sys: 350 ms, total: 1.69 s
Wall time: 1.68 s


In [28]:
%%time

query = "How many players are on a Quidditch team?"
ans = qa_chain(query)

ans_dict = compare_model_ans(query, ans, ans_dict)
print(ans['result'])


 There are seven players on a Quidditch team.
CPU times: user 2.53 s, sys: 751 ms, total: 3.29 s
Wall time: 3.29 s


In [29]:
%%time

query = "How many possible Quidditch fouls are there?"
ans = qa_chain(query)

ans_dict = compare_model_ans(query, ans, ans_dict)
print(ans['result'])

 There are seven Quidditch fouls.
CPU times: user 2.69 s, sys: 1.62 s, total: 4.3 s
Wall time: 4.3 s


#### Different language questions

In [30]:
%%time

query = "¿Cuál es la profesión de los padres de Harry Potter?"
ans = qa_chain(query)

ans_dict = compare_model_ans(query, ans, ans_dict)
print(ans['result'])

 Los padres de Harry Potter son muggles, personas sin habilidades mágicas.
CPU times: user 1.56 s, sys: 441 ms, total: 2 s
Wall time: 2 s


## Export CSV Results

In [31]:
import pandas as pd

# exporting csv of result answers for model comparison
def export_results_to_csv(answer_dict, csv_path, model_name, embeddings_model_repo, temp, top_p, r_penalty, chunk_size, overlap):
    
    ans_df = pd.DataFrame.from_dict([answer_dict])

    embeddings_model_repo = embeddings_model_repo.replace('/', '--')

    excel = ans_df.to_csv(csv_path 
                          + model_name + '_' 
                          + embeddings_model_repo + '_(' 
                          + str(temp) + '_'
                          + str(top_p) + '_'
                          + str(r_penalty) + '_'
                          + str(chunk_size) + '_'
                          + str(overlap) + ')'
                          + '.csv', index=False)
    
    print('Model results saved at '
          + csv_path
          + ' with the name '
          + model_name + '_'
          + embeddings_model_repo + '_('
          + str(temp) + '_'
          + str(top_p) + '_'
          + str(r_penalty) + '_'
          + str(chunk_size) + '_'
          + str(overlap) + ')'
          + '.csv')
    
    return 

In [32]:
export_results_to_csv(ans_dict,
                      CFG.csv_path,
                      CFG.model_name, 
                      CFG.embeddings_model_repo, 
                      CFG.temperature, 
                      CFG.top_p, 
                      CFG.repetition_penalty,
                      CFG.split_chunk_size,
                      CFG.split_overlap
                     )

Model results saved at ./model_comparison/ with the name vicuna_intfloat--multilingual-e5-large_(0_0.95_1_500_100).csv


### Conversational Chatbot