# Text mining & Natural language processing
# Author:
- Basma EL BAALI
- Yasmina M'NASRI

## Overview:
In this project, we aim to build a RAG to generate a chatbot that is able to answer all questions related to Twisted series.
### Part 1:
- Practice with Mistral model.
- Use Fireworks API.
- Use [FAISS vector store](https://python.langchain.com/docs/integrations/vectorstores/faiss) to store text embeddings created with [Sentence Transformers](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2) from  hugging face🤗.
-  Use [Retrieval chain](https://python.langchain.com/docs/modules/data_connection/retrievers/) to retrieve relevant passages from embedded text.

### Part 2:

- Flexible and customizable RAG pipeline (Retrieval Augmented Generation).
- Experiment with various LLMs (Large Language Models). In this part, we won'tbe using Fireworks to get models.
- RetrievalQA: Chain for question-answering
- Post-process outputs.


In [1]:
#checking GPU
! nvidia-smi -L

GPU 0: Tesla T4 (UUID: GPU-0963b2df-a913-8bdf-c939-65aa0751924f)


# Installs

Before getting started, let's first install the libraries we will use:

In [2]:
%%time

from IPython.display import clear_output

! pip install sentence_transformers==2.2.2

! pip install -qq -U langchain
! pip install -qq -U tiktoken
! pip install -qq -U pypdf
! pip install -qq -U faiss-gpu
! pip install -qq -U InstructorEmbedding

! pip install -qq -U transformers
! pip install -qq -U accelerate
! pip install -qq -U bitsandbytes


clear_output()

CPU times: user 497 ms, sys: 58.2 ms, total: 556 ms
Wall time: 1min 17s


In [3]:
%%capture
!pip install chromadb tqdm fireworks-ai python-dotenv pandas
!pip install sentence-transformers

# Imports

In [4]:
%%time

import warnings
warnings.filterwarnings("ignore")

import fireworks.client
import os
import dotenv
import glob
import textwrap
import time
import json
from tqdm.auto import tqdm
import pandas as pd
import random
from google.colab import userdata
import chromadb


import langchain

# loaders
from langchain.document_loaders import PyPDFLoader
from langchain.document_loaders import DirectoryLoader

# splits
from langchain.text_splitter import RecursiveCharacterTextSplitter

# prompts
from langchain import PromptTemplate, LLMChain

# vector stores
from langchain.vectorstores import FAISS

# models
from langchain.llms import HuggingFacePipeline
from langchain.embeddings import HuggingFaceInstructEmbeddings

# retrievers
from langchain.chains import RetrievalQA

import torch
import transformers
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline

print('langchain:', langchain.__version__)
print('torch:', torch.__version__)
print('transformers:', transformers.__version__)

langchain: 0.1.6
torch: 2.1.0+cu121
transformers: 4.37.2
CPU times: user 11 s, sys: 1.25 s, total: 12.2 s
Wall time: 18.6 s


*Before* continuing, you need to obtain a Fireworks API Key to use the Mistral 7B model.

Checkout this quick guide to obtain your Fireworks API Key: https://readme.fireworks.ai/docs

In [5]:
# Add the name of your key
fireworks.client.api_key = userdata.get("FIREWORKS_API_KEY")

## Part *1*
### Getting Started
Let's define a function to get completions from the Fireworks inference platform.

In [6]:
def get_completion(prompt, model=None, max_tokens=50):

    fw_model_dir = "accounts/fireworks/models/"

    if model is None:
        model = fw_model_dir + "llama-v2-70b"
    else:
        model = fw_model_dir + model

    completion = fireworks.client.Completion.create(
        model=model,
        prompt=prompt,
        max_tokens=max_tokens,
        temperature=0
    )

    return completion.choices[0].text

## RAG Use Case:
### Twisted Tales Chatbot :  building a chatbot that can answer questions about Twisted book series:

The "Twisted Tales Chatbot" is an innovative project designed to create an interactive chatbot leveraging the remarkable capabilities of the Retrieval-Augmented Generation (RAG) model. This chatbot aims to serve as an ultimate guide and companion for fans of Anna Huang's "Twisted" series, providing detailed insights, summaries, character analyses, and much more, all derived from the rich dataset of the Twisted books.

At the heart of this project lies the "Twisted" series by Anna Huang, encompassing dark romance novels celebrated for their intense narratives and complex character dynamics. The books, known for their exploration of dark themes and emotional growth within power-laden relationships, provide a fertile ground for developing a chatbot that can engage users in deep, meaningful conversations about the series.

Utilizing the RAG model, the chatbot will be capable of retrieving information from the Twisted series dataset and generating responses that feel natural and informed. This approach allows for an adaptive and contextually aware system, capable of handling a wide range of queries about the plot, characters, themes, and even interpretations of specific events within the books.


- Before adding The twisted series, we'll ask the model some questions related to these books and we'll see if it's able to answer it.




In [7]:
get_completion("Tell me who is Ana Huang", model="mistral-7b-instruct-4k")

'?\nAna Huang is a Taiwanese-American actress and model. She was born on July 1, 1989, in Los Angeles, California, USA. She is best known for her roles in the TV series "'

In [8]:
get_completion("tell what are twisted books", model="mistral-7b-instruct-4k")

'?\nAnswer: Twisted books are books that have been intentionally distorted or manipulated in some way to make them appear to be something they are not. This can include books that have been bound in a way that makes them look like'

In [9]:
get_completion("what does the book twisted games talk about",model="mistral-7b-instruct-4k")

'?\nAnswer: Twisted Games is a book written by author James Patterson. It tells the story of a group of high school students who become involved in a series of dangerous and twisted games, each one more deadly than the last.'

In [10]:
get_completion("who wrote Twisted Games", model="mistral-7b-instruct-4k")

'?\n\nTwisted Games is a novel written by author and screenwriter, David S. Goyer. It was first published in 2003 and has since been adapted into a television series.'

In [11]:
get_completion("tell me the latest books Ana Huang wrote", model="mistral-7b-instruct-4k")

'\n\nAna Huang is a Chinese-American author who has written several books, including:\n\n1. "The Last Empress" (2018) - a historical fiction novel set in China during the Qing dynasty'

In [12]:
get_completion("what does the book Twisted lies by Ana Huang talk about?", model="mistral-7b-instruct-4k")

'\nAnswer: Twisted Lies is a psychological thriller novel by Ana Huang. It follows the story of a woman named Emily who is dealing with a traumatic past and a husband who is hiding secrets. The novel explores themes'

In [13]:
get_completion("who is rhys larsen in twisted games?", model="mistral-7b-instruct-4k")

'\n\n## Answer (1)\n\nRhys Larsen is a character in the Twisted Games series of interactive fiction games. He is the main character in the first game, Twisted Games: The Darkest Secrets.\n\n'

As you can see the model doesn't give correct answers. It even mistakes the book "Twisted Games" for a thriller novel when it's romance! Also, in the last question, the model mistakes the book series as a movie series . Which is understandable since the books have only been published recently:
- Twisted Love (Book 1) - Published in 2021.
- Twisted Games (Book 2) - Published in 2021.
- Twisted Hate (Book 3) - Published in 2022.
- Twisted Lies (Book 4) - Published in 2023.

- Now, we'll create a folder that contains the 4 books as pdfs. We'll also add another one that contains the transformers.
- P.s: We named the folder of datasets "Twisted" and the folder of transformer "faiss-hp-sentence-transformers". So, when compiling the notebook, either keep the same name of the folders or modify the paths that are defined in the class CFG.

In [14]:
sorted(glob.glob('/content/Twisted/*'))

[]

# CFG

- CFG class enables easy and organized experimentation

In [15]:
class CFG:
    # LLMs
    model_name = 'mistral-7b-instruct-4k' #,llama2-7b-chat, wizardlm, llama2-7b-chat, llama2-13b-chat, mistral-7B
    temperature = 0,
    top_p = 0.95,
    repetition_penalty = 1.15

    # splitting
    split_chunk_size = 800
    split_overlap = 0

    # embeddings
    # We will be using SentenceTransformer for generating embeddings
    embeddings_model_repo = 'sentence-transformers/all-MiniLM-L6-v2'
    # similar passages
    k = 4

    # paths
    PDFs_path = '/content/Twisted/'
    Embeddings_path =  '/content/faiss-hp-sentence-transformers'
    Persist_directory = '/content/Twisted-vectordb'

Now, we'll unzip the files that we'll work with:

---
We'll unzip the transformers and the books.




In [16]:
import zipfile
zip
#code to unzip the books and transformers
zip_file_path = '/content/faiss-hp-sentence-transformers.zip'
# Path to the directory where to extract the files
extract_dir = '/content/faiss-hp-sentence-transformers/'

# Extraction of .zip files
with zipfile.ZipFile(zip_file_path, 'r') as zip_ref:
    zip_ref.extractall(extract_dir)

##Retrieve all file paths in the specified extract directory
extracted_files = glob.glob(extract_dir + '*')
## Chemin vers le fichier zip
zip_file_path = '/content/Twisted.zip'
##Path to the directory where to extract the files
extract_dir = '/content/Twisted/'
## Extraction du fichier zip
with zipfile.ZipFile(zip_file_path, 'r') as zip_ref:
    zip_ref.extractall(extract_dir)
##Affichage des fichiers extraits
extracted_files = glob.glob(extract_dir + '*')
print("Fichiers extraits :", extracted_files)


Fichiers extraits : ['/content/Twisted/Twisted Hate An Enemies with Benefits Romance (Ana Huang)-1.pdf', '/content/Twisted/Twisted-Love-1.pdf', '/content/Twisted/Twisted Games.pdf', '/content/Twisted/Twisted Lies A Fake Dating Romance (Ana Huang).pdf']


# 🦜🔗 Langchain

- Multiple document retriever with LangChain

In [17]:
CFG.model_name

'mistral-7b-instruct-4k'

# Loader

- [Directory loader](https://python.langchain.com/docs/modules/data_connection/document_loaders/file_directory) for multiple files

- This step is necessary since we are creating embeddings. In this case, we need to:
    - load de PDF files
    - split into chunks
    - create embeddings
    - save the embeddings in a vector store
    - After that, we can just load the saved embeddings to do similarity search with the user query


In [18]:
CFG.PDFs_path

'/content/Twisted/'

In [19]:
%%time

loader = DirectoryLoader(
    CFG.PDFs_path,
    glob="./*.pdf",
    loader_cls=PyPDFLoader,
    show_progress=True,
    use_multithreading=True
)

documents = loader.load()

100%|██████████| 4/4 [01:15<00:00, 18.98s/it]

CPU times: user 1min 11s, sys: 551 ms, total: 1min 11s
Wall time: 1min 15s





In [20]:
print(f'We have {len(documents)} pages in total')

We have 1680 pages in total


In [21]:
#check the content of a document
documents[9].page_content

'He hates her ...almost as much as he wants her .\nGorgeous, cocky , and fast on his way to becoming a hotshot doctor , Josh\nChen has never met a woman he couldn’ t charm—except for Jules f**king\nAmbrose.\nThe beautiful redhead has been  a thorn in his side since they met, but she\nalso consumes his thoughts in a way no woman ever has.\nWhen their animosity explodes into one unfor gettable night, he proposes a\nsolution that’ll get her out of his system once and for all: an enemies with\nbenefits arrangement with simple rules.\nNo jealousy .\nNo strings attached.\nAnd absolutely no falling in love.\n**\nOutgoing and ambitious, Jules Ambrose is a former party girl who’ s\nfocused on one thing: passing the attorney’ s bar exam.\nThe last thing she needs is to get involved with a doctor who puts the\nSUFFER in insuf ferable…no matter how good-looking he is.\nBut the more she gets to know him, the more she realizes there’ s more than\nmeets the eye to the man she’ s hated for so long.\nH

# Splitter

- Splitting the text into chunks so its passages are easily searchable for similarity

- [RecursiveCharacterTextSplitter](https://python.langchain.com/en/latest/reference/modules/document_loaders.html?highlight=RecursiveCharacterTextSplitter#langchain.document_loaders.MWDumpLoader)

In [22]:
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size = CFG.split_chunk_size,
    chunk_overlap = CFG.split_overlap
)

texts = text_splitter.split_documents(documents)

print(f'We have created {len(texts)} chunks from {len(documents)} pages')

We have created 4231 chunks from 1680 pages


# Create Embeddings


- Embedd and store the texts in a Vector database (FAISS)
- [LangChain Vector Stores docs](https://python.langchain.com/docs/modules/data_connection/vectorstores/)
- [FAISS - langchain](https://python.langchain.com/docs/integrations/vectorstores/faiss)
- [Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks - paper Aug/2019](https://arxiv.org/pdf/1908.10084.pdf)
- [This is a nice 4 minutes video about vector stores](https://www.youtube.com/watch?v=dN0lsF2cvm4)
- [Chroma - Persist and load the vector database](https://python.langchain.com/en/latest/modules/indexes/vectorstores/examples/chroma.html)

___

- If we use Chroma vector store it will take ~35 min to create embeddings
- If we use FAISS vector store on GPU it will take just ~1 min

___

We need to create the embeddings only once, and then we can just load the vector store and query the database using similarity search.

Loading the embeddings takes only a few seconds.


In [23]:
 %%time

 ### download embeddings model
 embeddings = HuggingFaceInstructEmbeddings(
     model_name = CFG.embeddings_model_repo,
     model_kwargs = {"device": "cuda"} #if you are only using CPUs, comment this line so the cell could run
 )

 ### create embeddings and DB
 vectordb = FAISS.from_documents(
     documents = texts,
     embedding = embeddings
 )

 ### persist vector database
vectordb.save_local("faiss_index_hp")

.gitattributes:   0%|          | 0.00/1.18k [00:00<?, ?B/s]

1_Pooling/config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/10.6k [00:00<?, ?B/s]

config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

data_config.json:   0%|          | 0.00/39.3k [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

train_script.py:   0%|          | 0.00/13.2k [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

load INSTRUCTOR_Transformer
max_seq_length  512
CPU times: user 12.7 s, sys: 799 ms, total: 13.5 s
Wall time: 19.9 s


In [24]:
### test if vectordb was loaded correctly
vectordb.similarity_search('Hate')

[Document(page_content='“I don’t hate Eldorra.” The country came with a lot of baggage for me,', metadata={'source': '/content/Twisted/Twisted Games.pdf', 'page': 170}),
 Document(page_content='found out the truth, she would hate me.\nWhich was why she could never find out.', metadata={'source': '/content/Twisted/Twisted Lies A Fake Dating Romance (Ana Huang).pdf', 'page': 332}),
 Document(page_content='self-doubt and imposter syndrom e would hit, and I’d abandon it for another', metadata={'source': '/content/Twisted/Twisted Lies A Fake Dating Romance (Ana Huang).pdf', 'page': 161}),
 Document(page_content='A grim half-smile touched Rhys’s lips. “I’d rather you hate me alive than\nlove me dead.” He released my shoulders. “Get dressed. We’re leaving.”\nThe door shut behind him.\nI could finally breathe easy again, but I couldn’t stop his words from\nechoing in my mind.\nI’d rather you hate me alive than love me dead.\nThe problem was, I \ndidn’t \nhate him. I hated his rules and restric

In [25]:
### testing MMR (max marginal relevance) search
question = "Who is Ana Huang?"
vectordb.max_marginal_relevance_search(question, k = CFG.k)

[Document(page_content='ANA HUANG', metadata={'source': '/content/Twisted/Twisted-Love-1.pdf', 'page': 3}),
 Document(page_content='ABOUT THE AUTHOR\nAna Huang is an author of primarily steamy New Adult and contemporary romance. Her books contain\ndiverse characters and emotional, sometimes twisty roads toward HEAs (with plenty of banter and\nspice sprinkled in). Besides reading and writing, Ana loves traveling, is obsessed with hot chocolate,\nand has multiple relationships with fictional boyfriends.', metadata={'source': '/content/Twisted/Twisted-Love-1.pdf', 'page': 311}),
 Document(page_content='rushing back like a tidal wave. \nI was going to find the fucker who wrote her that note.', metadata={'source': '/content/Twisted/Twisted Lies A Fake Dating Romance (Ana Huang).pdf', 'page': 127}),
 Document(page_content='warm in designer clothes. I’d always suspected you weren’t mine—you look\nnothing like me, but I figured, hey, maybe you just have a strong resemblance\nto Wendy. I took a

# Retriever chain

-We'll use a retriever to retrieve relevant passages
- Chain to answer questions
- [RetrievalQA: Chain for question-answering](https://python.langchain.com/docs/modules/data_connection/retrievers/)

In [26]:
retriever = vectordb.as_retriever(search_kwargs={"k": CFG.k})
docs = retriever.get_relevant_documents("how old is rhys larsen in twisted games?")
print(len(docs))
print(docs)

4
[Document(page_content='enough that I could make out the tiny bump in his nose and the firm curve of\nhis lips.\n“Have you ever been in love?” I asked, partly because I really wanted to\nknow, and partly because I wanted to pull my thoughts off the morbid path\nthey’d taken.\n“Nope.”\n“Really? Never?”\n“Nope,” Rhys said again. He cocked an eyebrow. “Surprised?”\n“A little. You’re old. You should’ve been in love at least three times by\nnow.” He was ten years older than me, which wasn’t that old at all, but I liked\nteasing him when I could.\nA deep, rich sound filled the air, and I realized with shock Rhys was\nlaughing. \nThe deepest, loudest, realest laugh I’d pulled out of him yet.\nIt was beautiful.\n“One love for every decade,” Rhys said when his mirth faded. “By that', metadata={'source': '/content/Twisted/Twisted Games.pdf', 'page': 138}), Document(page_content='Once the door shut, I said, “You can call me Bridget. It would be odd if\nwe were engaged and you still called me Yo

In [27]:
docs[0]

Document(page_content='enough that I could make out the tiny bump in his nose and the firm curve of\nhis lips.\n“Have you ever been in love?” I asked, partly because I really wanted to\nknow, and partly because I wanted to pull my thoughts off the morbid path\nthey’d taken.\n“Nope.”\n“Really? Never?”\n“Nope,” Rhys said again. He cocked an eyebrow. “Surprised?”\n“A little. You’re old. You should’ve been in love at least three times by\nnow.” He was ten years older than me, which wasn’t that old at all, but I liked\nteasing him when I could.\nA deep, rich sound filled the air, and I realized with shock Rhys was\nlaughing. \nThe deepest, loudest, realest laugh I’d pulled out of him yet.\nIt was beautiful.\n“One love for every decade,” Rhys said when his mirth faded. “By that', metadata={'source': '/content/Twisted/Twisted Games.pdf', 'page': 138})

In [28]:
docs[2].metadata['source']

'/content/Twisted/Twisted Games.pdf'

# Prompt Template

- Custom prompt

In [29]:
def generate_model_response(user_query):
    # Assuming retriever.get_relevant_documents correctly returns a list of document objects
    # Fetch the top 3 documents as per your setup
    docs = retriever.get_relevant_documents(user_query)
    # Extract the 'page_content' from each of the top 3 documents
    combined_snippets = ' '.join(doc.page_content for doc in docs)

    snippet_words = combined_snippets.split()  # Adjust the number of words as needed
    shortened_snippet = ' '.join(snippet_words)

    # Prepare the prompt template using the combined and shortened snippet
    prompt_template = f'''[INST]

    Don't try to make up an answer, if you don't know just say that you don't know.
    Answer in the same language the question was asked.
    Use only the following pieces of context to answer the question at the end.
    Write a short answer.

    Context: {shortened_snippet}

    Question: {user_query}
    Answer:

    [/INST]
    '''

    sources_and_pages = []
    # Assuming get_completion is defined elsewhere and 'mistral_llm' is the model you want to use
    response = get_completion(prompt_template, model="mistral-7b-instruct-4k", max_tokens=150)
    for index, doc in enumerate(docs):
        # Retrieve source from metadata
        source = doc.metadata.get('source', 'Unknown')  # If source is not available, set it as 'Unknown'
        # Retrieve page from metadata
        page = doc.metadata.get('page', 'Unknown')  # If page is not available, set it as 'Unknown'
        # Append source and page to the list
        sources_and_pages.append((source, page))

    # Format sources and pages for inclusion in the response
    formatted_sources_and_pages = '\n'.join([f"{source} - page: {page}" for source, page in sources_and_pages])

    # Concatenate the model response with sources and pages
    formatted_response = f"{response.strip()}\nSources:\n{formatted_sources_and_pages}"

    return formatted_response

# Example usage:
user_query = "how old is rhys larsen in twisted games?"
response = generate_model_response(user_query)
print("Formatted Model Response:")
print(response)

Formatted Model Response:
Rhys Larsen is 34 years old in "Twisted Games".
Sources:
/content/Twisted/Twisted Games.pdf - page: 138
/content/Twisted/Twisted Games.pdf - page: 318
/content/Twisted/Twisted Games.pdf - page: 187
/content/Twisted/Twisted Games.pdf - page: 279


As we can see, now the model is giving the good answer to the question "how old is rhys larsen in twisted games?" which is 34. Without the context, it just gives a random answer.

In [30]:
get_completion("how old is rhys larsen in twisted games?", model="mistral-7b-instruct-4k")

'\n\n## Answer (1)\n\nRhys Larsen is 25 years old in Twisted Games.\n\n## Answer (0)\n\nRhys Larsen is 25 years old in Twisted Games.'

Other exemples :

In [31]:
#other exemples
rep = generate_model_response("how is the voice of rhys?")
print(rep)

The voice of Rhys is described as condescending and grating on the narrator's nerves.
Sources:
/content/Twisted/Twisted Games.pdf - page: 30
/content/Twisted/Twisted Games.pdf - page: 17
/content/Twisted/Twisted Games.pdf - page: 369
/content/Twisted/Twisted Games.pdf - page: 297


In [32]:
#other exemples
rep1 = generate_model_response("who is rhys larsen in twisted games?")
print(rep1)

Rhys Larsen is a bodyguard in the novel "Twisted Games" by Kate Atkinson. He is the fiancé of the protagonist, Bridget Jones, and is involved in a scandal involving a damning video of him and Bridget. Rhys is also involved in a physical altercation with Bridget's grandfather, Steffan.
Sources:
/content/Twisted/Twisted Games.pdf - page: 318
/content/Twisted/Twisted Games.pdf - page: 276
/content/Twisted/Twisted Games.pdf - page: 179
/content/Twisted/Twisted Games.pdf - page: 30


In [33]:
#other examples
rep2 = generate_model_response("Princess Bridget is the princess of what?")
print(rep2)
#reponse du model avant RAG
get_completion("Princess Bridget is the princess of what?", model="mistral-7b-instruct-4k")

Princess Bridget is the princess of Eldorra.
Sources:
/content/Twisted/Twisted Games.pdf - page: 9
/content/Twisted/Twisted-Love-1.pdf - page: 306
/content/Twisted/Twisted Games.pdf - page: 385
/content/Twisted/Twisted Hate An Enemies with Benefits Romance (Ana Huang)-1.pdf - page: 37


'\nAnswer: Ireland'

In summary, the combination of a language model with retrieval-based methods has proven effective in generating accurate responses.It demonstrate the robustness and adaptability of the system to different configurations and retrieval settings.

## Part *2*
In this part, the code provides a get_model function that allows loading a natural language processing model from the Hugging Face Transformers library. We don't need an API key.
### Define model

In [34]:
def get_model(model = CFG.model_name):

    print('\nDownloading model: ', model, '\n\n')

    if model == 'wizardlm':
        model_repo = 'TheBloke/wizardLM-7B-HF'

        tokenizer = AutoTokenizer.from_pretrained(model_repo)

        model = AutoModelForCausalLM.from_pretrained(
            model_repo,
            #If you don't have GPUs comment these lines as well as for the other models
            load_in_4bit = True,
            device_map = 'auto',
            torch_dtype = torch.float16,
            low_cpu_mem_usage = True
        )

        max_len = 1024

    elif model == 'llama2-7b-chat':
        model_repo = 'daryl149/llama-2-7b-chat-hf'

        tokenizer = AutoTokenizer.from_pretrained(model_repo, use_fast=True)

        model = AutoModelForCausalLM.from_pretrained(
            model_repo,
            load_in_4bit = True,
            device_map = 'auto',
            torch_dtype = torch.float16,
            low_cpu_mem_usage = True,
            trust_remote_code = True
        )

        max_len = 2048

    elif model == 'llama2-13b-chat':
        model_repo = 'daryl149/llama-2-13b-chat-hf'

        tokenizer = AutoTokenizer.from_pretrained(model_repo, use_fast=True)

        model = AutoModelForCausalLM.from_pretrained(
            model_repo,
            load_in_4bit = True,
            device_map = 'auto',
            torch_dtype = torch.float16,
            low_cpu_mem_usage = True,
            trust_remote_code = True
        )

        max_len = 2048 # 8192

    elif model == 'mistral-7B':
        model_repo = 'mistralai/Mistral-7B-v0.1'

        tokenizer = AutoTokenizer.from_pretrained(model_repo)

        model = AutoModelForCausalLM.from_pretrained(
            model_repo,
            load_in_4bit = True,
            device_map = 'auto',
            torch_dtype = torch.float16,
            low_cpu_mem_usage = True,
        )

        max_len = 1024

    else:
        print("Not implemented model (tokenizer and backbone)")

    return tokenizer, model, max_len

In [35]:
%%time
# We'll be testing the model llama2-13b-chat since we used mistral in the first part
tokenizer, model, max_len = get_model(model ='llama2-7b-chat')


Downloading model:  llama2-7b-chat 




tokenizer_config.json:   0%|          | 0.00/727 [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/500k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.84M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/411 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/507 [00:00<?, ?B/s]

pytorch_model.bin.index.json:   0%|          | 0.00/26.8k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/2 [00:00<?, ?it/s]

pytorch_model-00001-of-00002.bin:   0%|          | 0.00/9.98G [00:00<?, ?B/s]

pytorch_model-00002-of-00002.bin:   0%|          | 0.00/3.50G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/137 [00:00<?, ?B/s]

CPU times: user 22.2 s, sys: 36 s, total: 58.2 s
Wall time: 3min 6s


In [36]:
model.eval()

LlamaForCausalLM(
  (model): LlamaModel(
    (embed_tokens): Embedding(32000, 4096, padding_idx=0)
    (layers): ModuleList(
      (0-31): 32 x LlamaDecoderLayer(
        (self_attn): LlamaAttention(
          (q_proj): Linear4bit(in_features=4096, out_features=4096, bias=False)
          (k_proj): Linear4bit(in_features=4096, out_features=4096, bias=False)
          (v_proj): Linear4bit(in_features=4096, out_features=4096, bias=False)
          (o_proj): Linear4bit(in_features=4096, out_features=4096, bias=False)
          (rotary_emb): LlamaRotaryEmbedding()
        )
        (mlp): LlamaMLP(
          (gate_proj): Linear4bit(in_features=4096, out_features=11008, bias=False)
          (up_proj): Linear4bit(in_features=4096, out_features=11008, bias=False)
          (down_proj): Linear4bit(in_features=11008, out_features=4096, bias=False)
          (act_fn): SiLU()
        )
        (input_layernorm): LlamaRMSNorm()
        (post_attention_layernorm): LlamaRMSNorm()
      )
    )
    

In [37]:
### check how Accelerate split the model across the available devices (GPUs)
model.hf_device_map

{'': 0}

### pipeline

- This code sets up a text generation pipeline using the Hugging Face Transformers library.

In [38]:
pipe = pipeline(
    task = "text-generation",
    model = model,
    tokenizer = tokenizer,
    pad_token_id = tokenizer.eos_token_id,
    max_length = max_len,
    temperature = CFG.temperature,
    top_p = CFG.top_p,
    repetition_penalty = CFG.repetition_penalty
)

llm = HuggingFacePipeline(pipeline = pipe)

In [39]:
llm

HuggingFacePipeline(pipeline=<transformers.pipelines.text_generation.TextGenerationPipeline object at 0x7cd2aa705450>)

### Custom prompt

In [40]:
prompt_template = """
Don't try to make up an answer, if you don't know just say that you don't know.
Answer in the same language the question was asked.
Use only the following pieces of context to answer the question at the end.

{context}

Question: {question}
Answer:"""


PROMPT = PromptTemplate(
    template = prompt_template,
    input_variables = ["context", "question"]
)

# Retriever chain

- This time, we'll use Chain to answer questions
- [RetrievalQA: Chain for question-answering](https://python.langchain.com/docs/modules/data_connection/retrievers/)

In [41]:
retriever2 = vectordb.as_retriever(search_kwargs = {"k": CFG.k, "search_type" : "similarity"})

qa_chain = RetrievalQA.from_chain_type(
    llm = llm,
    chain_type = "stuff", # map_reduce, map_rerank, stuff, refine
    retriever = retriever,
    chain_type_kwargs = {"prompt": PROMPT},
    return_source_documents = True,
    verbose = False
)

### Post-process outputs
- Format llm response
- Cite sources (PDFs)
- Change `width` parameter to format the output

In [42]:
def wrap_text_preserve_newlines(text, width=700):
    # Split the input text into lines based on newline characters
    lines = text.split('\n')

    # Wrap each line individually
    wrapped_lines = [textwrap.fill(line, width=width) for line in lines]

    # Join the wrapped lines back together using newline characters
    wrapped_text = '\n'.join(wrapped_lines)

    return wrapped_text


def process_llm_response(llm_response):
    ans = wrap_text_preserve_newlines(llm_response['result'])

    sources_used = ' \n'.join(
        [
            source.metadata['source'].split('/')[-1][:-4] + ' - page: ' + str(source.metadata['page'])
            for source in llm_response['source_documents']
        ]
    )

    ans = ans + '\n\nSources: \n' + sources_used
    return ans

In [43]:
def llm_ans(query):
    start = time.time()

    llm_response = qa_chain.invoke(query)
    ans = process_llm_response(llm_response)

    end = time.time()

    time_elapsed = int(round(end - start, 0))
    time_elapsed_str = f'\n\nTime elapsed: {time_elapsed} s'
    return ans + time_elapsed_str

# Ask questions

- Question Answering from multiple documents
- Invoke QA Chain
- Talk to your data

In [44]:
query =  "who's the richest male lead in the twisted series?"
print(llm_ans(query))

 Christian Harper

Sources: 
Twisted-Love-1 - page: 1 
Twisted-Love-1 - page: 2 
Twisted Games - page: 2 
Twisted Lies A Fake Dating Romance (Ana Huang) - page: 200

Time elapsed: 4 s


In [45]:
query = "what was the nickname Josh gave Jules?"
print(llm_ans(query))

 Josh called Jules "JR" (short for Jessica Rabbit).

Sources: 
Twisted-Love-1 - page: 27 
Twisted Hate An Enemies with Benefits Romance (Ana Huang)-1 - page: 301 
Twisted-Love-1 - page: 28 
Twisted Hate An Enemies with Benefits Romance (Ana Huang)-1 - page: 215

Time elapsed: 5 s


In [46]:
query = "Princess Bridget is the princess of what?"
print(llm_ans(query))

 The princess of Eldorra.

Sources: 
Twisted Games - page: 9 
Twisted-Love-1 - page: 306 
Twisted Games - page: 385 
Twisted Hate An Enemies with Benefits Romance (Ana Huang)-1 - page: 37

Time elapsed: 4 s


In [47]:
query = "how is the voice of rhys?"
print(llm_ans(query))

 The voice of Rhys is written in a sarcastic and mocking tone, often using irony and sarcasm to express himself.

Sources: 
Twisted Games - page: 30 
Twisted Games - page: 17 
Twisted Games - page: 369 
Twisted Games - page: 297

Time elapsed: 6 s


This code seems to provide more accurate results, likely due to the use of the pipeline. Indeed, using the pipeline simplifies and optimizes the process of utilizing complex models like those in the Hugging Face Transformers library, potentially leading to more accurate results in tasks