<h3> RAG LLM Chatbot using Llama3 from Hugging Face</h3>

In [52]:
import json
import torch

from transformers import (AutoTokenizer,
                          AutoModelForCausalLM,
                          BitsAndBytesConfig,
                          pipeline)

<h2>Hugging Face Account Configuration</h2>

In [53]:
model_id = "meta-llama/Meta-Llama-3-8B"
huggingfacetoken = json.load(open("config.json"))["HF_TOKEN"]

<h2>Quantization Configurations</h2>
To shring model Weights and make processing less heavier on the system

In [54]:
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_quant_storage=torch.bfloat16
)

<h3>Loading Tokenizer and LLM</h3

In [55]:
tokenizer = AutoTokenizer.from_pretrained(model_id, huggingfacetoken)
tokenizer.pad_token = tokenizer.eos_token

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


In [56]:
#Fetch instance of the model
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    device_map="cuda",
    quantization_config=bnb_config,
    token=huggingfacetoken,
    low_cpu_mem_usage=True
)

Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]

In [57]:
text_generator = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    max_new_tokens=128
)

<h3>Load the PDF text which has been chunked according to the Page Contents</h3>


In [45]:
import pandas as pd

# Read the CSV file
textDataSet = pd.read_parquet('sourceData/pageChunksDataSet.parquet')
textDataSet['Embeddings']

0      [0.12017979472875595, 0.15168651938438416, 0.4...
1      [0.029344482347369194, 0.002526520751416683, 0...
2      [0.20948801934719086, 0.2716869115829468, 0.53...
3      [0.1858314573764801, 0.3157446086406708, 0.785...
4      [0.10378478467464447, 0.04209050163626671, 0.5...
                             ...                        
96     [-0.10375858098268509, 0.08440834283828735, 0....
97     [-0.4156252145767212, -0.45168107748031616, 0....
98     [-0.010929952375590801, -0.3700862228870392, 0...
99     [-0.14516785740852356, -0.196247398853302, 0.5...
100    [0.3874731659889221, -0.010221307165920734, 0....
Name: Embeddings, Length: 101, dtype: object

In [48]:
from datasets import Dataset
# import faiss.cpu as faiss
data = Dataset.from_pandas(textDataSet)
data = data.add_faiss_index("Embeddings")

data


  0%|          | 0/1 [00:00<?, ?it/s]

Dataset({
    features: ['TrainingData', 'Embeddings'],
    num_rows: 101
})

A Function to Embed the Query and Search for the nearest matching Documents that can be used for Generation of a Response / Context grounding

In [47]:

from sentence_transformers import SentenceTransformer
from sentence_transformers.util import cos_sim
from sentence_transformers.quantization import quantize_embeddings

# 2. load model
SentenceTransformer = SentenceTransformer("mixedbread-ai/mxbai-embed-large-v1")


In [71]:
def searchKNearestMatchingDocuments(query: str, k: int = 3):
    embeddedQuery = SentenceTransformer.encode(query)
    scores, retrieved_examples = data.get_nearest_examples("Embeddings",        embeddedQuery, # compare our new embedded query with the dataset embeddings
            k # get only top k results
        )
    return scores, retrieved_examples

In [None]:
def transformQuery(query: str, contextualDocs: list[str]):
    #Appending this Prompt is necessary if you want to do a retrieval
    query =  '''You are an assistant for answering questions.
You are given the extracted parts of a long document and a question. Provide a conversational answer. If you don\'t know the answer, just say "I do not know." Don\'t make up an answer.: Question{query} Document:'''
    for content in contextualDocs:
        query = query + content + "\n"
    print("Formatted Query:",query)
    return query

In [79]:
def getllmResponse(prompt):
    sequences = text_generator(prompt)
    gen_text = sequences[0]["generated_text"]
    return(gen_text)

In [80]:
originalPrompt = "What does the author say about Big Bang Theory"
score, retrievedPages = searchKNearestMatchingDocuments(originalPrompt, k=5)
print("score:", score, "\n\n","Retrieved data:", retrievedPages)
print("Pages Retrieved:", len(retrievedPages["Embeddings"]))

retrievedPages = [page for page in retrievedPages["TrainingData"]]
modified_rag_prompt = transformQuery(originalPrompt, retrievedPages)
getllmResponse(modified_rag_prompt)

score: [228.07977 233.10527 248.58414 254.4018  254.60861] 

 Retrieved data: {'TrainingData': ['ACKNOWLEDGMENTS\xa0Many people have helped me in writing this book. My scientific colleagues have without exception beeninspiring. Over the years my principal associates and collaborators were Roger Penrose, Robert Geroch,Brandon Carter, George Ellis, Gary Gibbons, Don Page, and Jim Hartle. I owe a lot to them, and to myresearch students, who have always given me help when needed.One of my students, Brian Whitt, gave me a lot of help writing the first edition of this book. My editor at BantamBooks, Peter Guzzardi, made innumerable comments which improved the book considerably. In addition, forthis edition, I would like to thank Andrew Dunn, who helped me revise the text.I could not have written this book without my communication system. The software, called Equalizer, wasdonated by Walt Waltosz of Words Plus Inc., in Lancaster, California. My speech synthesizer was donated bySpeech Plus, of

