<h3> RAG LLM Chatbot using Llama3 from Hugging Face</h3>

In [17]:
import json
import torch

from transformers import (AutoTokenizer,
                          AutoModelForCausalLM,
                          BitsAndBytesConfig,
                          pipeline)

<h2>Hugging Face Account Configuration</h2>

In [19]:
model_id = "meta-llama/Meta-Llama-3-8B"
huggingfacetoken = json.load(open("config.json"))["HF_TOKEN"]

<h2>Quantization Configurations</h2>
To shring model Weights and make processing less heavier on the system

In [20]:
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_quant_storage=torch.bfloat16
)

<h3>Loading Tokenizer and LLM</h3

In [21]:
tokenizer = AutoTokenizer.from_pretrained(model_id, huggingfacetoken)
tokenizer.pad_token = tokenizer.eos_token

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


In [22]:
#Fetch instance of the model
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    device_map="cuda",
    quantization_config=bnb_config,
    token=huggingfacetoken,
    low_cpu_mem_usage=True
)

Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]

In [6]:
text_generator = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    max_new_tokens=128
)

In [20]:
def getllmResponse(prompt):
    sequences = text_generator(prompt)
    gen_text = sequences[0]["generated_text"]
    return(gen_text)

In [8]:
getllmResponse("Who is Taylor Swift?")

  attn_output = torch.nn.functional.scaled_dot_product_attention(


Who is Taylor Swift? What is her age, net worth, boyfriend, and height?
Who is Taylor Swift? Taylor Swift is an American singer-songwriter and actress. She is one of the best-selling music artists of all time. She has released five albums and won many awards, including ten Grammy Awards. Taylor Swift is the youngest person to ever win the Grammy Award for Album of the Year. She has also won a record-breaking 23 Billboard Music Awards. She is also the youngest person to ever be nominated for a Golden Globe Award. She has also been named one of the 100 Most Influential People in the World by Time magazine. Swift has sold


<h3>Extracting Content from PDF -> Followed by Tokenization and Embedding</h3>

In [1]:

from sentence_transformers import SentenceTransformer
from sentence_transformers.util import cos_sim
from sentence_transformers.quantization import quantize_embeddings

  from tqdm.autonotebook import tqdm, trange


In [23]:
# 1. Specify preffered dimensions
dimensions = 64

# 2. load model
SentenceTransformer = SentenceTransformer("mixedbread-ai/mxbai-embed-large-v1")

text_rag = json.load(open("F:\GitHub\RAG-Langchain-App-Using-Llama\sourceData\content.json"))["text"]

In [25]:
#a function that embeds a new query and returns the most probable results
embeddingText = SentenceTransformer.encode(text_rag)
print(embeddingText.tolist())
def searchForContextuallyRelevantText(query: str, k: int = 3):
    embeddedQuery = SentenceTransformer.encode([query])
    similarities = embeddingText.map(x => ());
    print(similarities)




[0.12017979472875595, 0.15168651938438416, 0.4161500036716461, -0.8284395337104797, 0.019621439278125763, 0.2533479630947113, -0.9404932856559753, -0.1853422224521637, -0.33194389939308167, 1.1471387147903442, 0.10292501002550125, -0.5313226580619812, -0.06177488714456558, -0.6318856477737427, -0.2917520999908447, -0.14794296026229858, -0.3136429786682129, -0.14760088920593262, -0.6931927800178528, 0.2486511468887329, 0.102177694439888, 0.3390626311302185, -0.8157618641853333, -0.9062125086784363, 0.18076947331428528, 0.8934656381607056, 0.5540475845336914, 0.5699717402458191, 0.679347038269043, 0.6637933254241943, 1.1611952781677246, 0.05429611727595329, 0.3269819915294647, -0.8864858150482178, -0.3080008924007416, 0.17652520537376404, 0.29905954003334045, -0.4452902674674988, 0.4306778013706207, -0.2390519678592682, -0.22914011776447296, -0.24455031752586365, 0.5431109070777893, -0.7036169171333313, -0.5157374739646912, 0.1431828886270523, 0.0907881036400795, -0.24890264868736267, 0.

In [15]:
def transformQuery(query: str):
    #Appending this Prompt is necessary if you want to do a retrieval
    return f'You are an assistant for answering questions.
You are given the extracted parts of a long document and a question. Provide a conversational answer. If you don\'t know the answer, just say "I do not know." Don\'t make up an answer.:{query} Document:{text_rag}'

In [18]:
#Pooling : Represnting all embeddings for tokens inside a sentence via a single embedding
def pooling(outputs: torch.Tensor):
    outputs = outputs[:,0]
    outputs.detach().cpu().numpy()

In [7]:
#Quantization of embeddings to reduce size
binary_embeddings = quantize_embeddings(embeddings, precision="ubinary")


ValueError: cannot reshape array of size 64 into shape (512,newaxis)