<h3> RAG LLM Chatbot using Llama3 from Hugging Face</h3>

In [1]:
import json
import torch

from transformers import (AutoTokenizer,
                          AutoModelForCausalLM,
                          BitsAndBytesConfig,
                          pipeline)

<h2>Hugging Face Account Configuration</h2>

In [2]:
model_id = "meta-llama/Meta-Llama-3-8B"
huggingfacetoken = json.load(open("config.json"))["HF_TOKEN"]

<h2>Quantization Configurations</h2>
To shring model Weights and make processing less heavier on the system

In [3]:
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_quant_storage=torch.bfloat16
)

<h3>Loading Tokenizer and LLM</h3

In [4]:
tokenizer = AutoTokenizer.from_pretrained(model_id, huggingfacetoken)
tokenizer.pad_token = tokenizer.eos_token

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


In [5]:
#Fetch instance of the model
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    device_map="cuda",
    quantization_config=bnb_config,
    token=huggingfacetoken,
    low_cpu_mem_usage=True
)

Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]

In [6]:
text_generator = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    max_new_tokens=128
)

In [20]:
def getllmResponse(prompt):
    sequences = text_generator(prompt)
    gen_text = sequences[0]["generated_text"]
    return(gen_text)

In [8]:
getllmResponse("Who is Taylor Swift?")

  attn_output = torch.nn.functional.scaled_dot_product_attention(


Who is Taylor Swift? What is her age, net worth, boyfriend, and height?
Who is Taylor Swift? Taylor Swift is an American singer-songwriter and actress. She is one of the best-selling music artists of all time. She has released five albums and won many awards, including ten Grammy Awards. Taylor Swift is the youngest person to ever win the Grammy Award for Album of the Year. She has also won a record-breaking 23 Billboard Music Awards. She is also the youngest person to ever be nominated for a Golden Globe Award. She has also been named one of the 100 Most Influential People in the World by Time magazine. Swift has sold


<h3>Extracting Content from PDF -> Followed by Tokenization and Embedding</h3>

In [29]:

from sentence_transformers import SentenceTransformer
from sentence_transformers.util import cos_sim
from sentence_transformers.quantization import quantize_embeddings

In [30]:
# 1. Specify preffered dimensions
dimensions = 512

# 2. load model
model = SentenceTransformer("mixedbread-ai/mxbai-embed-large-v1", truncate_dim=dimensions)

modules.json:   0%|          | 0.00/229 [00:00<?, ?B/s]

To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to see activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development


config_sentence_transformers.json:   0%|          | 0.00/171 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/113k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/677 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/670M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/1.24k [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/711k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/695 [00:00<?, ?B/s]

1_Pooling/config.json:   0%|          | 0.00/297 [00:00<?, ?B/s]

In [32]:
#this PRompt is necessary if you want to do a retrieval
query = 'Represent this sentence for searching relevant passages: A man is eating a piece of bread'

text_rag = json.load(open("F:\GitHub\RAG-Langchain-App-Using-Llama\sourceData\content.json"))["text"]

In [33]:
embeddings = model.encode(text_rag)

In [34]:
#Quantization of embeddings to reduce size
binary_embeddings = quantize_embeddings(embeddings, precision="ubinary")
similarities = cos_sim(embeddings[0], embeddings[1:])
print('similarities:', similarities)

ValueError: cannot reshape array of size 64 into shape (512,newaxis)