<a href="https://colab.research.google.com/github/AnDDoanf/LLM-repo/blob/master/Llama2_RAG.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
!pip install pypdf
!pip install transformers einops accelerate langchain torch bitsandbytes
!pip install sentence_transformers #Embedding
!pip install llama_index
!pip install llama-index-embeddings-langchain
!pip install llama-index-llms-huggingface
!pip install langchain-community
!huggingface-cli login --token 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

import torch
from transformers import BitsAndBytesConfig
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader, ServiceContext  #Vector store index is for indexing the vector
from llama_index.llms.huggingface import HuggingFaceLLM
from langchain.embeddings.huggingface import HuggingFaceEmbeddings
from llama_index.core import ServiceContext
from langchain.embeddings.huggingface import HuggingFaceEmbeddings

The token has not been saved to the git credentials helper. Pass `add_to_git_credential=True` in this function directly or `--add-to-git-credential` if using via `huggingface-cli` if you want to set the git credential as well.
Token is valid (permission: read).
Your token has been saved to /root/.cache/huggingface/token
Login successful





Object oriented

In [None]:
class QnAAssistant:
    def __init__(self, pdf_dir, model_name, embed_model_name, system_prompt):
        self.pdf_dir = pdf_dir
        self.model_name = model_name
        self.embed_model_name = embed_model_name
        self.system_prompt = system_prompt
        self.documents = None
        self.query_engine = None

        self.load_documents()
        self.setup_model()
        self.setup_embeddings()
        self.create_service_context()
        self.create_index()
        self.create_query_engine()

    def load_documents(self):
        self.documents = SimpleDirectoryReader(self.pdf_dir).load_data()

    def setup_model(self):
        model_config = BitsAndBytesConfig(
            load_in_4bit=True,
            bnb_4bit_use_double_quant=True,
            bnb_4bit_quant_type="nf4",
            bnb_4bit_compute_dtype=torch.bfloat16,
        )

        self.llm = HuggingFaceLLM(
            context_window=4096,
            max_new_tokens=256,
            generate_kwargs={"temperature": 0.0, "do_sample": False},
            system_prompt=self.system_prompt,
            tokenizer_name=self.model_name,
            model_name=self.model_name,
            device_map="auto",
            model_kwargs={"torch_dtype": torch.float16}
        )

    def setup_embeddings(self):
        self.embed_model = HuggingFaceEmbeddings(model_name=self.embed_model_name)

    def create_service_context(self):
        self.service_context = ServiceContext.from_defaults(
            chunk_size=512,
            chunk_overlap=20,
            llm=self.llm,
            embed_model=self.embed_model
        )

    def create_index(self):
        self.index = VectorStoreIndex.from_documents(self.documents, service_context=self.service_context)

    def create_query_engine(self):
        self.query_engine = self.index.as_query_engine()

    def query(self, user_query):
        torch.cuda.empty_cache()
        response = self.query_engine.query(user_query)
        return response

    def run(self):
        while True:
            query = input("Enter your query: ")
            if query.lower() == "exit":
                print("Thank you for asking.")
                break
            response = self.query(query)
            print(response)

In [None]:
# Augmented data:https://assets.kpmg.com/content/dam/kpmg/xx/pdf/2023/09/kpmg-global-tech-report.pdf

pdf_dir = '/content/pdf'
model_name = "meta-llama/Llama-2-7b-chat-hf"
embed_model_name = "sentence-transformers/all-mpnet-base-v2"
system_prompt = """
You are a Q&A assistant. Your goal is to answer questions based on the provided context. If you do not know, say you don not know
"""

assistant = QnAAssistant(pdf_dir, model_name, embed_model_name, system_prompt)

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

  self.service_context = ServiceContext.from_defaults(


In [None]:
assistant.run()

Enter your query: Hi
Hi there! I'm here to help you with any questions you may have. What would you like to know?
Enter your query: How are you
I don't know.
Enter your query: give me headers of KPMG tech report 2023
I don't know the headers of KPMG's tech report 2023 as I don't have access to the report.
Enter your query: Being intentional means what?
Being intentional means being really clear what value you intend to generate from the technology you deploy.
Enter your query: exit
Thank you for asking.
