# VANILLA RAG

## Install Requirements

In [1]:
!pip install ctransformers sentence_transformers chromadb langchain pypdf PyPDF2



In [2]:
!pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

Looking in indexes: https://download.pytorch.org/whl/cu118


### Importing Libraries & Naive RAG

In [2]:
import os
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.vectorstores import Chroma
from langchain.embeddings import HuggingFaceBgeEmbeddings
from langchain.document_loaders import PyPDFLoader
from langchain import PromptTemplate, LLMChain
from langchain.llms import CTransformers
from langchain.chains import RetrievalQA
from io import BytesIO

In [10]:
from langchain.llms import CTransformers

In [3]:
model_name = "BAAI/bge-large-en"
model_kwargs = {'device': 'cuda'}
encode_kwargs = {'normalize_embeddings': False}
embeddings = HuggingFaceBgeEmbeddings(
    model_name=model_name,
    model_kwargs=model_kwargs,
    encode_kwargs=encode_kwargs
)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


In [4]:
loaders = [
    PyPDFLoader(r"/content/RAFT.pdf")

]
information = []
for loader in loaders:
    information.extend(loader.load())

In [5]:
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=100)
texts = text_splitter.split_documents(information)

vector_store = Chroma.from_documents(texts, embeddings, collection_metadata={"hnsw:space": "cosine"}, persist_directory="stores/RAFT")

print("Vector Store Created.......")

Vector Store Created.......


In [6]:
config = {
'max_new_tokens': 1024,
'repetition_penalty': 1.1,
'temperature': 0.1,
'top_k': 50,
'top_p': 0.9,
'stream': True,
'threads': int(os.cpu_count() / 2)
}


In [9]:
llm = CTransformers(
    model='TheBloke/Mistral-7B-Instruct-v0.1-GGUF',
    model_type="mistral",
    #lib="avx2", #for CPU use
    #**config
    temperature=0.1
)

print("LLM Initialized...")

Fetching 1 files:   0%|          | 0/1 [00:00<?, ?it/s]

config.json:   0%|          | 0.00/31.0 [00:00<?, ?B/s]

Fetching 1 files:   0%|          | 0/1 [00:00<?, ?it/s]

mistral-7b-instruct-v0.1.Q2_K.gguf:   0%|          | 0.00/3.08G [00:00<?, ?B/s]

LLM Initialized...


In [11]:
prompt_template = """Write a concise summary of the information to answer the user's question delimited by triple backquotes.
Return your response in bullet points which covers the key points of the context.
If you don't know the answer, just say that you don't know, don't try to make up an answer.

Context: {context}
Question: {question}

Only return the helpful summary below and nothing else.
Helpful summary:
"""

In [12]:
prompt = PromptTemplate(template=prompt_template, input_variables=['context', 'question'])
load_vector_store = Chroma(persist_directory="stores/RAFT", embedding_function=embeddings)
retriever = load_vector_store.as_retriever(search_kwargs={"k":1})
chain_type_kwargs = {"prompt": prompt}

In [13]:
qa_chain = RetrievalQA.from_chain_type(llm=llm, chain_type="stuff", retriever=retriever, return_source_documents=True, chain_type_kwargs=chain_type_kwargs, verbose=True)


In [14]:
response = qa_chain('What is RAFT?')

  warn_deprecated(




[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m


In [16]:
print(response['result'])

* RAFT stands for Reasonable Answer Formulation Training.
* It is a language model trained on standard instructional tuning and context comprehension.
* It combines these two approaches to improve the ability of language models to process text effectively.
* The model can be adapted to domain-specific tasks by incorporating relevant documents in its training dataset.


In [17]:
print(response['source_documents'])

[Document(page_content='RAFT: Adapting Language Model to Domain Specific RAG\nand include an example of the prompt we used in Figure 3.\n4.4. Qualitative Analysis\nTo illustrate the potential advantages of RAFT over the\ndomain-specifically fine-tuned (DSF) approach, we present\na comparative example in Figure 4. This example qual-\nitatively demonstrates a scenario where the DSF model\nbecomes confused by a question asking for the identity of\na screenwriter. Instead of providing the correct name, it\nmistakenly cites one of the films written by the screenwriter.\nIn contrast, the RAFT model accurately answers the ques-\ntion. This discrepancy suggests that training a model solely\nwith question-answer pairs may impair its ability to derive\nrelevant context from provided documents. The comparison\nunderscores the importance of incorporating both standard\ninstructional tuning and context comprehension into the\ntraining dataset to preserve and enhance the model’s ability\nto process 