## **Advacne RAG 01 - Powerful RAG Using Hybrid Search(Keyword+vVector search)**  

1. What is Keyword search
2. What is Vector search
3. Ensemble Solution
4. Combine Solution For better retrieval
5. Solution with ChromaDB
6. Solution using weaviate
7. Loading quantize model from HuggingFace
8. Reranking after retrival


## **Retrival Technique Could be**   
- Naive Retrieval + keyword Search
- Sentence windows retrieval
- self Query Retrieval
- Parent Document Retrieval
-  HDE (hypothetical Document Embeddings)  


In [1]:
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
import numpy as np


In [2]:
# simple Documents
document =  [

    "this is list which containg sample documens.",
    "Keywords are important for keyword-based search" ,
    "Document analysis involves extracting keywords" ,
    "Keyword-based search relies on sparse embeddings."

]

In [3]:
document

['this is list which containg sample documens.',
 'Keywords are important for keyword-based search',
 'Document analysis involves extracting keywords',
 'Keyword-based search relies on sparse embeddings.']

In [4]:
query ="keyword-based search"


In [5]:
# prompt: write function name preproce_text which takes a list of text
#   and convert text to lower and remove all punctuation

import string

def preprocess_text(text_list):
  processed_text = []
  for text in text_list:
    text = text.lower()
    text = ''.join([char for char in text if char not in string.punctuation])
    processed_text.append(text)
  return processed_text


In [6]:
document =  preprocess_text(document)

In [7]:
document

['this is list which containg sample documens',
 'keywords are important for keywordbased search',
 'document analysis involves extracting keywords',
 'keywordbased search relies on sparse embeddings']

In [8]:
query = ''.join([char for char in query.lower()  if char not in string.punctuation])

In [9]:
query

'keywordbased search'

In [10]:
vector = TfidfVectorizer()

In [11]:
x =  vector.fit_transform(document)

In [12]:
x

<4x21 sparse matrix of type '<class 'numpy.float64'>'
	with 24 stored elements in Compressed Sparse Row format>

In [13]:
x =  x.toarray()

In [14]:
x

array([[0.        , 0.        , 0.37796447, 0.37796447, 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.37796447, 0.        , 0.        , 0.37796447, 0.        ,
        0.        , 0.37796447, 0.        , 0.        , 0.37796447,
        0.37796447],
       [0.        , 0.4533864 , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.4533864 , 0.4533864 , 0.        ,
        0.        , 0.35745504, 0.35745504, 0.        , 0.        ,
        0.        , 0.        , 0.35745504, 0.        , 0.        ,
        0.        ],
       [0.46516193, 0.        , 0.        , 0.        , 0.46516193,
        0.        , 0.46516193, 0.        , 0.        , 0.46516193,
        0.        , 0.        , 0.36673901, 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        ],
       [0.        , 0.        , 0.        , 0.        , 0.        ,
        0.43671931, 0.        , 0.        , 0.       

In [15]:
x.shape

(4, 21)

In [16]:
x[0]

array([0.        , 0.        , 0.37796447, 0.37796447, 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.37796447, 0.        , 0.        , 0.37796447, 0.        ,
       0.        , 0.37796447, 0.        , 0.        , 0.37796447,
       0.37796447])

In [17]:
x[1]

array([0.        , 0.4533864 , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.4533864 , 0.4533864 , 0.        ,
       0.        , 0.35745504, 0.35745504, 0.        , 0.        ,
       0.        , 0.        , 0.35745504, 0.        , 0.        ,
       0.        ])

In [18]:
x[2]

array([0.46516193, 0.        , 0.        , 0.        , 0.46516193,
       0.        , 0.46516193, 0.        , 0.        , 0.46516193,
       0.        , 0.        , 0.36673901, 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        ])

In [19]:
x[3]

array([0.        , 0.        , 0.        , 0.        , 0.        ,
       0.43671931, 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.34431452, 0.        , 0.        , 0.43671931,
       0.43671931, 0.        , 0.34431452, 0.43671931, 0.        ,
       0.        ])

In [20]:
query_embedding =  vector.transform([query]).toarray()

In [21]:
query_embedding

array([[0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.70710678, 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.70710678, 0.        , 0.        ,
        0.        ]])

In [22]:
similarities =  cosine_similarity(query_embedding, x)

In [23]:
similarities

array([[0.        , 0.50551777, 0.        , 0.48693426]])

In [24]:
# Ranking
ranked_indices =  np.argsort(similarities )[::-1].flatten()

In [25]:
ranked_indices = ranked_indices[::-1]

In [26]:
ranked_indices

array([1, 3, 2, 0])

In [27]:
ranked_documents = [document[i] for i in ranked_indices]


In [28]:
ranked_documents

['keywords are important for keywordbased search',
 'keywordbased search relies on sparse embeddings',
 'document analysis involves extracting keywords',
 'this is list which containg sample documens']

In [29]:
## Output the ranked Documents :

for i  , doc in enumerate(ranked_documents):
  print(f"Rank{i+1}: {doc}")

Rank1: keywords are important for keywordbased search
Rank2: keywordbased search relies on sparse embeddings
Rank3: document analysis involves extracting keywords
Rank4: this is list which containg sample documens


In [30]:
query

'keywordbased search'

## **Vector Search**

In [31]:
document_embeddings = np.array([
    [0.634, 0.234, 0.867, 0.042, 0.249],
    [0.123, 0.456, 0.789, 0.321, 0.654],
    [0.987, 0.654, 0.321, 0.123, 0.456]
])

In [32]:
query_embedding = np.array([[0.987, 0.654, 0.321, 0.123 , 0.456]])

In [33]:
similarities = cosine_similarity(document_embeddings , query_embedding)

In [34]:
similarities

array([[0.79303963],
       [0.65530827],
       [1.        ]])

In [35]:
ranked_indices =np.argsort(similarities , axis =0)[::-1]

In [36]:
ranked_indices

array([[2],
       [0],
       [1]])

In [37]:
for i , doc_index in enumerate(ranked_indices):
  print(f"Rank {i+1} : document {doc_index}")

Rank 1 : document [2]
Rank 2 : document [0]
Rank 3 : document [1]


In [None]:
 !pip install pypdf

In [None]:
!pip install langchain_community

In [40]:
from langchain.document_loaders import PyPDFLoader

In [44]:
doc_path ="/content/Retrieval-Augmented Generation for Large Language Models A Survey.pdf"

In [45]:
loader =  PyPDFLoader(doc_path)

In [None]:
loader.load()

In [47]:
docs = loader.load()

In [48]:
from langchain.text_splitter import RecursiveCharacterTextSplitter

In [49]:
spliter =  RecursiveCharacterTextSplitter(chunk_size = 200 , chunk_overlap =30 )

In [50]:
chunks = spliter.split_documents(docs)

In [51]:
chunks[3]

Document(metadata={'source': '/content/Retrieval-Augmented Generation for Large Language Models A Survey.pdf', 'page': 0}, page_content='outdated knowledge, and non-transparent, untraceable reasoning\nprocesses. Retrieval-Augmented Generation (RAG) has emerged\nas a promising solution by incorporating knowledge from external')

In [52]:
from langchain.embeddings import HuggingFaceEmbeddings

In [None]:
!pip install -U huggingface_hub[cli]

In [54]:
!huggingface-cli login


    _|    _|  _|    _|    _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|_|_|_|    _|_|      _|_|_|  _|_|_|_|
    _|    _|  _|    _|  _|        _|          _|    _|_|    _|  _|            _|        _|    _|  _|        _|
    _|_|_|_|  _|    _|  _|  _|_|  _|  _|_|    _|    _|  _|  _|  _|  _|_|      _|_|_|    _|_|_|_|  _|        _|_|_|
    _|    _|  _|    _|  _|    _|  _|    _|    _|    _|    _|_|  _|    _|      _|        _|    _|  _|        _|
    _|    _|    _|_|      _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|        _|    _|    _|_|_|  _|_|_|_|

    To login, `huggingface_hub` requires a token generated from https://huggingface.co/settings/tokens .
Enter your token (input will not be visible): 
Add token as git credential? (Y/n) y
Token is valid (permission: fineGrained).
[1m[31mCannot authenticate through git-credential as no helper is defined on your machine.
You might have to re-authenticate when pushing to the Hugging Face Hub.
Run the following command in yo

In [58]:
embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-mpnet-base-v2")

  from tqdm.autonotebook import tqdm, trange
The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/10.6k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/571 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/438M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/363 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/239 [00:00<?, ?B/s]

1_Pooling/config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

In [None]:
!pip install sentence-transformers

In [59]:
from langchain.vectorstores import Chroma

In [None]:
!pip install chromadb

In [80]:
vectorstore = Chroma.from_documents(chunks , embeddings)

In [81]:
vectorstore_retriever = vectorstore.as_retriever(search_kwargs={"k": 3})

In [64]:
vectorstore_retriever

VectorStoreRetriever(tags=['Chroma', 'HuggingFaceEmbeddings'], vectorstore=<langchain_community.vectorstores.chroma.Chroma object at 0x7f957158a260>, search_kwargs={'k': 3})

In [None]:
!pip install rank_bm25

In [66]:
from langchain.retrievers import BM25Retriever , EnsembleRetriever

## **Mixing vector search and keyword search for hybrid search**
Hybrid_score =  (1-alpha) *sparse_score +alpha * dense_score


In [67]:
model_name = "HuggingFaceH4/zephyr-7b-alpha"

In [None]:
!pip install bitsandbytes

In [69]:
!pip install accelerate

In [70]:
import torch
from transformers import  AutoTokenizer , AutoModelForCausalLM , pipeline   , BitsAndBytesConfig
from langchain import HuggingFacePipeline

In [71]:
# function for loading 4-bit quantized model
def load_quantized_model(model_name: str):
    """
    model_name: Name or path of the model to be loaded.
    return: Loaded quantized model.
    """
    bnb_config = BitsAndBytesConfig(
        load_in_4bit=True,
        bnb_4bit_use_double_quant=True,
        bnb_4bit_quant_type="nf4",
        bnb_4bit_compute_dtype=torch.bfloat16,
    )

    model = AutoModelForCausalLM.from_pretrained(
        model_name,
        torch_dtype=torch.bfloat16,
        quantization_config=bnb_config,
    )
    return model

In [72]:
def initialize_tokenizer(model_name: str):
    """
    model_name: Name or path of the model for tokenizer initialization.
    return: Initialized tokenizer.
    """
    tokenizer = AutoTokenizer.from_pretrained(model_name, return_token_type_ids=False)
    tokenizer.bos_token_id = 1  # Set beginning of sentence token id
    return tokenizer


In [73]:

tokenizer = AutoTokenizer.from_pretrained(model_name, return_token_type_ids=False, truncation=True, model_max_length=512)

tokenizer_config.json:   0%|          | 0.00/1.43k [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/493k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.80M [00:00<?, ?B/s]

added_tokens.json:   0%|          | 0.00/42.0 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/168 [00:00<?, ?B/s]

In [75]:
pipeline = pipeline(
    "text-generation",
    model=model_name,
    tokenizer=tokenizer,
    use_cache=True,
    device_map="auto",
    max_length=2048,
    do_sample=True,
    top_k=5,
    num_return_sequences=1,
    eos_token_id=tokenizer.eos_token_id,
    pad_token_id=tokenizer.pad_token_id,
)

generation_config.json:   0%|          | 0.00/111 [00:00<?, ?B/s]



In [76]:
llm = HuggingFacePipeline(pipeline=pipeline)

  llm = HuggingFacePipeline(pipeline=pipeline)


In [77]:
from langchain.chains import RetrievalQA

In [82]:
normal_chain = RetrievalQA.from_chain_type(
    llm=llm, chain_type="stuff", retriever=vectorstore_retriever
)

In [88]:
keyword_retriever = BM25Retriever.from_documents(chunks)

In [89]:
ensemble_retriever = EnsembleRetriever(retrievers=[vectorstore_retriever,keyword_retriever],weights=[0.3, 0.7])


In [90]:
hybrid_chain = RetrievalQA.from_chain_type(
    llm=llm, chain_type="stuff", retriever=ensemble_retriever
)

In [None]:
response1 = normal_chain.invoke("What is Abstractive Question Answering?")

In [None]:
response1.get("result")

In [None]:
response2 = hybrid_chain.invoke("What is Abstractive Question Answering?")


In [None]:
response2.get("result")