<a href="https://colab.research.google.com/github/Raviii6685/Advance_RAG_hybrid_search/blob/main/hybrid_search_reranking_weaviate.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#Importing necessary libs
1. weaviate-client: This library provides the tools to connect and interact with a Weaviate database. Weaviate is a vector database often used in combination with LangChain for tasks like semantic search.
2. langchain: This is the core LangChain library. LangChain is a framework designed for developing applications powered by large language models (LLMs). It offers tools to chain together different components like LLMs, data sources, and agents.
3. langchain-community: This package contains community-contributed components and integrations for LangChain, expanding its functionality with additional tools and connections to different services.


In [46]:
#langchain supports the weviate inbuilt (hybrid_search)
!pip install weaviate-client
!pip install langchain
!pip install langchain-community

Collecting weaviate-client
  Downloading weaviate_client-4.9.6-py3-none-any.whl.metadata (3.6 kB)
Collecting httpx<=0.27.0,>=0.25.0 (from weaviate-client)
  Downloading httpx-0.27.0-py3-none-any.whl.metadata (7.2 kB)
Collecting validators==0.34.0 (from weaviate-client)
  Downloading validators-0.34.0-py3-none-any.whl.metadata (3.8 kB)
Collecting authlib<1.3.2,>=1.2.1 (from weaviate-client)
  Downloading Authlib-1.3.1-py2.py3-none-any.whl.metadata (3.8 kB)
Collecting grpcio-tools<2.0.0,>=1.57.0 (from weaviate-client)
  Downloading grpcio_tools-1.68.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (5.3 kB)
Collecting grpcio-health-checking<2.0.0,>=1.57.0 (from weaviate-client)
  Downloading grpcio_health_checking-1.68.1-py3-none-any.whl.metadata (1.1 kB)
Collecting grpcio<2.0.0,>=1.57.0 (from weaviate-client)
  Downloading grpcio-1.68.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (3.9 kB)
Downloading weaviate_client-4.9.6-py3-none-any.whl (386



# Client Creation for weaviate
1. client = weaviate.Client(...): This line creates a Weaviate client object using the weaviate.Client class. This client object is your primary interface for interacting with the Weaviate database.
2. url=WEAVIATE_CLUSTER: This argument provides the URL of your Weaviate cluster to the client, allowing it to establish a connection.
3. auth_client_secret=weaviate.AuthApiKey(api_key=WEAVIATE_API_KEY): This sets up authentication using your API key. It uses the weaviate.AuthApiKey class to handle the API key securely.
4. additional_headers={"X-HuggingFace-Api-Key": HF_TOKEN}: This includes an additional header in requests to Weaviate, likely for integrating with Hugging Face's services. The header contains your Hugging Face API token, HF_TOKEN.

In [53]:

import weaviate #(in this we have already bm25 )
from google.colab import userdata
HF_TOKEN=userdata.get('HuGGINGFACE_TOKEN')
WEAVIATE_CLUSTER=userdata.get('WEAVIATE_CLUSTER')
WEAVIATE_API_KEY=userdata.get('WEAVIATE_API_KEY')
#creating the client so we can connect to DB
client  = weaviate.Client(
    url=WEAVIATE_CLUSTER,
    auth_client_secret=weaviate.AuthApiKey(api_key=WEAVIATE_API_KEY),
    additional_headers={
        "X-HuggingFace-Api-Key": HF_TOKEN #if not mentioning then automatically takes openai_api_key, else define this step manually
    }
)

Python client v3 `weaviate.Client(...)` connections and methods are deprecated and will
            be removed by 2024-11-30.

            Upgrade your code to use Python client v4 `weaviate.WeaviateClient` connections and methods.
                - For Python Client v4 usage, see: https://weaviate.io/developers/weaviate/client-libraries/python
                - For code migration, see: https://weaviate.io/developers/weaviate/client-libraries/python/v3_v4_migration

            If you have to use v3 code, install the v3 client and pin the v3 dependency in your requirements file: `weaviate-client>=3.26.7;<4.0.0`
  client  = weaviate.Client(


**Schema: The Blueprint for Your Data**

In Weaviate, the schema is essentially the blueprint that defines how your data will be structured and organized within the database. Think of it like the design plan for a building – it specifies the different rooms (classes), their purpose, and the types of things that can be stored within them (properties).

**Why is it needed?**
- Data Organization: The schema helps Weaviate understand what kind of data it's dealing with and how to categorize it. By defining classes and properties, you're providing a structure for your data, making it easier to manage and query later on.
- Semantic Search: Weaviate is a vector database, often used for semantic search. To enable this, text data needs to be transformed into vectors that represent its meaning. The schema specifies how this conversion will happen:
"vectorizer": "text2vec-huggingface": Tells Weaviate to use the text2vec-huggingface module for creating vectors.
"moduleConfig": Provides further instructions to this module, like which model ("sentence-transformers/all-MiniLM-L6-v2") to use for generating the vectors. This is crucial for semantic search as it determines how similar or different pieces of text are perceived by the database.
- Data Validation: The schema acts as a validation tool. When you add data to Weaviate, it checks if the data conforms to the schema's defined structure. This prevents you from accidentally adding data of the wrong type to a property or creating inconsistent data entries.
- Querying Efficiency: A well-defined schema can significantly improve the efficiency of your queries. Weaviate can use the schema to optimize how it searches for and retrieves data, leading to faster results.
---
In the context of the code:
The code you provided defines a schema for storing documents related to RAG (Retrieval Augmented Generation). The schema specifies:
Class: "RAG" - This is the category for your data, likely for storing documents used in RAG tasks.
Property: "content" - This property will hold the actual text content of your documents.
By defining this schema, you're essentially instructing Weaviate to:
Organize incoming documents under the "RAG" category.
Convert the text content of those documents into vectors using the specified Hugging Face model, enabling semantic search capabilities.
Ensure that any data added to this class adheres to the defined structure, guaranteeing consistency and data integrity.
This structure makes it possible to efficiently store, search, and retrieve relevant documents for your RAG applications. I hope this helps clarify the purpose and importance of the schema in Weaviate.

In [56]:
client.is_ready() #used to check if connection_is_established
client.schema.get() #uRetrieves the existing schema to understand the structure of the data stored in Weaviate.
#If not define the schema manually then , then it directly creates the schemas with help of openai

#Creating the Schema manually
schema = {
    "classes": [
        {
            "class": "RAG",
            "description": "Documents for RAG",
            "vectorizer": "text2vec-huggingface",
            "moduleConfig": {"text2vec-huggingface": {"model": "sentence-transformers/all-MiniLM-L6-v2", "type": "text"}},
            "properties": [
                {
                    "dataType": ["text"],
                    "description": "The content of the paragraph",
                    "moduleConfig": {
                        "text2vec-huggingface": {
                            "skip": False,
                            "vectorizePropertyName": False,
                        }
                    },
                    "name": "content",
                },
            ],
        },
    ]
}
client.schema.create(schema)
client.schema.get()

{'classes': [{'class': 'RAG',
   'description': 'Documents for RAG',
   'invertedIndexConfig': {'bm25': {'b': 0.75, 'k1': 1.2},
    'cleanupIntervalSeconds': 60,
    'stopwords': {'additions': None, 'preset': 'en', 'removals': None}},
   'moduleConfig': {'text2vec-huggingface': {'model': 'sentence-transformers/all-MiniLM-L6-v2',
     'type': 'text',
     'useCache': True,
     'useGPU': False,
     'vectorizeClassName': True,
     'waitForModel': False}},
   'multiTenancyConfig': {'autoTenantActivation': False,
    'autoTenantCreation': False,
    'enabled': False},
   'properties': [{'dataType': ['text'],
     'description': 'The content of the paragraph',
     'indexFilterable': True,
     'indexRangeFilters': False,
     'indexSearchable': True,
     'moduleConfig': {'text2vec-huggingface': {'skip': False,
       'vectorizePropertyName': False}},
     'name': 'content',
     'tokenization': 'word'}],
   'replicationConfig': {'asyncEnabled': False,
    'deletionStrategy': 'NoAutomate

In [57]:
from langchain.retrievers.weaviate_hybrid_search import WeaviateHybridSearchRetriever
retriever=WeaviateHybridSearchRetriever(
    client=client,
    index_name="RAG",
    text_key="content", #this 2 are must to connect to weaviate
    alpha=0.5,
    attributes=[],#we can keep it empty also , it is a return result
    create_schema_if_missing=True, #if not defined schema manually then this will create the schema by help of this
)
model_name = "HuggingFaceH4/zephyr-7b-beta"

In [58]:
#Two must libraries we need
!pip install bytesandbytes
!pip install accelerate

[31mERROR: Could not find a version that satisfies the requirement bytesandbytes (from versions: none)[0m[31m
[0m[31mERROR: No matching distribution found for bytesandbytes[0m[31m


In [59]:
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig,pipeline
from langchain.llms import HuggingFacePipeline

# Designed to load a quantized language model
1. bnb_config=BitsAndBytesConfig(...): This line creates a configuration object called bnb_config using the BitsAndBytesConfig class. This configuration is specifically for loading models with quantization, a technique that reduces the model's size and memory footprint while potentially maintaining accuracy.
 - load_in_4bit=True: This setting enables bit quantization, which means the model's weights and activations will be represented using only 4 bits instead of the usual 32 bits (float32).
 - bnb_4bit_use_double_quant=True, bnb_4bit_quant_type="nf4", bnb_4bit_compute_dtype=torch.bfloat16 These settings fine-tune the quantization process for better performance.
 - low_cpu_mem_usage=True: This is an optimization to reduce CPU memory usage during model loading.
2. model=AutoModelForCausalLM.from_pretrained(...): This is the core of the function. It loads the pre-trained causal language model using the AutoModelForCausalLM class from the transformers library.
model_name: The name of the pre-trained model you specified is passed here.
quantization_config=bnb_config: The quantization configuration we defined earlier is applied to the model during loading.
torch_dtype=torch.bfloat16: This sets the data type for the model's computations to bfloat16, which is a lower-precision floating-point format, further reducing memory usage.

In [60]:
import torch
def load_quantaized_model(model_name:str):
  bnb_config=BitsAndBytesConfig(
      load_in_4bit=True,
      bnb_4bit_use_double_quant=True,
      bnb_4bit_quant_type="nf4",
      bnb_4bit_compute_dtype=torch.bfloat16,
      low_cpu_mem_usage=True
  )
  model=AutoModelForCausalLM.from_pretrained(
      model_name,
      quantization_config=bnb_config,
      torch_dtype=torch.bfloat16)
  return model

In [61]:
def initialize_tokenizer(model_name: str):
  tokenizer = AutoTokenizer.from_pretrained(model_name, return_token_type_ids=False)
  tokenizer.bos_token_id = 1
  # Set beginning of sentence token id
  return tokenizer

In [64]:
tokenizer = initialize_tokenizer(model_name)
model = load_quantaized_model(model_name)

Unused kwargs: ['low_cpu_mem_usage']. These kwargs are not used in <class 'transformers.utils.quantization_config.BitsAndBytesConfig'>.
`low_cpu_mem_usage` was None, now default to True since model is quantized.


Loading checkpoint shards:   0%|          | 0/8 [00:00<?, ?it/s]

In [69]:
pipeline = pipeline(
                  "text-generation", #what kind of task we want tells to huggingface
                  model=model, #llm model is defined
                  tokenizer=tokenizer, #converting text into numerical representations and vice-versa
                  use_cache=True, #This setting enables caching, which can speed up text generation by storing previously computed results.
                  device_map="auto", #tell which is best cpu or gpu
                  #max_length=2048,
                  do_sample=True, #introduces randomness to output more diverse
                  top_k=5,
                  max_new_tokens=100,
                  num_return_sequences=1,
                  eos_token_id=tokenizer.eos_token_id,
                  pad_token_id=tokenizer.pad_token_id,
)

ValueError: The following `model_kwargs` are not used by the model: ['model', 'device'] (note: typos in the generate arguments will also show up in this list)

In [71]:
llm=HuggingFacePipeline(pipeline=pipeline) #this all steps for loading the model and the tokenizer

In [72]:
doc_path="./170603762v7.pdf"

# Loading the Data

In [73]:
from langchain_community.document_loaders import PyPDFLoader
loader = PyPDFLoader(doc_path)
docs = loader.load()
 #loading the data into retriever
retriever.add_documents(docs)

['c5c7f4f9-4910-4d0f-ab5c-9dbcb69f9fec',
 'afd76029-322c-4e17-b4f8-a9d377d9e9b8',
 '8fcc632b-49da-4ad5-9c45-a42221ef7a76',
 '424f35b8-2f1e-46f5-83a4-81530f345ac6',
 '935eed24-25fe-4dd4-b5b6-35b3b8b34b5a',
 '5cf887dd-c404-4aa5-8199-18749ad387e5',
 '2e8c11a6-7537-4735-9462-1ecfebdddfa2',
 'af71ca4a-d501-4042-ac78-49ab4b53e83d',
 'ec4e5dae-a413-4bbc-962d-da4e25af9222',
 'ab36f4a4-e331-4f87-a154-92d03e5e82c6',
 '405b0e30-e9ba-4d6b-bab7-d8e493251812',
 'acc8ff0b-feb0-4bf2-a6c3-04dbcb15c8f6',
 'b6a66e2a-c31f-43f4-86ec-deb5b81d51a7',
 'f2d32997-c2a3-4353-9c5d-28d66eaf7ac3',
 '0e0fd03f-8672-41a8-abe4-a05ceb98733b']

In [77]:
print(retriever.invoke("what is atttention all you need?")[0].page_content)

Input-Input Layer5
The
Law
will
never
be
perfect
,
but
its
application
should
be
just
-
this
is
what
we
are
missing
,
in
my
opinion
.
<EOS>
<pad>
The
Law
will
never
be
perfect
,
but
its
application
should
be
just
-
this
is
what
we
are
missing
,
in
my
opinion
.
<EOS>
<pad>
Input-Input Layer5
The
Law
will
never
be
perfect
,
but
its
application
should
be
just
-
this
is
what
we
are
missing
,
in
my
opinion
.
<EOS>
<pad>
The
Law
will
never
be
perfect
,
but
its
application
should
be
just
-
this
is
what
we
are
missing
,
in
my
opinion
.
<EOS>
<pad>
Figure 5: Many of the attention heads exhibit behaviour that seems related to the structure of the
sentence. We give two such examples above, from two different heads from the encoder self-attention
at layer 5 of 6. The heads clearly learned to perform different tasks.
15


In [78]:
retriever.invoke("what is atttention all you need?",score=True)

[Document(metadata={'_additional': {'explainScore': '\nHybrid (Result Set keyword,bm25) Document 0e0fd03f-8672-41a8-abe4-a05ceb98733b: original score 1.5577736, normalized score: 0.4980778 - \nHybrid (Result Set vector,hybridVector) Document 0e0fd03f-8672-41a8-abe4-a05ceb98733b: original score 0.19551826, normalized score: 0.34582606', 'score': '0.8439039'}}, page_content='Input-Input Layer5\nThe\nLaw\nwill\nnever\nbe\nperfect\n,\nbut\nits\napplication\nshould\nbe\njust\n-\nthis\nis\nwhat\nwe\nare\nmissing\n,\nin\nmy\nopinion\n.\n<EOS>\n<pad>\nThe\nLaw\nwill\nnever\nbe\nperfect\n,\nbut\nits\napplication\nshould\nbe\njust\n-\nthis\nis\nwhat\nwe\nare\nmissing\n,\nin\nmy\nopinion\n.\n<EOS>\n<pad>\nInput-Input Layer5\nThe\nLaw\nwill\nnever\nbe\nperfect\n,\nbut\nits\napplication\nshould\nbe\njust\n-\nthis\nis\nwhat\nwe\nare\nmissing\n,\nin\nmy\nopinion\n.\n<EOS>\n<pad>\nThe\nLaw\nwill\nnever\nbe\nperfect\n,\nbut\nits\napplication\nshould\nbe\njust\n-\nthis\nis\nwhat\nwe\nare\nmissing\n,\nin

In [79]:
from langchain.chains import RetrievalQA
hybrid_chain = RetrievalQA.from_chain_type(llm=llm, chain_type="stuff", retriever=retriever)
from langchain_core.prompts import ChatPromptTemplate
system_prompt = (
    "Use the given context to answer the question. "
    "If you don't know the answer, say you don't know. "
    "Use three sentence maximum and keep the answer concise. "
    "Context: {context}"
)
prompt = ChatPromptTemplate.from_messages(
    [
        ("system", system_prompt),
        ("human", "{query}"),
    ]
)
from langchain.prompts import PromptTemplate
template = """
Use the following pieces of context to answer the question at the end.
If you don't know the answer, just say that you do not have the relevant information needed to provide a verified answer, don't try to make up an answer.
When providing an answer, aim for clarity and precision. Position yourself as a knowledgeable authority on the topic, but also be mindful to explain the information in a manner that is accessible and comprehensible to those without a technical background.
Always say "Do you have any more questions pertaining to this instrument?" at the end of the answer.
{context}
Question: {question}
Helpful Answer:"""

prompt = PromptTemplate.from_template(template)

In [80]:
from langchain.chains.combine_documents import create_stuff_documents_chain
question_answer_chain = create_stuff_documents_chain(llm, prompt)
result1 = hybrid_chain.invoke("what is this data all about?")
print(result1)

{'query': 'what is this data all about?', 'result': "Use the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer.\n\nTable 3: Variations on the Transformer architecture. Unlisted values are identical to those of the base\nmodel. All metrics are on the English-to-German translation development set, newstest2013. Listed\nperplexities are per-wordpiece, according to our byte-pair encoding, and should not be compared to\nper-word perplexities.\nN d model dff h d k dv Pdrop ϵls\ntrain PPL BLEU params\nsteps (dev) (dev) ×106\nbase 6 512 2048 8 64 64 0.1 0.1 100K 4.92 25.8 65\n(A)\n1 512 512 5.29 24.9\n4 128 128 5.00 25.5\n16 32 32 4.91 25.8\n32 16 16 5.01 25.4\n(B) 16 5.16 25.1 58\n32 5.01 25.4 60\n(C)\n2 6.11 23.7 36\n4 5.19 25.3 50\n8 4.88 25.5 80\n256 32 32 5.75 24.5 28\n1024 128 128 4.66 26.0 168\n1024 5.12 25.4 53\n4096 4.75 26.2 90\n(D)\n0.0 5.77 24.6\n0.2 4.95 25.5\n0.0 4.67 25.3\n0.2