## Tesing the Hybrid Index

Using both Sparse Values and Vector Embeddings 

1. Using SPLADE for computing the Sparse Values and Indices
2. Using Sentence-Transformers for computing the Vector Database

In [None]:
from langchain_core.documents import Document
from doc_preprocessor import (
    load_md_with_metadata, 
    filter_to_minimal_docs,
    text_split
)

  from .autonotebook import tqdm as notebook_tqdm


In [2]:
import os

current_dir = os.getcwd()
parent_dir = os.path.dirname(current_dir)
DATA_FILE_PATH = os.path.join(parent_dir, "data")

In [3]:
docs = load_md_with_metadata(DATA_FILE_PATH)
minimal_docs = filter_to_minimal_docs(docs)
text_chunks = text_split(minimal_docs)

### Sentence-Transformers for VECTOR EMBEDDING

In [5]:
from sentence_transformers import SentenceTransformer
embeddingModel = SentenceTransformer('all-MiniLM-L6-v2')

Loading weights: 100%|██████████| 103/103 [00:01<00:00, 92.77it/s, Materializing param=pooler.dense.weight]                              
BertModel LOAD REPORT from: sentence-transformers/all-MiniLM-L6-v2
Key                     | Status     |  | 
------------------------+------------+--+-
embeddings.position_ids | UNEXPECTED |  | 

Notes:
- UNEXPECTED	:can be ignored when loading from different task/architecture; not ok if you expect identical arch.


### Load a SPLADE model for SPARSE INDICES and VALUES

In [6]:
from transformers import AutoTokenizer, AutoModelForMaskedLM
import torch

tokenizer = AutoTokenizer.from_pretrained("naver/splade-cocondenser-ensembledistil")
model = AutoModelForMaskedLM.from_pretrained("naver/splade-cocondenser-ensembledistil")

Loading weights: 100%|██████████| 204/204 [00:01<00:00, 194.52it/s, Materializing param=cls.predictions.transform.dense.weight]                 
BertForMaskedLM LOAD REPORT from: naver/splade-cocondenser-ensembledistil
Key                          | Status     |  | 
-----------------------------+------------+--+-
bert.embeddings.position_ids | UNEXPECTED |  | 

Notes:
- UNEXPECTED	:can be ignored when loading from different task/architecture; not ok if you expect identical arch.


In [12]:
def splade_sparse(text, threshold=0.1):
    inputs = tokenizer(text, return_tensors="pt", truncation=True)

    with torch.no_grad():
        logits = model(**inputs).logits

    # SPLADE pooling
    scores = torch.log1p(torch.relu(logits))
    scores = torch.max(scores, dim=1).values.squeeze()

    indices = []
    values = []

    for idx, score in enumerate(scores):
        if score > threshold:
            indices.append(idx)
            values.append(float(score))

    return indices, values

In [None]:
start_idx = 2501
vectorEmbeddings = []
skipped = 0

for id, chunk in enumerate(text_chunks[2500:]):
    source = chunk.metadata.get('source', "")
    title = chunk.metadata.get("title", "")
    description = chunk.metadata.get("description", "")
    text = chunk.page_content

    # Vector Embeddings 
    embedding = embeddingModel.encode(text).tolist()
    if len(embedding) != embeddingModel.get_sentence_embedding_dimension():
        skipped += 1
        continue

    # Sparse Indices and Values for Hybrid Search
    sparse_indices, sparse_values = splade_sparse(text)
    if len(sparse_indices) != len(sparse_values):
        skipped += 1
        continue

    data = {
        "id": id + start_idx,
        "vector": embedding,
        "sparse_indices": sparse_indices,
        "sparse_values": sparse_values,
        "meta": {
            "title": title,
            "description": description,
            "source": source,
            "text": text
        }
    }

    print(id + 1, " is been executed")
    
    vectorEmbeddings.append(data)

In [57]:
# Performing list slicing because, insertion limit is 1000 vectors, and I am keeping 950 vectors per upsert
slices = []

end = len(vectorEmbeddings) // 950
prev = 0
for i in range(1, end + 1):
    slices.append((prev, 950 * i + 1))
    prev = 950 * i + 1
slices.append((prev, len(vectorEmbeddings)))

print(slices)

[(0, 951), (951, 1901), (1901, 2851), (2851, 2992)]


### Creating the **Hybrid Index** in Vector DB

In [35]:
import requests
from requests.exceptions import ConnectionError, Timeout, HTTPError

# Index name for Endee Vector Database
INDEX_NAME = "enterprise_knowledge_base2_hybrid"

# URL for Endee API service
ENDEE_URL = "http://127.0.0.1:8000"

In [37]:
# Payload for creating an index in Endee Vector DB
payload = {
    "index_name": INDEX_NAME,
    "dimension": embeddingModel.get_sentence_embedding_dimension(),
    "sparse_dimension": tokenizer.vocab_size,
    "precision": "INT16D"
}

In [39]:
try:    
    response = requests.post(
        f"{ENDEE_URL}/index/hybrid/create",
        json=payload
    )
    response.raise_for_status()
    print(f"Message for index creation: {response.json()}")

except ConnectionError:
    print("Backend service is not reachable (is it running?)")
    
except Timeout:
    print("Request timed out")
    
except HTTPError as e:
    try:
        err = e.response.json()
        print("Error message:", err.get("error"))
    except ValueError:
        print("Raw error:", e.response.text)
    
except Exception as e:
    print(f"Unexpected error: {e}")

Message for index creation: {'index_name': 'enterprise_knowledge_base2_hybrid', 'status': 'Hybrid index created'}


In [40]:
try:
    response = requests.post(
        f"{ENDEE_URL}/index/get",
        json={
            "index_name": INDEX_NAME
        })
    
    response.raise_for_status()
    print(response.json())
    
except ConnectionError:
    print("Backend service is not reachable (is it running?)")
    
except Timeout:
    print("Request timed out")
    
except HTTPError as e:
    try:
        err = e.response.json()
        print("Error message:", err.get("error"))
    except ValueError:
        print("Raw error:", e.response.text)
    
except Exception as e:
    print(f"Unexpected error: {e}")

{'index_name': 'enterprise_knowledge_base2_hybrid', 'status': 'index loaded'}


In [41]:
# This is the function to UPSERT the VECTORS into DB
def upsertVectors(vectors):
    payload = {
            "index_name": INDEX_NAME,
            "embedded_vectors": vectors
    }

    try:
        response = requests.post(
            f"{ENDEE_URL}/index/hybrid/upsert",
            json=payload
        )
        response.raise_for_status()

        print(response.json())
        return True

    except ConnectionError:
        print("Backend service is not reachable (is it running?)")

    except Timeout:
        print("Request timed out")

    except HTTPError as e:
        try:
            err = e.response.json()
            print("Error message:", err.get("error"))
        except ValueError:
            print("Raw error:", e.response.text)

    except Exception as e:
        print(f"Unexpected error: {e}")

    return False

In [59]:
for start, end in slices:
    vectors = vectorEmbeddings[start:end]
    upsertVectors(vectors)

{'count': 951, 'status': 'vectors upserted'}
{'count': 950, 'status': 'vectors upserted'}
{'count': 950, 'status': 'vectors upserted'}
{'count': 141, 'status': 'vectors upserted'}


In [None]:
# Sample RETRIEVAL of data relevant to queries.
query = "Deep dive into Customer Experience"
embedding_for_query = embeddingModel.encode(query).tolist()
sparse_indices, sparse_values = splade_sparse(query)

payload = {
    "index_name": INDEX_NAME,
    "vector": embedding_for_query,
    "sparse_indices": sparse_indices,
    "sparse_values": sparse_values,
    "top_k": 20
}

try:
    # Sends a query to the Endee API
    response = requests.post(
        f"{ENDEE_URL}/index/hybrid/query",
        json=payload
    )

    response.raise_for_status()
    print(response.json())
    
except ConnectionError:
    print("Backend service is not reachable (is it running?)")
    
except Timeout:
    print("Request timed out")
    
except HTTPError as e:
    try:
        err = e.response.json()
        print("Error message:", err.get("error"))
    except ValueError:
        print("Raw error:", e.response.text)
    
except Exception as e:
    print(f"Unexpected error: {e}")

data = response.json()
print(data["results"][:2])

[{'description': 'Deep dive into Customer Experience (CX)', 'distance': 0.9674775265157223, 'id': '1', 'similarity': 0.032522473484277725, 'text': 'Deep dive into Customer Experience (CX) \n Understanding how this critical practice drives customer-centricity in GitLab \n What is Customer Experience (CX)?', 'title': 'About Customer Experience (CX)'}, {'description': 'Understanding the current customer experience, identifying opportunities where we can improve, and leveraging metrics to advocate key experience improvement initiatives.', 'distance': 0.9679815582931042, 'id': '942', 'similarity': 0.03201844170689583, 'text': 'Customer Experience (CX) research takes three perspectives into account: \n \n Customer Lens: Understanding the customer’s key activities along our core customer journey, their needs, who on their side is engaged in purchase decisions, through to onboarding and adoption, and the needs and expectations of GitLab for success.', 'title': 'Customer Experience Journey Rese

### Building the Complete RAG Pipeline

In [46]:
# Loading the environment variables from the .env file
from dotenv import load_dotenv
load_dotenv()

GROQ_API_KEY = os.getenv("GROQ_API_KEY")
os.environ["GROQ_API_KEY"] = GROQ_API_KEY

In [69]:
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnableLambda, RunnablePassthrough
from langchain_groq import ChatGroq

In [48]:
llm = ChatGroq(
    model_name="llama-3.3-70b-versatile",
    temperature=0.3
)

In [None]:
# This is wrapper for retrieving the data
def endee_retriever(query: str):
    embedding_for_query = embeddingModel.encode(query).tolist()

    payload = {
        "index_name": INDEX_NAME,
        "vector": embedding_for_query,
        "top_k": 20
    }

    response = requests.post(
        f"{ENDEE_URL}/index/hybrid/query",
        json=payload
    )

    response.raise_for_status()
    data = response.json()

    docs = []
    for d in data.get("results", []):
        text = d.get("text", "")

        docs.append(
            Document(
                page_content=text,
                metadata={
                    "similarity": d.get("similarity"),
                    "source": d.get("source"),
                    "title": d.get("title"),
                    "description": d.get("description")
                }
            )
        )

    return docs

retriever = RunnableLambda(endee_retriever)

In [50]:
system_prompt = """
You are a professional and a helpful virtual assistant for a Company named GitLab.
Your name is 'GitLab Copilot' and your job is to answer to all the queries made by the employees of this company to make their work easier.
Answer in three to five sentences for general questions.
and if the answer needs to be more elaborated, provide more details like step by step instructions only when needed or asked specifically by the user.
Use the context given below for your reference and keep the answer concise and to the point.
Respond in detail only when the user specifies it.
And always perform a Double check before giving the final response to the user.

Context:
{context}

If a user ask whether they can upload a Document, respond with "Yes, you can upload a PDF Document only."
"""

In [None]:
# Building the Prompt
prompt = ChatPromptTemplate.from_messages([
    ("system", system_prompt),
    ("human", "{input}")
])

# Rag pipeline:
rag_chain = (
    {
        "context": retriever | (lambda docs: "\n\n".join(d.page_content for d in docs)),
        "input": RunnablePassthrough()
    }
    | prompt
    | llm
    | StrOutputParser()
)

In [67]:
# Sample test:
print(rag_chain.invoke("How can I prepare for the interview, can you also tell the stages involved in Hiring process"))

To prepare for the interview, it's essential to familiarize yourself with the job description and requirements. Review the job family page and reach out to the Hiring Manager if you have any questions. Align with the Hiring Manager on competencies and prepare actionable steps to work on those areas. You can also review the interview kit in Greenhouse, which includes the candidate's resume, interview description, scorecard, and suggested questions. The hiring process typically involves several stages, including Application, Technical Interview, and Hiring Manager Interview, followed by a Senior Manager/Director/VP Interview for successful candidates.
