## Tesing the Hybrid Index

- Using both Sparse Values and Vector Embeddings 

In [8]:
from helpers import (
    load_md_with_metadata, 
    filter_to_minimal_docs,
    text_split
)

In [9]:
import os

current_dir = os.getcwd()
parent_dir = os.path.dirname(current_dir)
DATA_FILE_PATH = os.path.join(parent_dir, "data")

In [10]:
docs = load_md_with_metadata(DATA_FILE_PATH)
minimal_docs = filter_to_minimal_docs(docs)
text_chunks = text_split(minimal_docs)

### Sentence-Transformers for VECTOR EMBEDDING

In [12]:
from sentence_transformers import SentenceTransformer
embeddingModel = SentenceTransformer('all-MiniLM-L6-v2')

Loading weights: 100%|██████████| 103/103 [00:00<00:00, 241.07it/s, Materializing param=pooler.dense.weight]                             
BertModel LOAD REPORT from: sentence-transformers/all-MiniLM-L6-v2
Key                     | Status     |  | 
------------------------+------------+--+-
embeddings.position_ids | UNEXPECTED |  | 

Notes:
- UNEXPECTED	:can be ignored when loading from different task/architecture; not ok if you expect identical arch.


### Load a SPLADE model for SPARSE INDICES and VALUES

In [13]:
from transformers import AutoTokenizer, AutoModelForMaskedLM
import torch

tokenizer = AutoTokenizer.from_pretrained("naver/splade-cocondenser-ensembledistil")
model = AutoModelForMaskedLM.from_pretrained("naver/splade-cocondenser-ensembledistil")

To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development
Loading weights: 100%|██████████| 204/204 [00:01<00:00, 176.91it/s, Materializing param=cls.predictions.transform.dense.weight]                 
BertForMaskedLM LOAD REPORT from: naver/splade-cocondenser-ensembledistil
Key                          | Status     |  | 
-----------------------------+------------+--+-
bert.embeddings.position_ids | UNEXPECTED |  | 

Notes:
- UNEXPECTED	:can be ignored when loading from different task/architecture; not ok if you expect identical arch.


In [15]:
def splade_sparse(text, threshold=0.0):
    inputs = tokenizer(text, return_tensors="pt", truncation=True)

    with torch.no_grad():
        logits = model(**inputs).logits

    # SPLADE pooling
    scores = torch.log1p(torch.relu(logits))
    scores = torch.max(scores, dim=1).values.squeeze()

    indices = []
    values = []

    for idx, score in enumerate(scores):
        if score > threshold:
            indices.append(idx)
            values.append(float(score))

    return indices, values

In [16]:
indices, values = splade_sparse("Machine learning vector database")

print("Non-zero terms:", len(indices))
print("Sparse dim:", tokenizer.vocab_size)


Non-zero terms: 41
Sparse dim: 30522


In [None]:
indices, values = splade_sparse("Hi I am Sayan and I am trying to build RAG Pipeline")

print("Non-zero terms:", len(indices))
print("Sparse dim:", tokenizer.vocab_size)

Non-zero terms: 53
Sparse dim: 30522


In [None]:
# payload required to create a HYBRID INDEX

INDEX_NAME = ""
vector_dimensions = embeddingModel.get_sentence_embedding_dimension()
sparse_dimensions = tokenizer.vocab_size
