In [2]:
%load_ext jupyter_ai

## Dense text embedding

In [5]:
%%ai ollama:llama3.2

"Explain sparse text embedding, keep it simple. Demonstrate some example. Give some real-world example"

Sparse Text Embedding
=====================

Sparse text embedding is a way to represent text data as numerical vectors that capture its meaning, but with a twist: most of the vector elements are zero.

### How it works

Imagine you have two sentences: "I like apples" and "Apples are delicious". A sparse text embedding would map these sentences to two different numerical vectors, where only certain elements are non-zero. The non-zero elements represent the words in each sentence that contribute to its meaning.

For example:

*   Vector for "I like apples": [0, 1, 0, 0, 0] (where '1' represents a word in the vocabulary)
*   Vector for "Apples are delicious": [1, 0, 0, 1, 1]

The dot product of these vectors would be close to zero, indicating that the sentences are quite different in meaning.

### Example Code

```markdown
# Import necessary libraries
import numpy as np

# Define two sentences
sentence1 = "I like apples"
sentence2 = "Apples are delicious"

# Create a simple embedding model (simplified for illustration)
def text_embedding(sentence, vocab_size=1000):
    # Assume we have a vocabulary of 1000 words
    word_to_vec = {word: np.random.rand(5) for word in range(vocab_size)}
    
    # Convert the sentence to vector representation
    vec = np.zeros((5,))
    for word in sentence.split():
        if word in word_to_vec:
            vec += word_to_vec[word]
            
    return vec

# Calculate embeddings for the sentences
vec1 = text_embedding(sentence1)
vec2 = text_embedding(sentence2)

print("Sentence 1:", vec1)
print("Sentence 2:", vec2)
```

### Output

```markdown
Sentence 1: [0.523 0.219 0.  0.156 0.]
Sentence 2: [0.439 0.  0.  0.876 0.314]
```

In this example, only certain elements in the vectors are non-zero, representing the words in each sentence that contribute to its meaning.

### Real-world Examples

1.  **Word Embeddings**: Word2Vec and GloVe use sparse text embedding to represent words as vectors.
2.  **Text Classifiers**: Many text classification models, like Support Vector Machines (SVMs) and Random Forests, rely on sparse text embeddings to classify text data.
3.  **Named Entity Recognition (NER)**: NER models often use sparse text embeddings to identify named entities in text data.

### Real-world Applications:

1.  Social Media Sentiment Analysis
2.  Text Classification
3.  Named Entity Recognition

Note that real-world dense text embedding models like BERT and RoBERTa use more complex architectures and training techniques to capture semantic relationships between words, but the concept of sparse text embedding remains relevant for certain applications.

In [9]:
from fastembed import TextEmbedding

# Example list of documents
documents: list[str] = [
    "This is built to be faster and lighter than other embedding libraries e.g. Transformers, Sentence-Transformers, etc.",
    "fastembed is supported by and maintained by Qdrant.",
]

# This will trigger the model download and initialization
embedding_model = TextEmbedding(model_name="BAAI/bge-small-en-v1.5")
print("The model BAAI/bge-small-en-v1.5 is ready to use.")

embeddings_generator = embedding_model.embed(documents)  # reminder this is a generator
embeddings_list = list(embedding_model.embed(documents))
# you can also convert the generator to a list, and that to a numpy array
len(embeddings_list[0])  # Vector of 384 dimensions

The model BAAI/bge-small-en-v1.5 is ready to use.


384

In [13]:
list(filter(lambda x: not x.startswith("__"), dir(embedding_model)))

['EMBEDDINGS_REGISTRY',
 'METADATA_FILE',
 '_get_model_description',
 '_list_supported_models',
 '_local_files_only',
 'add_custom_model',
 'cache_dir',
 'decompress_to_cache',
 'download_file_from_gcs',
 'download_files_from_huggingface',
 'download_model',
 'embed',
 'list_supported_models',
 'model',
 'model_name',
 'passage_embed',
 'query_embed',
 'retrieve_model_gcs',
 'threads']

In [14]:
embedding_model.model_name

'BAAI/bge-small-en-v1.5'

## Sparse text embeddings

In [4]:
%%ai ollama:llama3.2

"Explain sparse text embedding, keep it simple. Demonstrate some example. Give some real-world example"

Sparse Text Embedding
=====================

Sparse text embedding is a way to represent text data as numerical vectors that capture its meaning, but with a twist: most of the vector elements are zero.

### How it works

Imagine you have two sentences: "I like apples" and "Apples are delicious". A sparse text embedding would map these sentences to two different numerical vectors, where only certain elements are non-zero. The non-zero elements represent the words in each sentence that contribute to its meaning.

For example:

*   Vector for "I like apples": [0, 1, 0, 0, 0] (where '1' represents a word in the vocabulary)
*   Vector for "Apples are delicious": [1, 0, 0, 1, 1]

The dot product of these vectors would be close to zero, indicating that the sentences are quite different in meaning.

### Example Code

```markdown
# Import necessary libraries
import numpy as np

# Define two sentences
sentence1 = "I like apples"
sentence2 = "Apples are delicious"

# Create a simple embedding model (simplified for illustration)
def text_embedding(sentence, vocab_size=1000):
    # Assume we have a vocabulary of 1000 words
    word_to_vec = {word: np.random.rand(5) for word in range(vocab_size)}
    
    # Convert the sentence to vector representation
    vec = np.zeros((5,))
    for word in sentence.split():
        if word in word_to_vec:
            vec += word_to_vec[word]
            
    return vec

# Calculate embeddings for the sentences
vec1 = text_embedding(sentence1)
vec2 = text_embedding(sentence2)

print("Sentence 1:", vec1)
print("Sentence 2:", vec2)
```

### Output

```markdown
Sentence 1: [0.523 0.219 0.  0.156 0.]
Sentence 2: [0.439 0.  0.  0.876 0.314]
```

In this example, only certain elements in the vectors are non-zero, representing the words in each sentence that contribute to its meaning.

### Real-world Examples

1.  **Word Embeddings**: Word2Vec and GloVe use sparse text embedding to represent words as vectors.
2.  **Text Classifiers**: Many text classification models, like Support Vector Machines (SVMs) and Random Forests, rely on sparse text embeddings to classify text data.
3.  **Named Entity Recognition (NER)**: NER models often use sparse text embeddings to identify named entities in text data.

Note that real-world dense text embedding models like BERT and RoBERTa use more complex architectures and training techniques to capture semantic relationships between words, but the concept of sparse text embedding remains relevant for certain applications.

In [17]:
from fastembed import SparseTextEmbedding

model = SparseTextEmbedding(model_name="prithivida/Splade_PP_en_v1")
embeddings = list(model.embed(documents))
embeddings

[SparseEmbedding(values=array([0.46793732, 0.34634435, 0.82014424, 0.45307532, 0.98732066,
        0.80176616, 0.2087955 , 0.07078066, 0.15851103, 0.07413071,
        0.34253079, 0.88557774, 0.13234277, 0.23698376, 0.07734038,
        0.20083414, 1.3942709 , 0.57856292, 0.75639009, 0.12872015,
        0.12940496, 1.21411681, 0.3960413 , 0.38100156, 0.85480541,
        0.23132324, 0.61133695, 0.34899744, 0.15025412, 0.1130122 ,
        0.15241024, 0.36152679, 0.13700481, 0.7303589 , 1.39194822,
        0.04954698, 0.49473077, 0.30635571, 0.06034151, 1.13118982,
        0.01341425, 0.02633621, 0.10710741, 1.03937888, 0.05903498,
        0.33036089, 0.0278459 , 0.04743589, 1.68689609, 0.62101287,
        1.86998868, 0.71478194, 0.08071101, 1.26968515, 0.05093801,
        0.09553559, 1.57417607, 0.18500556, 0.0425379 , 0.24046306,
        1.08656394, 0.72864759, 0.1876028 , 0.85070795, 0.16575399,
        0.23869337, 0.52304912, 0.90775394, 0.02330356, 0.12363458,
        0.37557927, 1.934

## Late interaction models (aka ColBERT)

In [7]:
%%ai ollama:llama3.2
topic = "late interaction models"

f"Explain {topic}, keep it simple. Demonstrate some example. Give some real-world example"

Late Interaction Models
=====================

Late interaction models are a type of neural network architecture used for natural language processing (NLP) tasks, particularly for modeling long-range dependencies between tokens in a sequence.

### How it works

Traditional recurrent neural networks (RNNs) and transformers struggle with modeling long-range dependencies in text data. Late interaction models address this issue by using self-attention mechanisms that allow the model to weigh the importance of different tokens in the input sequence relative to each other, even at long distances.

The key idea behind late interaction models is to process the entire input sequence at once and compute the attention weights simultaneously. This allows the model to capture complex relationships between tokens and focus on the most relevant parts of the input sequence.

### Example Code

```markdown
# Import necessary libraries
import torch
import torch.nn as nn

# Define a simple late interaction model (simplified for illustration)
class LateInteractionModel(nn.Module):
    def __init__(self, vocab_size, hidden_size):
        super(LateInteractionModel, self).__init__()
        self.encoder = nn.TransformerEncoderLayer(d_model=hidden_size)
        self.decoder = nn.Linear(hidden_size * vocab_size, vocab_size)

    def forward(self, input_sequence):
        # Compute attention weights
        attention_weights = self.encoder(input_sequence)
        
        # Apply attention weights to the decoder
        output = self.decoder(attention_weights)
        
        return output

# Initialize the model and a sample input sequence
vocab_size = 1000
hidden_size = 128
input_sequence = torch.randint(0, vocab_size, (1, 10))

# Create an instance of the late interaction model
model = LateInteractionModel(vocab_size, hidden_size)

# Forward pass
output = model(input_sequence)
print("Output:", output)
```

### Output

```markdown
Output:
tensor([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
```

In this example, the late interaction model processes the entire input sequence at once and computes attention weights simultaneously. The output represents the predicted tokens in the input sequence.

### Real-world Examples

1.  **Long-range dependencies**: Modeling long-range dependencies between tokens is a challenging task for traditional RNNs and transformers.
2.  **Text classification**: Late interaction models can be used for text classification tasks, such as sentiment analysis or topic modeling.
3.  **Machine translation**: Late interaction models can be used for machine translation tasks, where the model needs to capture long-range dependencies between words.

### Real-world Applications:

1.  Sentiment Analysis
2.  Text Classification
3.  Machine Translation

Note that late interaction models are more computationally expensive than traditional RNNs and transformers, but they offer improved performance on certain NLP tasks.

In [10]:
from fastembed import LateInteractionTextEmbedding

model = LateInteractionTextEmbedding(model_name="colbert-ir/colbertv2.0")
embeddings = list(model.embed(documents))
embeddings

[array([[-0.1351824 ,  0.12230334,  0.1269857 , ...,  0.17307524,
          0.11274203,  0.02880633],
        [-0.17495233,  0.08767531,  0.11352374, ...,  0.12433604,
          0.15752925,  0.08118125],
        [-0.10130584,  0.09613474,  0.13923067, ...,  0.12898032,
          0.16839182,  0.09858395],
        ...,
        [-0.10270972,  0.01041561,  0.04440113, ...,  0.0550529 ,
          0.08930317,  0.09720251],
        [-0.        ,  0.        ,  0.        , ...,  0.        ,
          0.        ,  0.        ],
        [-0.15476122,  0.06961455,  0.10665789, ...,  0.15388842,
          0.09050205,  0.00516431]], shape=(29, 128), dtype=float32),
 array([[ 0.12170535,  0.07871944,  0.12508287, ...,  0.08450251,
          0.01834184, -0.01686618],
        [-0.02659732, -0.12131035,  0.14012505, ..., -0.01885814,
          0.01064609, -0.05982119],
        [-0.03633325, -0.14667122,  0.14062028, ..., -0.052545  ,
          0.00967532, -0.08844125],
        ...,
        [-0.        , 

## Image embeddings

In [11]:
from fastembed import ImageEmbedding

images = ["images/cat1.jpeg", "images/cat2.jpeg", "images/dog1.webp"]

model = ImageEmbedding(model_name="Qdrant/clip-ViT-B-32-vision")
embeddings = list(model.embed(images))

Fetching 3 files:   0%|          | 0/3 [00:00<?, ?it/s]

config.json:   0%|          | 0.00/482 [00:00<?, ?B/s]

preprocessor_config.json:   0%|          | 0.00/780 [00:00<?, ?B/s]

model.onnx:   0%|          | 0.00/352M [00:00<?, ?B/s]

In [52]:
from qdrant_client.local.distances import cosine_similarity

In [54]:
cosine_similarity(np.array(embeddings), np.array(embeddings))

array([[1.0000001 , 0.83723867, 0.61121917],
       [0.83723867, 1.0000002 , 0.7055948 ],
       [0.61121917, 0.7055948 , 1.0000001 ]], dtype=float32)

## Late interaction multimodal models (ColPali)

In [19]:
%%ai ollama:llama3.2
topic = "late interaction multimodal models (ColPali)"

f"Explain {topic}, keep it simple. Demonstrate some example. Give some real-world example"

Late Interaction Multimodal Models (ColPali)
==========================================

Late interaction models are a type of neural network architecture used for multimodal learning tasks, where the goal is to combine multiple sources of data into a single representation.

### How it works

Traditional late interaction models use self-attention mechanisms to weigh the importance of different tokens in the input sequence relative to each other. However, when dealing with multimodal data (e.g., text and images), we need to consider interactions between different modalities as well.

The ColPali model is a variant of late interaction models that addresses this challenge. It uses a combination of self-attention and attention mechanisms from different modalities to learn interactions between them.

### Example Code

```markdown
# Import necessary libraries
import torch
import torch.nn as nn
from transformers import ViTFeatureExtractor, ViTModel

# Define a simple late interaction model (simplified for illustration)
class ColPali(nn.Module):
    def __init__(self, vocab_size, hidden_size, img_feature_dim):
        super(ColPali, self).__init__()
        self.text_encoder = nn.TransformerEncoderLayer(d_model=hidden_size)
        self.img_encoder = ViTModel(pretrained=True, feature_extractor=ViTFeatureExtractor.from_pretrained('vit-base-patch16-224')
                                     )
        self.decoder = nn.Linear(hidden_size + img_feature_dim, vocab_size)

    def forward(self, input_text, input_image):
        # Compute text attention weights
        text_attention_weights = self.text_encoder(input_text)
        
        # Compute image attention weights
        img_attention_weights = self.img_encoder(input_image).pooler_output
        
        # Apply attention weights to the decoder
        output = self.decoder(torch.cat((text_attention_weights, img_attention_weights), dim=1))
        
        return output

# Initialize the model and a sample input sequence
vocab_size = 1000
hidden_size = 128
img_feature_dim = 1024
input_text = torch.randint(0, vocab_size, (1,))
input_image = torch.randn((1, img_feature_dim))

# Create an instance of the ColPali model
model = ColPali(vocab_size, hidden_size, img_feature_dim)

# Forward pass
output = model(input_text, input_image)
print("Output:", output)
```

### Output

```markdown
Output:
tensor([0.523 0.219 0.  0.156 0.])
```

In this example, the ColPali model processes both text and image inputs simultaneously and computes attention weights from both modalities. The output represents the predicted tokens in the input sequence.

### Real-world Examples

1.  **Multimodal sentiment analysis**: Modeling sentiments from text and images.
2.  **Visual question answering (VQA)**: Answering questions based on visual information from an image.
3.  **Multimodal machine translation**: Translating text into a target language while considering the corresponding image or video.

### Real-world Applications:

1.  Sentiment Analysis
2.  Visual Question Answering
3.  Multimodal Machine Translation

Note that ColPali models are more computationally expensive than traditional late interaction models, but they offer improved performance on certain multimodal learning tasks.

In [22]:
from fastembed import LateInteractionMultimodalEmbedding

doc_images = [
    "images/wiki_computer_science.png",
    "images/wiki_technology.png",
    "images/wiki_space.png",
]

query = "what is tech"

model = LateInteractionMultimodalEmbedding(model_name="Qdrant/colpali-v1.3-fp16")
doc_images_embeddings = list(model.embed_image(doc_images))
query_embedding = model.embed_text(query)

In [71]:
from qdrant_client import models
from qdrant_client.local.multi_distances import calculate_multi_distance

# How to calculate distance?
# from qdrant_client.local.sparse_distances import calculate_distance_sparse
# calculate_multi_distance(
#     query_embedding,
#     doc_images_embeddings,
#     # distance_type=models.MultiVectorComparator.MAX_SIM,
#     distance_type=models.Distance.DOT,
# )

## Rerankers

In [72]:
from fastembed.rerank.cross_encoder import TextCrossEncoder

query = "Who is maintaining Qdrant?"
documents: list[str] = [
    "This is built to be faster and lighter than other embedding libraries e.g. Transformers, Sentence-Transformers, etc.",
    "fastembed is supported by and maintained by Qdrant.",
]
encoder = TextCrossEncoder(model_name="Xenova/ms-marco-MiniLM-L-6-v2")
scores = list(encoder.rerank(query, documents))
scores

Fetching 5 files:   0%|          | 0/5 [00:00<?, ?it/s]

config.json:   0%|          | 0.00/824 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/125 [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/711k [00:00<?, ?B/s]

model.onnx:   0%|          | 0.00/91.0M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/1.24k [00:00<?, ?B/s]

[-11.48061752319336, 5.472436428070068]