# HuggingFace Embeddings

- `Hugging Face` cung c·∫•p m·ªôt lo·∫°t c√°c **m√¥ h√¨nh embedding** mi·ªÖn ph√≠, cho ph√©p th·ª±c hi·ªán nhi·ªÅu t√°c v·ª• `embedding` kh√°c nhau m·ªôt c√°ch d·ªÖ d√†ng.
- Trong h∆∞·ªõng d·∫´n n√†y, ch√∫ng ta s·∫Ω s·ª≠ d·ª•ng `langchain_huggingface` ƒë·ªÉ x√¢y d·ª±ng m·ªôt **h·ªá th·ªëng t√¨m ki·∫øm ƒë∆°n gi·∫£n d·ª±a tr√™n embedding vƒÉn b·∫£n**.
- C√°c m√¥ h√¨nh sau s·∫Ω ƒë∆∞·ª£c s·ª≠ d·ª•ng cho **Text Embedding**:

    - 1Ô∏è‚É£ **multilingual-e5-large-instruct**: M·ªôt m√¥ h√¨nh `embedding` ƒëa ng√¥n ng·ªØ d·ª±a tr√™n h∆∞·ªõng d·∫´n.
    - 2Ô∏è‚É£ **multilingual-e5-large**: M·ªôt m√¥ h√¨nh `embedding` ƒëa ng√¥n ng·ªØ m·∫°nh m·∫Ω.
    - 3Ô∏è‚É£ **bge-m3**: ƒê∆∞·ª£c t·ªëi ∆∞u h√≥a cho x·ª≠ l√Ω vƒÉn b·∫£n quy m√¥ l·ªõn.


![](https://raw.githubusercontent.com/aidino/LangChain-OpenTutorial/a589b7f082f5b0a358921d3e6e9a0c8d97978eb8/08-Embedding/assets/03-huggingfaceembeddings-workflow.png)

### üõ†Ô∏è **C√°c c·∫•u h√¨nh sau s·∫Ω ƒë∆∞·ª£c thi·∫øt l·∫≠p**

-   **C√†i ƒë·∫∑t ƒë·∫ßu ra Jupyter Notebook**
    -   Hi·ªÉn th·ªã th√¥ng b√°o l·ªói chu·∫©n (`stderr`) tr·ª±c ti·∫øp thay v√¨ ghi l·∫°i ch√∫ng.
-   **C√†i ƒë·∫∑t c√°c g√≥i c·∫ßn thi·∫øt**
    -   ƒê·∫£m b·∫£o t·∫•t c·∫£ c√°c ph·ª• thu·ªôc c·∫ßn thi·∫øt ƒë√£ ƒë∆∞·ª£c c√†i ƒë·∫∑t.
-   **Thi·∫øt l·∫≠p kh√≥a API**
    -   ƒê·ªãnh c·∫•u h√¨nh kh√≥a API ƒë·ªÉ x√°c th·ª±c.
-   **Thi·∫øt l·∫≠p l·ª±a ch·ªçn thi·∫øt b·ªã PyTorch**
    -   T·ª± ƒë·ªông ch·ªçn thi·∫øt b·ªã t√≠nh to√°n t·ªëi ∆∞u (CPU, CUDA ho·∫∑c MPS).
        -   `{"device": "mps"}`: Th·ª±c hi·ªán t√≠nh to√°n `embedding` b·∫±ng **MPS** thay v√¨ GPU. (D√†nh cho ng∆∞·ªùi d√πng Mac)
        -   `{"device": "cuda"}`: Th·ª±c hi·ªán t√≠nh to√°n `embedding` b·∫±ng **GPU**. (D√†nh cho ng∆∞·ªùi d√πng Linux v√† Windows, y√™u c·∫ßu c√†i ƒë·∫∑t CUDA)
        -   `{"device": "cpu"}`: Th·ª±c hi·ªán t√≠nh to√°n `embedding` b·∫±ng **CPU**. (Kh·∫£ d·ª•ng cho t·∫•t c·∫£ ng∆∞·ªùi d√πng)
-   **ƒê∆∞·ªùng d·∫´n l∆∞u tr·ªØ c·ª•c b·ªô m√¥ h√¨nh Embedding**
    -   X√°c ƒë·ªãnh ƒë∆∞·ªùng d·∫´n c·ª•c b·ªô ƒë·ªÉ l∆∞u tr·ªØ c√°c m√¥ h√¨nh `embedding`.


In [1]:
# Automatically select the appropriate device
import torch
import platform


def get_device():
    if platform.system() == "Darwin":  # macOS specific
        if hasattr(torch.backends, "mps") and torch.backends.mps.is_available():
            print("‚úÖ Using MPS (Metal Performance Shaders) on macOS")
            return "mps"
    if torch.cuda.is_available():
        print("‚úÖ Using CUDA (NVIDIA GPU)")
        return "cuda"
    else:
        print("‚úÖ Using CPU")
        return "cpu"


# Set the device
device = get_device()
print("üñ•Ô∏è Current device in use:", device)

‚úÖ Using CPU
üñ•Ô∏è Current device in use: cpu


In [2]:
# Embedding Model Local Storage Path
import os
import warnings

# Ignore warnings
warnings.filterwarnings("ignore")

# Set the download path to ./cache/
os.environ["HF_HOME"] = "./cache/"

## Data Preparation for Embedding-Based Search Tutorial

To perform **embedding-based search,** we prepare both a **Query** and **Documents.**  

1. Query  
- Write a **key question** that will serve as the basis for the search.  

In [3]:
# Query
q = "Please tell me more about LangChain."

2. Documents  
- Prepare **multiple documents (texts)** that will serve as the target for the search.  
- Each document will be **embedded** to enable semantic search capabilities.  

In [4]:
# Documents for Text Embedding
docs = [
    "Hi, nice to meet you.",
    "LangChain simplifies the process of building applications with large language models.",
    "The LangChain English tutorial is structured based on LangChain's official documentation, cookbook, and various practical examples to help users utilize LangChain more easily and effectively.",
    "LangChain simplifies the process of building applications with large-scale language models.",
    "Retrieval-Augmented Generation (RAG) is an effective technique for improving AI responses.",
]

## Which Text Embedding Model Should You Use?
- Leverage the **MTEB leaderboard** and **free embedding models** to confidently select and utilize the **best-performing text embedding models** for your projects! üöÄ  

---

### üöÄ **What is MTEB (Massive Text Embedding Benchmark)?**  
- **MTEB** is a benchmark designed to **systematically and objectively evaluate** the performance of text embedding models.  
    - **Purpose:** To **fairly compare** the performance of embedding models.  
    - **Evaluation Tasks:** Includes tasks like **Classification,**  **Retrieval,**  **Clustering,**  and **Semantic Similarity.**  
    - **Supported Models:** A wide range of **text embedding models available on Hugging Face.**  
    - **Results:** Displayed as **scores,**  with top-performing models ranked on the **leaderboard.**  

üîó [ **MTEB Leaderboard (Hugging Face)** ](https://huggingface.co/spaces/mteb/leaderboard)  

---

### üõ†Ô∏è **Models Used in This Tutorial**  

| **Embedding Model** | **Description** |
|----------|----------|
| 1Ô∏è‚É£ **multilingual-e5-large-instruct** | Offers strong multilingual support with consistent results. |
| 2Ô∏è‚É£ **multilingual-e5-large** | A powerful multilingual embedding model. |
| 3Ô∏è‚É£ **bge-m3** | Optimized for large-scale text processing, excelling in retrieval and semantic similarity tasks. |

1Ô∏è‚É£ **multilingual-e5-large-instruct**
![](./assets/03-huggingfaceembeddings-leaderboard-01.png)

2Ô∏è‚É£ **multilingual-e5-large**
![](./assets/03-huggingfaceembeddings-leaderboard-02.png)

3Ô∏è‚É£ **bge-m3**
![](./assets/03-huggingfaceembeddings-leaderboard-03.png)

## T√≠nh to√°n ƒë·ªô t∆∞∆°ng ƒë·ªìng - Similarity Calculation

**T√≠nh to√°n ƒë·ªô t∆∞∆°ng ƒë·ªìng b·∫±ng t√≠ch v√¥ h∆∞·ªõng vector**

-   ƒê·ªô t∆∞∆°ng ƒë·ªìng ƒë∆∞·ª£c x√°c ƒë·ªãnh b·∫±ng **t√≠ch v√¥ h∆∞·ªõng** c·ªßa c√°c vector.

-   **C√¥ng th·ª©c t√≠nh to√°n ƒë·ªô t∆∞∆°ng ƒë·ªìng:**

$$ \text{similarities} = \mathbf{query} \cdot \mathbf{documents}^T $$

---

### üìê **√ù nghƒ©a to√°n h·ªçc c·ªßa t√≠ch v√¥ h∆∞·ªõng vector**

**ƒê·ªãnh nghƒ©a t√≠ch v√¥ h∆∞·ªõng vector**

**T√≠ch v√¥ h∆∞·ªõng** c·ªßa hai vector, $\mathbf{a}$ v√† $\mathbf{b}$, ƒë∆∞·ª£c ƒë·ªãnh nghƒ©a to√°n h·ªçc nh∆∞ sau:

$$ \mathbf{a} \cdot \mathbf{b} = \sum_{i=1}^{n} a_i b_i $$

---

**M·ªëi quan h·ªá v·ªõi ƒë·ªô t∆∞∆°ng ƒë·ªìng cosine**

**T√≠ch v√¥ h∆∞·ªõng** c≈©ng li√™n quan ƒë·∫øn **ƒë·ªô t∆∞∆°ng ƒë·ªìng cosine** v√† tu√¢n theo t√≠nh ch·∫•t sau:

$$ \mathbf{a} \cdot \mathbf{b} = \|\mathbf{a}\| \|\mathbf{b}\| \cos \theta $$

Trong ƒë√≥:

-   $\|\mathbf{a}\|$ v√† $\|\mathbf{b}\|$ ƒë·∫°i di·ªán cho **ƒë·ªô l·ªõn** (**chu·∫©n**, c·ª• th·ªÉ l√† chu·∫©n Euclidean) c·ªßa c√°c vector $\mathbf{a}$ v√† $\mathbf{b}$.
-   $\theta$ l√† **g√≥c gi·ªØa hai vector**.
-   $\cos \theta$ ƒë·∫°i di·ªán cho **ƒë·ªô t∆∞∆°ng ƒë·ªìng cosine** gi·ªØa hai vector.

---

**üîç Gi·∫£i th√≠ch t√≠ch v√¥ h∆∞·ªõng vector trong ƒë·ªô t∆∞∆°ng ƒë·ªìng**

Khi **gi√° tr·ªã t√≠ch v√¥ h∆∞·ªõng l·ªõn** (gi√° tr·ªã d∆∞∆°ng l·ªõn):

-   **ƒê·ªô l·ªõn** ($\|\mathbf{a}\|$ v√† $\|\mathbf{b}\|$) c·ªßa hai vector l·ªõn.
-   **G√≥c** ($\theta$) gi·ªØa hai vector nh·ªè (**$\cos \theta$ ti·∫øn g·∫ßn 1**).

ƒêi·ªÅu n√†y cho th·∫•y hai vector ch·ªâ theo **h∆∞·ªõng t∆∞∆°ng t·ª±** v√† **t∆∞∆°ng ƒë·ªìng v·ªÅ ng·ªØ nghƒ©a h∆°n**, ƒë·∫∑c bi·ªát khi ƒë·ªô l·ªõn c·ªßa ch√∫ng c≈©ng l·ªõn.

---

### üìè **T√≠nh to√°n ƒë·ªô l·ªõn vector (chu·∫©n)**

**ƒê·ªãnh nghƒ©a chu·∫©n Euclidean**

ƒê·ªëi v·ªõi vector $\mathbf{a} = [a_1, a_2, \ldots, a_n]$, **chu·∫©n Euclidean** $\|\mathbf{a}\|$ ƒë∆∞·ª£c t√≠nh nh∆∞ sau:

$$ \|\mathbf{a}\| = \sqrt{a_1^2 + a_2^2 + \cdots + a_n^2} $$

**ƒê·ªô l·ªõn** n√†y ƒë·∫°i di·ªán cho **chi·ªÅu d√†i** ho·∫∑c **k√≠ch th∆∞·ªõc** c·ªßa vector trong kh√¥ng gian ƒëa chi·ªÅu.

---

Hi·ªÉu ƒë∆∞·ª£c nh·ªØng n·ªÅn t·∫£ng to√°n h·ªçc n√†y gi√∫p ƒë·∫£m b·∫£o t√≠nh to√°n ƒë·ªô t∆∞∆°ng ƒë·ªìng ch√≠nh x√°c, cho ph√©p hi·ªáu su·∫•t t·ªët h∆°n trong c√°c t√°c v·ª• nh∆∞ **t√¨m ki·∫øm ng·ªØ nghƒ©a**, **h·ªá th·ªëng truy xu·∫•t** v√† **c√¥ng c·ª• ƒë·ªÅ xu·∫•t**. üöÄ


----
### Similarity calculation between `embedded_query` and `embedded_document` 
- `embed_documents` : For embedding multiple texts (documents)
- `embed_query` : For embedding a single text (query)

We've implemented a method to search for the most relevant documents using **text embeddings.** 
- Let's use `search_similar_documents(q, docs, hf_embeddings)` to find the most relevant documents.

In [5]:
import numpy as np


def search_similar_documents(q, docs, hf_embeddings):
    """
    Search for the most relevant documents based on a query using text embeddings.

    Args:
        q (str): The query string for which relevant documents are to be found.
        docs (list of str): A list of document strings to compare against the query.
        hf_embeddings: An embedding model object with `embed_query` and `embed_documents` methods.

    Returns:
        tuple:
            - embedded_query (numpy.ndarray): The embedding vector of the query.
            - embedded_documents (numpy.ndarray): The embedding matrix of the documents.

    Workflow:
        1. Embed the query string into a numerical vector using `embed_query`.
        2. Embed each document into numerical vectors using `embed_documents`.
        3. Calculate similarity scores between the query and documents using the dot product.
        4. Sort the documents based on their similarity scores in descending order.
        5. Print the query and display the sorted documents by their relevance.
        6. Return the query and document embeddings for further analysis if needed.
    """
    # Embed the query and documents using the embedding model
    embedded_query = hf_embeddings.embed_query(q)
    embedded_documents = hf_embeddings.embed_documents(docs)

    # Calculate similarity scores using dot product
    similarity_scores = np.array(embedded_query) @ np.array(embedded_documents).T

    # Sort documents by similarity scores in descending order
    sorted_idx = similarity_scores.argsort()[::-1]

    # Display the results
    print(f"[Query] {q}\n" + "=" * 40)
    for i, idx in enumerate(sorted_idx):
        print(f"[{i}] {docs[idx]}")
        print()

    # Return embeddings for potential further processing or analysis
    return embedded_query, embedded_documents

## T·ªïng quan v·ªÅ HuggingFaceEndpointEmbeddings

**HuggingFaceEndpointEmbeddings** l√† m·ªôt t√≠nh nƒÉng trong th∆∞ vi·ªán **LangChain** t·∫≠n d·ª•ng **ƒëi·ªÉm cu·ªëi Hugging Face Inference API** ƒë·ªÉ t·∫°o ra c√°c `embeddings` vƒÉn b·∫£n m·ªôt c√°ch li·ªÅn m·∫°ch.

---

### üìö **C√°c kh√°i ni·ªám ch√≠nh**

1.  **Hugging Face Inference API**
    -   Truy c·∫≠p c√°c m√¥ h√¨nh `embedding` ƒë∆∞·ª£c ƒë√†o t·∫°o tr∆∞·ªõc th√¥ng qua API c·ªßa Hugging Face.
    -   Kh√¥ng c·∫ßn t·∫£i xu·ªëng m√¥ h√¨nh c·ª•c b·ªô; `embeddings` ƒë∆∞·ª£c t·∫°o tr·ª±c ti·∫øp th√¥ng qua API.

2.  **T√≠ch h·ª£p LangChain**
    -   D·ªÖ d√†ng t√≠ch h·ª£p k·∫øt qu·∫£ `embedding` v√†o quy tr√¨nh l√†m vi·ªác LangChain b·∫±ng giao di·ªán ti√™u chu·∫©n h√≥a c·ªßa n√≥.

3.  **Tr∆∞·ªùng h·ª£p s·ª≠ d·ª•ng**
    -   T√≠nh to√°n ƒë·ªô t∆∞∆°ng ƒë·ªìng truy v·∫•n vƒÉn b·∫£n v√† t√†i li·ªáu
    -   H·ªá th·ªëng t√¨m ki·∫øm v√† ƒë·ªÅ xu·∫•t
    -   C√°c ·ª©ng d·ª•ng Hi·ªÉu ng√¥n ng·ªØ t·ª± nhi√™n (NLU)

---

### ‚öôÔ∏è **C√°c tham s·ªë ch√≠nh**

-   `model`: ID m√¥ h√¨nh Hugging Face (v√≠ d·ª•: `BAAI/bge-m3`)
-   `task`: T√°c v·ª• c·∫ßn th·ª±c hi·ªán (th∆∞·ªùng l√† `"feature-extraction"`)
-   `api_key`: M√£ th√¥ng b√°o API Hugging Face c·ªßa b·∫°n
-   `model_kwargs`: C√°c tham s·ªë c·∫•u h√¨nh m√¥ h√¨nh b·ªï sung

---

### üí° **∆Øu ƒëi·ªÉm**

-   **Kh√¥ng c·∫ßn t·∫£i xu·ªëng m√¥ h√¨nh c·ª•c b·ªô:** Truy c·∫≠p t·ª©c th√¨ qua API.
-   **Kh·∫£ nƒÉng m·ªü r·ªông:** H·ªó tr·ª£ nhi·ªÅu m√¥ h√¨nh Hugging Face ƒë∆∞·ª£c ƒë√†o t·∫°o tr∆∞·ªõc.
-   **T√≠ch h·ª£p li·ªÅn m·∫°ch:** D·ªÖ d√†ng t√≠ch h·ª£p `embeddings` v√†o quy tr√¨nh l√†m vi·ªác LangChain.

---

### ‚ö†Ô∏è **L∆∞u √Ω**

-   **H·ªó tr·ª£ API:** Kh√¥ng ph·∫£i t·∫•t c·∫£ c√°c m√¥ h√¨nh ƒë·ªÅu h·ªó tr·ª£ suy lu·∫≠n API.
-   **T·ªëc ƒë·ªô & Chi ph√≠:** API mi·ªÖn ph√≠ c√≥ th·ªÉ c√≥ th·ªùi gian ph·∫£n h·ªìi ch·∫≠m h∆°n v√† gi·ªõi h·∫°n s·ª≠ d·ª•ng.

---

V·ªõi **HuggingFaceEndpointEmbeddings**, b·∫°n c√≥ th·ªÉ d·ªÖ d√†ng t√≠ch h·ª£p c√°c m√¥ h√¨nh `embedding` m·∫°nh m·∫Ω c·ªßa Hugging Face v√†o **quy tr√¨nh l√†m vi·ªác LangChain** c·ªßa m√¨nh ƒë·ªÉ c√≥ c√°c gi·∫£i ph√°p NLP hi·ªáu qu·∫£ v√† c√≥ kh·∫£ nƒÉng m·ªü r·ªông. üöÄ


---
Let‚Äôs use the `intfloat/multilingual-e5-large-instruct` model via the API to search for the most relevant documents using text embeddings.

- [intfloat/multilingual-e5-large-instruct](https://huggingface.co/intfloat/multilingual-e5-large-instruct)

In [10]:
from dotenv import load_dotenv
load_dotenv(override=True, dotenv_path="../.env")

True

In [11]:
from langchain_huggingface.embeddings import HuggingFaceEndpointEmbeddings

model_name = "intfloat/multilingual-e5-large-instruct"

hf_endpoint_embeddings = HuggingFaceEndpointEmbeddings(
    model=model_name,
    task="feature-extraction",
    huggingfacehub_api_token=os.environ["HUGGINGFACEHUB_API_TOKEN"],
)

Search for the most relevant documents based on a query using text embeddings.

In [12]:
%%time
# Embed the query and documents using the embedding model
embedded_query = hf_endpoint_embeddings.embed_query(q)
embedded_documents = hf_endpoint_embeddings.embed_documents(docs)

CPU times: user 8.07 ms, sys: 1.03 ms, total: 9.1 ms
Wall time: 1.23 s


In [13]:
# Calculate similarity scores using dot product
similarity_scores = np.array(embedded_query) @ np.array(embedded_documents).T

# Sort documents by similarity scores in descending order
sorted_idx = similarity_scores.argsort()[::-1]

In [14]:
# Display the results
print(f"[Query] {q}\n" + "=" * 40)
for i, idx in enumerate(sorted_idx):
    print(f"[{i}] {docs[idx]}")
    print()

[Query] Please tell me more about LangChain.
[0] LangChain simplifies the process of building applications with large language models.

[1] LangChain simplifies the process of building applications with large-scale language models.

[2] The LangChain English tutorial is structured based on LangChain's official documentation, cookbook, and various practical examples to help users utilize LangChain more easily and effectively.

[3] Retrieval-Augmented Generation (RAG) is an effective technique for improving AI responses.

[4] Hi, nice to meet you.



In [15]:
print("[HuggingFace Endpoint Embedding]")
print(f"Model: \t\t{model_name}")
print(f"Document Dimension: \t{len(embedded_documents[0])}")
print(f"Query Dimension: \t{len(embedded_query)}")

[HuggingFace Endpoint Embedding]
Model: 		intfloat/multilingual-e5-large-instruct
Document Dimension: 	1024
Query Dimension: 	1024


Ch√∫ng ta c√≥ th·ªÉ x√°c minh r·∫±ng k√≠ch th∆∞·ªõc c·ªßa `embedded_documents` v√† `embedded_query` nh·∫•t qu√°n.

B·∫°n c≈©ng c√≥ th·ªÉ th·ª±c hi·ªán t√¨m ki·∫øm b·∫±ng ph∆∞∆°ng th·ª©c `search_similar_documents` m√† ch√∫ng ta ƒë√£ tri·ªÉn khai tr∆∞·ªõc ƒë√≥.
T·ª´ b√¢y gi·ªù, h√£y s·ª≠ d·ª•ng ph∆∞∆°ng th·ª©c n√†y cho c√°c t√¨m ki·∫øm c·ªßa ch√∫ng ta.


In [16]:
%%time
embedded_query, embedded_documents = search_similar_documents(q, docs, hf_endpoint_embeddings)

[Query] Please tell me more about LangChain.
[0] LangChain simplifies the process of building applications with large language models.

[1] LangChain simplifies the process of building applications with large-scale language models.

[2] The LangChain English tutorial is structured based on LangChain's official documentation, cookbook, and various practical examples to help users utilize LangChain more easily and effectively.

[3] Retrieval-Augmented Generation (RAG) is an effective technique for improving AI responses.

[4] Hi, nice to meet you.

CPU times: user 9.11 ms, sys: 54 Œºs, total: 9.16 ms
Wall time: 709 ms


## HuggingFaceEmbeddings Overview

- **HuggingFaceEmbeddings** l√† m·ªôt t√≠nh nƒÉng trong th∆∞ vi·ªán **LangChain** cho ph√©p chuy·ªÉn ƒë·ªïi d·ªØ li·ªáu vƒÉn b·∫£n th√†nh vect∆° b·∫±ng c√°ch s·ª≠ d·ª•ng **Hugging Face embedding models.**
- L·ªõp n√†y t·∫£i xu·ªëng v√† v·∫≠n h√†nh c√°c m√¥ h√¨nh Hugging Face **locally** ƒë·ªÉ x·ª≠ l√Ω hi·ªáu qu·∫£.

---

### üìö **C√°c kh√°i ni·ªám ch√≠nh**

1. **Hugging Face Pre-trained Models**
   - S·ª≠ d·ª•ng c√°c m√¥ h√¨nh embedding ƒë∆∞·ª£c hu·∫•n luy·ªán tr∆∞·ªõc (pre-trained) do Hugging Face cung c·∫•p.
   - T·∫£i xu·ªëng c√°c m√¥ h√¨nh **locally** ƒë·ªÉ th·ª±c hi·ªán tr·ª±c ti·∫øp c√°c ho·∫°t ƒë·ªông embedding.

2. **LangChain Integration**
   - T√≠ch h·ª£p li·ªÅn m·∫°ch v·ªõi quy tr√¨nh l√†m vi·ªác c·ªßa LangChain b·∫±ng giao di·ªán ti√™u chu·∫©n h√≥a c·ªßa n√≥.

3. **Use Cases**
   - T√≠nh to√°n ƒë·ªô t∆∞∆°ng t·ª± gi·ªØa truy v·∫•n vƒÉn b·∫£n v√† t√†i li·ªáu
   - H·ªá th·ªëng t√¨m ki·∫øm v√† ƒë·ªÅ xu·∫•t
   - ·ª®ng d·ª•ng Natural Language Understanding (NLU)

---

### ‚öôÔ∏è **C√°c tham s·ªë ch√≠nh**

- `model_name`: ID m√¥ h√¨nh Hugging Face (v√≠ d·ª•: `sentence-transformers/all-MiniLM-L6-v2`)
- `model_kwargs`: C√°c tham s·ªë c·∫•u h√¨nh m√¥ h√¨nh b·ªï sung (v√≠ d·ª•: c√†i ƒë·∫∑t thi·∫øt b·ªã GPU/CPU)
- `encode_kwargs`: C√°c c√†i ƒë·∫∑t b·ªï sung cho vi·ªác t·∫°o embedding

---

### üí° **∆Øu ƒëi·ªÉm**

- **Local Embedding Operations:** Th·ª±c hi·ªán embedding c·ª•c b·ªô m√† kh√¥ng c·∫ßn k·∫øt n·ªëi internet.
- **High Performance:** S·ª≠ d·ª•ng c√†i ƒë·∫∑t GPU ƒë·ªÉ t·∫°o embedding nhanh h∆°n.
- **Model Variety:** H·ªó tr·ª£ nhi·ªÅu lo·∫°i m√¥ h√¨nh Hugging Face.

---

### ‚ö†Ô∏è **L∆∞u √Ω**

- **Local Storage Requirement:** C√°c m√¥ h√¨nh ƒë∆∞·ª£c hu·∫•n luy·ªán tr∆∞·ªõc ph·∫£i ƒë∆∞·ª£c t·∫£i xu·ªëng c·ª•c b·ªô.
- **Environment Configuration:** Hi·ªáu su·∫•t c√≥ th·ªÉ thay ƒë·ªïi t√πy thu·ªôc v√†o c√†i ƒë·∫∑t thi·∫øt b·ªã GPU/CPU.

---

V·ªõi **HuggingFaceEmbeddings**, b·∫°n c√≥ th·ªÉ s·ª≠ d·ª•ng hi·ªáu qu·∫£ **Hugging Face's powerful embedding models** trong m√¥i tr∆∞·ªùng **local**, cho ph√©p c√°c gi·∫£i ph√°p NLP linh ho·∫°t v√† c√≥ kh·∫£ nƒÉng m·ªü r·ªông. üöÄ


---
Let's download the embedding model locally, perform embeddings, and search for the most relevant documents.

`intfloat/multilingual-e5-large-instruct` 

- [intfloat/multilingual-e5-large-instruct](https://huggingface.co/intfloat/multilingual-e5-large-instruct)

In [17]:
from langchain_huggingface.embeddings import HuggingFaceEmbeddings

model_name = "intfloat/multilingual-e5-large-instruct"

hf_embeddings_e5_instruct = HuggingFaceEmbeddings(
    model_name=model_name,
    model_kwargs={"device": device},  # mps, cuda, cpu
    encode_kwargs={"normalize_embeddings": True},
)

modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/128 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/140k [00:00<?, ?B/s]

sentence_xlm-roberta_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/690 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/1.12G [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/1.18k [00:00<?, ?B/s]

sentencepiece.bpe.model:   0%|          | 0.00/5.07M [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/17.1M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/964 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/271 [00:00<?, ?B/s]

In [18]:
%%time
embedded_query, embedded_documents = search_similar_documents(q, docs, hf_embeddings_e5_instruct)

[Query] Please tell me more about LangChain.
[0] LangChain simplifies the process of building applications with large language models.

[1] LangChain simplifies the process of building applications with large-scale language models.

[2] The LangChain English tutorial is structured based on LangChain's official documentation, cookbook, and various practical examples to help users utilize LangChain more easily and effectively.

[3] Retrieval-Augmented Generation (RAG) is an effective technique for improving AI responses.

[4] Hi, nice to meet you.

CPU times: user 10.6 s, sys: 0 ns, total: 10.6 s
Wall time: 681 ms


In [19]:
print(f"Model: \t\t{model_name}")
print(f"Document Dimension: \t{len(embedded_documents[0])}")
print(f"Query Dimension: \t{len(embedded_query)}")

Model: 		intfloat/multilingual-e5-large-instruct
Document Dimension: 	1024
Query Dimension: 	1024


## FlagEmbedding Usage Guide


- **FlagEmbedding** l√† m·ªôt framework embedding ti√™n ti·∫øn ƒë∆∞·ª£c ph√°t tri·ªÉn b·ªüi **BAAI (Beijing Academy of Artificial Intelligence).**
- N√≥ h·ªó tr·ª£ **various embedding approaches** v√† ch·ªß y·∫øu ƒë∆∞·ª£c s·ª≠ d·ª•ng v·ªõi m√¥ h√¨nh **BGE (BAAI General Embedding).**
- FlagEmbedding v∆∞·ª£t tr·ªôi trong c√°c nhi·ªám v·ª• nh∆∞ **semantic search**, **natural language processing (NLP)** v√† **recommendation systems.**

---

### üìö **C√°c kh√°i ni·ªám c·ªët l√µi c·ªßa FlagEmbedding**

1Ô∏è‚É£ `Dense Embedding`
- ƒê·ªãnh nghƒ©a: Bi·ªÉu di·ªÖn √Ω nghƒ©a t·ªïng th·ªÉ c·ªßa m·ªôt vƒÉn b·∫£n d∆∞·ªõi d·∫°ng m·ªôt vect∆° m·∫≠t ƒë·ªô cao duy nh·∫•t.
- ∆Øu ƒëi·ªÉm: N·∫Øm b·∫Øt hi·ªáu qu·∫£ s·ª± t∆∞∆°ng ƒë·ªìng v·ªÅ ng·ªØ nghƒ©a.
- Tr∆∞·ªùng h·ª£p s·ª≠ d·ª•ng: T√¨m ki·∫øm ng·ªØ nghƒ©a, t√≠nh to√°n ƒë·ªô t∆∞∆°ng ƒë·ªìng c·ªßa t√†i li·ªáu.

2Ô∏è‚É£ `Lexical Embedding`
- ƒê·ªãnh nghƒ©a: Ph√¢n t√°ch vƒÉn b·∫£n th√†nh c√°c th√†nh ph·∫ßn c·∫•p t·ª´, nh·∫•n m·∫°nh vi·ªác kh·ªõp t·ª´.
- ∆Øu ƒëi·ªÉm: ƒê·∫£m b·∫£o kh·ªõp ch√≠nh x√°c c√°c t·ª´ ho·∫∑c c·ª•m t·ª´ c·ª• th·ªÉ.
- Tr∆∞·ªùng h·ª£p s·ª≠ d·ª•ng: T√¨m ki·∫øm d·ª±a tr√™n t·ª´ kh√≥a, kh·ªõp t·ª´ ch√≠nh x√°c.

3Ô∏è‚É£ `Multi-Vector Embedding`
- ƒê·ªãnh nghƒ©a: Chia m·ªôt t√†i li·ªáu th√†nh nhi·ªÅu vect∆° ƒë·ªÉ bi·ªÉu di·ªÖn.
- ∆Øu ƒëi·ªÉm: Cho ph√©p bi·ªÉu di·ªÖn chi ti·∫øt h∆°n ƒë·ªëi v·ªõi c√°c vƒÉn b·∫£n d√†i ho·∫∑c c√°c ch·ªß ƒë·ªÅ ƒëa d·∫°ng.
- Tr∆∞·ªùng h·ª£p s·ª≠ d·ª•ng: Ph√¢n t√≠ch c·∫•u tr√∫c t√†i li·ªáu ph·ª©c t·∫°p, kh·ªõp ch·ªß ƒë·ªÅ chi ti·∫øt.


---

FlagEmbedding cung c·∫•p m·ªôt **flexible and powerful toolkit** ƒë·ªÉ t·∫≠n d·ª•ng embeddings tr√™n m·ªôt lo·∫°t c√°c **NLP tasks v√† semantic search applications.** üöÄ

ƒêo·∫°n m√£ sau ƒë∆∞·ª£c s·ª≠ d·ª•ng ƒë·ªÉ ki·ªÉm so√°t **tokenizer parallelism** trong th∆∞ vi·ªán `transformers` c·ªßa Hugging Face:

- `TOKENIZERS_PARALLELISM = "true"` ‚Üí **Optimized for speed**, ph√π h·ª£p cho x·ª≠ l√Ω d·ªØ li·ªáu quy m√¥ l·ªõn.
- `TOKENIZERS_PARALLELISM = "false"` ‚Üí **Ensures stability**, ngƒÉn ng·ª´a xung ƒë·ªôt v√† ƒëi·ªÅu ki·ªán ch·∫°y ƒëua (race conditions).


In [20]:
import os

os.environ["TOKENIZERS_PARALLELISM"] = "true"  # "false"

In [21]:
# install FlagEmbedding
%pip install -qU FlagEmbedding

Note: you may need to restart the kernel to use updated packages.


### ‚öôÔ∏è **Tham s·ªë ch√≠nh**

`BGEM3FlagModel`
- `model_name`: **Model ID** c·ªßa Hugging Face (v√≠ d·ª•: `BAAI/bge-m3`).
- `use_fp16`: Khi ƒë∆∞·ª£c ƒë·∫∑t th√†nh **True**, gi·∫£m **memory usage** v√† c·∫£i thi·ªán **encoding speed.**

`bge_embeddings.encode`
- `batch_size`: X√°c ƒë·ªãnh **number of documents** c·∫ßn x·ª≠ l√Ω c√πng m·ªôt l√∫c.
- `max_length`: ƒê·∫∑t **maximum token length** cho vi·ªác encoding documents.
    - TƒÉng l√™n cho c√°c t√†i li·ªáu d√†i h∆°n ƒë·ªÉ ƒë·∫£m b·∫£o m√£ h√≥a to√†n b·ªô n·ªôi dung.
    - Gi√° tr·ªã qu√° l·ªõn c√≥ th·ªÉ **degrade performance.**
- `return_dense`: Khi ƒë∆∞·ª£c ƒë·∫∑t th√†nh **True**, ch·ªâ tr·∫£ v·ªÅ **Dense Vectors.**
- `return_sparse`: Khi ƒë∆∞·ª£c ƒë·∫∑t th√†nh **True**, tr·∫£ v·ªÅ **Sparse Vectors.**
- `return_colbert_vecs`: Khi ƒë∆∞·ª£c ƒë·∫∑t th√†nh **True**, tr·∫£ v·ªÅ **ColBERT-style vectors.**

### 1Ô∏è‚É£ **V√≠ d·ª• v·ªÅ Dense Vector Embedding**
- ƒê·ªãnh nghƒ©a: Bi·ªÉu di·ªÖn √Ω nghƒ©a t·ªïng th·ªÉ c·ªßa m·ªôt vƒÉn b·∫£n d∆∞·ªõi d·∫°ng m·ªôt vect∆° m·∫≠t ƒë·ªô cao duy nh·∫•t.
- ∆Øu ƒëi·ªÉm: N·∫Øm b·∫Øt hi·ªáu qu·∫£ s·ª± t∆∞∆°ng ƒë·ªìng v·ªÅ ng·ªØ nghƒ©a.
- Tr∆∞·ªùng h·ª£p s·ª≠ d·ª•ng: T√¨m ki·∫øm ng·ªØ nghƒ©a, t√≠nh to√°n ƒë·ªô t∆∞∆°ng ƒë·ªìng c·ªßa t√†i li·ªáu.


In [22]:
from FlagEmbedding import BGEM3FlagModel

model_name = "BAAI/bge-m3"

bge_embeddings = BGEM3FlagModel(
    model_name,
    use_fp16=True,  # Enabling fp16 improves encoding speed with minimal precision trade-off.
)

# Encode documents with specified parameters
embedded_documents_dense_vecs = bge_embeddings.encode(
    sentences=docs,
    batch_size=12,
    max_length=8192,  # Reduce this value if your documents are shorter to speed up encoding.
)["dense_vecs"]

# Query Encoding
embedded_query_dense_vecs = bge_embeddings.encode(
    sentences=[q],
    batch_size=12,
    max_length=8192,  # Reduce this value if your documents are shorter to speed up encoding.
)["dense_vecs"]

tokenizer_config.json:   0%|          | 0.00/444 [00:00<?, ?B/s]

sentencepiece.bpe.model:   0%|          | 0.00/5.07M [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/17.1M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/964 [00:00<?, ?B/s]

Fetching 30 files:   0%|          | 0/30 [00:00<?, ?it/s]

colbert_linear.pt:   0%|          | 0.00/2.10M [00:00<?, ?B/s]

config.json:   0%|          | 0.00/687 [00:00<?, ?B/s]

.gitattributes:   0%|          | 0.00/1.63k [00:00<?, ?B/s]

.DS_Store:   0%|          | 0.00/6.15k [00:00<?, ?B/s]

bm25.jpg:   0%|          | 0.00/132k [00:00<?, ?B/s]

README.md:   0%|          | 0.00/15.8k [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/123 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/191 [00:00<?, ?B/s]

long.jpg:   0%|          | 0.00/485k [00:00<?, ?B/s]

mkqa.jpg:   0%|          | 0.00/608k [00:00<?, ?B/s]

nqa.jpg:   0%|          | 0.00/158k [00:00<?, ?B/s]

miracl.jpg:   0%|          | 0.00/576k [00:00<?, ?B/s]

long.jpg:   0%|          | 0.00/127k [00:00<?, ?B/s]

Constant_7_attr__value:   0%|          | 0.00/65.6k [00:00<?, ?B/s]

others.webp:   0%|          | 0.00/21.0k [00:00<?, ?B/s]

model.onnx:   0%|          | 0.00/725k [00:00<?, ?B/s]

model.onnx_data:   0%|          | 0.00/2.27G [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/17.1M [00:00<?, ?B/s]

config.json:   0%|          | 0.00/698 [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/2.27G [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/1.17k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/54.0 [00:00<?, ?B/s]

sparse_linear.pt:   0%|          | 0.00/3.52k [00:00<?, ?B/s]

modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

You're using a XLMRobertaTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.


In [25]:
embedded_documents_dense_vecs.shape

(5, 1024)

In [26]:
embedded_query_dense_vecs.shape

(1, 1024)

In [27]:
# Calculating Similarity Between Documents and Query
from sklearn.metrics.pairwise import cosine_similarity

similarities = cosine_similarity(
    embedded_query_dense_vecs, embedded_documents_dense_vecs
)
most_similar_idx = similarities.argmax()

# Display the Most Similar Document
print(f"Question: {q}")
print(f"Most similar document: {docs[most_similar_idx]}")

Question: Please tell me more about LangChain.
Most similar document: LangChain simplifies the process of building applications with large language models.


In [28]:
from FlagEmbedding import BGEM3FlagModel

model_name = "BAAI/bge-m3"

bge_embeddings = BGEM3FlagModel(
    model_name,
    use_fp16=True,  # Enabling fp16 improves encoding speed with minimal precision trade-off.
)

# Encode documents with specified parameters
embedded_documents_dense_vecs_default = bge_embeddings.encode(
    sentences=docs, return_dense=True
)["dense_vecs"]

# Query Encoding
embedded_query_dense_vecs_default = bge_embeddings.encode(
    sentences=[q], return_dense=True
)["dense_vecs"]

Fetching 30 files:   0%|          | 0/30 [00:00<?, ?it/s]

You're using a XLMRobertaTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.


In [29]:
# Calculating Similarity Between Documents and Query
from sklearn.metrics.pairwise import cosine_similarity

similarities = cosine_similarity(
    embedded_query_dense_vecs_default, embedded_documents_dense_vecs_default
)
most_similar_idx = similarities.argmax()

# Display the Most Similar Document
print(f"Question: {q}")
print(f"Most similar document: {docs[most_similar_idx]}")

Question: Please tell me more about LangChain.
Most similar document: LangChain simplifies the process of building applications with large language models.


### 2Ô∏è‚É£ **Sparse(Lexical) Vector Embedding Example**

**Sparse Embedding (Tr·ªçng s·ªë t·ª´ v·ª±ng)**
- **Sparse embedding** l√† m·ªôt ph∆∞∆°ng ph√°p embedding s·ª≠ d·ª•ng **high-dimensional vectors where most values are zero.**
- Ph∆∞∆°ng ph√°p s·ª≠ d·ª•ng **lexical weight** t·∫°o ra embeddings b·∫±ng c√°ch xem x√©t **importance of each word.**

**C√°ch th·ª©c ho·∫°t ƒë·ªông**
1.  T√≠nh to√°n **lexical weight** cho m·ªói t·ª´. C√°c k·ªπ thu·∫≠t nh∆∞ **TF-IDF** ho·∫∑c **BM25** c√≥ th·ªÉ ƒë∆∞·ª£c s·ª≠ d·ª•ng.
2.  ƒê·ªëi v·ªõi m·ªói t·ª´ trong m·ªôt t√†i li·ªáu ho·∫∑c truy v·∫•n, g√°n m·ªôt gi√° tr·ªã cho chi·ªÅu t∆∞∆°ng ·ª©ng c·ªßa **sparse vector** d·ª±a tr√™n lexical weight c·ªßa n√≥.
3.  K·∫øt qu·∫£ l√†, c√°c t√†i li·ªáu v√† truy v·∫•n ƒë∆∞·ª£c bi·ªÉu di·ªÖn d∆∞·ªõi d·∫°ng **high-dimensional vectors where most values are zero.**

**∆Øu ƒëi·ªÉm**
- Ph·∫£n √°nh tr·ª±c ti·∫øp **importance of words.**
- Cho ph√©p **precise matching** c·ªßa c√°c t·ª´ ho·∫∑c c·ª•m t·ª´ c·ª• th·ªÉ.
- **Faster computation** so v·ªõi dense embeddings.


In [30]:
from FlagEmbedding import BGEM3FlagModel

model_name = "BAAI/bge-m3"

bge_embeddings = BGEM3FlagModel(
    model_name,
    use_fp16=True,  # Enabling fp16 improves encoding speed with minimal precision trade-off.
)

# Encode documents with specified parameters
embedded_documents_sparse_vecs = bge_embeddings.encode(
    sentences=docs, return_sparse=True
)

# Query Encoding
embedded_query_sparse_vecs = bge_embeddings.encode(sentences=[q], return_sparse=True)

Fetching 30 files:   0%|          | 0/30 [00:00<?, ?it/s]

You're using a XLMRobertaTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.


In [31]:
lexical_scores_0 = bge_embeddings.compute_lexical_matching_score(
    embedded_query_sparse_vecs["lexical_weights"][0],
    embedded_documents_sparse_vecs["lexical_weights"][0],
)

lexical_scores_1 = bge_embeddings.compute_lexical_matching_score(
    embedded_query_sparse_vecs["lexical_weights"][0],
    embedded_documents_sparse_vecs["lexical_weights"][1],
)

lexical_scores_2 = bge_embeddings.compute_lexical_matching_score(
    embedded_query_sparse_vecs["lexical_weights"][0],
    embedded_documents_sparse_vecs["lexical_weights"][2],
)

lexical_scores_3 = bge_embeddings.compute_lexical_matching_score(
    embedded_query_sparse_vecs["lexical_weights"][0],
    embedded_documents_sparse_vecs["lexical_weights"][3],
)

lexical_scores_4 = bge_embeddings.compute_lexical_matching_score(
    embedded_query_sparse_vecs["lexical_weights"][0],
    embedded_documents_sparse_vecs["lexical_weights"][4],
)

In [32]:
print(f"question: {q}")
print("====================")
for i, doc in enumerate(docs):
    print(doc, f": {eval(f'lexical_scores_{i}')}")

question: Please tell me more about LangChain.
Hi, nice to meet you. : 0.011874185875058174
LangChain simplifies the process of building applications with large language models. : 0.23139647534117103
The LangChain English tutorial is structured based on LangChain's official documentation, cookbook, and various practical examples to help users utilize LangChain more easily and effectively. : 0.1879164595156908
LangChain simplifies the process of building applications with large-scale language models. : 0.22665631817653775
Retrieval-Augmented Generation (RAG) is an effective technique for improving AI responses. : 0.002352734562009573


### 3Ô∏è‚É£ **Multi-Vector(ColBERT) Embedding Example**

**ColBERT** (Contextualized Late Interaction over BERT) l√† m·ªôt ph∆∞∆°ng ph√°p hi·ªáu qu·∫£ cho **document retrieval.**
- Ph∆∞∆°ng ph√°p n√†y s·ª≠ d·ª•ng **multi-vector strategy** ƒë·ªÉ bi·ªÉu di·ªÖn c·∫£ t√†i li·ªáu v√† truy v·∫•n b·∫±ng nhi·ªÅu vect∆°.

**C√°ch th·ª©c ho·∫°t ƒë·ªông**
1.  T·∫°o **separate vector** cho m·ªói **token in a document**, d·∫´n ƒë·∫øn nhi·ªÅu vect∆° cho m·ªói t√†i li·ªáu.
2.  T∆∞∆°ng t·ª±, t·∫°o **separate vector** cho m·ªói **token in a query.**
3.  Trong qu√° tr√¨nh retrieval, t√≠nh to√°n **similarity** gi·ªØa m·ªói vect∆° token truy v·∫•n v√† t·∫•t c·∫£ c√°c vect∆° token t√†i li·ªáu.
4.  T·ªïng h·ª£p c√°c ƒëi·ªÉm t∆∞∆°ng ƒë·ªìng n√†y ƒë·ªÉ t·∫°o ra **final retrieval score.**

**∆Øu ƒëi·ªÉm**
- Cho ph√©p **fine-grained token-level matching.**
- N·∫Øm b·∫Øt **contextual embeddings** m·ªôt c√°ch hi·ªáu qu·∫£.
- Ho·∫°t ƒë·ªông hi·ªáu qu·∫£ ngay c·∫£ v·ªõi **long documents.**


In [33]:
from FlagEmbedding import BGEM3FlagModel

model_name = "BAAI/bge-m3"

bge_embeddings = BGEM3FlagModel(
    model_name,
    use_fp16=True,  # Enabling fp16 improves encoding speed with minimal precision trade-off.
)

# Encode documents with specified parameters
embedded_documents_colbert_vecs = bge_embeddings.encode(
    sentences=docs, return_colbert_vecs=True
)

# Query Encoding
embedded_query_colbert_vecs = bge_embeddings.encode(
    sentences=[q], return_colbert_vecs=True
)

Fetching 30 files:   0%|          | 0/30 [00:00<?, ?it/s]

You're using a XLMRobertaTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.


In [34]:
colbert_scores_0 = bge_embeddings.colbert_score(
    embedded_query_colbert_vecs["colbert_vecs"][0],
    embedded_documents_colbert_vecs["colbert_vecs"][0],
)

colbert_scores_1 = bge_embeddings.colbert_score(
    embedded_query_colbert_vecs["colbert_vecs"][0],
    embedded_documents_colbert_vecs["colbert_vecs"][1],
)

colbert_scores_2 = bge_embeddings.colbert_score(
    embedded_query_colbert_vecs["colbert_vecs"][0],
    embedded_documents_colbert_vecs["colbert_vecs"][2],
)

colbert_scores_3 = bge_embeddings.colbert_score(
    embedded_query_colbert_vecs["colbert_vecs"][0],
    embedded_documents_colbert_vecs["colbert_vecs"][3],
)

colbert_scores_4 = bge_embeddings.colbert_score(
    embedded_query_colbert_vecs["colbert_vecs"][0],
    embedded_documents_colbert_vecs["colbert_vecs"][4],
)

In [35]:
print(f"question: {q}")
print("====================")
for i, doc in enumerate(docs):
    print(doc, f": {eval(f'colbert_scores_{i}')}")

question: Please tell me more about LangChain.
Hi, nice to meet you. : 0.5088493824005127
LangChain simplifies the process of building applications with large language models. : 0.703724205493927
The LangChain English tutorial is structured based on LangChain's official documentation, cookbook, and various practical examples to help users utilize LangChain more easily and effectively. : 0.6633750796318054
LangChain simplifies the process of building applications with large-scale language models. : 0.7055995464324951
Retrieval-Augmented Generation (RAG) is an effective technique for improving AI responses. : 0.38072970509529114


### üí° **∆Øu ƒëi·ªÉm c·ªßa FlagEmbedding**

-   **Diverse Embedding Options:** H·ªó tr·ª£ c√°c ph∆∞∆°ng ph√°p **Dense, Lexical v√† Multi-Vector.**
-   **High-Performance Models:** S·ª≠ d·ª•ng c√°c m√¥ h√¨nh ƒë∆∞·ª£c hu·∫•n luy·ªán tr∆∞·ªõc m·∫°nh m·∫Ω nh∆∞ **BGE.**
-   **Flexibility:** Ch·ªçn ph∆∞∆°ng ph√°p embedding t·ªëi ∆∞u d·ª±a tr√™n **use case** c·ªßa b·∫°n.
-   **Scalability:** C√≥ kh·∫£ nƒÉng th·ª±c hi·ªán embeddings tr√™n **large-scale datasets.**

---

### ‚ö†Ô∏è **C√¢n nh·∫Øc**

-   **Model Size:** M·ªôt s·ªë m√¥ h√¨nh c√≥ th·ªÉ y√™u c·∫ßu **significant storage capacity.**
-   **Resource Requirements:** **GPU usage is recommended** cho c√°c t√≠nh to√°n vect∆° quy m√¥ l·ªõn.
-   **Configuration Needs:** Hi·ªáu su·∫•t t·ªëi ∆∞u c√≥ th·ªÉ y√™u c·∫ßu **parameter tuning.**

---

### üìä **So s√°nh vect∆° FlagEmbedding**

| **Embedding Type** | **Strengths** | **Use Cases** |
|---|---|---|
| **Dense Vector** | Nh·∫•n m·∫°nh s·ª± t∆∞∆°ng ƒë·ªìng ng·ªØ nghƒ©a | Semantic search, document matching |
| **Lexical Vector** | Kh·ªõp t·ª´ ch√≠nh x√°c | Keyword search, exact matches |
| **Multi-Vector** | N·∫Øm b·∫Øt √Ω nghƒ©a ph·ª©c t·∫°p | Long document analysis, topic classification |
