<a href="https://colab.research.google.com/github/run-llama/llama_index/blob/main/docs/docs/examples/finetuning/embeddings/finetune_embedding_adapter.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Finetuning an Adapter on Top of any Black-Box Embedding Model


We have capabilities in LlamaIndex allowing you to fine-tune an adapter on top of embeddings produced from any model (sentence_transformers, OpenAI, and more).

This allows you to transform your embedding representations into a new latent space that's optimized for retrieval over your specific data and queries. This can lead to small increases in retrieval performance that in turn translate to better performing RAG systems.

We do this via our `EmbeddingAdapterFinetuneEngine` abstraction. We fine-tune three types of adapters:
- Linear
- 2-Layer NN
- Custom NN

## Generate Corpus

We use our helper abstractions, `generate_qa_embedding_pairs`, to generate our training and evaluation dataset. This function takes in any set of text nodes (chunks) and generates a structured dataset containing (question, context) pairs.

In [None]:
%pip install llama-index-embeddings-openai
%pip install llama-index-embeddings-adapter
%pip install llama-index-finetuning

In [None]:
import json

from llama_index.core import SimpleDirectoryReader
from llama_index.core.node_parser import SentenceSplitter
from llama_index.core.schema import MetadataMode

Download Data

In [None]:
!mkdir -p 'data/10k/'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/10k/uber_2021.pdf' -O 'data/10k/uber_2021.pdf'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/10k/lyft_2021.pdf' -O 'data/10k/lyft_2021.pdf'

In [None]:
TRAIN_FILES = ["./data/10k/lyft_2021.pdf"]
VAL_FILES = ["./data/10k/uber_2021.pdf"]

TRAIN_CORPUS_FPATH = "./data/train_corpus.json"
VAL_CORPUS_FPATH = "./data/val_corpus.json"

In [None]:
def load_corpus(files, verbose=False):
    if verbose:
        print(f"Loading files {files}")

    reader = SimpleDirectoryReader(input_files=files)
    docs = reader.load_data()
    if verbose:
        print(f"Loaded {len(docs)} docs")

    parser = SentenceSplitter()
    nodes = parser.get_nodes_from_documents(docs, show_progress=verbose)

    if verbose:
        print(f"Parsed {len(nodes)} nodes")

    return nodes

We do a very naive train/val split by having the Lyft corpus as the train dataset, and the Uber corpus as the val dataset.

In [None]:
train_nodes = load_corpus(TRAIN_FILES, verbose=True)
val_nodes = load_corpus(VAL_FILES, verbose=True)

Loading files ['../../../examples/data/10k/lyft_2021.pdf']
Loaded 238 docs


Parsing documents into nodes:   0%|          | 0/238 [00:00<?, ?it/s]

Parsed 349 nodes
Loading files ['../../../examples/data/10k/uber_2021.pdf']
Loaded 307 docs


Parsing documents into nodes:   0%|          | 0/307 [00:00<?, ?it/s]

Parsed 418 nodes


### Generate synthetic queries

Now, we use an LLM (gpt-3.5-turbo) to generate questions using each text chunk in the corpus as context.

Each pair of (generated question, text chunk used as context) becomes a datapoint in the finetuning dataset (either for training or evaluation).

In [None]:
from llama_index.finetuning import generate_qa_embedding_pairs
from llama_index.core.evaluation import EmbeddingQAFinetuneDataset

In [None]:
train_dataset = generate_qa_embedding_pairs(train_nodes)
val_dataset = generate_qa_embedding_pairs(val_nodes)

train_dataset.save_json("train_dataset.json")
val_dataset.save_json("val_dataset.json")

In [None]:
# [Optional] Load
train_dataset = EmbeddingQAFinetuneDataset.from_json("train_dataset.json")
val_dataset = EmbeddingQAFinetuneDataset.from_json("val_dataset.json")

## Scifact benchmark

In [1]:
%pip install llama-index-experimental llama-index-embeddings-huggingface nudge-ft torch datasets

Collecting llama-index-experimental
  Downloading llama_index_experimental-0.5.3-py3-none-any.whl.metadata (885 bytes)
Collecting llama-index-embeddings-huggingface
  Downloading llama_index_embeddings_huggingface-0.5.1-py3-none-any.whl.metadata (767 bytes)
Collecting nudge-ft
  Downloading nudge-ft-0.1.2.tar.gz (8.2 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting datasets
  Downloading datasets-3.2.0-py3-none-any.whl.metadata (20 kB)
Collecting llama-index-core<0.13.0,>=0.12.13 (from llama-index-experimental)
  Downloading llama_index_core-0.12.14-py3-none-any.whl.metadata (2.5 kB)
Collecting llama-index-finetuning<0.4.0,>=0.3.0 (from llama-index-experimental)
  Downloading llama_index_finetuning-0.3.0-py3-none-any.whl.metadata (992 bytes)
Collecting dill<0.3.9,>=0.3.0 (from datasets)
  Downloading dill-0.3.8-py3-none-any.whl.metadata (10 kB)
Collecting xxhash (from datasets)
  Downloading xxhash-3.5.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.meta

In [2]:
from llama_index.finetuning import EmbeddingQAFinetuneDataset
from datasets import load_dataset


def load_hf_dataset(dataset_name):
    hf_dataset_name = f"sepz/{dataset_name}_ft"
    corpus = load_dataset(hf_dataset_name, "data_records", split="train")

    queries_train = load_dataset(hf_dataset_name, "qs", split="train")
    queries_validation = load_dataset(hf_dataset_name, "qs", split="dev")
    queries_test = load_dataset(hf_dataset_name, "qs", split="test")

    qrels_train = load_dataset(hf_dataset_name, "qs_rel", split="train")
    qrels_validation = load_dataset(hf_dataset_name, "qs_rel", split="dev")
    qrels_test = load_dataset(hf_dataset_name, "qs_rel", split="test")

    corpus = {
        str(corpus[i]["record_id"]): corpus[i]["text"]
        for i in range(len(corpus))
    }

    queries_train = {
        str(queries_train[i]["q_id"]): queries_train[i]["input"]
        for i in range(len(queries_train))
    }
    queries_validation = {
        str(r["q_id"]): r["input"] for r in queries_validation
    }
    queries_test = {str(r["q_id"]): r["input"] for r in queries_test}

    qrels_train = (
        qrels_train.to_pandas()
        .groupby("q_id")["record_id"]
        .apply(list)
        .to_dict()
    )
    qrels_validation = (
        qrels_validation.to_pandas()
        .groupby("q_id")["record_id"]
        .apply(list)
        .to_dict()
    )
    qrels_test = (
        qrels_test.to_pandas()
        .groupby("q_id")["record_id"]
        .apply(list)
        .to_dict()
    )
    # convert to strings
    qrels_train = {str(k): [str(i) for i in v] for k, v in qrels_train.items()}
    qrels_validation = {
        str(k): [str(i) for i in v] for k, v in qrels_validation.items()
    }
    qrels_test = {str(k): [str(i) for i in v] for k, v in qrels_test.items()}

    # Load the dataset
    train_dataset = EmbeddingQAFinetuneDataset(
        corpus=corpus, queries=queries_train, relevant_docs=qrels_train
    )
    validation_dataset = EmbeddingQAFinetuneDataset(
        corpus=corpus,
        queries=queries_validation,
        relevant_docs=qrels_validation,
    )
    test_dataset = EmbeddingQAFinetuneDataset(
        corpus=corpus, queries=queries_test, relevant_docs=qrels_test
    )

    return train_dataset, validation_dataset, test_dataset

In [3]:
from llama_index.core.embeddings import resolve_embed_model

train_dataset, val_dataset, test_dataset = load_hf_dataset("scifact")
base_embed_model = resolve_embed_model("local:sentence-transformers/all-MiniLM-L6-v2")

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


README.md:   0%|          | 0.00/814 [00:00<?, ?B/s]

data.parquet:   0%|          | 0.00/4.57M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/5183 [00:00<?, ? examples/s]

Generating dev split:   0%|          | 0/5183 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/5183 [00:00<?, ? examples/s]

qs.parquet:   0%|          | 0.00/48.0k [00:00<?, ?B/s]

qs.parquet:   0%|          | 0.00/10.9k [00:00<?, ?B/s]

qs.parquet:   0%|          | 0.00/17.7k [00:00<?, ?B/s]

Generating train split:   0%|          | 0/778 [00:00<?, ? examples/s]

Generating dev split:   0%|          | 0/110 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/221 [00:00<?, ? examples/s]

qs_rel.parquet:   0%|          | 0.00/9.96k [00:00<?, ?B/s]

qs_rel.parquet:   0%|          | 0.00/3.70k [00:00<?, ?B/s]

qs_rel.parquet:   0%|          | 0.00/4.87k [00:00<?, ?B/s]

Generating train split:   0%|          | 0/868 [00:00<?, ? examples/s]

Generating dev split:   0%|          | 0/129 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/261 [00:00<?, ? examples/s]

The cache for model files in Transformers v4.22.0 has been updated. Migrating your old cache. This is a one-time only operation. You can interrupt this and resume the migration later on by calling `transformers.utils.move_cache()`.


0it [00:00, ?it/s]

modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/10.7k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

1_Pooling/config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

In [4]:
print(val_dataset.queries["2"])


Depletion of nitric oxide is responsible for vasospasm.


In [5]:
print(val_dataset.relevant_docs["2"])


['552']


In [6]:
print(val_dataset.corpus["552"])


CONTEXT Delayed cerebral vasospasm causes permanent neurological deficits or death in at least 15% of patients following otherwise successful treatment for ruptured intracranial aneurysm. Decreased bioavailability of nitric oxide has been associated with the development of cerebral vasospasm. OBJECTIVE To determine whether infusions of nitrite will prevent delayed cerebral vasospasm. DESIGN, SETTING, AND SUBJECTS A total of 14 anesthetized cynomolgus monkeys had an autologous blood clot placed around the right middle cerebral artery. Cerebral arteriography was performed before clot placement and on days 7 and 14 to assess vasospasm. The study was conducted from August 2003 to February 2004. INTERVENTIONS A 90-mg sodium nitrite intravenous solution infused over 24 hours plus a 45-mg sodium nitrite bolus daily (n = 3); a 180-mg sodium nitrite intravenous solution infused over 24 hours (n = 3); or a control saline solution infusion (n = 8). Each was infused continuously for 14 days. MAIN 

## evaluation


In [7]:
from typing import Optional, Dict

import torch
import numpy as np
from tqdm import tqdm
from llama_index.core.schema import TextNode
from llama_index.core.base.embeddings.base import BaseEmbedding
from llama_index.core.base.base_retriever import BaseRetriever
from llama_index.core import VectorStoreIndex


def build_retriever(
    corpus: Dict[str, str],
    embed_model: BaseEmbedding | str,
    corpus_embeddings: Optional[torch.Tensor] = None,
    k: int = 10,
) -> BaseRetriever:
    nodes = []
    for i, (id_, text) in enumerate(corpus.items()):
        if corpus_embeddings is not None:
            nodes.append(
                TextNode(
                    id_=id_, text=text, embedding=corpus_embeddings[i].tolist()
                )
            )
        else:
            nodes.append(TextNode(id_=id_, text=text))

    index = VectorStoreIndex(
        nodes=nodes,
        embeddings=corpus_embeddings,
        embed_model=embed_model,
        show_progress=True,
    )
    return index.as_retriever(similarity_top_k=k)


def ndcg_at_k(
    dataset: EmbeddingQAFinetuneDataset, retriever: BaseRetriever, k: int = 10
):
    queries = dataset.queries
    relevant_docs = dataset.relevant_docs
    ndcg_scores = []
    for query_id, query in tqdm(queries.items()):
        retrieved_nodes = retriever.retrieve(query)
        retrieved_ids = [node.node.node_id for node in retrieved_nodes]
        expected_ids = relevant_docs[query_id]

        # Calculate NDCG
        ideal_dcg = np.sum(
            [1 / np.log2(i + 2) for i in range(min(k, len(expected_ids)))]
        )
        rel_scores = np.zeros(k)
        for j in range(min(k, len(retrieved_ids))):
            if retrieved_ids[j] in expected_ids:
                rel_scores[j] = 1
        dcg = np.sum(
            [rel_scores[i] / np.log2(i + 2) for i in range(len(rel_scores))]
        )
        ndcg = dcg / ideal_dcg if ideal_dcg > 0 else 0

        ndcg_scores.append(ndcg)

    mean_ndcg = np.mean(ndcg_scores)
    return mean_ndcg

## finetune


In [8]:
%%capture
from llama_index.finetuning import EmbeddingAdapterFinetuneEngine

embedding_adapater_finetune_engine = EmbeddingAdapterFinetuneEngine(
    train_dataset,
    base_embed_model,
    epochs=4,
    batch_size=10,
)
embedding_adapater_finetune_engine.finetune()
embedding_adapter_model = (
    embedding_adapater_finetune_engine.get_finetuned_model()
)
ft_retriever = build_retriever(
    train_dataset.corpus, embedding_adapter_model, k=k
)
ft_ndcg_test = ndcg_at_k(test_dataset, ft_retriever, k)

ERROR:root:Internal Python error in the inspect module.
Below is the traceback from this internal error.



Internal Python error in the inspect module.
Below is the traceback from this internal error.



TypeError: object of type 'NoneType' has no len()

## base model


In [None]:
%%capture

base_retriever = build_retriever(train_dataset.corpus, base_embed_model, k=k)
bge_ndcg_test = ndcg_at_k(test_dataset, base_retriever, k)

## tensorised linear adapter


In [14]:
!pip install llama-index-embeddings-adapter --upgrade  # ensures the latest package version is installed
# Instead of importing from 'llama_index.embeddings.adapter_utils', import from 'llama_index.embeddings.adapter'
from llama_index.embeddings.adapter import BaseAdapter # Update the import statement to reflect the new location of BaseAdapter
import torch.nn.functional as F
from torch import nn, Tensor
from typing import Dict



class CustomNN(BaseAdapter):
    """Custom NN transformation.

    Is a copy of our TwoLayerNN, showing it here for notebook purposes.

    Args:
        in_features (int): Input dimension.
        hidden_features (int): Hidden dimension.
        out_features (int): Output dimension.
        bias (bool): Whether to use bias. Defaults to False.
        activation_fn_str (str): Name of activation function. Defaults to "relu".

    """

    def __init__(
        self,
        in_features: int,
        hidden_features: int,
        out_features: int,
        bias: bool = False,
        add_residual: bool = False,
    ) -> None:
        super(CustomNN, self).__init__()
        self.in_features = in_features
        self.hidden_features = hidden_features
        self.out_features = out_features
        self.bias = bias

        self.linear1 = nn.Linear(in_features, hidden_features, bias=True)
        self.linear2 = nn.Linear(hidden_features, out_features, bias=True)
        self._add_residual = add_residual
        # if add_residual, then add residual_weight (init to 0)
        self.residual_weight = nn.Parameter(torch.zeros(1))

    def forward(self, embed: Tensor) -> Tensor:
        """Forward pass (Wv).

        Args:
            embed (Tensor): Input tensor.

        """
        output1 = self.linear1(embed)
        output1 = F.relu(output1)
        output2 = self.linear2(output1)

        if self._add_residual:
            output2 = self.residual_weight * output2 + embed

        return output2

    def get_config_dict(self) -> Dict:
        """Get config dict."""
        return {
            "in_features": self.in_features,
            "hidden_features": self.hidden_features,
            "out_features": self.out_features,
            "bias": self.bias,
            "add_residual": self._add_residual,
        }





In [17]:
# prompt: modify class template CustomNN to a tensor train linear layer
!pip install tensorly
import torch
import torch.nn as nn
import torch.nn.functional as F
from typing import Dict
from llama_index.embeddings.adapter import BaseAdapter
from tensorly.decomposition import tensor_train

class TensorTrainLinearAdapter(BaseAdapter):
    """Tensor Train Linear Adapter."""

    def __init__(
        self,
        in_features: int,
        out_features: int,
        ranks: int = 4, # Control the TT-rank (complexity of the decomposition). Adjust as needed.
        bias: bool = False,
    ):
        super().__init__()
        self.in_features = in_features
        self.out_features = out_features
        self.ranks = ranks
        self.bias = bias

        # Initialize the weight tensor
        self.weight = nn.Parameter(torch.randn(in_features, out_features))

        # Perform TT-decomposition of the weight tensor
        self.tt_cores = tensor_train(self.weight.detach().cpu().numpy(), ranks=ranks) # Compute on CPU for now
        self.tt_cores = [nn.Parameter(torch.tensor(core, dtype=torch.float32, device=self.weight.device)) for core in self.tt_cores]

        if self.bias:
            self.bias = nn.Parameter(torch.zeros(out_features))
        else:
            self.bias = None

    def forward(self, embed: torch.Tensor) -> torch.Tensor:
        """Forward pass (Wv)."""
        # Reconstruct the weight tensor from the TT-cores
        reconstructed_weight = torch.tensor(tensor_train.reconstruct(self.tt_cores), dtype=torch.float32, device=embed.device)

        # Perform the linear operation
        output = F.linear(embed, reconstructed_weight, self.bias)
        return output

    def get_config_dict(self) -> Dict:
        """Get config dict."""
        return {
            "in_features": self.in_features,
            "out_features": self.out_features,
            "ranks": self.ranks,
            "bias": self.bias is not None,
        }



In [18]:
# prompt: create instance of TensorTrainLinearAdapter defined above where input dimension = 384 and output dimension = 384. choose appropriate ranks for 0.5 compression

import torch.nn as nn
from tensorly.decomposition import tensor_train
from tensorly import tensor

# Assuming input and output dimensions are 384
input_dim = 384
output_dim = 384

# Choose ranks for 0.5 compression.  This is an educated guess and you might need to adjust based on performance
# Lower ranks mean more compression but potentially lower accuracy
ranks = 2 # Start with a lower rank for 0.5 compression and adjust

# Create an instance of TensorTrainLinearAdapter
tt_adapter = TensorTrainLinearAdapter(in_features=input_dim, out_features=output_dim, ranks=ranks)

TypeError: tensor_train() got an unexpected keyword argument 'ranks'

In [None]:
custom_adapter = CustomNN(
    384,  # input dimension
    1024,  # hidden dimension
    384,  # output dimension
    bias=True,
    add_residual=True,
)

finetune_engine = EmbeddingAdapterFinetuneEngine(
    train_dataset,
    base_embed_model,
    model_output_path="custom_model_output",
    model_checkpoint_path="custom_model_ck",
    adapter_model=custom_adapter,
    epochs=25,
    verbose=True,
)

In [None]:
custom_adapter = CustomNN(
    384,  # input dimension
    1024,  # hidden dimension
    384,  # output dimension
    bias=True,
    add_residual=True,
)

finetune_engine = EmbeddingAdapterFinetuneEngine(
    train_dataset,
    base_embed_model,
    model_output_path="custom_model_output",
    model_checkpoint_path="custom_model_ck",
    adapter_model=custom_adapter,
    epochs=25,
    verbose=True,
)

## Run Embedding Finetuning

We then fine-tune our linear adapter on top of an existing embedding model. We import our new `EmbeddingAdapterFinetuneEngine` abstraction, which takes in an existing embedding model and a set of training parameters.

#### Fine-tune bge-small-en (default)

In [None]:
from llama_index.finetuning import EmbeddingAdapterFinetuneEngine
from llama_index.core.embeddings import resolve_embed_model
import torch

base_embed_model = resolve_embed_model("local:BAAI/bge-small-en")

finetune_engine = EmbeddingAdapterFinetuneEngine(
    train_dataset,
    base_embed_model,
    model_output_path="model_output_test",
    # bias=True,
    epochs=4,
    verbose=True,
    # optimizer_class=torch.optim.SGD,
    # optimizer_params={"lr": 0.01}
)

In [None]:
finetune_engine.finetune()

In [None]:
embed_model = finetune_engine.get_finetuned_model()

# alternatively import model
from llama_index.core.embeddings import LinearAdapterEmbeddingModel

# embed_model = LinearAdapterEmbeddingModel(base_embed_model, "model_output_test")

## Evaluate Finetuned Model

We compare the fine-tuned model against the base model, as well as against text-embedding-ada-002.

We evaluate with two ranking metrics:
- **Hit-rate metric**: For each (query, context) pair, we retrieve the top-k documents with the query. It's a hit if the results contain the ground-truth context.
- **Mean Reciprocal Rank**: A slightly more granular ranking metric that looks at the "reciprocal rank" of the ground-truth context in the top-k retrieved set. The reciprocal rank is defined as 1/rank. Of course, if the results don't contain the context, then the reciprocal rank is 0.

In [None]:
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.core import VectorStoreIndex
from llama_index.core.schema import TextNode
from tqdm.notebook import tqdm
import pandas as pd

from eval_utils import evaluate, display_results

In [None]:
ada = OpenAIEmbedding()
ada_val_results = evaluate(val_dataset, ada)

Generating embeddings:   0%|          | 0/395 [00:00<?, ?it/s]

100%|████████████████████████████████████████████████████████████████| 790/790 [03:03<00:00,  4.30it/s]


In [None]:
display_results(["ada"], [ada_val_results])

Unnamed: 0,retrievers,hit_rate,mrr
0,ada,0.870886,0.72884


In [None]:
bge = "local:BAAI/bge-small-en"
bge_val_results = evaluate(val_dataset, bge)

Generating embeddings:   0%|          | 0/395 [00:00<?, ?it/s]

100%|████████████████████████████████████████████████████████████████| 790/790 [00:23<00:00, 33.76it/s]


In [None]:
display_results(["bge"], [bge_val_results])

Unnamed: 0,retrievers,hit_rate,mrr
0,bge,0.787342,0.643038


In [None]:
ft_val_results = evaluate(val_dataset, embed_model)

Generating embeddings:   0%|          | 0/395 [00:00<?, ?it/s]

100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 790/790 [00:21<00:00, 36.95it/s]


In [None]:
display_results(["ft"], [ft_val_results])

Unnamed: 0,retrievers,hit_rate,mrr
0,ft,0.798734,0.662152


Here we show all the results concatenated together.

In [None]:
display_results(
    ["ada", "bge", "ft"], [ada_val_results, bge_val_results, ft_val_results]
)

Unnamed: 0,retrievers,hit_rate,mrr
0,ada,0.870886,0.730105
1,bge,0.787342,0.643038
2,ft,0.798734,0.662152


## Fine-tune a Two-Layer Adapter

Let's try fine-tuning a two-layer NN as well!

It's a simple two-layer NN with a ReLU activation and a residual layer at the end.

We train for 25 epochs - longer than the linear adapter - and preserve checkpoints every 100 steps.

In [None]:
# requires torch dependency
from llama_index.core.embeddings.adapter_utils import TwoLayerNN

from llama_index.finetuning import EmbeddingAdapterFinetuneEngine
from llama_index.core.embeddings import resolve_embed_model
from llama_index.embeddings.adapter import AdapterEmbeddingModel

In [None]:
base_embed_model = resolve_embed_model("local:BAAI/bge-small-en")
adapter_model = TwoLayerNN(
    384,  # input dimension
    1024,  # hidden dimension
    384,  # output dimension
    bias=True,
    add_residual=True,
)

finetune_engine = EmbeddingAdapterFinetuneEngine(
    train_dataset,
    base_embed_model,
    model_output_path="model5_output_test",
    model_checkpoint_path="model5_ck",
    adapter_model=adapter_model,
    epochs=25,
    verbose=True,
)

In [None]:
finetune_engine.finetune()

In [None]:
embed_model_2layer = finetune_engine.get_finetuned_model(
    adapter_cls=TwoLayerNN
)

### Evaluation Results

Run the same evaluation script used in the previous section to measure hit-rate/MRR within the two-layer model.

In [None]:
# load model from checkpoint in the midde
embed_model_2layer = AdapterEmbeddingModel(
    base_embed_model,
    "model5_output_test",
    TwoLayerNN,
)

In [None]:
from eval_utils import evaluate, display_results

In [None]:
ft_val_results_2layer = evaluate(val_dataset, embed_model_2layer)

Generating embeddings:   0%|          | 0/395 [00:00<?, ?it/s]

100%|████████████████████████████████████████████████████████████████| 790/790 [00:21<00:00, 36.93it/s]


In [None]:
# comment out if you haven't run ada/bge yet
display_results(
    ["ada", "bge", "ft_2layer"],
    [ada_val_results, bge_val_results, ft_val_results_2layer],
)

# uncomment if you just want to display the fine-tuned model's results
# display_results(["ft_2layer"], [ft_val_results_2layer])

Unnamed: 0,retrievers,hit_rate,mrr
0,ada,0.870886,0.72884
1,bge,0.787342,0.643038
2,ft_2layer,0.798734,0.662848


In [None]:
# load model from checkpoint in the midde
embed_model_2layer_s900 = AdapterEmbeddingModel(
    base_embed_model,
    "model5_ck/step_900",
    TwoLayerNN,
)

In [None]:
ft_val_results_2layer_s900 = evaluate(val_dataset, embed_model_2layer_s900)

Generating embeddings:   0%|          | 0/395 [00:00<?, ?it/s]

100%|████████████████████████████████████████████████████████████████| 790/790 [00:19<00:00, 40.57it/s]


In [None]:
# comment out if you haven't run ada/bge yet
display_results(
    ["ada", "bge", "ft_2layer_s900"],
    [ada_val_results, bge_val_results, ft_val_results_2layer_s900],
)

# uncomment if you just want to display the fine-tuned model's results
# display_results(["ft_2layer_s900"], [ft_val_results_2layer_s900])

Unnamed: 0,retrievers,hit_rate,mrr
0,ada,0.870886,0.72884
1,bge,0.787342,0.643038
2,ft_2layer_s900,0.803797,0.667426


## Try Your Own Custom Model

You can define your own custom adapter here! Simply subclass `BaseAdapter`, which is a light wrapper around the `nn.Module` class.

You just need to subclass `forward` and `get_config_dict`.

Just make sure you're familiar with writing `PyTorch` code :)

In [None]:
from llama_index.core.embeddings.adapter_utils import BaseAdapter
import torch.nn.functional as F
from torch import nn, Tensor
from typing import Dict

In [None]:
class CustomNN(BaseAdapter):
    """Custom NN transformation.

    Is a copy of our TwoLayerNN, showing it here for notebook purposes.

    Args:
        in_features (int): Input dimension.
        hidden_features (int): Hidden dimension.
        out_features (int): Output dimension.
        bias (bool): Whether to use bias. Defaults to False.
        activation_fn_str (str): Name of activation function. Defaults to "relu".

    """

    def __init__(
        self,
        in_features: int,
        hidden_features: int,
        out_features: int,
        bias: bool = False,
        add_residual: bool = False,
    ) -> None:
        super(CustomNN, self).__init__()
        self.in_features = in_features
        self.hidden_features = hidden_features
        self.out_features = out_features
        self.bias = bias

        self.linear1 = nn.Linear(in_features, hidden_features, bias=True)
        self.linear2 = nn.Linear(hidden_features, out_features, bias=True)
        self._add_residual = add_residual
        # if add_residual, then add residual_weight (init to 0)
        self.residual_weight = nn.Parameter(torch.zeros(1))

    def forward(self, embed: Tensor) -> Tensor:
        """Forward pass (Wv).

        Args:
            embed (Tensor): Input tensor.

        """
        output1 = self.linear1(embed)
        output1 = F.relu(output1)
        output2 = self.linear2(output1)

        if self._add_residual:
            output2 = self.residual_weight * output2 + embed

        return output2

    def get_config_dict(self) -> Dict:
        """Get config dict."""
        return {
            "in_features": self.in_features,
            "hidden_features": self.hidden_features,
            "out_features": self.out_features,
            "bias": self.bias,
            "add_residual": self._add_residual,
        }

In [None]:
custom_adapter = CustomNN(
    384,  # input dimension
    1024,  # hidden dimension
    384,  # output dimension
    bias=True,
    add_residual=True,
)

finetune_engine = EmbeddingAdapterFinetuneEngine(
    train_dataset,
    base_embed_model,
    model_output_path="custom_model_output",
    model_checkpoint_path="custom_model_ck",
    adapter_model=custom_adapter,
    epochs=25,
    verbose=True,
)

In [None]:
finetune_engine.finetune()

In [None]:
embed_model_custom = finetune_engine.get_finetuned_model(
    adapter_cls=CustomAdapter
)

### Evaluation Results

Run the same evaluation script used in the previous section to measure hit-rate/MRR.

In [None]:
# [optional] load model manually
# embed_model_custom = AdapterEmbeddingModel(
#     base_embed_model,
#     "custom_model_ck/step_300",
#     TwoLayerNN,
# )

In [None]:
from eval_utils import evaluate, display_results

In [None]:
ft_val_results_custom = evaluate(val_dataset, embed_model_custom)

Generating embeddings:   0%|          | 0/395 [00:00<?, ?it/s]

100%|████████████████████████████████████████████████████████████████| 790/790 [00:20<00:00, 37.77it/s]


In [None]:
display_results(["ft_custom"]x, [ft_val_results_custom])

Unnamed: 0,retrievers,hit_rate,mrr
0,ft_custom,0.789873,0.645127
