## Fintune `BAAI/bge-reranker-v2-m3` for better Results
[TutorialLink](https://blog.gopenai.com/fine-tuning-re-ranking-models-a-beginners-guide-066b4b9c3ecf)

In [None]:
training_query = [] # ping me if you want to see it

# Format
# https://github.com/FlagOpen/FlagEmbedding/blob/master/examples/finetune/reranker/example_data/normal/examples.jsonl
"""
[{
    "query": "who creates the best car?",
    "pos": [
        "ferrari produced 50 cars a year",
        "ford produced 200 cars a year",
    ],
    "neg": [
        "can a horse still replace a car",
        "why is my cat so slow?",
    ],
    "pos_scores": [0.98, 0.89],
    "neg_scores": [0.22, 0.1]
}]
"""

In [None]:
from sentence_transformers import InputExample
from torch.utils.data import DataLoader

train_samples = []
for tquery in training_query:

    for i, pos_context in enumerate(tquery['pos']):
        train_samples.append(
            InputExample(
                texts=[tquery['query'], tquery['pos'][i]],
                label=tquery['pos_scores'][i]
            )
        )

    for i, neg_context in enumerate(tquery['neg']):
        train_samples.append(
            InputExample(
                texts=[tquery['query'], tquery['neg'][i]],
                label=tquery['neg_scores'][i]
            )
        )

train_dataloader = DataLoader(train_samples, shuffle=True, batch_size=8)

In [None]:
from sentence_transformers import CrossEncoder

model = CrossEncoder(
    model_name='BAAI/bge-reranker-v2-m3',
    device='cuda'
)

In [None]:
model.fit(
    train_dataloader=train_dataloader,
    epochs=5,
    warmup_steps=100,
    evaluation_steps=0,
    output_path="finetuned_reranker",
    save_best_model=True,
    use_amp=True,
    scheduler= 'warmupcosine',
    show_progress_bar=True,
)

model.save_pretrained("finetuned_reranker")

In [None]:
# test newly finetuned reranker
reranker = CrossEncoder('finetuned_reranker', local_files_only=True)

query = ""
docs = ["", ""]
a = reranker.rank(query, docs)
score = sorted(a, key=lambda x: x['score'], reverse=True)

In [None]:
# Just a little bit of finetuning on a specific domain returned instantly better results
# than off the shelf rerankers.

"""
#    Model            Hits  P@10      Recall@100  MRR@100    DCG@100    NDCG@100
---  -------------  ------  ------  ------------  ---------  ---------  ----------
a    rr_base_large   9.167  0.658          0.784  0.806      6.168      0.758
b    rr_base        10.333  0.733ᵍ         0.851  0.854ᵍ     6.848ᵍ     0.818ᵍ
c    rr_base_ft     11.25   0.750ᵍ         0.901  0.854ᵍ     7.122ᶠᵍ    0.845ᶠᵍ
d    rr_cohere      10.333  0.708ᵍ         0.846  0.794      6.563      0.775
e    rr_mixed        9.25   0.575          0.78   0.784      5.832      0.707
f    rr_marco       10.083  0.675          0.833  0.847ᵍ     6.565      0.781
g    rr_marco_inst   8.417  0.425          0.734  0.382      4.192      0.529
h    rr_inst        11.05   0.725          0.902  0.839      6.905      0.790
"""