Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
33 changes: 21 additions & 12 deletions examples/README.md
Original file line number Diff line number Diff line change
@@ -1,30 +1,42 @@
# 1. Introduction
# Examples

- [1. Introduction](#1-Introduction)
- [2. Installation](#2-Installation)
- [3. Inference](#3-Inference)
- [4. Finetune](#4-Finetune)
- [5. Evaluation](#5-Evaluation)

## 1. Introduction

In this example, we show how to **inference**, **finetune** and **evaluate** the baai-general-embedding.

# 2. Installation
## 2. Installation

* **with pip**

```shell
pip install -U FlagEmbedding
```

* **from source**

```shell
git clone https://github.com/FlagOpen/FlagEmbedding.git
cd FlagEmbedding
pip install .
```

For development, install as editable:

```shell
pip install -e .
```

# 3. Inference
## 3. Inference

We have provided the inference code for two types of models: the **embedder** and the **reranker**. These can be loaded using `FlagAutoModel` and `FlagAutoReranker`, respectively. For more detailed instructions on their use, please refer to the documentation for the [embedder](https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/inference/embedder) and [reranker](https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/inference/reranker).

## 1. Embedder
### 1. Embedder

```python
from FlagEmbedding import FlagAutoModel
Expand All @@ -49,7 +61,7 @@ scores = q_embeddings @ p_embeddings.T
print(scores)
```

## 2. Reranker
### 2. Reranker

```python
from FlagEmbedding import FlagAutoReranker
Expand All @@ -65,7 +77,7 @@ scores = model.compute_score(pairs)
print(scores)
```

# 4. Finetune
## 4. Finetune

We support fine-tuning a variety of BGE series models, including `bge-large-en-v1.5`, `bge-m3`, `bge-en-icl`, `bge-multilingual-gemma2`, `bge-reranker-v2-m3`, `bge-reranker-v2-gemma`, and `bge-reranker-v2-minicpm-layerwise`, among others. As examples, we use the basic models `bge-large-en-v1.5` and `bge-reranker-large`. For more details, please refer to the [embedder](https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/finetune/embedder) and [reranker](https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/finetune/reranker) sections.

Expand All @@ -74,7 +86,7 @@ pip install deepspeed
pip install flash-attn --no-build-isolation
```

## 1. Embedder
### 1. Embedder

```shell
torchrun --nproc_per_node 2 \
Expand Down Expand Up @@ -109,7 +121,7 @@ torchrun --nproc_per_node 2 \
--kd_loss_type kl_div
```

## 2. Reranker
### 2. Reranker

```shell
torchrun --nproc_per_node 2 \
Expand Down Expand Up @@ -139,16 +151,13 @@ torchrun --nproc_per_node 2 \
--save_steps 1000
```

# 5. Evaluation
## 5. Evaluation

We support evaluations on [MTEB](https://github.com/embeddings-benchmark/mteb), [BEIR](https://github.com/beir-cellar/beir), [MSMARCO](https://microsoft.github.io/msmarco/), [MIRACL](https://github.com/project-miracl/miracl), [MLDR](https://huggingface.co/datasets/Shitao/MLDR), [MKQA](https://github.com/apple/ml-mkqa), [AIR-Bench](https://github.com/AIR-Bench/AIR-Bench), and custom datasets. Below is an example of evaluating MSMARCO passages. For more details, please refer to the [evaluation examples](https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/evaluation).

```shell
pip install pytrec_eval
pip install https://github.com/kyamagu/faiss-wheels/releases/download/v1.7.3/faiss_gpu-1.7.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
```

```shell
python -m FlagEmbedding.evaluation.msmarco \
--eval_name msmarco \
--dataset_dir ./data/msmarco \
Expand Down
53 changes: 29 additions & 24 deletions examples/evaluation/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,21 +8,35 @@ This document serves as an overview of the evaluation process and provides a bri

In this section, we will first introduce the commonly used arguments across all datasets. Then, we will provide a more detailed explanation of the specific arguments used for each individual dataset.

- [1. Introduction](#1-Introduction)
- [(1) EvalArgs](#1-EvalArgs)
- [(2) ModelArgs](#2-ModelArgs)
- [2. Usage](#2-Usage)
- [Requirements](#Requirements)
- [(1) MTEB](#1-MTEB)
- [(2) BEIR](#2-BEIR)
- [(3) MSMARCO](#3-MSMARCO)
- [(4) MIRACL](#4-MIRACL)
- [(5) MLDR](#5-MLDR)
- [(6) MKQA](#6-MKQA)
- [(7) AIR-Bench](#7-Air-Bench)
- [(8) Custom Dataset](#8-Custom-Dataset)

## Introduction

### 1. EvalArgs

**Arguments for evaluation setup:**

- **`eval_name`**: Name of the evaluation task (e.g., msmarco, beir, miracl).

- **`dataset_dir`**: Path to the dataset directory. This can be:
1. A local path to perform evaluation on your dataset (must exist). It should contain:
- `corpus.jsonl`
- `<split>_queries.jsonl`
- `<split>_qrels.jsonl`
2. Path to store datasets downloaded via API. Provide `None` to use the cache directory.
1. A local path to perform evaluation on your dataset (must exist). It should contain:
- `corpus.jsonl`
- `<split>_queries.jsonl`
- `<split>_qrels.jsonl`
2. Path to store datasets downloaded via API. Provide `None` to use the cache directory.

- **`force_redownload`**: Set to `True` to force redownload of the dataset. Default is `False`.

- **`dataset_names`**: List of dataset names to evaluate or `None` to evaluate all available datasets. This can be the dataset name (BEIR, etc.) or language (MIRACL, etc.).
Expand Down Expand Up @@ -107,11 +121,8 @@ Here is an example for evaluation:

```shell
pip install mteb==1.15.0
```

```shell
python -m FlagEmbedding.evaluation.mteb \
--eval_name mteb \
--eval_name mteb \
--output_dir ./data/mteb/search_results \
--languages eng \
--tasks NFCorpus BiorxivClusteringS2S SciDocsRR \
Expand All @@ -133,11 +144,8 @@ Here is an example for evaluation:
pip install beir
mkdir eval_beir
cd eavl_beir
```

```shell
python -m FlagEmbedding.evaluation.beir \
--eval_name beir \
--eval_name beir \
--dataset_dir ./beir/data \
--dataset_names fiqa arguana cqadupstack \
--splits test dev \
Expand Down Expand Up @@ -168,7 +176,7 @@ Here is an example for evaluation:

```shell
python -m FlagEmbedding.evaluation.msmarco \
--eval_name msmarco \
--eval_name msmarco \
--dataset_dir ./msmarco/data \
--dataset_names passage \
--splits dev dl19 dl20 \
Expand Down Expand Up @@ -198,7 +206,7 @@ Here is an example for evaluation:

```shell
python -m FlagEmbedding.evaluation.miracl \
--eval_name miracl \
--eval_name miracl \
--dataset_dir ./miracl/data \
--dataset_names bn hi sw te th yo \
--splits dev \
Expand Down Expand Up @@ -228,7 +236,7 @@ Here is an example for evaluation:

```shell
python -m FlagEmbedding.evaluation.mldr \
--eval_name mldr \
--eval_name mldr \
--dataset_dir ./mldr/data \
--dataset_names hi \
--splits test \
Expand Down Expand Up @@ -258,7 +266,7 @@ Here is an example for evaluation:

```shell
python -m FlagEmbedding.evaluation.mkqa \
--eval_name mkqa \
--eval_name mkqa \
--dataset_dir ./mkqa/data \
--dataset_names en zh_cn \
--splits test \
Expand Down Expand Up @@ -293,11 +301,8 @@ Here is an example for evaluation:

```shell
pip install air-benchmark
```

```shell
python -m FlagEmbedding.evaluation.air_bench \
--benchmark_version AIR-Bench_24.05 \
--benchmark_version AIR-Bench_24.05 \
--task_types qa long-doc \
--domains arxiv \
--languages en \
Expand Down Expand Up @@ -352,7 +357,7 @@ Please put the above file (`corpus.jsonl`, `test_queries.jsonl`, `test_qrels.jso

```shell
python -m FlagEmbedding.evaluation.custom \
--eval_name your_data_name \
--eval_name your_data_name \
--dataset_dir ./your_data_path \
--splits test \
--corpus_embd_save_dir ./your_data_name/corpus_embd \
Expand Down
10 changes: 10 additions & 0 deletions examples/finetune/embedder/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,16 @@

In this example, we show how to finetune the embedder with your data.

- [1. Installation](#1-Installation)
- [2. Data format](#2-Data-format)
- [Hard Negatives](#Hard-Negatives)
- [Teacher Scores](#Teacher-Scores)
- [3. Train](#3-Train)
- [(1) standard model](#1-standard-model)
- [(2) bge-m3](#2-bge-m3)
- [(3) bge-multilingual-gemma2](#3-bge-multilingual-gemma2)
- [(4) bge-en-icl](#4-bge-en-icl)

## 1. Installation

- **with pip**
Expand Down
9 changes: 9 additions & 0 deletions examples/finetune/reranker/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,15 @@

In this example, we show how to finetune the reranker with your data.

- [1. Installation](#1-Installation)
- [2. Data format](#2-Data-format)
- [Hard Negatives](#Hard-Negatives)
- [Teacher Scores](#Teacher-Scores)
- [3. Train](#3-Train)
- [(1) standard model](#1-standard-model)
- [(2) bge-reranker-v2-gemma](#2-bge-reranker-v2-gemma)
- [(3) bge-reranker-v2-layerwise-minicpm](#3-bge-reranker-v2-layerwise-minicpm)

## 1. Installation

- **with pip**
Expand Down