From 4618fda88aab3d4508fd7e4dc8002cd8dc960e38 Mon Sep 17 00:00:00 2001 From: Pringled Date: Wed, 12 Feb 2025 18:23:26 +0100 Subject: [PATCH 1/6] Refactored results --- model2vec/train/README.md | 36 -------------------------------- results/README.md | 43 +++++++++++++++++++++++++++++++++++++-- 2 files changed, 41 insertions(+), 38 deletions(-) diff --git a/model2vec/train/README.md b/model2vec/train/README.md index 39d748cd..f7a7d61e 100644 --- a/model2vec/train/README.md +++ b/model2vec/train/README.md @@ -92,42 +92,6 @@ pipeline = StaticModelPipeline.from_pretrained("my_cool/project") Loading pipelines in this way is _extremely_ fast. It takes only 30ms to load a pipeline from disk. -# Results - -The main results are detailed in our training blogpost, but we'll do a comparison with vanilla model2vec here. In a vanilla model2vec classifier, you just put a scikit-learn `LogisticRegressionCV` on top of the model encoder. In contrast, training a `StaticModelForClassification` fine-tunes the full model, including the `StaticModel` weights. The Setfit model is trained on using [all-minilm-l6-v2](sentence-transformers/all-MiniLM-L6-v2) as a base model. - -We use 14 classification datasets, using 1000 examples from the train set, and the full test set. No parameters were tuned on any validation set. All datasets were taken from the [Setfit organization on Hugging Face](https://huggingface.co/datasets/SetFit). - -| dataset | model2vec + logreg | model2vec full finetune | setfit | -|:---------------------------|----------------------------------------------:|---------------------------------------:|-------------------------------------------------:| -| 20_newgroups | 56.24 | 57.94 | 61.29 | -| ade | 79.2 | 79.68 | 83.05 | -| ag_news | 86.7 | 87.2 | 88.01 | -| amazon_counterfactual | 90.96 | 91.93 | 95.51 | -| bbc | 95.8 | 97.21 | 96.6 | -| emotion | 65.57 | 67.11 | 72.86 | -| enron_spam | 96.4 | 96.85 | 97.45 | -| hatespeech_offensive | 83.54 | 85.61 | 87.69 | -| imdb | 85.34 | 85.59 | 86 | -| massive_scenario | 82.86 | 84.42 | 83.54 | -| senteval_cr | 77.03 | 79.47 | 86.15 | -| sst5 | 32.34 | 37.95 | 42.31 | -| student | 83.2 | 85.02 | 89.62 | -| subj | 89.2 | 89.85 | 93.8 | -| tweet_sentiment_extraction | 64.96 | 62.65 | 75.15 | - -| | logreg | full finetune | setfit -|:---------------------------|-----------:|---------------:|-------:| -| average | 77.9 | 79.2 | 82.6 | - -As you can see, full fine-tuning brings modest performance improvements in some cases, but very large ones in other cases, leading to a pretty large increase in average score. Our advice is to test both if you can use `potion-base-32m`, and to use full fine-tuning if you are starting from another base model. - -The speed difference between model2vec and setfit is immense, with the full finetune being 35x faster than a setfit based on `all-minilm-l6-v2` on CPU. - -| | logreg | full finetune | setfit -|:---------------------------|-----------:|---------------:|-------:| -| samples / second | 17925 | 24744 | 716 | - # Bring your own architecture diff --git a/results/README.md b/results/README.md index cafa2cd2..eb46787f 100644 --- a/results/README.md +++ b/results/README.md @@ -1,7 +1,8 @@ # Results -This page contains the experiments results of the Model2Vec project. The results are presented in the following sections: +This document contains the experiments results of the Model2Vec project. The results are presented in the following sections: - [MTEB Results](#mteb-results) +- [Training Results](#training-results) - [Ablations](#ablations) ## MTEB Results @@ -51,7 +52,7 @@ NOTE: for fairness of comparison, we disabled multiprocessing for Model2Vec for |*Figure: The average MTEB score plotted against sentences per second. The circle size indicates model size.*| -## Retrieval Results +### Retrieval Results A subset of models we created and compare against are specifically designed for retrieval tasks. The results are shown in the table below, including two general-purpose models for comparison and a transformer. @@ -65,6 +66,44 @@ A subset of models we created and compare against are specifically designed for As can be seen, [potion-retrieval-32M](https://huggingface.co/minishlab/potion-retrieval-32M) model is the most performant static retrieval model, reaching 86.65%% of the performance of [all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2) with a retrieval score of 36.35. +## Training Results + +The main results are detailed in our training blogpost, but we'll do a comparison with vanilla model2vec here. In a vanilla model2vec classifier, you just put a scikit-learn `LogisticRegressionCV` on top of the model encoder. In contrast, training a `StaticModelForClassification` fine-tunes the full model, including the `StaticModel` weights. The Setfit model is trained on using [all-minilm-l6-v2](sentence-transformers/all-MiniLM-L6-v2) as a base model. + +We use 14 classification datasets, using 1000 examples from the train set, and the full test set. No parameters were tuned on any validation set. All datasets were taken from the [Setfit organization on Hugging Face](https://huggingface.co/datasets/SetFit). + +| dataset | model2vec + logreg | model2vec full finetune | setfit | +|:---------------------------|----------------------------------------------:|---------------------------------------:|-------------------------------------------------:| +| 20_newgroups | 56.24 | 57.94 | 61.29 | +| ade | 79.2 | 79.68 | 83.05 | +| ag_news | 86.7 | 87.2 | 88.01 | +| amazon_counterfactual | 90.96 | 91.93 | 95.51 | +| bbc | 95.8 | 97.21 | 96.6 | +| emotion | 65.57 | 67.11 | 72.86 | +| enron_spam | 96.4 | 96.85 | 97.45 | +| hatespeech_offensive | 83.54 | 85.61 | 87.69 | +| imdb | 85.34 | 85.59 | 86 | +| massive_scenario | 82.86 | 84.42 | 83.54 | +| senteval_cr | 77.03 | 79.47 | 86.15 | +| sst5 | 32.34 | 37.95 | 42.31 | +| student | 83.2 | 85.02 | 89.62 | +| subj | 89.2 | 89.85 | 93.8 | +| tweet_sentiment_extraction | 64.96 | 62.65 | 75.15 | + +| | logreg | full finetune | setfit +|:---------------------------|-----------:|---------------:|-------:| +| average | 77.9 | 79.2 | 82.6 | + +As you can see, full fine-tuning brings modest performance improvements in some cases, but very large ones in other cases, leading to a pretty large increase in average score. Our advice is to test both if you can use `potion-base-32m`, and to use full fine-tuning if you are starting from another base model. + +The speed difference between model2vec and setfit is immense, with the full finetune being 35x faster than a setfit based on `all-minilm-l6-v2` on CPU. + +| | logreg | full finetune | setfit +|:---------------------------|-----------:|---------------:|-------:| +| samples / second | 17925 | 24744 | 716 | + + + ## Ablations To better understand the factors contributing to the performance of Model2Vec, we conducted a comprehensive set of ablation studies, covering various aspects of the model's architecture and preprocessing methods. In these studies, we examined the impact of key elements such as PCA, Zipf weighting, and the use of Sentence Transformers versus regular transformer models. We also compared the performance of input embeddings versus output embeddings, since it would seem plausible that these should also work well. The results are shown in the table below. From 1aabcd9a85af72d81b70f7226b6e5c7509eb5676 Mon Sep 17 00:00:00 2001 From: Pringled Date: Wed, 12 Feb 2025 18:30:06 +0100 Subject: [PATCH 2/6] Refactored results --- results/README.md | 11 ++++++++--- 1 file changed, 8 insertions(+), 3 deletions(-) diff --git a/results/README.md b/results/README.md index eb46787f..6072fa92 100644 --- a/results/README.md +++ b/results/README.md @@ -1,6 +1,6 @@ # Results -This document contains the experiments results of the Model2Vec project. The results are presented in the following sections: +This document contains the results of the Model2Vec project. The results are presented in the following sections: - [MTEB Results](#mteb-results) - [Training Results](#training-results) - [Ablations](#ablations) @@ -68,7 +68,12 @@ As can be seen, [potion-retrieval-32M](https://huggingface.co/minishlab/potion-r ## Training Results -The main results are detailed in our training blogpost, but we'll do a comparison with vanilla model2vec here. In a vanilla model2vec classifier, you just put a scikit-learn `LogisticRegressionCV` on top of the model encoder. In contrast, training a `StaticModelForClassification` fine-tunes the full model, including the `StaticModel` weights. The Setfit model is trained on using [all-minilm-l6-v2](sentence-transformers/all-MiniLM-L6-v2) as a base model. +The main results for Model2Vec training are outlined in this section. + +We compare three different architectures: +- `model2vec + logreg`: A model2vec model with a scikit-learn `LogisticRegressionCV` on top. +- `model2vec full finetune`: A model2vec classifier with the full model finetuned. This uses our `StaticModelForClassification`. +- `setfit`: A [SetFit](https://github.com/huggingface/setfit/tree/main) model trained using [all-minilm-l6-v2](sentence-transformers/all-MiniLM-L6-v2) as a base model. We use 14 classification datasets, using 1000 examples from the train set, and the full test set. No parameters were tuned on any validation set. All datasets were taken from the [Setfit organization on Hugging Face](https://huggingface.co/datasets/SetFit). @@ -94,7 +99,7 @@ We use 14 classification datasets, using 1000 examples from the train set, and t |:---------------------------|-----------:|---------------:|-------:| | average | 77.9 | 79.2 | 82.6 | -As you can see, full fine-tuning brings modest performance improvements in some cases, but very large ones in other cases, leading to a pretty large increase in average score. Our advice is to test both if you can use `potion-base-32m`, and to use full fine-tuning if you are starting from another base model. +As can be seen see, full fine-tuning brings modest performance improvements in some cases, but very large ones in other cases, leading to a pretty large increase in average score. Our advice is to test both if you can use `potion-base-32m`, and to use full fine-tuning if you are starting from another base model. The speed difference between model2vec and setfit is immense, with the full finetune being 35x faster than a setfit based on `all-minilm-l6-v2` on CPU. From 8c1118e969580bd8d9e697bf1c12c7194b172057 Mon Sep 17 00:00:00 2001 From: Pringled Date: Wed, 12 Feb 2025 18:32:53 +0100 Subject: [PATCH 3/6] Updated docs --- README.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index 24a9142a..04726e7c 100644 --- a/README.md +++ b/README.md @@ -103,7 +103,7 @@ from datasets import load_dataset from model2vec.train import StaticModelForClassification # Initialize a classifier from a pre-trained model -classifier = StaticModelForClassification.from_pretrained(model_name="minishlab/potion-base-8M") +classifier = StaticModelForClassification.from_pretrained(model_name="minishlab/potion-base-32M") # Load a dataset ds = load_dataset("setfit/subj") @@ -120,7 +120,7 @@ For advanced usage, please refer to our [usage documentation](https://github.com ## Updates & Announcements -- **12/02/2024**: We released **Model2Vec training**, allowing you to fine-tune your own classification models on top of Model2Vec models. Find out more in our [documentation](https://github.com/MinishLab/model2vec/blob/main/model2vec/train/README.md) and in our [blog post](LINK). +- **12/02/2024**: We released **Model2Vec training**, allowing you to fine-tune your own classification models on top of Model2Vec models. Find out more in our [training documentation](https://github.com/MinishLab/model2vec/blob/main/model2vec/train/README.md) and [results](https://github.com/MinishLab/blob/main/results#training-results). - **30/01/2024**: We released two new models: [potion-base-32M](https://huggingface.co/minishlab/potion-base-32M) and [potion-retrieval-32M](https://huggingface.co/minishlab/potion-retrieval-32M). [potion-base-32M](https://huggingface.co/minishlab/potion-base-32M) is our most performant model to date, using a larger vocabulary and higher dimensions. [potion-retrieval-32M](https://huggingface.co/minishlab/potion-retrieval-32M) is a finetune of [potion-base-32M](https://huggingface.co/minishlab/potion-base-32M) that is optimized for retrieval tasks, and is the best performing static retrieval model currently available. From 189d3e209e6ccbbeade00fc383473f1f3b2f974f Mon Sep 17 00:00:00 2001 From: Pringled Date: Wed, 12 Feb 2025 18:35:38 +0100 Subject: [PATCH 4/6] Updated docs --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index 04726e7c..890a0534 100644 --- a/README.md +++ b/README.md @@ -133,6 +133,7 @@ For advanced usage, please refer to our [usage documentation](https://github.com - **Lightweight Dependencies**: the base package's only major dependency is `numpy`. - **Lightning-fast Inference**: up to 500 times faster on CPU than the original model. - **Fast, Dataset-free Distillation**: distill your own model in 30 seconds on a CPU, without a dataset. +- **Fine-tuning**: fine-tune your own classification models on top of Model2Vec models. - **Integrated in many popular libraries**: Model2Vec is integrated direclty into popular libraries such as [Sentence Transformers](https://github.com/UKPLab/sentence-transformers) and [LangChain](https://github.com/langchain-ai/langchain). For more information, see our [integrations documentation](https://github.com/MinishLab/model2vec/blob/main/docs/integrations.md). - **Tightly integrated with HuggingFace hub**: easily share and load models from the HuggingFace hub, using the familiar `from_pretrained` and `push_to_hub`. Our own models can be found [here](https://huggingface.co/minishlab). From c1f86bd05f4a39ce9b86ad5749d31bb4271aa0b8 Mon Sep 17 00:00:00 2001 From: Pringled Date: Wed, 12 Feb 2025 18:36:38 +0100 Subject: [PATCH 5/6] Updated docs --- README.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index 890a0534..bb6a5646 100644 --- a/README.md +++ b/README.md @@ -120,7 +120,7 @@ For advanced usage, please refer to our [usage documentation](https://github.com ## Updates & Announcements -- **12/02/2024**: We released **Model2Vec training**, allowing you to fine-tune your own classification models on top of Model2Vec models. Find out more in our [training documentation](https://github.com/MinishLab/model2vec/blob/main/model2vec/train/README.md) and [results](https://github.com/MinishLab/blob/main/results#training-results). +- **12/02/2024**: We released **Model2Vec training**, allowing you to fine-tune your own classification models on top of Model2Vec models. Find out more in our [training documentation](https://github.com/MinishLab/model2vec/blob/main/model2vec/train/README.md) and [results](results/README.md#training-results). - **30/01/2024**: We released two new models: [potion-base-32M](https://huggingface.co/minishlab/potion-base-32M) and [potion-retrieval-32M](https://huggingface.co/minishlab/potion-retrieval-32M). [potion-base-32M](https://huggingface.co/minishlab/potion-base-32M) is our most performant model to date, using a larger vocabulary and higher dimensions. [potion-retrieval-32M](https://huggingface.co/minishlab/potion-retrieval-32M) is a finetune of [potion-base-32M](https://huggingface.co/minishlab/potion-base-32M) that is optimized for retrieval tasks, and is the best performing static retrieval model currently available. @@ -174,6 +174,7 @@ We provide a number of models that can be used out of the box. These models are We have performed extensive experiments to evaluate the performance of Model2Vec models. The results are documented in the [results](results/README.md) folder. The results are presented in the following sections: - [MTEB Results](results/README.md#mteb-results) +- [Training Results](results/README.md#training-results) - [Ablations](results/README.md#ablations) ## License From 76f2e1864266de9bb56f05c514f6ac97f955463f Mon Sep 17 00:00:00 2001 From: Pringled Date: Wed, 12 Feb 2025 19:18:06 +0100 Subject: [PATCH 6/6] Updated description --- README.md | 4 ++-- pyproject.toml | 2 +- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/README.md b/README.md index bb6a5646..944b978e 100644 --- a/README.md +++ b/README.md @@ -7,7 +7,7 @@
-

The Fastest State-of-the-Art Static Embeddings in the World

+

Fast State-of-the-Art Static Embeddings

@@ -187,7 +187,7 @@ If you use Model2Vec in your research, please cite the following: ```bibtex @software{minishlab2024model2vec, authors = {Stephan Tulkens and Thomas van Dongen}, - title = {Model2Vec: The Fastest State-of-the-Art Static Embeddings in the World}, + title = {Model2Vec: Fast State-of-the-Art Static Embeddings}, year = {2024}, url = {https://github.com/MinishLab/model2vec} } diff --git a/pyproject.toml b/pyproject.toml index c1c56d3a..b50d9e1a 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -1,6 +1,6 @@ [project] name = "model2vec" -description = "The Fastest State-of-the-Art Static Embeddings in the World" +description = "Fast State-of-the-Art Static Embeddings" readme = { file = "README.md", content-type = "text/markdown" } license = { file = "LICENSE" } requires-python = ">=3.9"