# Getting started
This is a minimal guide on how to get started using SEB. If you feel like the documentation is lacking feel free to file an [issue](https://github.com/KennethEnevoldsen/scandinavian-embedding-benchmark/issues).

## Using the CLI

SEB comes with a simple cli to allow you to run models. This section will show a minimal example of how to use the CLI but if you want to know more check out the CLI documentation. To get a list of available commands you can simply run:

In [1]:
%%bash

seb --help


Available commands:

  run   Runs the Benchmark on a specified model.



or for more on the specific command you can call `seb {command} --help`. To run a model using the CLI you can run it like so:

In [2]:
%%bash
seb run sentence-transformers/all-MiniLM-L6-v2 --output-path model_results.json

INFO:seb.cli.run:Model registered in SEB. Loading from registry.

Running all-MiniLM-L6-v2:   0%|          | 0/14 [00:00<?, ?it/s][A
Running all-MiniLM-L6-v2 on Angry Tweets:   0%|          | 0/14 [00:00<?, ?it/s][A
Running all-MiniLM-L6-v2 on LCC:   0%|          | 0/14 [00:00<?, ?it/s]         [A
Running all-MiniLM-L6-v2 on Bornholm Parallel:   0%|          | 0/14 [00:00<?, ?it/s][A
Running all-MiniLM-L6-v2 on DKHate:   0%|          | 0/14 [00:00<?, ?it/s]           [A
Running all-MiniLM-L6-v2 on Da Political Comments:   0%|          | 0/14 [00:00<?, ?it/s][A
Running all-MiniLM-L6-v2 on Massive Intent:   0%|          | 0/14 [00:00<?, ?it/s]       [A
Running all-MiniLM-L6-v2 on Massive Scenario:   0%|          | 0/14 [00:00<?, ?it/s][A
Running all-MiniLM-L6-v2 on ScaLA:   0%|          | 0/14 [00:00<?, ?it/s]           [A
Running all-MiniLM-L6-v2 on Language Identification:   0%|          | 0/14 [00:00<?, ?it/s][A
Running all-MiniLM-L6-v2 on NoReC:   0%|          | 0/14 [00:00

## Running a task
To run a task you will need to fetch the task amd a model run it.


In [3]:
import seb

model = seb.get_model("jonfd/electra-small-nordic")
task = seb.get_task("DKHate")

# initialize benchmark with tasks
benchmark = seb.Benchmark(tasks=[task])

# benchmark the model
benchmark_result = benchmark.evaluate_model(model)



In [4]:
benchmark_result  # examine output

BenchmarkResults(meta=ModelMeta(name='electra-small-nordic', description=None, huggingface_name='jonfd/electra-small-nordic', reference='https://huggingface.co/jonfd/electra-small-nordic', languages=['da', 'nb', 'sv', 'nn'], open_source=True, embedding_size=256), task_results=[TaskResult(task_name='DKHate', task_description='Danish Tweets annotated for Hate Speech either being Offensive or not', task_version='1.0.3.dev0', time_of_run=datetime.datetime(2023, 7, 30, 13, 55, 38, 480327), scores={'da': {'accuracy': 0.5945288753799393, 'f1': 0.4912211182797449, 'ap': 0.8950480900418238, 'accuracy_stderr': 0.07818347662767612, 'f1_stderr': 0.05511334661624392, 'ap_stderr': 0.013877821318913264, 'main_score': 0.5945288753799393}}, main_score='accuracy')])

In [5]:
benchmark_result[0]  # examine the results for the first task

TaskResult(task_name='DKHate', task_description='Danish Tweets annotated for Hate Speech either being Offensive or not', task_version='1.0.3.dev0', time_of_run=datetime.datetime(2023, 7, 30, 13, 55, 38, 480327), scores={'da': {'accuracy': 0.5945288753799393, 'f1': 0.4912211182797449, 'ap': 0.8950480900418238, 'accuracy_stderr': 0.07818347662767612, 'f1_stderr': 0.05511334661624392, 'ap_stderr': 0.013877821318913264, 'main_score': 0.5945288753799393}}, main_score='accuracy')

## Reproducing the Benchmark
Reproducing the benchmark is easy and is doable simply using the following command:

In [6]:
results = seb.run_benchmark()

Running text-embedding-ada-002: 100%|██████████| 30/30 [00:00<00:00, 31.41it/s]                       
Running text-embedding-ada-002: 100%|██████████| 30/30 [00:00<00:00, 40.98it/s]                       
Running text-embedding-ada-002: 100%|██████████| 30/30 [00:00<00:00, 34.49it/s]                       
Running text-embedding-ada-002: 100%|██████████| 30/30 [00:00<00:00, 33.02it/s]                       


This runs the full benchmark on all the registrered models as well as all the registrered datasets. Note that all benchmark results are cached as included as a part of the package, this means that you won't have to rerun results that are already run.

The results are returned as a dictionary of where the keys represent the benchmark and values are a list of benchmark results.

In [13]:
print(results.keys())

results["Danish"][0]  # result for the first model in the benchmark

dict_keys(['Mainland Scandinavian', 'Danish', 'Norwegian', 'Swedish'])


BenchmarkResults(meta=ModelMeta(name='embed-multilingual-v3.0', description=None, huggingface_name=None, reference='https://huggingface.co/Cohere/Cohere-embed-multilingual-v3.0', languages=[], open_source=False, embedding_size=1024), task_results=[TaskResult(task_name='Angry Tweets', task_description='A sentiment dataset with 3 classes (positiv, negativ, neutral) for Danish tweets', task_version='1.1.1.dev0', time_of_run=datetime.datetime(2023, 11, 15, 15, 49, 12, 771515), scores={'da': {'accuracy': 0.589111747851003, 'f1': 0.5800442049443755, 'accuracy_stderr': 0.02208679883291171, 'f1_stderr': 0.0205122012161316, 'main_score': 0.589111747851003}}, main_score='accuracy'), TaskResult(task_name='LCC', task_description='The leipzig corpora collection, annotated for sentiment', task_version='1.1.1.dev0', time_of_run=datetime.datetime(2023, 11, 15, 15, 49, 26, 932464), scores={'da': {'accuracy': 0.604, 'f1': 0.6045645057913338, 'accuracy_stderr': 0.034794635601866374, 'f1_stderr': 0.034695

## Adding a model

The benchmark uses a registry to add models. A model in `seb` includes two thing. 1) a metadata object (`seb.ModelMeta`) describing the metadata of the model and 2) a loader for the model itself, which is an object that needs an encode methods as described by the `seb.ModelInterface`. Here is a minimal example of how to add a new model:

In [7]:
from sentence_transformers import SentenceTransformer

import seb

model_name = "sentence-transformers/all-MiniLM-L12-v2"


def get_my_model() -> SentenceTransformer:
    return SentenceTransformer(model_name)


@seb.models.register(model_name)  # add the model to the registry
def create_all_mini_lm_l6_v2() -> seb.EmbeddingModel:
    hf_name = model_name

    # create meta data
    meta = seb.ModelMeta(
        name=hf_name.split("/")[-1],
        huggingface_name=hf_name,
        reference="https://huggingface.co/{hf_name}",
        languages=[],
        embedding_size=384,
    )
    return seb.EmbeddingModel(
        loader=get_my_model,
        meta=meta,
    )

Note that if you want to use the CLI with one of your own added models you can import registrered functions from a file specified using the `--code` flag.