# Getting started
This is a minimal documentation at the moment at I am unsure how many will use the package. If you do want to use the package, but feel like the documentation is lacking feel free to open an issue on GitHub.

## Running a task
To run a task you will need to fetch the task, a model run it.


In [6]:
import seb

tasks = ["DKHate"]
model = seb.get_model("jonfd/electra-small-nordic")

# initialize benchmark with tasks
benchmark = seb.Benchmark(tasks=tasks)

# benchmark the model
benchmark_result = benchmark.evaluate_model(model)

In [7]:
benchmark_result  # examine output

BenchmarkResults(meta=ModelMeta(name='electra-small-nordic', description=None, huggingface_name='jonfd/electra-small-nordic', reference='https://huggingface.co/{hf_name}', languages=['da', 'no', 'sv']), task_results=[TaskResult(task_name='DKHate', task_description='Danish Tweets annotated for Hate Speech either being Offensive or not', task_version='59d12749a3c91a186063c7d729ec392fda94681c_1.0.3.dev0', time_of_run=datetime.datetime(2023, 7, 27, 13, 21, 43, 861342), scores={'da': {'accuracy': 0.5945288753799393, 'f1': 0.4912211182797449, 'ap': 0.15442320525050762, 'accuracy_stderr': 0.07818347662767612, 'f1_stderr': 0.05511334661624392, 'ap_stderr': 0.019081572459727296, 'main_score': 0.5945288753799393}}, main_score='accuracy')])

In [8]:
benchmark_result[0]  # examine the results for the first task

TaskResult(task_name='DKHate', task_description='Danish Tweets annotated for Hate Speech either being Offensive or not', task_version='59d12749a3c91a186063c7d729ec392fda94681c_1.0.3.dev0', time_of_run=datetime.datetime(2023, 7, 27, 13, 21, 43, 861342), scores={'da': {'accuracy': 0.5945288753799393, 'f1': 0.4912211182797449, 'ap': 0.15442320525050762, 'accuracy_stderr': 0.07818347662767612, 'f1_stderr': 0.05511334661624392, 'ap_stderr': 0.019081572459727296, 'main_score': 0.5945288753799393}}, main_score='accuracy')

## Adding a model

The benchmark uses a registry to add models. A model in `seb` includes two thing. 1) a metadata object (`seb.ModelMeta`) describing the metadata of the model and 2) a loader for the model itself, which is an object that needs an encode methods as described by the `seb.ModelInterface`. Here is an example of how to add a model:

In [1]:
import seb

model_name = "sentence-transformers/all-MiniLM-L12-v2"


def get_my_model():
    from sentence_transformers import SentenceTransformer

    return SentenceTransformer(model_name)


@seb.models.register(model_name)
def create_all_mini_lm_l6_v2() -> seb.SebModel:
    hf_name = model_name

    # create meta data
    meta = seb.ModelMeta(
        name=hf_name.split("/")[-1],
        huggingface_name=hf_name,
        reference="https://huggingface.co/{hf_name}",
        languages=[],
    )
    return seb.SebModel(
        loader=get_my_model,
        meta=meta,
    )

## Reproducing the Benchmark
Reproducing the benchmark is easy and is doable simply using the following command:

In [None]:
# deliberately not running this test as it takes a while
from seb import run_benchmark

results = run_benchmark()

This runs the full benchmark on all the registrered models as well as all the registrered datasets. The results are returned as a dictionary of where the keys represent the benchmark and values are a list of benchmark results.