Skip to content

Commit

Permalink
Browse files Browse the repository at this point in the history
  • Loading branch information
KennethEnevoldsen committed Jan 22, 2024
2 parents 3b07a4d + e298e66 commit 07efe8f
Show file tree
Hide file tree
Showing 3 changed files with 18 additions and 4 deletions.
13 changes: 13 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,19 @@



## v0.5.3 (2024-01-22)

### Fix

* fix: ScaLA now correctly wraps models to allow for task argument to be passed Renamed scala cache ([`a70c950`](https://github.com/KennethEnevoldsen/scandinavian-embedding-benchmark/commit/a70c950b0093924e9aa64dc4c0a4604cb868c864))

### Unknown

* Merge pull request #73 from KennethEnevoldsen/bug-scala-missing-task-encode-wrapper

Wraps ScaLA models in MTEBTaskModel ([`e2eee05`](https://github.com/KennethEnevoldsen/scandinavian-embedding-benchmark/commit/e2eee055775ffcc7019fa470dcd5d70a124416aa))


## v0.5.2 (2024-01-19)

### Documentation
Expand Down
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"

[project]
name = "seb"
version = "0.5.2"
version = "0.5.3"
authors = [
{ name = "Kenneth Enevoldsen", email = "Kennethcenevoldsen@gmail.com" },
]
Expand Down
7 changes: 4 additions & 3 deletions src/seb/registered_tasks/multilingual.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
from datasets import DatasetDict, concatenate_datasets

from seb.interfaces.model import Encoder
from seb.interfaces.mteb_task import MTEBTask
from seb.interfaces.mteb_task import MTEBTask, MTEBTaskModel
from seb.interfaces.task import Task
from seb.registries import tasks
from seb.result_dataclasses import TaskResult
Expand Down Expand Up @@ -84,7 +84,7 @@ def get_descriptive_stats(self) -> dict[str, Any]:
for text_column in self._text_columns:
texts += ds[split][text_column]

document_lengths = [len(text) for text in texts]
document_lengths = np.array([len(text) for text in texts])

mean = np.mean(document_lengths)
std = np.std(document_lengths)
Expand All @@ -96,9 +96,10 @@ def get_descriptive_stats(self) -> dict[str, Any]:

def evaluate(self, model: Encoder) -> TaskResult:
scores = {}
_model = MTEBTaskModel(model, self)
for lang, mteb_task in self.mteb_tasks.items():
mteb_task.load_data()
score = mteb_task.evaluate(model)
score = mteb_task.evaluate(_model)
scores[lang] = score

return TaskResult(
Expand Down

0 comments on commit 07efe8f

Please sign in to comment.