mk-llm-eval

Evaluation framework for LLMs in Macedonian

This simple app can be used to get quick evaluation scores against several popular benchmarks that have been translated to Macedonian.

The app is compatible with openai's api format, more precisely the chat completions endpoint. This means it can be used to target openai's models or a custom model that is running in an openai compatible server via a framework such as vLLM for example.

Usage

To run the benchmarks just run the app via the cli:

python eval.py --benchmark trajkovnikola/arc_easy_mk --benchmark_split train --model_endpoint http://model-endpoint/v1 --model meta-llama/Meta-Llama-3-70B-Instruct  --num_samples 5000 --use_system_prompt

Arguments:

benchmark: name of the benchmark to run Currently supported benchmarks are:
- trajkovnikola/arc_easy_mk
- trajkovnikola/arc_challenge_mk
- trajkovnikola/winogrande_mk
- classla/COPA-MK
benchmark_split: different datasets have different splits
model_endpoint: the endpoint where the model is served
model: the model we want to use
num_samples: number of samples from the dataset to be used, default 10 for testing
use_system_prompt: whether to add a system prompt, if the model supports it

After the app finishes, it will print out the score achieved and save a csv file with the name of the benchmark with the responses to each sample.

To add a new benchmark, first of all we need the translated dataset to be pushed to huggingface and then we need to extend the PromptPrepper and ResultsParser classes if necessary to accommodate for the new benchmark.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
eval.py		eval.py
logger.py		logger.py
requirements.txt		requirements.txt
tasks.py		tasks.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

mk-llm-eval

Usage

About

Releases

Packages

Languages

License

N13T/mk-llm-eval

Folders and files

Latest commit

History

Repository files navigation

mk-llm-eval

Usage

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages