Skip to content

NLP-Core-Team/mmlu_ru

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MMLU in Russian (Massive Multitask Language Understanding)

Quickstart

python3.9 mmlu_ru.py --hf_model_id "huggyllama/llama-7b" --k_shot 5 --lang "ru" --output_dir "results"

Possible parameters:

  • lang: "ru" or "en".
  • k_shot: 0 to 5.
  • hf_model_id: huggyllama-LLaMA and IlyaGusev-Saiga series.

It will produce JSONL files with actual prompts and MMLU choices scores ("A"/"B"/"C"/"D"), and CSV files with accuracy scores (with total/category/subcategory granularity).

CPU-only setup not tested.

Other models

To use with other models, revisit:

  • conversation.py for suitable Conversation class
  • mmlu_ru.get_prompt_from_dataframes for any custom (should be ok for both foundation/instruct-tuned LLMs)
  • mmlu_ru.load_model_components for loading customs settings

MMLU Dataset

Dataset used: https://huggingface.co/datasets/NLPCoreTeam/mmlu_ru (translated into Russian via Yandex.Translate API).

MMLU dataset covers 57 different tasks. Each task requires to choose the right answer out of four options for a given question. Totally ~14K test samples.

Evals

model paper, MMLU EN MMLU EN, k=5, ctx=2048 MMLU RU, k=5, ctx=2048
Llama 1 7B 35.1 36.18 31.65
Llama 1 13B 46.9 48.81 38.03
Llama 1 33B 57.8 59.63 49.06
Llama 1 65B 63.4 65.21 53.96
Llama 2 7B 45.3 47.87 37.86
Llama 2 13B 54.8 56.96 45.29
Llama 2 34B 62.6 unk unk
Llama 2 70B 68.9 71.16 62.86

Please note the scores may slightly vary (vs other evals), but inter-model comparison should be stable.

Additional Resources

Contributions

Dataset translated and code adopted by NLP core team RnD Telegram channel

Releases

No releases published

Packages

No packages published

Languages