Add TemplateLM boilerplate LM class #1279

anjor · 2024-01-13T23:25:06Z

Replaces #1215

haileyschoelkopf

Thanks very much for working on this! I think it largely looks good, but may opt to merge this in after getting #1277 into it and fixing conflicts.

I wish that we could either also add loglikelihood_rolling() into this or cut out some of the generate_until() boilerplate but maybe that would either add too much indirection in HFLM (which is nice pedagogically to have separate/self-contained), but also realize we couldn't because of HFLM's distributed handling.

If one of these decisions means that OpenAICompletionsLM couldn't inherit from TemplateLM, but that both vLLM and HF could significantly reduce code reuse, I think I'd prefer that on net, but others may differ in opinion.

lm_eval/api/model.py

anjor · 2024-01-25T15:11:03Z

Looks like #1277 merged. I will work on conflicts now.

…n docs (EleutherAI#1261) * Make parallelize=True distinction clearer in documentation. * run linter

…EleutherAI#1273) * benchmark yamls allow minor edits of already registered tasks * add documentation * removed print

* add WIP device_map overrides * update handling outside of accelerate launcher * change .to(device) log to debug level * run linter

* rewor documentation for explaining local dataset * fix typo * Update new_task_guide.md

It looks like Google Scholar has [already noticed](https://scholar.google.com/scholar?hl=en&as_sdt=0%2C9&authuser=2&q=%22A+framework+for+few-shot+language+model+evaluation%2C+12+2023%22&btnG=) the updated citation block so let's add it back in.

@StellaAthena

Bumping CITATION.bib to match re-adding the citation in readme. cc @StellaAthena

* Update README.md * punctuation --------- Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

* Update nq_open.yaml change regex * Bump NQ version --------- Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

* tuple should be considered as well * set option to keep callable as callable

* add fix fordeciding if stderr is N/A or not * process N/A

* Add `local-completions` support using OpenAI interface * Refactor oa_completion * Address tokenizer comments and change request chunks to batch size * Add warning message for tiktoken backend * fix formatting * fix whitespace * Update README.md --------- Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

* Update arc_easy.yaml * Update flan_cot.yaml * update HF dataset path * Update freeform.yaml * Update flan_cot.yaml --------- Co-authored-by: Lintang Sutawika <lintang@eleuther.ai>

* publish to pypi * lint * Update publish.yml * minor

* make deps not point to github urls * formatting * try making PyPI only run on tag pushes

* Add support for RWKV models with World tokenizer The RWKV line of model with the World tokenizer, does not allow the padding token to be configured, and has its value preset as 0 This however fails all the "if set" checks, and would cause the tokenizer to crash. A tokenizer class name check was added, in addition to a model type check, as there exists RWKV models which uses the neox tokenizers * Update huggingface.py Genericized so that this supports any RWKVWorld tokenizer, and added a fall-back for if the HF implementation name changes. * Comply with formatting guidelines * fix format --------- Co-authored-by: Stella Biderman <stellabiderman@gmail.com> Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

* add bypass metric * fixed `bypass` metric. * add task attributes if predict_only * add `predict_only` checks * add docs * added `overide_metric`, `override_config` to `Task` * nits * nit * changed --predict_only to generations; nits * nits * nits * change gen_kwargs warning * add note about `--predict_only` in README.md * added `predict_only` * move table to bottom * nit * change null aggregation to bypass (conflict) * bugfix; default `temp=0.0` * typo

anjor · 2024-02-16T21:43:29Z

@haileyschoelkopf any blockers here, or can we get it merged?

haileyschoelkopf · 2024-02-19T19:37:52Z

@anjor just https://github.com/EleutherAI/lm-evaluation-harness/pull/1279/files#r1453700886 I think! Can review today or tomorrow.

haileyschoelkopf

Thank you @anjor ! Appreciate this and hope it'll be helpful to other contributors!

lm_eval/api/model.py

LSinev · 2024-02-22T06:37:47Z

As subclassing of HFLM is encouraged in documentation after this PR merged, is there (or will be) a way to put files with new subclasses to (sub)path in --include_path (or other name) argument of __main__.py?

haileyschoelkopf · 2024-02-22T15:08:29Z

@LSinev If you'd be willing to create a PR for this, we'd gladly accept it! Otherwise, we'll make an issue for it but uncertain precisely when we'll have time to add it.

Making --include_path serve this functionality (importing the module, thus making decorated classes added to the registries) when pointing to a python module might suffice for this?

LSinev · 2024-02-23T10:42:33Z

I am not familiar enough with all the structure of this project, so am not ready to PR such great contribution. Some new issue describing ideas of possible realization would be great in case of such PR in the future.
I am experimenting with custom python Tasks for huggingface models in a fork of this project, so I have no experience even with basic usage of --include_path for now so have no idea of answer to your question. May be, as tasks are processed with the TaskManager class, so should models be proccesed with LMManager class, and also ConfigurableLM with yaml setups may be a good idea (but I used HFLM only with no patching needed, so have no idea if it is needed)

haileyschoelkopf · 2024-02-23T16:28:32Z

No worries! I'll leave the issue open in case there is someone who wants to add this. I think an LMManager class might be overkill for this, but curious what sorts of features you'd like out of a ConfigurableLM class!

@StellaAthena

* loglikelihood refactor using template lm * linter * fix whitespace in target + prompt for CoT gsm8k (EleutherAI#1275) * Make `parallelize=True` vs. `accelerate launch` distinction clearer in docs (EleutherAI#1261) * Make parallelize=True distinction clearer in documentation. * run linter * Allow parameter edits for registered tasks when listed in a benchmark (EleutherAI#1273) * benchmark yamls allow minor edits of already registered tasks * add documentation * removed print * Fix data-parallel evaluation with quantized models (EleutherAI#1270) * add WIP device_map overrides * update handling outside of accelerate launcher * change .to(device) log to debug level * run linter * Rework documentation for explaining local dataset (EleutherAI#1284) * rewor documentation for explaining local dataset * fix typo * Update new_task_guide.md * Re-add citation It looks like Google Scholar has [already noticed](https://scholar.google.com/scholar?hl=en&as_sdt=0%2C9&authuser=2&q=%22A+framework+for+few-shot+language+model+evaluation%2C+12+2023%22&btnG=) the updated citation block so let's add it back in. * Update CITATION.bib (EleutherAI#1285) Bumping CITATION.bib to match re-adding the citation in readme. cc @StellaAthena * Update nq_open.yaml (EleutherAI#1289) * Update README.md with custom integration doc (EleutherAI#1298) * Update README.md * punctuation --------- Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com> * Update nq_open.yaml (EleutherAI#1305) * Update nq_open.yaml change regex * Bump NQ version --------- Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com> * Update task_guide.md (EleutherAI#1306) * Update pyproject.toml (EleutherAI#1312) * Fix polemo2_in.yaml config name (EleutherAI#1313) * Update pyproject.toml (EleutherAI#1314) * Fix group register (EleutherAI#1315) * tuple should be considered as well * set option to keep callable as callable * Update task_guide.md (EleutherAI#1316) * Update polemo2_in.yaml (EleutherAI#1318) * don't pass extra kwargs to mamba any more (EleutherAI#1328) * Fix Issue regarding stderr (EleutherAI#1327) * add fix fordeciding if stderr is N/A or not * process N/A * Add `local-completions` support using OpenAI interface (EleutherAI#1277) * Add `local-completions` support using OpenAI interface * Refactor oa_completion * Address tokenizer comments and change request chunks to batch size * Add warning message for tiktoken backend * fix formatting * fix whitespace * Update README.md --------- Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com> * fallback to classname when LM doesnt have config (EleutherAI#1334) * fix a trailing whitespace that breaks a lint job (EleutherAI#1335) * skip "benchmarks" in changed_tasks (EleutherAI#1336) * Update migrated HF dataset paths (EleutherAI#1332) * Update arc_easy.yaml * Update flan_cot.yaml * update HF dataset path * Update freeform.yaml * Update flan_cot.yaml --------- Co-authored-by: Lintang Sutawika <lintang@eleuther.ai> * Don't use `get_task_dict()` in task registration / initialization (EleutherAI#1331) * don't use get_task_dict() as a helper, it will download the dataset! * pre-commit * Update README.md --------- Co-authored-by: lintangsutawika <lintang@eleuther.ai> * manage default (greedy) gen_kwargs in vllm (EleutherAI#1341) * manage default (greedy) gen_kwargs in vllm better * mirror HF `do_sample` * just need to set temp=0 for greedy * modified default gen_kwargs to work better with CLI; changed prompt_logprobs=1 (EleutherAI#1345) * update links to task_guide.md (EleutherAI#1348) * `Filter` docs not offset by `doc_id` (EleutherAI#1349) * get `doc` from instance * acceletate bugfix: get ground doc from instance * convert filter to `process_result` * get docs from instances in `FilterEnsemble` * rename * nit * better looping * fix typehint * Add FAQ on `lm_eval.tasks.initialize_tasks()` to README (EleutherAI#1330) * Update README.md * [!Tip] * Refix issue regarding stderr (EleutherAI#1357) * Add causalLM OpenVino models (EleutherAI#1290) * added intel optimum * added intel optimum in readme * modified intel optimum * modified intel optimum * modified intel optimum * modified install optimum * modified path of IR file * added openvino_device * added openvino_device2 * changed optimum-causal to openvino-causal * Update README.md * Update README.md * remove `lm_eval.base` import * update openvino-causal -> openvino ; pass device through super().__init__() * Update README.md * Add optimum to tests dependencies * apply pre-commit * fix so tests pass --------- Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com> Co-authored-by: haileyschoelkopf <hailey@eleuther.ai> * Apply some best practices and guideline recommendations to code (EleutherAI#1363) * raise Exception, not a string Additional info https://peps.python.org/pep-0352/#exception-hierarchy-changes https://docs.python.org/3.8/tutorial/errors.html#raising-exceptions * Apply PEP8 recommendation to prefer isinstance "Object type comparisons should always use isinstance() instead of comparing types directly" https://peps.python.org/pep-0008/ * Remove dangerous default mutable values in arguments https://pylint.readthedocs.io/en/stable/user_guide/messages/warning/dangerous-default-value.html * Format logging messages with fstring (not with format) Additional info https://pylint.readthedocs.io/en/stable/user_guide/messages/warning/logging-format-interpolation.html There are also discussions about the speed of formatting while logging or some unintended code executions pylint-dev/pylint#2395 https://stackoverflow.com/a/54368109 but at least one format (fstring one) will be used throughout the project * Specify utf-8 encoding for `open` explicitly If not specified, it may be supposed differently in different environments, OSes, and Python versions. See https://peps.python.org/pep-0597/ https://docs.python.org/3.11/library/locale.html#locale.getencoding https://docs.python.org/3.10/library/os.html#utf8-mode https://pylint.readthedocs.io/en/stable/user_guide/messages/warning/unspecified-encoding.html Helps also if some code from English language tasks is taken as inspiration for tasks in non-English languages. * Use inline-ignoring comments to pass pre-commit instead of identity process https://flake8.pycqa.org/en/3.0.1/user/ignoring-errors.html#in-line-ignoring-errors https://www.flake8rules.com/rules/F841.html flake8 comments are supported by ruff: https://docs.astral.sh/ruff/linter/#error-suppression * serialize callable functions in config (EleutherAI#1367) * delay filter init; remove `*args` (EleutherAI#1369) * delay filter init; remove `*args` * bugfix * optimize * type hint * Fix unintuitive `--gen_kwargs` behavior (EleutherAI#1329) * don't override do_sample if no value for it is passed * Update gen_kwargs override condition * Update huggingface.py * Update huggingface.py * run linters * silence an erroneous warning * Publish to pypi (EleutherAI#1194) * publish to pypi * lint * Update publish.yml * minor * Make dependencies compatible with PyPI (EleutherAI#1378) * make deps not point to github urls * formatting * try making PyPI only run on tag pushes * Add support for RWKV models with World tokenizer (EleutherAI#1374) * Add support for RWKV models with World tokenizer The RWKV line of model with the World tokenizer, does not allow the padding token to be configured, and has its value preset as 0 This however fails all the "if set" checks, and would cause the tokenizer to crash. A tokenizer class name check was added, in addition to a model type check, as there exists RWKV models which uses the neox tokenizers * Update huggingface.py Genericized so that this supports any RWKVWorld tokenizer, and added a fall-back for if the HF implementation name changes. * Comply with formatting guidelines * fix format --------- Co-authored-by: Stella Biderman <stellabiderman@gmail.com> Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com> * add bypass metric (EleutherAI#1156) * add bypass metric * fixed `bypass` metric. * add task attributes if predict_only * add `predict_only` checks * add docs * added `overide_metric`, `override_config` to `Task` * nits * nit * changed --predict_only to generations; nits * nits * nits * change gen_kwargs warning * add note about `--predict_only` in README.md * added `predict_only` * move table to bottom * nit * change null aggregation to bypass (conflict) * bugfix; default `temp=0.0` * typo * loglikelihood refactor using template lm * lint * code review * neuron optimum * Mention TemplateLM in model_guide.md * Update lm_eval/api/model.py * fix linter * fix format * fix format * fix format --------- Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com> Co-authored-by: Lintang Sutawika <lintang@eleuther.ai> Co-authored-by: Stella Biderman <stellabiderman@gmail.com> Co-authored-by: Mark Saroufim <marksaroufim@meta.com> Co-authored-by: Hannibal046 <38466901+Hannibal046@users.noreply.github.com> Co-authored-by: Danielle Pintz <38207072+daniellepintz@users.noreply.github.com> Co-authored-by: Quentin Lhoest <42851186+lhoestq@users.noreply.github.com> Co-authored-by: kwrobel.eth <djstrong@gmail.com> Co-authored-by: Michael Goin <michael@neuralmagic.com> Co-authored-by: Brian Vaughan <nairbv@users.noreply.github.com> Co-authored-by: Baber Abbasi <92168766+baberabb@users.noreply.github.com> Co-authored-by: thnkinbtfly <70014488+thnkinbtfly@users.noreply.github.com> Co-authored-by: NoushNabi <33136068+NoushNabi@users.noreply.github.com> Co-authored-by: haileyschoelkopf <hailey@eleuther.ai> Co-authored-by: LSinev <LSinev@users.noreply.github.com> Co-authored-by: Eugene Cheah <PicoCreator@users.noreply.github.com>

@StellaAthena

* loglikelihood refactor using template lm * linter * fix whitespace in target + prompt for CoT gsm8k (EleutherAI#1275) * Make `parallelize=True` vs. `accelerate launch` distinction clearer in docs (EleutherAI#1261) * Make parallelize=True distinction clearer in documentation. * run linter * Allow parameter edits for registered tasks when listed in a benchmark (EleutherAI#1273) * benchmark yamls allow minor edits of already registered tasks * add documentation * removed print * Fix data-parallel evaluation with quantized models (EleutherAI#1270) * add WIP device_map overrides * update handling outside of accelerate launcher * change .to(device) log to debug level * run linter * Rework documentation for explaining local dataset (EleutherAI#1284) * rewor documentation for explaining local dataset * fix typo * Update new_task_guide.md * Re-add citation It looks like Google Scholar has [already noticed](https://scholar.google.com/scholar?hl=en&as_sdt=0%2C9&authuser=2&q=%22A+framework+for+few-shot+language+model+evaluation%2C+12+2023%22&btnG=) the updated citation block so let's add it back in. * Update CITATION.bib (EleutherAI#1285) Bumping CITATION.bib to match re-adding the citation in readme. cc @StellaAthena * Update nq_open.yaml (EleutherAI#1289) * Update README.md with custom integration doc (EleutherAI#1298) * Update README.md * punctuation --------- Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com> * Update nq_open.yaml (EleutherAI#1305) * Update nq_open.yaml change regex * Bump NQ version --------- Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com> * Update task_guide.md (EleutherAI#1306) * Update pyproject.toml (EleutherAI#1312) * Fix polemo2_in.yaml config name (EleutherAI#1313) * Update pyproject.toml (EleutherAI#1314) * Fix group register (EleutherAI#1315) * tuple should be considered as well * set option to keep callable as callable * Update task_guide.md (EleutherAI#1316) * Update polemo2_in.yaml (EleutherAI#1318) * don't pass extra kwargs to mamba any more (EleutherAI#1328) * Fix Issue regarding stderr (EleutherAI#1327) * add fix fordeciding if stderr is N/A or not * process N/A * Add `local-completions` support using OpenAI interface (EleutherAI#1277) * Add `local-completions` support using OpenAI interface * Refactor oa_completion * Address tokenizer comments and change request chunks to batch size * Add warning message for tiktoken backend * fix formatting * fix whitespace * Update README.md --------- Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com> * fallback to classname when LM doesnt have config (EleutherAI#1334) * fix a trailing whitespace that breaks a lint job (EleutherAI#1335) * skip "benchmarks" in changed_tasks (EleutherAI#1336) * Update migrated HF dataset paths (EleutherAI#1332) * Update arc_easy.yaml * Update flan_cot.yaml * update HF dataset path * Update freeform.yaml * Update flan_cot.yaml --------- Co-authored-by: Lintang Sutawika <lintang@eleuther.ai> * Don't use `get_task_dict()` in task registration / initialization (EleutherAI#1331) * don't use get_task_dict() as a helper, it will download the dataset! * pre-commit * Update README.md --------- Co-authored-by: lintangsutawika <lintang@eleuther.ai> * manage default (greedy) gen_kwargs in vllm (EleutherAI#1341) * manage default (greedy) gen_kwargs in vllm better * mirror HF `do_sample` * just need to set temp=0 for greedy * modified default gen_kwargs to work better with CLI; changed prompt_logprobs=1 (EleutherAI#1345) * update links to task_guide.md (EleutherAI#1348) * `Filter` docs not offset by `doc_id` (EleutherAI#1349) * get `doc` from instance * acceletate bugfix: get ground doc from instance * convert filter to `process_result` * get docs from instances in `FilterEnsemble` * rename * nit * better looping * fix typehint * Add FAQ on `lm_eval.tasks.initialize_tasks()` to README (EleutherAI#1330) * Update README.md * [!Tip] * Refix issue regarding stderr (EleutherAI#1357) * Add causalLM OpenVino models (EleutherAI#1290) * added intel optimum * added intel optimum in readme * modified intel optimum * modified intel optimum * modified intel optimum * modified install optimum * modified path of IR file * added openvino_device * added openvino_device2 * changed optimum-causal to openvino-causal * Update README.md * Update README.md * remove `lm_eval.base` import * update openvino-causal -> openvino ; pass device through super().__init__() * Update README.md * Add optimum to tests dependencies * apply pre-commit * fix so tests pass --------- Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com> Co-authored-by: haileyschoelkopf <hailey@eleuther.ai> * Apply some best practices and guideline recommendations to code (EleutherAI#1363) * raise Exception, not a string Additional info https://peps.python.org/pep-0352/#exception-hierarchy-changes https://docs.python.org/3.8/tutorial/errors.html#raising-exceptions * Apply PEP8 recommendation to prefer isinstance "Object type comparisons should always use isinstance() instead of comparing types directly" https://peps.python.org/pep-0008/ * Remove dangerous default mutable values in arguments https://pylint.readthedocs.io/en/stable/user_guide/messages/warning/dangerous-default-value.html * Format logging messages with fstring (not with format) Additional info https://pylint.readthedocs.io/en/stable/user_guide/messages/warning/logging-format-interpolation.html There are also discussions about the speed of formatting while logging or some unintended code executions pylint-dev/pylint#2395 https://stackoverflow.com/a/54368109 but at least one format (fstring one) will be used throughout the project * Specify utf-8 encoding for `open` explicitly If not specified, it may be supposed differently in different environments, OSes, and Python versions. See https://peps.python.org/pep-0597/ https://docs.python.org/3.11/library/locale.html#locale.getencoding https://docs.python.org/3.10/library/os.html#utf8-mode https://pylint.readthedocs.io/en/stable/user_guide/messages/warning/unspecified-encoding.html Helps also if some code from English language tasks is taken as inspiration for tasks in non-English languages. * Use inline-ignoring comments to pass pre-commit instead of identity process https://flake8.pycqa.org/en/3.0.1/user/ignoring-errors.html#in-line-ignoring-errors https://www.flake8rules.com/rules/F841.html flake8 comments are supported by ruff: https://docs.astral.sh/ruff/linter/#error-suppression * serialize callable functions in config (EleutherAI#1367) * delay filter init; remove `*args` (EleutherAI#1369) * delay filter init; remove `*args` * bugfix * optimize * type hint * Fix unintuitive `--gen_kwargs` behavior (EleutherAI#1329) * don't override do_sample if no value for it is passed * Update gen_kwargs override condition * Update huggingface.py * Update huggingface.py * run linters * silence an erroneous warning * Publish to pypi (EleutherAI#1194) * publish to pypi * lint * Update publish.yml * minor * Make dependencies compatible with PyPI (EleutherAI#1378) * make deps not point to github urls * formatting * try making PyPI only run on tag pushes * Add support for RWKV models with World tokenizer (EleutherAI#1374) * Add support for RWKV models with World tokenizer The RWKV line of model with the World tokenizer, does not allow the padding token to be configured, and has its value preset as 0 This however fails all the "if set" checks, and would cause the tokenizer to crash. A tokenizer class name check was added, in addition to a model type check, as there exists RWKV models which uses the neox tokenizers * Update huggingface.py Genericized so that this supports any RWKVWorld tokenizer, and added a fall-back for if the HF implementation name changes. * Comply with formatting guidelines * fix format --------- Co-authored-by: Stella Biderman <stellabiderman@gmail.com> Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com> * add bypass metric (EleutherAI#1156) * add bypass metric * fixed `bypass` metric. * add task attributes if predict_only * add `predict_only` checks * add docs * added `overide_metric`, `override_config` to `Task` * nits * nit * changed --predict_only to generations; nits * nits * nits * change gen_kwargs warning * add note about `--predict_only` in README.md * added `predict_only` * move table to bottom * nit * change null aggregation to bypass (conflict) * bugfix; default `temp=0.0` * typo * loglikelihood refactor using template lm * lint * code review * neuron optimum * Mention TemplateLM in model_guide.md * Update lm_eval/api/model.py * fix linter * fix format * fix format * fix format --------- Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com> Co-authored-by: Lintang Sutawika <lintang@eleuther.ai> Co-authored-by: Stella Biderman <stellabiderman@gmail.com> Co-authored-by: Mark Saroufim <marksaroufim@meta.com> Co-authored-by: Hannibal046 <38466901+Hannibal046@users.noreply.github.com> Co-authored-by: Danielle Pintz <38207072+daniellepintz@users.noreply.github.com> Co-authored-by: Quentin Lhoest <42851186+lhoestq@users.noreply.github.com> Co-authored-by: kwrobel.eth <djstrong@gmail.com> Co-authored-by: Michael Goin <michael@neuralmagic.com> Co-authored-by: Brian Vaughan <nairbv@users.noreply.github.com> Co-authored-by: Baber Abbasi <92168766+baberabb@users.noreply.github.com> Co-authored-by: thnkinbtfly <70014488+thnkinbtfly@users.noreply.github.com> Co-authored-by: NoushNabi <33136068+NoushNabi@users.noreply.github.com> Co-authored-by: haileyschoelkopf <hailey@eleuther.ai> Co-authored-by: LSinev <LSinev@users.noreply.github.com> Co-authored-by: Eugene Cheah <PicoCreator@users.noreply.github.com>

@StellaAthena

* loglikelihood refactor using template lm * linter * fix whitespace in target + prompt for CoT gsm8k (EleutherAI#1275) * Make `parallelize=True` vs. `accelerate launch` distinction clearer in docs (EleutherAI#1261) * Make parallelize=True distinction clearer in documentation. * run linter * Allow parameter edits for registered tasks when listed in a benchmark (EleutherAI#1273) * benchmark yamls allow minor edits of already registered tasks * add documentation * removed print * Fix data-parallel evaluation with quantized models (EleutherAI#1270) * add WIP device_map overrides * update handling outside of accelerate launcher * change .to(device) log to debug level * run linter * Rework documentation for explaining local dataset (EleutherAI#1284) * rewor documentation for explaining local dataset * fix typo * Update new_task_guide.md * Re-add citation It looks like Google Scholar has [already noticed](https://scholar.google.com/scholar?hl=en&as_sdt=0%2C9&authuser=2&q=%22A+framework+for+few-shot+language+model+evaluation%2C+12+2023%22&btnG=) the updated citation block so let's add it back in. * Update CITATION.bib (EleutherAI#1285) Bumping CITATION.bib to match re-adding the citation in readme. cc @StellaAthena * Update nq_open.yaml (EleutherAI#1289) * Update README.md with custom integration doc (EleutherAI#1298) * Update README.md * punctuation --------- Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com> * Update nq_open.yaml (EleutherAI#1305) * Update nq_open.yaml change regex * Bump NQ version --------- Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com> * Update task_guide.md (EleutherAI#1306) * Update pyproject.toml (EleutherAI#1312) * Fix polemo2_in.yaml config name (EleutherAI#1313) * Update pyproject.toml (EleutherAI#1314) * Fix group register (EleutherAI#1315) * tuple should be considered as well * set option to keep callable as callable * Update task_guide.md (EleutherAI#1316) * Update polemo2_in.yaml (EleutherAI#1318) * don't pass extra kwargs to mamba any more (EleutherAI#1328) * Fix Issue regarding stderr (EleutherAI#1327) * add fix fordeciding if stderr is N/A or not * process N/A * Add `local-completions` support using OpenAI interface (EleutherAI#1277) * Add `local-completions` support using OpenAI interface * Refactor oa_completion * Address tokenizer comments and change request chunks to batch size * Add warning message for tiktoken backend * fix formatting * fix whitespace * Update README.md --------- Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com> * fallback to classname when LM doesnt have config (EleutherAI#1334) * fix a trailing whitespace that breaks a lint job (EleutherAI#1335) * skip "benchmarks" in changed_tasks (EleutherAI#1336) * Update migrated HF dataset paths (EleutherAI#1332) * Update arc_easy.yaml * Update flan_cot.yaml * update HF dataset path * Update freeform.yaml * Update flan_cot.yaml --------- Co-authored-by: Lintang Sutawika <lintang@eleuther.ai> * Don't use `get_task_dict()` in task registration / initialization (EleutherAI#1331) * don't use get_task_dict() as a helper, it will download the dataset! * pre-commit * Update README.md --------- Co-authored-by: lintangsutawika <lintang@eleuther.ai> * manage default (greedy) gen_kwargs in vllm (EleutherAI#1341) * manage default (greedy) gen_kwargs in vllm better * mirror HF `do_sample` * just need to set temp=0 for greedy * modified default gen_kwargs to work better with CLI; changed prompt_logprobs=1 (EleutherAI#1345) * update links to task_guide.md (EleutherAI#1348) * `Filter` docs not offset by `doc_id` (EleutherAI#1349) * get `doc` from instance * acceletate bugfix: get ground doc from instance * convert filter to `process_result` * get docs from instances in `FilterEnsemble` * rename * nit * better looping * fix typehint * Add FAQ on `lm_eval.tasks.initialize_tasks()` to README (EleutherAI#1330) * Update README.md * [!Tip] * Refix issue regarding stderr (EleutherAI#1357) * Add causalLM OpenVino models (EleutherAI#1290) * added intel optimum * added intel optimum in readme * modified intel optimum * modified intel optimum * modified intel optimum * modified install optimum * modified path of IR file * added openvino_device * added openvino_device2 * changed optimum-causal to openvino-causal * Update README.md * Update README.md * remove `lm_eval.base` import * update openvino-causal -> openvino ; pass device through super().__init__() * Update README.md * Add optimum to tests dependencies * apply pre-commit * fix so tests pass --------- Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com> Co-authored-by: haileyschoelkopf <hailey@eleuther.ai> * Apply some best practices and guideline recommendations to code (EleutherAI#1363) * raise Exception, not a string Additional info https://peps.python.org/pep-0352/#exception-hierarchy-changes https://docs.python.org/3.8/tutorial/errors.html#raising-exceptions * Apply PEP8 recommendation to prefer isinstance "Object type comparisons should always use isinstance() instead of comparing types directly" https://peps.python.org/pep-0008/ * Remove dangerous default mutable values in arguments https://pylint.readthedocs.io/en/stable/user_guide/messages/warning/dangerous-default-value.html * Format logging messages with fstring (not with format) Additional info https://pylint.readthedocs.io/en/stable/user_guide/messages/warning/logging-format-interpolation.html There are also discussions about the speed of formatting while logging or some unintended code executions pylint-dev/pylint#2395 https://stackoverflow.com/a/54368109 but at least one format (fstring one) will be used throughout the project * Specify utf-8 encoding for `open` explicitly If not specified, it may be supposed differently in different environments, OSes, and Python versions. See https://peps.python.org/pep-0597/ https://docs.python.org/3.11/library/locale.html#locale.getencoding https://docs.python.org/3.10/library/os.html#utf8-mode https://pylint.readthedocs.io/en/stable/user_guide/messages/warning/unspecified-encoding.html Helps also if some code from English language tasks is taken as inspiration for tasks in non-English languages. * Use inline-ignoring comments to pass pre-commit instead of identity process https://flake8.pycqa.org/en/3.0.1/user/ignoring-errors.html#in-line-ignoring-errors https://www.flake8rules.com/rules/F841.html flake8 comments are supported by ruff: https://docs.astral.sh/ruff/linter/#error-suppression * serialize callable functions in config (EleutherAI#1367) * delay filter init; remove `*args` (EleutherAI#1369) * delay filter init; remove `*args` * bugfix * optimize * type hint * Fix unintuitive `--gen_kwargs` behavior (EleutherAI#1329) * don't override do_sample if no value for it is passed * Update gen_kwargs override condition * Update huggingface.py * Update huggingface.py * run linters * silence an erroneous warning * Publish to pypi (EleutherAI#1194) * publish to pypi * lint * Update publish.yml * minor * Make dependencies compatible with PyPI (EleutherAI#1378) * make deps not point to github urls * formatting * try making PyPI only run on tag pushes * Add support for RWKV models with World tokenizer (EleutherAI#1374) * Add support for RWKV models with World tokenizer The RWKV line of model with the World tokenizer, does not allow the padding token to be configured, and has its value preset as 0 This however fails all the "if set" checks, and would cause the tokenizer to crash. A tokenizer class name check was added, in addition to a model type check, as there exists RWKV models which uses the neox tokenizers * Update huggingface.py Genericized so that this supports any RWKVWorld tokenizer, and added a fall-back for if the HF implementation name changes. * Comply with formatting guidelines * fix format --------- Co-authored-by: Stella Biderman <stellabiderman@gmail.com> Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com> * add bypass metric (EleutherAI#1156) * add bypass metric * fixed `bypass` metric. * add task attributes if predict_only * add `predict_only` checks * add docs * added `overide_metric`, `override_config` to `Task` * nits * nit * changed --predict_only to generations; nits * nits * nits * change gen_kwargs warning * add note about `--predict_only` in README.md * added `predict_only` * move table to bottom * nit * change null aggregation to bypass (conflict) * bugfix; default `temp=0.0` * typo * loglikelihood refactor using template lm * lint * code review * neuron optimum * Mention TemplateLM in model_guide.md * Update lm_eval/api/model.py * fix linter * fix format * fix format * fix format --------- Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com> Co-authored-by: Lintang Sutawika <lintang@eleuther.ai> Co-authored-by: Stella Biderman <stellabiderman@gmail.com> Co-authored-by: Mark Saroufim <marksaroufim@meta.com> Co-authored-by: Hannibal046 <38466901+Hannibal046@users.noreply.github.com> Co-authored-by: Danielle Pintz <38207072+daniellepintz@users.noreply.github.com> Co-authored-by: Quentin Lhoest <42851186+lhoestq@users.noreply.github.com> Co-authored-by: kwrobel.eth <djstrong@gmail.com> Co-authored-by: Michael Goin <michael@neuralmagic.com> Co-authored-by: Brian Vaughan <nairbv@users.noreply.github.com> Co-authored-by: Baber Abbasi <92168766+baberabb@users.noreply.github.com> Co-authored-by: thnkinbtfly <70014488+thnkinbtfly@users.noreply.github.com> Co-authored-by: NoushNabi <33136068+NoushNabi@users.noreply.github.com> Co-authored-by: haileyschoelkopf <hailey@eleuther.ai> Co-authored-by: LSinev <LSinev@users.noreply.github.com> Co-authored-by: Eugene Cheah <PicoCreator@users.noreply.github.com>

loglikelihood refactor using template lm

b4fcc09

anjor requested review from haileyschoelkopf and lintangsutawika as code owners January 13, 2024 23:25

linter

ea44741

haileyschoelkopf reviewed Jan 16, 2024

View reviewed changes

lm_eval/api/model.py Show resolved Hide resolved

haileyschoelkopf and others added 24 commits January 31, 2024 22:41

fix whitespace in target + prompt for CoT gsm8k (EleutherAI#1275)

4e6a870

Make parallelize=True vs. accelerate launch distinction clearer i…

d444e9a

…n docs (EleutherAI#1261) * Make parallelize=True distinction clearer in documentation. * run linter

Allow parameter edits for registered tasks when listed in a benchmark (…

d41a351

…EleutherAI#1273) * benchmark yamls allow minor edits of already registered tasks * add documentation * removed print

Fix data-parallel evaluation with quantized models (EleutherAI#1270)

db3ee51

* add WIP device_map overrides * update handling outside of accelerate launcher * change .to(device) log to debug level * run linter

Rework documentation for explaining local dataset (EleutherAI#1284)

1c07f70

* rewor documentation for explaining local dataset * fix typo * Update new_task_guide.md

Re-add citation

b716761

It looks like Google Scholar has [already noticed](https://scholar.google.com/scholar?hl=en&as_sdt=0%2C9&authuser=2&q=%22A+framework+for+few-shot+language+model+evaluation%2C+12+2023%22&btnG=) the updated citation block so let's add it back in.

Update CITATION.bib (EleutherAI#1285)

370cbbe

Bumping CITATION.bib to match re-adding the citation in readme. cc @StellaAthena

Update nq_open.yaml (EleutherAI#1289)

4702624

Update README.md with custom integration doc (EleutherAI#1298)

0013399

* Update README.md * punctuation --------- Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

Update nq_open.yaml (EleutherAI#1305)

8783281

* Update nq_open.yaml change regex * Bump NQ version --------- Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

Update task_guide.md (EleutherAI#1306)

5762058

Update pyproject.toml (EleutherAI#1312)

55e51ec

Fix polemo2_in.yaml config name (EleutherAI#1313)

5fb93fc

Update pyproject.toml (EleutherAI#1314)

7724bf1

Fix group register (EleutherAI#1315)

3688b1f

* tuple should be considered as well * set option to keep callable as callable

Update task_guide.md (EleutherAI#1316)

b6051f9

Update polemo2_in.yaml (EleutherAI#1318)

d0de14e

don't pass extra kwargs to mamba any more (EleutherAI#1328)

e7daca5

Fix Issue regarding stderr (EleutherAI#1327)

e8bc89d

* add fix fordeciding if stderr is N/A or not * process N/A

fallback to classname when LM doesnt have config (EleutherAI#1334)

ea12d33

fix a trailing whitespace that breaks a lint job (EleutherAI#1335)

413f183

skip "benchmarks" in changed_tasks (EleutherAI#1336)

9703c8a

Update migrated HF dataset paths (EleutherAI#1332)

0ffc6b6

* Update arc_easy.yaml * Update flan_cot.yaml * update HF dataset path * Update freeform.yaml * Update flan_cot.yaml --------- Co-authored-by: Lintang Sutawika <lintang@eleuther.ai>

anjor and others added 7 commits January 31, 2024 22:42

Publish to pypi (EleutherAI#1194)

5ff7c41

* publish to pypi * lint * Update publish.yml * minor

Make dependencies compatible with PyPI (EleutherAI#1378)

68a193b

* make deps not point to github urls * formatting * try making PyPI only run on tag pushes

loglikelihood refactor using template lm

8d974bf

Merge branch 'main' into anjor/loglikelihood-refactor-2

3b07548

lint

b9436a9

haileyschoelkopf mentioned this pull request Feb 1, 2024

Support for Inf2 optimum class [WIP] #1364

Merged

anjor and others added 4 commits February 21, 2024 20:24

code review

907968c

Merge branch 'main' into anjor/loglikelihood-refactor-2

bb5481a

neuron optimum

129a2ee

Mention TemplateLM in model_guide.md

a97260e

haileyschoelkopf approved these changes Feb 22, 2024

View reviewed changes

lm_eval/api/model.py Show resolved Hide resolved

haileyschoelkopf added 5 commits February 21, 2024 19:26

Update lm_eval/api/model.py

63564e7

fix linter

acff950

fix format

5c17420

fix format

b481947

fix format

63d58f7

haileyschoelkopf changed the title ~~Loglikelihood refactor attempt 2 using template lm~~ Add TemplateLM boilerplate LM class Feb 22, 2024

haileyschoelkopf merged commit ba5cdf0 into EleutherAI:main Feb 22, 2024
8 checks passed

anjor deleted the anjor/loglikelihood-refactor-2 branch February 23, 2024 22:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add TemplateLM boilerplate LM class #1279

Add TemplateLM boilerplate LM class #1279

anjor commented Jan 13, 2024 •

edited

Loading

haileyschoelkopf left a comment

anjor commented Jan 25, 2024

anjor commented Feb 16, 2024

haileyschoelkopf commented Feb 19, 2024

haileyschoelkopf left a comment

LSinev commented Feb 22, 2024 •

edited

Loading

haileyschoelkopf commented Feb 22, 2024

LSinev commented Feb 23, 2024

haileyschoelkopf commented Feb 23, 2024

Add TemplateLM boilerplate LM class #1279

Add TemplateLM boilerplate LM class #1279

Conversation

anjor commented Jan 13, 2024 • edited Loading

haileyschoelkopf left a comment

Choose a reason for hiding this comment

anjor commented Jan 25, 2024

anjor commented Feb 16, 2024

haileyschoelkopf commented Feb 19, 2024

haileyschoelkopf left a comment

Choose a reason for hiding this comment

LSinev commented Feb 22, 2024 • edited Loading

haileyschoelkopf commented Feb 22, 2024

LSinev commented Feb 23, 2024

haileyschoelkopf commented Feb 23, 2024

anjor commented Jan 13, 2024 •

edited

Loading

LSinev commented Feb 22, 2024 •

edited

Loading