Add `local-completions` support using OpenAI interface #1277

mgoin · 2024-01-12T16:50:44Z

Since ChatCompletions doesn't allow for loglikelihood support, adding a locally hosted OAI-like server interface for Completions will allow for an easier interface to get loglikelihood for open-source models

haileyschoelkopf

Thanks very much for the PR!

Would you be willing to make the following changes:

Add a tokenizer_backend kwarg, as existed in https://github.com/EleutherAI/lm-evaluation-harness/pull/1186/files for ChatCompletions ? This would allow for HF tokenizers to be used for this model.
Run the linters

If so, then I think we're all set!

mgoin · 2024-01-12T20:43:08Z

Thanks for the fast review @haileyschoelkopf ! I also added a tokenizer input to specify the transformers model_id to use for the tokenizer if the model name doesn't match a transformers model_id exactly. Maybe we could remove tokenizer_backend and just use transformers if a tokenizer is defined? I'll share an example of how I tested this with deepsparse

Server:

deepsparse.server --integration openai --task text-generation --model_path hf:mgoin/llama-2-7b-gsm8k-pruned60-quant-ds

Client lm-eval:

lm_eval --model local-completions --model_args base_url=http://localhost:5543/v1,model=hf:mgoin/llama-2-7b-gsm8k-pruned60-quant-ds,tokenizer_backend=huggingface,tokenizer=mgoin/llama-2-7b-gsm8k-pruned60-quant-ds --tasks gsm8k --num_fewshot 0

haileyschoelkopf · 2024-01-13T14:13:53Z

Maybe we could remove tokenizer_backend and just use transformers if a tokenizer is defined?

Sounds good to me! We may want to add a check or log message warning people in the event they pass base_url but not tokenizer to initialization though, or a log message saying what tokenizer name + backend type are being used.

You may need to copy this handling in for the HF tokenizer case though:

https://github.com/EleutherAI/lm-evaluation-harness/blob/89618bf8421d27c8cf28004d616b33fc5b305ceb/lm_eval/models/huggingface.py#L228C9-L239C76

As some models don't have an EOS token set (and some, like Qwen, have custom tokenizer code that may not allow the addition of new special tokens.)

haileyschoelkopf · 2024-01-13T14:17:52Z

Also cc @veekaybee @gmottajr this PR updates local-completions

@mgoin does Deepsparse's OpenAI server support echo=True for completions logits? If so, will it continue to do so?

mgoin · 2024-01-16T20:07:24Z

Hey @haileyschoelkopf I'll add in the tokenizer messaging today. I don't think we've implemented echo yet, and I now see it's required, but I think we should definitely do it. Let me get back to you on time to implement

haileyschoelkopf · 2024-01-22T19:15:46Z

Thanks again for the contribution @mgoin !

Let me know if you get a chance to check out echo=True soon.

mgoin · 2024-01-23T15:56:37Z

Hey @haileyschoelkopf sorry for this falling off my radar, I had the changes but forgot to push 😅 thanks for finishing it!

I think for the server, rather than using the openai-style interface we'll actually just submit native local model backends for deepsparse and sparseml. Trying to stabilize the interface for sparseml to submit my fork next week

* Add `local-completions` support using OpenAI interface * Refactor oa_completion * Address tokenizer comments and change request chunks to batch size * Add warning message for tiktoken backend * fix formatting * fix whitespace * Update README.md --------- Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

haileyschoelkopf · 2024-01-24T15:27:57Z

No problem!

I think for the server, rather than using the openai-style interface we'll actually just submit native local model backends for deepsparse and sparseml. Trying to stabilize the interface for sparseml to submit my fork next week

fantastic! Hopefully these can be used as templates for other local server LMs, in the event that using the OpenAI style interface blocks echo=True.

* Add `local-completions` support using OpenAI interface * Refactor oa_completion * Address tokenizer comments and change request chunks to batch size * Add warning message for tiktoken backend * fix formatting * fix whitespace * Update README.md --------- Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

@StellaAthena

* loglikelihood refactor using template lm * linter * fix whitespace in target + prompt for CoT gsm8k (#1275) * Make `parallelize=True` vs. `accelerate launch` distinction clearer in docs (#1261) * Make parallelize=True distinction clearer in documentation. * run linter * Allow parameter edits for registered tasks when listed in a benchmark (#1273) * benchmark yamls allow minor edits of already registered tasks * add documentation * removed print * Fix data-parallel evaluation with quantized models (#1270) * add WIP device_map overrides * update handling outside of accelerate launcher * change .to(device) log to debug level * run linter * Rework documentation for explaining local dataset (#1284) * rewor documentation for explaining local dataset * fix typo * Update new_task_guide.md * Re-add citation It looks like Google Scholar has [already noticed](https://scholar.google.com/scholar?hl=en&as_sdt=0%2C9&authuser=2&q=%22A+framework+for+few-shot+language+model+evaluation%2C+12+2023%22&btnG=) the updated citation block so let's add it back in. * Update CITATION.bib (#1285) Bumping CITATION.bib to match re-adding the citation in readme. cc @StellaAthena * Update nq_open.yaml (#1289) * Update README.md with custom integration doc (#1298) * Update README.md * punctuation --------- Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com> * Update nq_open.yaml (#1305) * Update nq_open.yaml change regex * Bump NQ version --------- Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com> * Update task_guide.md (#1306) * Update pyproject.toml (#1312) * Fix polemo2_in.yaml config name (#1313) * Update pyproject.toml (#1314) * Fix group register (#1315) * tuple should be considered as well * set option to keep callable as callable * Update task_guide.md (#1316) * Update polemo2_in.yaml (#1318) * don't pass extra kwargs to mamba any more (#1328) * Fix Issue regarding stderr (#1327) * add fix fordeciding if stderr is N/A or not * process N/A * Add `local-completions` support using OpenAI interface (#1277) * Add `local-completions` support using OpenAI interface * Refactor oa_completion * Address tokenizer comments and change request chunks to batch size * Add warning message for tiktoken backend * fix formatting * fix whitespace * Update README.md --------- Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com> * fallback to classname when LM doesnt have config (#1334) * fix a trailing whitespace that breaks a lint job (#1335) * skip "benchmarks" in changed_tasks (#1336) * Update migrated HF dataset paths (#1332) * Update arc_easy.yaml * Update flan_cot.yaml * update HF dataset path * Update freeform.yaml * Update flan_cot.yaml --------- Co-authored-by: Lintang Sutawika <lintang@eleuther.ai> * Don't use `get_task_dict()` in task registration / initialization (#1331) * don't use get_task_dict() as a helper, it will download the dataset! * pre-commit * Update README.md --------- Co-authored-by: lintangsutawika <lintang@eleuther.ai> * manage default (greedy) gen_kwargs in vllm (#1341) * manage default (greedy) gen_kwargs in vllm better * mirror HF `do_sample` * just need to set temp=0 for greedy * modified default gen_kwargs to work better with CLI; changed prompt_logprobs=1 (#1345) * update links to task_guide.md (#1348) * `Filter` docs not offset by `doc_id` (#1349) * get `doc` from instance * acceletate bugfix: get ground doc from instance * convert filter to `process_result` * get docs from instances in `FilterEnsemble` * rename * nit * better looping * fix typehint * Add FAQ on `lm_eval.tasks.initialize_tasks()` to README (#1330) * Update README.md * [!Tip] * Refix issue regarding stderr (#1357) * Add causalLM OpenVino models (#1290) * added intel optimum * added intel optimum in readme * modified intel optimum * modified intel optimum * modified intel optimum * modified install optimum * modified path of IR file * added openvino_device * added openvino_device2 * changed optimum-causal to openvino-causal * Update README.md * Update README.md * remove `lm_eval.base` import * update openvino-causal -> openvino ; pass device through super().__init__() * Update README.md * Add optimum to tests dependencies * apply pre-commit * fix so tests pass --------- Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com> Co-authored-by: haileyschoelkopf <hailey@eleuther.ai> * Apply some best practices and guideline recommendations to code (#1363) * raise Exception, not a string Additional info https://peps.python.org/pep-0352/#exception-hierarchy-changes https://docs.python.org/3.8/tutorial/errors.html#raising-exceptions * Apply PEP8 recommendation to prefer isinstance "Object type comparisons should always use isinstance() instead of comparing types directly" https://peps.python.org/pep-0008/ * Remove dangerous default mutable values in arguments https://pylint.readthedocs.io/en/stable/user_guide/messages/warning/dangerous-default-value.html * Format logging messages with fstring (not with format) Additional info https://pylint.readthedocs.io/en/stable/user_guide/messages/warning/logging-format-interpolation.html There are also discussions about the speed of formatting while logging or some unintended code executions pylint-dev/pylint#2395 https://stackoverflow.com/a/54368109 but at least one format (fstring one) will be used throughout the project * Specify utf-8 encoding for `open` explicitly If not specified, it may be supposed differently in different environments, OSes, and Python versions. See https://peps.python.org/pep-0597/ https://docs.python.org/3.11/library/locale.html#locale.getencoding https://docs.python.org/3.10/library/os.html#utf8-mode https://pylint.readthedocs.io/en/stable/user_guide/messages/warning/unspecified-encoding.html Helps also if some code from English language tasks is taken as inspiration for tasks in non-English languages. * Use inline-ignoring comments to pass pre-commit instead of identity process https://flake8.pycqa.org/en/3.0.1/user/ignoring-errors.html#in-line-ignoring-errors https://www.flake8rules.com/rules/F841.html flake8 comments are supported by ruff: https://docs.astral.sh/ruff/linter/#error-suppression * serialize callable functions in config (#1367) * delay filter init; remove `*args` (#1369) * delay filter init; remove `*args` * bugfix * optimize * type hint * Fix unintuitive `--gen_kwargs` behavior (#1329) * don't override do_sample if no value for it is passed * Update gen_kwargs override condition * Update huggingface.py * Update huggingface.py * run linters * silence an erroneous warning * Publish to pypi (#1194) * publish to pypi * lint * Update publish.yml * minor * Make dependencies compatible with PyPI (#1378) * make deps not point to github urls * formatting * try making PyPI only run on tag pushes * Add support for RWKV models with World tokenizer (#1374) * Add support for RWKV models with World tokenizer The RWKV line of model with the World tokenizer, does not allow the padding token to be configured, and has its value preset as 0 This however fails all the "if set" checks, and would cause the tokenizer to crash. A tokenizer class name check was added, in addition to a model type check, as there exists RWKV models which uses the neox tokenizers * Update huggingface.py Genericized so that this supports any RWKVWorld tokenizer, and added a fall-back for if the HF implementation name changes. * Comply with formatting guidelines * fix format --------- Co-authored-by: Stella Biderman <stellabiderman@gmail.com> Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com> * add bypass metric (#1156) * add bypass metric * fixed `bypass` metric. * add task attributes if predict_only * add `predict_only` checks * add docs * added `overide_metric`, `override_config` to `Task` * nits * nit * changed --predict_only to generations; nits * nits * nits * change gen_kwargs warning * add note about `--predict_only` in README.md * added `predict_only` * move table to bottom * nit * change null aggregation to bypass (conflict) * bugfix; default `temp=0.0` * typo * loglikelihood refactor using template lm * lint * code review * neuron optimum * Mention TemplateLM in model_guide.md * Update lm_eval/api/model.py * fix linter * fix format * fix format * fix format --------- Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com> Co-authored-by: Lintang Sutawika <lintang@eleuther.ai> Co-authored-by: Stella Biderman <stellabiderman@gmail.com> Co-authored-by: Mark Saroufim <marksaroufim@meta.com> Co-authored-by: Hannibal046 <38466901+Hannibal046@users.noreply.github.com> Co-authored-by: Danielle Pintz <38207072+daniellepintz@users.noreply.github.com> Co-authored-by: Quentin Lhoest <42851186+lhoestq@users.noreply.github.com> Co-authored-by: kwrobel.eth <djstrong@gmail.com> Co-authored-by: Michael Goin <michael@neuralmagic.com> Co-authored-by: Brian Vaughan <nairbv@users.noreply.github.com> Co-authored-by: Baber Abbasi <92168766+baberabb@users.noreply.github.com> Co-authored-by: thnkinbtfly <70014488+thnkinbtfly@users.noreply.github.com> Co-authored-by: NoushNabi <33136068+NoushNabi@users.noreply.github.com> Co-authored-by: haileyschoelkopf <hailey@eleuther.ai> Co-authored-by: LSinev <LSinev@users.noreply.github.com> Co-authored-by: Eugene Cheah <PicoCreator@users.noreply.github.com>

* Add `local-completions` support using OpenAI interface * Refactor oa_completion * Address tokenizer comments and change request chunks to batch size * Add warning message for tiktoken backend * fix formatting * fix whitespace * Update README.md --------- Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

@StellaAthena

* loglikelihood refactor using template lm * linter * fix whitespace in target + prompt for CoT gsm8k (EleutherAI#1275) * Make `parallelize=True` vs. `accelerate launch` distinction clearer in docs (EleutherAI#1261) * Make parallelize=True distinction clearer in documentation. * run linter * Allow parameter edits for registered tasks when listed in a benchmark (EleutherAI#1273) * benchmark yamls allow minor edits of already registered tasks * add documentation * removed print * Fix data-parallel evaluation with quantized models (EleutherAI#1270) * add WIP device_map overrides * update handling outside of accelerate launcher * change .to(device) log to debug level * run linter * Rework documentation for explaining local dataset (EleutherAI#1284) * rewor documentation for explaining local dataset * fix typo * Update new_task_guide.md * Re-add citation It looks like Google Scholar has [already noticed](https://scholar.google.com/scholar?hl=en&as_sdt=0%2C9&authuser=2&q=%22A+framework+for+few-shot+language+model+evaluation%2C+12+2023%22&btnG=) the updated citation block so let's add it back in. * Update CITATION.bib (EleutherAI#1285) Bumping CITATION.bib to match re-adding the citation in readme. cc @StellaAthena * Update nq_open.yaml (EleutherAI#1289) * Update README.md with custom integration doc (EleutherAI#1298) * Update README.md * punctuation --------- Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com> * Update nq_open.yaml (EleutherAI#1305) * Update nq_open.yaml change regex * Bump NQ version --------- Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com> * Update task_guide.md (EleutherAI#1306) * Update pyproject.toml (EleutherAI#1312) * Fix polemo2_in.yaml config name (EleutherAI#1313) * Update pyproject.toml (EleutherAI#1314) * Fix group register (EleutherAI#1315) * tuple should be considered as well * set option to keep callable as callable * Update task_guide.md (EleutherAI#1316) * Update polemo2_in.yaml (EleutherAI#1318) * don't pass extra kwargs to mamba any more (EleutherAI#1328) * Fix Issue regarding stderr (EleutherAI#1327) * add fix fordeciding if stderr is N/A or not * process N/A * Add `local-completions` support using OpenAI interface (EleutherAI#1277) * Add `local-completions` support using OpenAI interface * Refactor oa_completion * Address tokenizer comments and change request chunks to batch size * Add warning message for tiktoken backend * fix formatting * fix whitespace * Update README.md --------- Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com> * fallback to classname when LM doesnt have config (EleutherAI#1334) * fix a trailing whitespace that breaks a lint job (EleutherAI#1335) * skip "benchmarks" in changed_tasks (EleutherAI#1336) * Update migrated HF dataset paths (EleutherAI#1332) * Update arc_easy.yaml * Update flan_cot.yaml * update HF dataset path * Update freeform.yaml * Update flan_cot.yaml --------- Co-authored-by: Lintang Sutawika <lintang@eleuther.ai> * Don't use `get_task_dict()` in task registration / initialization (EleutherAI#1331) * don't use get_task_dict() as a helper, it will download the dataset! * pre-commit * Update README.md --------- Co-authored-by: lintangsutawika <lintang@eleuther.ai> * manage default (greedy) gen_kwargs in vllm (EleutherAI#1341) * manage default (greedy) gen_kwargs in vllm better * mirror HF `do_sample` * just need to set temp=0 for greedy * modified default gen_kwargs to work better with CLI; changed prompt_logprobs=1 (EleutherAI#1345) * update links to task_guide.md (EleutherAI#1348) * `Filter` docs not offset by `doc_id` (EleutherAI#1349) * get `doc` from instance * acceletate bugfix: get ground doc from instance * convert filter to `process_result` * get docs from instances in `FilterEnsemble` * rename * nit * better looping * fix typehint * Add FAQ on `lm_eval.tasks.initialize_tasks()` to README (EleutherAI#1330) * Update README.md * [!Tip] * Refix issue regarding stderr (EleutherAI#1357) * Add causalLM OpenVino models (EleutherAI#1290) * added intel optimum * added intel optimum in readme * modified intel optimum * modified intel optimum * modified intel optimum * modified install optimum * modified path of IR file * added openvino_device * added openvino_device2 * changed optimum-causal to openvino-causal * Update README.md * Update README.md * remove `lm_eval.base` import * update openvino-causal -> openvino ; pass device through super().__init__() * Update README.md * Add optimum to tests dependencies * apply pre-commit * fix so tests pass --------- Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com> Co-authored-by: haileyschoelkopf <hailey@eleuther.ai> * Apply some best practices and guideline recommendations to code (EleutherAI#1363) * raise Exception, not a string Additional info https://peps.python.org/pep-0352/#exception-hierarchy-changes https://docs.python.org/3.8/tutorial/errors.html#raising-exceptions * Apply PEP8 recommendation to prefer isinstance "Object type comparisons should always use isinstance() instead of comparing types directly" https://peps.python.org/pep-0008/ * Remove dangerous default mutable values in arguments https://pylint.readthedocs.io/en/stable/user_guide/messages/warning/dangerous-default-value.html * Format logging messages with fstring (not with format) Additional info https://pylint.readthedocs.io/en/stable/user_guide/messages/warning/logging-format-interpolation.html There are also discussions about the speed of formatting while logging or some unintended code executions pylint-dev/pylint#2395 https://stackoverflow.com/a/54368109 but at least one format (fstring one) will be used throughout the project * Specify utf-8 encoding for `open` explicitly If not specified, it may be supposed differently in different environments, OSes, and Python versions. See https://peps.python.org/pep-0597/ https://docs.python.org/3.11/library/locale.html#locale.getencoding https://docs.python.org/3.10/library/os.html#utf8-mode https://pylint.readthedocs.io/en/stable/user_guide/messages/warning/unspecified-encoding.html Helps also if some code from English language tasks is taken as inspiration for tasks in non-English languages. * Use inline-ignoring comments to pass pre-commit instead of identity process https://flake8.pycqa.org/en/3.0.1/user/ignoring-errors.html#in-line-ignoring-errors https://www.flake8rules.com/rules/F841.html flake8 comments are supported by ruff: https://docs.astral.sh/ruff/linter/#error-suppression * serialize callable functions in config (EleutherAI#1367) * delay filter init; remove `*args` (EleutherAI#1369) * delay filter init; remove `*args` * bugfix * optimize * type hint * Fix unintuitive `--gen_kwargs` behavior (EleutherAI#1329) * don't override do_sample if no value for it is passed * Update gen_kwargs override condition * Update huggingface.py * Update huggingface.py * run linters * silence an erroneous warning * Publish to pypi (EleutherAI#1194) * publish to pypi * lint * Update publish.yml * minor * Make dependencies compatible with PyPI (EleutherAI#1378) * make deps not point to github urls * formatting * try making PyPI only run on tag pushes * Add support for RWKV models with World tokenizer (EleutherAI#1374) * Add support for RWKV models with World tokenizer The RWKV line of model with the World tokenizer, does not allow the padding token to be configured, and has its value preset as 0 This however fails all the "if set" checks, and would cause the tokenizer to crash. A tokenizer class name check was added, in addition to a model type check, as there exists RWKV models which uses the neox tokenizers * Update huggingface.py Genericized so that this supports any RWKVWorld tokenizer, and added a fall-back for if the HF implementation name changes. * Comply with formatting guidelines * fix format --------- Co-authored-by: Stella Biderman <stellabiderman@gmail.com> Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com> * add bypass metric (EleutherAI#1156) * add bypass metric * fixed `bypass` metric. * add task attributes if predict_only * add `predict_only` checks * add docs * added `overide_metric`, `override_config` to `Task` * nits * nit * changed --predict_only to generations; nits * nits * nits * change gen_kwargs warning * add note about `--predict_only` in README.md * added `predict_only` * move table to bottom * nit * change null aggregation to bypass (conflict) * bugfix; default `temp=0.0` * typo * loglikelihood refactor using template lm * lint * code review * neuron optimum * Mention TemplateLM in model_guide.md * Update lm_eval/api/model.py * fix linter * fix format * fix format * fix format --------- Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com> Co-authored-by: Lintang Sutawika <lintang@eleuther.ai> Co-authored-by: Stella Biderman <stellabiderman@gmail.com> Co-authored-by: Mark Saroufim <marksaroufim@meta.com> Co-authored-by: Hannibal046 <38466901+Hannibal046@users.noreply.github.com> Co-authored-by: Danielle Pintz <38207072+daniellepintz@users.noreply.github.com> Co-authored-by: Quentin Lhoest <42851186+lhoestq@users.noreply.github.com> Co-authored-by: kwrobel.eth <djstrong@gmail.com> Co-authored-by: Michael Goin <michael@neuralmagic.com> Co-authored-by: Brian Vaughan <nairbv@users.noreply.github.com> Co-authored-by: Baber Abbasi <92168766+baberabb@users.noreply.github.com> Co-authored-by: thnkinbtfly <70014488+thnkinbtfly@users.noreply.github.com> Co-authored-by: NoushNabi <33136068+NoushNabi@users.noreply.github.com> Co-authored-by: haileyschoelkopf <hailey@eleuther.ai> Co-authored-by: LSinev <LSinev@users.noreply.github.com> Co-authored-by: Eugene Cheah <PicoCreator@users.noreply.github.com>

@StellaAthena

* loglikelihood refactor using template lm * linter * fix whitespace in target + prompt for CoT gsm8k (EleutherAI#1275) * Make `parallelize=True` vs. `accelerate launch` distinction clearer in docs (EleutherAI#1261) * Make parallelize=True distinction clearer in documentation. * run linter * Allow parameter edits for registered tasks when listed in a benchmark (EleutherAI#1273) * benchmark yamls allow minor edits of already registered tasks * add documentation * removed print * Fix data-parallel evaluation with quantized models (EleutherAI#1270) * add WIP device_map overrides * update handling outside of accelerate launcher * change .to(device) log to debug level * run linter * Rework documentation for explaining local dataset (EleutherAI#1284) * rewor documentation for explaining local dataset * fix typo * Update new_task_guide.md * Re-add citation It looks like Google Scholar has [already noticed](https://scholar.google.com/scholar?hl=en&as_sdt=0%2C9&authuser=2&q=%22A+framework+for+few-shot+language+model+evaluation%2C+12+2023%22&btnG=) the updated citation block so let's add it back in. * Update CITATION.bib (EleutherAI#1285) Bumping CITATION.bib to match re-adding the citation in readme. cc @StellaAthena * Update nq_open.yaml (EleutherAI#1289) * Update README.md with custom integration doc (EleutherAI#1298) * Update README.md * punctuation --------- Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com> * Update nq_open.yaml (EleutherAI#1305) * Update nq_open.yaml change regex * Bump NQ version --------- Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com> * Update task_guide.md (EleutherAI#1306) * Update pyproject.toml (EleutherAI#1312) * Fix polemo2_in.yaml config name (EleutherAI#1313) * Update pyproject.toml (EleutherAI#1314) * Fix group register (EleutherAI#1315) * tuple should be considered as well * set option to keep callable as callable * Update task_guide.md (EleutherAI#1316) * Update polemo2_in.yaml (EleutherAI#1318) * don't pass extra kwargs to mamba any more (EleutherAI#1328) * Fix Issue regarding stderr (EleutherAI#1327) * add fix fordeciding if stderr is N/A or not * process N/A * Add `local-completions` support using OpenAI interface (EleutherAI#1277) * Add `local-completions` support using OpenAI interface * Refactor oa_completion * Address tokenizer comments and change request chunks to batch size * Add warning message for tiktoken backend * fix formatting * fix whitespace * Update README.md --------- Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com> * fallback to classname when LM doesnt have config (EleutherAI#1334) * fix a trailing whitespace that breaks a lint job (EleutherAI#1335) * skip "benchmarks" in changed_tasks (EleutherAI#1336) * Update migrated HF dataset paths (EleutherAI#1332) * Update arc_easy.yaml * Update flan_cot.yaml * update HF dataset path * Update freeform.yaml * Update flan_cot.yaml --------- Co-authored-by: Lintang Sutawika <lintang@eleuther.ai> * Don't use `get_task_dict()` in task registration / initialization (EleutherAI#1331) * don't use get_task_dict() as a helper, it will download the dataset! * pre-commit * Update README.md --------- Co-authored-by: lintangsutawika <lintang@eleuther.ai> * manage default (greedy) gen_kwargs in vllm (EleutherAI#1341) * manage default (greedy) gen_kwargs in vllm better * mirror HF `do_sample` * just need to set temp=0 for greedy * modified default gen_kwargs to work better with CLI; changed prompt_logprobs=1 (EleutherAI#1345) * update links to task_guide.md (EleutherAI#1348) * `Filter` docs not offset by `doc_id` (EleutherAI#1349) * get `doc` from instance * acceletate bugfix: get ground doc from instance * convert filter to `process_result` * get docs from instances in `FilterEnsemble` * rename * nit * better looping * fix typehint * Add FAQ on `lm_eval.tasks.initialize_tasks()` to README (EleutherAI#1330) * Update README.md * [!Tip] * Refix issue regarding stderr (EleutherAI#1357) * Add causalLM OpenVino models (EleutherAI#1290) * added intel optimum * added intel optimum in readme * modified intel optimum * modified intel optimum * modified intel optimum * modified install optimum * modified path of IR file * added openvino_device * added openvino_device2 * changed optimum-causal to openvino-causal * Update README.md * Update README.md * remove `lm_eval.base` import * update openvino-causal -> openvino ; pass device through super().__init__() * Update README.md * Add optimum to tests dependencies * apply pre-commit * fix so tests pass --------- Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com> Co-authored-by: haileyschoelkopf <hailey@eleuther.ai> * Apply some best practices and guideline recommendations to code (EleutherAI#1363) * raise Exception, not a string Additional info https://peps.python.org/pep-0352/#exception-hierarchy-changes https://docs.python.org/3.8/tutorial/errors.html#raising-exceptions * Apply PEP8 recommendation to prefer isinstance "Object type comparisons should always use isinstance() instead of comparing types directly" https://peps.python.org/pep-0008/ * Remove dangerous default mutable values in arguments https://pylint.readthedocs.io/en/stable/user_guide/messages/warning/dangerous-default-value.html * Format logging messages with fstring (not with format) Additional info https://pylint.readthedocs.io/en/stable/user_guide/messages/warning/logging-format-interpolation.html There are also discussions about the speed of formatting while logging or some unintended code executions pylint-dev/pylint#2395 https://stackoverflow.com/a/54368109 but at least one format (fstring one) will be used throughout the project * Specify utf-8 encoding for `open` explicitly If not specified, it may be supposed differently in different environments, OSes, and Python versions. See https://peps.python.org/pep-0597/ https://docs.python.org/3.11/library/locale.html#locale.getencoding https://docs.python.org/3.10/library/os.html#utf8-mode https://pylint.readthedocs.io/en/stable/user_guide/messages/warning/unspecified-encoding.html Helps also if some code from English language tasks is taken as inspiration for tasks in non-English languages. * Use inline-ignoring comments to pass pre-commit instead of identity process https://flake8.pycqa.org/en/3.0.1/user/ignoring-errors.html#in-line-ignoring-errors https://www.flake8rules.com/rules/F841.html flake8 comments are supported by ruff: https://docs.astral.sh/ruff/linter/#error-suppression * serialize callable functions in config (EleutherAI#1367) * delay filter init; remove `*args` (EleutherAI#1369) * delay filter init; remove `*args` * bugfix * optimize * type hint * Fix unintuitive `--gen_kwargs` behavior (EleutherAI#1329) * don't override do_sample if no value for it is passed * Update gen_kwargs override condition * Update huggingface.py * Update huggingface.py * run linters * silence an erroneous warning * Publish to pypi (EleutherAI#1194) * publish to pypi * lint * Update publish.yml * minor * Make dependencies compatible with PyPI (EleutherAI#1378) * make deps not point to github urls * formatting * try making PyPI only run on tag pushes * Add support for RWKV models with World tokenizer (EleutherAI#1374) * Add support for RWKV models with World tokenizer The RWKV line of model with the World tokenizer, does not allow the padding token to be configured, and has its value preset as 0 This however fails all the "if set" checks, and would cause the tokenizer to crash. A tokenizer class name check was added, in addition to a model type check, as there exists RWKV models which uses the neox tokenizers * Update huggingface.py Genericized so that this supports any RWKVWorld tokenizer, and added a fall-back for if the HF implementation name changes. * Comply with formatting guidelines * fix format --------- Co-authored-by: Stella Biderman <stellabiderman@gmail.com> Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com> * add bypass metric (EleutherAI#1156) * add bypass metric * fixed `bypass` metric. * add task attributes if predict_only * add `predict_only` checks * add docs * added `overide_metric`, `override_config` to `Task` * nits * nit * changed --predict_only to generations; nits * nits * nits * change gen_kwargs warning * add note about `--predict_only` in README.md * added `predict_only` * move table to bottom * nit * change null aggregation to bypass (conflict) * bugfix; default `temp=0.0` * typo * loglikelihood refactor using template lm * lint * code review * neuron optimum * Mention TemplateLM in model_guide.md * Update lm_eval/api/model.py * fix linter * fix format * fix format * fix format --------- Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com> Co-authored-by: Lintang Sutawika <lintang@eleuther.ai> Co-authored-by: Stella Biderman <stellabiderman@gmail.com> Co-authored-by: Mark Saroufim <marksaroufim@meta.com> Co-authored-by: Hannibal046 <38466901+Hannibal046@users.noreply.github.com> Co-authored-by: Danielle Pintz <38207072+daniellepintz@users.noreply.github.com> Co-authored-by: Quentin Lhoest <42851186+lhoestq@users.noreply.github.com> Co-authored-by: kwrobel.eth <djstrong@gmail.com> Co-authored-by: Michael Goin <michael@neuralmagic.com> Co-authored-by: Brian Vaughan <nairbv@users.noreply.github.com> Co-authored-by: Baber Abbasi <92168766+baberabb@users.noreply.github.com> Co-authored-by: thnkinbtfly <70014488+thnkinbtfly@users.noreply.github.com> Co-authored-by: NoushNabi <33136068+NoushNabi@users.noreply.github.com> Co-authored-by: haileyschoelkopf <hailey@eleuther.ai> Co-authored-by: LSinev <LSinev@users.noreply.github.com> Co-authored-by: Eugene Cheah <PicoCreator@users.noreply.github.com>

* Add `local-completions` support using OpenAI interface * Refactor oa_completion * Address tokenizer comments and change request chunks to batch size * Add warning message for tiktoken backend * fix formatting * fix whitespace * Update README.md --------- Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

@StellaAthena

* loglikelihood refactor using template lm * linter * fix whitespace in target + prompt for CoT gsm8k (EleutherAI#1275) * Make `parallelize=True` vs. `accelerate launch` distinction clearer in docs (EleutherAI#1261) * Make parallelize=True distinction clearer in documentation. * run linter * Allow parameter edits for registered tasks when listed in a benchmark (EleutherAI#1273) * benchmark yamls allow minor edits of already registered tasks * add documentation * removed print * Fix data-parallel evaluation with quantized models (EleutherAI#1270) * add WIP device_map overrides * update handling outside of accelerate launcher * change .to(device) log to debug level * run linter * Rework documentation for explaining local dataset (EleutherAI#1284) * rewor documentation for explaining local dataset * fix typo * Update new_task_guide.md * Re-add citation It looks like Google Scholar has [already noticed](https://scholar.google.com/scholar?hl=en&as_sdt=0%2C9&authuser=2&q=%22A+framework+for+few-shot+language+model+evaluation%2C+12+2023%22&btnG=) the updated citation block so let's add it back in. * Update CITATION.bib (EleutherAI#1285) Bumping CITATION.bib to match re-adding the citation in readme. cc @StellaAthena * Update nq_open.yaml (EleutherAI#1289) * Update README.md with custom integration doc (EleutherAI#1298) * Update README.md * punctuation --------- Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com> * Update nq_open.yaml (EleutherAI#1305) * Update nq_open.yaml change regex * Bump NQ version --------- Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com> * Update task_guide.md (EleutherAI#1306) * Update pyproject.toml (EleutherAI#1312) * Fix polemo2_in.yaml config name (EleutherAI#1313) * Update pyproject.toml (EleutherAI#1314) * Fix group register (EleutherAI#1315) * tuple should be considered as well * set option to keep callable as callable * Update task_guide.md (EleutherAI#1316) * Update polemo2_in.yaml (EleutherAI#1318) * don't pass extra kwargs to mamba any more (EleutherAI#1328) * Fix Issue regarding stderr (EleutherAI#1327) * add fix fordeciding if stderr is N/A or not * process N/A * Add `local-completions` support using OpenAI interface (EleutherAI#1277) * Add `local-completions` support using OpenAI interface * Refactor oa_completion * Address tokenizer comments and change request chunks to batch size * Add warning message for tiktoken backend * fix formatting * fix whitespace * Update README.md --------- Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com> * fallback to classname when LM doesnt have config (EleutherAI#1334) * fix a trailing whitespace that breaks a lint job (EleutherAI#1335) * skip "benchmarks" in changed_tasks (EleutherAI#1336) * Update migrated HF dataset paths (EleutherAI#1332) * Update arc_easy.yaml * Update flan_cot.yaml * update HF dataset path * Update freeform.yaml * Update flan_cot.yaml --------- Co-authored-by: Lintang Sutawika <lintang@eleuther.ai> * Don't use `get_task_dict()` in task registration / initialization (EleutherAI#1331) * don't use get_task_dict() as a helper, it will download the dataset! * pre-commit * Update README.md --------- Co-authored-by: lintangsutawika <lintang@eleuther.ai> * manage default (greedy) gen_kwargs in vllm (EleutherAI#1341) * manage default (greedy) gen_kwargs in vllm better * mirror HF `do_sample` * just need to set temp=0 for greedy * modified default gen_kwargs to work better with CLI; changed prompt_logprobs=1 (EleutherAI#1345) * update links to task_guide.md (EleutherAI#1348) * `Filter` docs not offset by `doc_id` (EleutherAI#1349) * get `doc` from instance * acceletate bugfix: get ground doc from instance * convert filter to `process_result` * get docs from instances in `FilterEnsemble` * rename * nit * better looping * fix typehint * Add FAQ on `lm_eval.tasks.initialize_tasks()` to README (EleutherAI#1330) * Update README.md * [!Tip] * Refix issue regarding stderr (EleutherAI#1357) * Add causalLM OpenVino models (EleutherAI#1290) * added intel optimum * added intel optimum in readme * modified intel optimum * modified intel optimum * modified intel optimum * modified install optimum * modified path of IR file * added openvino_device * added openvino_device2 * changed optimum-causal to openvino-causal * Update README.md * Update README.md * remove `lm_eval.base` import * update openvino-causal -> openvino ; pass device through super().__init__() * Update README.md * Add optimum to tests dependencies * apply pre-commit * fix so tests pass --------- Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com> Co-authored-by: haileyschoelkopf <hailey@eleuther.ai> * Apply some best practices and guideline recommendations to code (EleutherAI#1363) * raise Exception, not a string Additional info https://peps.python.org/pep-0352/#exception-hierarchy-changes https://docs.python.org/3.8/tutorial/errors.html#raising-exceptions * Apply PEP8 recommendation to prefer isinstance "Object type comparisons should always use isinstance() instead of comparing types directly" https://peps.python.org/pep-0008/ * Remove dangerous default mutable values in arguments https://pylint.readthedocs.io/en/stable/user_guide/messages/warning/dangerous-default-value.html * Format logging messages with fstring (not with format) Additional info https://pylint.readthedocs.io/en/stable/user_guide/messages/warning/logging-format-interpolation.html There are also discussions about the speed of formatting while logging or some unintended code executions pylint-dev/pylint#2395 https://stackoverflow.com/a/54368109 but at least one format (fstring one) will be used throughout the project * Specify utf-8 encoding for `open` explicitly If not specified, it may be supposed differently in different environments, OSes, and Python versions. See https://peps.python.org/pep-0597/ https://docs.python.org/3.11/library/locale.html#locale.getencoding https://docs.python.org/3.10/library/os.html#utf8-mode https://pylint.readthedocs.io/en/stable/user_guide/messages/warning/unspecified-encoding.html Helps also if some code from English language tasks is taken as inspiration for tasks in non-English languages. * Use inline-ignoring comments to pass pre-commit instead of identity process https://flake8.pycqa.org/en/3.0.1/user/ignoring-errors.html#in-line-ignoring-errors https://www.flake8rules.com/rules/F841.html flake8 comments are supported by ruff: https://docs.astral.sh/ruff/linter/#error-suppression * serialize callable functions in config (EleutherAI#1367) * delay filter init; remove `*args` (EleutherAI#1369) * delay filter init; remove `*args` * bugfix * optimize * type hint * Fix unintuitive `--gen_kwargs` behavior (EleutherAI#1329) * don't override do_sample if no value for it is passed * Update gen_kwargs override condition * Update huggingface.py * Update huggingface.py * run linters * silence an erroneous warning * Publish to pypi (EleutherAI#1194) * publish to pypi * lint * Update publish.yml * minor * Make dependencies compatible with PyPI (EleutherAI#1378) * make deps not point to github urls * formatting * try making PyPI only run on tag pushes * Add support for RWKV models with World tokenizer (EleutherAI#1374) * Add support for RWKV models with World tokenizer The RWKV line of model with the World tokenizer, does not allow the padding token to be configured, and has its value preset as 0 This however fails all the "if set" checks, and would cause the tokenizer to crash. A tokenizer class name check was added, in addition to a model type check, as there exists RWKV models which uses the neox tokenizers * Update huggingface.py Genericized so that this supports any RWKVWorld tokenizer, and added a fall-back for if the HF implementation name changes. * Comply with formatting guidelines * fix format --------- Co-authored-by: Stella Biderman <stellabiderman@gmail.com> Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com> * add bypass metric (EleutherAI#1156) * add bypass metric * fixed `bypass` metric. * add task attributes if predict_only * add `predict_only` checks * add docs * added `overide_metric`, `override_config` to `Task` * nits * nit * changed --predict_only to generations; nits * nits * nits * change gen_kwargs warning * add note about `--predict_only` in README.md * added `predict_only` * move table to bottom * nit * change null aggregation to bypass (conflict) * bugfix; default `temp=0.0` * typo * loglikelihood refactor using template lm * lint * code review * neuron optimum * Mention TemplateLM in model_guide.md * Update lm_eval/api/model.py * fix linter * fix format * fix format * fix format --------- Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com> Co-authored-by: Lintang Sutawika <lintang@eleuther.ai> Co-authored-by: Stella Biderman <stellabiderman@gmail.com> Co-authored-by: Mark Saroufim <marksaroufim@meta.com> Co-authored-by: Hannibal046 <38466901+Hannibal046@users.noreply.github.com> Co-authored-by: Danielle Pintz <38207072+daniellepintz@users.noreply.github.com> Co-authored-by: Quentin Lhoest <42851186+lhoestq@users.noreply.github.com> Co-authored-by: kwrobel.eth <djstrong@gmail.com> Co-authored-by: Michael Goin <michael@neuralmagic.com> Co-authored-by: Brian Vaughan <nairbv@users.noreply.github.com> Co-authored-by: Baber Abbasi <92168766+baberabb@users.noreply.github.com> Co-authored-by: thnkinbtfly <70014488+thnkinbtfly@users.noreply.github.com> Co-authored-by: NoushNabi <33136068+NoushNabi@users.noreply.github.com> Co-authored-by: haileyschoelkopf <hailey@eleuther.ai> Co-authored-by: LSinev <LSinev@users.noreply.github.com> Co-authored-by: Eugene Cheah <PicoCreator@users.noreply.github.com>

Add local-completions support using OpenAI interface

533c44c

mgoin requested review from haileyschoelkopf and lintangsutawika as code owners January 12, 2024 16:50

Refactor oa_completion

c580d41

haileyschoelkopf requested changes Jan 12, 2024

View reviewed changes

mgoin mentioned this pull request Jan 12, 2024

Add support for tokenized prompt input for CompletionRequest neuralmagic/deepsparse#1526

Merged

Address tokenizer comments and change request chunks to batch size

7843c15

This was referenced Jan 15, 2024

Publish to pypi #1194

Merged

How to evaluate using custom prompt template? #1291

Closed

Add TemplateLM boilerplate LM class #1279

Merged

haileyschoelkopf added 3 commits January 22, 2024 13:48

Add warning message for tiktoken backend

29fa0bc

fix formatting

5782358

fix whitespace

c89b9a0

haileyschoelkopf approved these changes Jan 22, 2024

View reviewed changes

Update README.md

ceb7d10

haileyschoelkopf merged commit 5c25dd5 into EleutherAI:main Jan 22, 2024
3 of 4 checks passed

mgoin deleted the local-completions branch January 23, 2024 15:56

mgoin restored the local-completions branch January 23, 2024 15:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `local-completions` support using OpenAI interface #1277

Add `local-completions` support using OpenAI interface #1277

mgoin commented Jan 12, 2024

haileyschoelkopf left a comment

mgoin commented Jan 12, 2024

haileyschoelkopf commented Jan 13, 2024 •

edited

Loading

haileyschoelkopf commented Jan 13, 2024

mgoin commented Jan 16, 2024 •

edited

Loading

haileyschoelkopf commented Jan 22, 2024

mgoin commented Jan 23, 2024 •

edited

Loading

haileyschoelkopf commented Jan 24, 2024

Add local-completions support using OpenAI interface #1277

Add local-completions support using OpenAI interface #1277

Conversation

mgoin commented Jan 12, 2024

haileyschoelkopf left a comment

Choose a reason for hiding this comment

mgoin commented Jan 12, 2024

haileyschoelkopf commented Jan 13, 2024 • edited Loading

haileyschoelkopf commented Jan 13, 2024

mgoin commented Jan 16, 2024 • edited Loading

haileyschoelkopf commented Jan 22, 2024

mgoin commented Jan 23, 2024 • edited Loading

haileyschoelkopf commented Jan 24, 2024

Add `local-completions` support using OpenAI interface #1277

Add `local-completions` support using OpenAI interface #1277

haileyschoelkopf commented Jan 13, 2024 •

edited

Loading

mgoin commented Jan 16, 2024 •

edited

Loading

mgoin commented Jan 23, 2024 •

edited

Loading