Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Setting trust_remote_code to True for HuggingFace datasets compatibility #1487

Merged
merged 9 commits into from Mar 3, 2024

Conversation

veekaybee
Copy link
Contributor

@veekaybee veekaybee commented Feb 27, 2024

See this issue for context: #1135 (comment)

We'd like to be able to use the latest datasets version, 2.16 in lm-evaluation-harness. We currently use 2.15 .

In order to accommodate this, we'd like to:

  1. add a trust_remote_code dataset kwarg to each dataset that requires it and pass through. See notebook for derivation of which datasets require it.
['EleutherAI/mutual',
 'skt/kobest_v1',
 'EleutherAI/logiqa',
 'bigbio/pubmed_qa',
 'EleutherAI/race',
 'BigScienceBiasEval/crows_pairs_multilingual',
 'baber/logiqa2',
 'EleutherAI/arithmetic',
 'EleutherAI/asdiv',
 'EleutherAI/lambada_openai',
 'EleutherAI/wikitext_document_level',
 'EleutherAI/drop',
 'EleutherAI/hendrycks_math',
 'juletxara/xstory_cloze',
 'allenai/qasper',
 'EleutherAI/pile',
 'EleutherAI/sycophancy',
 'EleutherAI/hendrycks_ethics',
 'EleutherAI/unscramble',
 'EleutherAI/headqa',
 'skg/toxigen-data',
 'EleutherAI/coqa',
 'corypaik/prost']
  1. Include trust_remote_code as True by default in the model constructor, to be overriden by -model_args when evaluating from the command line.

  2. Respect the user's HF_DATASETS_TRUST_REMOTE_CODE if it exists.

Related: #1467

cc @lhoestq fyi

lm_eval/api/task.py Outdated Show resolved Hide resolved
@veekaybee
Copy link
Contributor Author

The tests are failing at the download step and I'm guessing it's because of the environment variable, although this passes locally, so I may have to add conditionals to account for this if we decide those variables are ok where they are

 pytest tests/test_tasks.py 
======================================================================= test session starts =======================================================================
platform darwin -- Python 3.10.0, pytest-7.4.3, pluggy-1.3.0
rootdir: /Users/vicki/lm-evaluation-harness
plugins: anyio-3.7.1
collected 13 items                                                                                                                                                

tests/test_tasks.py .............                                                                                                                           [100%]

======================================================================== warnings summary =========================================================================
../.pyenv/versions/3.10.0/envs/evals/lib/python3.10/site-packages/bitsandbytes/cextension.py:34
  /Users/vicki/.pyenv/versions/3.10.0/envs/evals/lib/python3.10/site-packages/bitsandbytes/cextension.py:34: UserWarning: The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers, 8-bit multiplication, and GPU quantization are unavailable.
    warn("The installed version of bitsandbytes was compiled without GPU support. "

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
================================================================= 13 passed, 1 warning in 25.38s ==================================================================

Copy link
Contributor

@haileyschoelkopf haileyschoelkopf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @veekaybee !

Upon thinking some more, I think actually, let's do the following (slightly differently to how I originally said it):

  • keep trust_remote_code = False by default in model constructor
  • have trust_remote_code: True in the YAMLs as you added but only for datasets under the EleutherAI/ HF org or datasets in the default namespace (no org) , the others should not get it set
  • respect HF_DATASETS_TRUST_REMOTE_CODE env variable when it is set
  • add a --trust_remote_code CLI flag, which, when set will: 1) cause trust_remote_code=True to be added to model_args string / passed to model constructor, 2) set the HF_DATASETS_TRUST_REMOTE_CODE env variable to true in main.py (or maybe in simple_evaluate()? we want running on remote-code requiring datasets to be not too annoying for people not using main.py)

How's this sound? basically, the PR as you've implemented it, but with an extra --trust_remote_code flag in the CLI, and with datasets not in the EAI or HF namespaces requiring this flag. Reasoning being that we'd like to support HF's initiative of making the OSS ecosystem more secure.

lm_eval/api/task.py Outdated Show resolved Hide resolved
lm_eval/api/task.py Outdated Show resolved Hide resolved
@haileyschoelkopf
Copy link
Contributor

E NotADirectoryError: [Errno 20] Not a directory: '/home/runner/.cache/huggingface/datasets/downloads/9d10351eefe83ab9887de1b307f40404b99de9ba10fed427d64faa36ae611778/HEAD_EN/train_HEAD_EN.json'

The tests are failing at the download step and I'm guessing it's because of the environment variable

This looks like a separate error relating to HeadQA!

@veekaybee
Copy link
Contributor Author

@haileyschoelkopf Thank you! All of that makes sense and I think I've addressed all the comments, please take a look at the new implementation of the check for the trust_remote_code environment variable in main, and I'll check to see how tests are doing, as well.

@veekaybee
Copy link
Contributor Author

Getting new testing failures triggering from update of the YAML files I'm assuming, this time from pile_arxiv.yaml:

BuilderConfig 'pile_arxiv' not found. Available: ['all', 'enron_emails', 'europarl', 'free_law', 'hacker_news', 'nih_exporter', 'pubmed', 'pubmed_central', 'ubuntu_irc', 'uspto', 'github']

should the task be set to one of these? https://huggingface.co/datasets/EleutherAI/pile/blob/main/pile.py#L61

@lhoestq
Copy link
Contributor

lhoestq commented Feb 28, 2024

AFAIK the EleutherAI/pile dataset has been taken down on its original host (the-eye) and is not accessible anymore

@veekaybee
Copy link
Contributor Author

In that case, I can take out the kwargs from that YAML file which I think is triggering the test errors -does it also make sense to delete/modify the task?

@lhoestq
Copy link
Contributor

lhoestq commented Feb 28, 2024

You can at least remove the kwargs from the YAML. Note sure if we can fix this task (can it be re-hosted on HF ?) or if it should be removed

@veekaybee
Copy link
Contributor Author

@haileyschoelkopf this is ready for re-review! I removed the headqa task because it was erroring out on expected filetypes and on taking a look, the file pointers are to pickle files which I'd assume we want to avoid (there is a new set of parquet files that this points to but I'm not sure if we want to use those) https://huggingface.co/datasets/EleutherAI/headqa/discussions/1

@veekaybee
Copy link
Contributor Author

I'm also assuming that once we merge this we'll want to pin this to 2.17 https://github.com/EleutherAI/lm-evaluation-harness/pull/1312/files

@veekaybee
Copy link
Contributor Author

Hi all, I wanted to check on the status of this issue and if it was ok as-is, still needs more review, or is not needed anymore. Thanks!

Copy link
Contributor

@haileyschoelkopf haileyschoelkopf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I apologize for the delayed review @veekaybee ! Thank you very much for your work on this!

It looks good to me, and I think we can pin datasets>=2.16 now!

I've left some nits but otherwise LGTM!

lm_eval/__main__.py Outdated Show resolved Hide resolved
lm_eval/api/task.py Outdated Show resolved Hide resolved
@veekaybee
Copy link
Contributor Author

Should be all addressed @haileyschoelkopf , thank you for taking a look!

Copy link
Contributor

@haileyschoelkopf haileyschoelkopf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your work on this!

@haileyschoelkopf haileyschoelkopf merged commit 9516792 into EleutherAI:main Mar 3, 2024
8 checks passed
wx-zhang pushed a commit to wx-zhang/lm-evaluation-harness that referenced this pull request Mar 13, 2024
…ity (EleutherAI#1487)

* setting trust_remote_code

* dataset list no notebooks

* respect trust remote code

* Address changes, move cli options and change datasets

* fix task for tests

* headqa

* remove kobest

* pin datasets and address comments

* clean up space
nightingal3 pushed a commit to mycoalchen/lm-evaluation-harness that referenced this pull request May 2, 2024
…ity (EleutherAI#1487)

* setting trust_remote_code

* dataset list no notebooks

* respect trust remote code

* Address changes, move cli options and change datasets

* fix task for tests

* headqa

* remove kobest

* pin datasets and address comments

* clean up space
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants