Setting trust_remote_code to True for HuggingFace datasets compatibility #1487

veekaybee · 2024-02-27T17:04:26Z

See this issue for context: #1135 (comment)

We'd like to be able to use the latest datasets version, 2.16 in lm-evaluation-harness. We currently use 2.15 .

In order to accommodate this, we'd like to:

add a trust_remote_code dataset kwarg to each dataset that requires it and pass through. See notebook for derivation of which datasets require it.

['EleutherAI/mutual',
 'skt/kobest_v1',
 'EleutherAI/logiqa',
 'bigbio/pubmed_qa',
 'EleutherAI/race',
 'BigScienceBiasEval/crows_pairs_multilingual',
 'baber/logiqa2',
 'EleutherAI/arithmetic',
 'EleutherAI/asdiv',
 'EleutherAI/lambada_openai',
 'EleutherAI/wikitext_document_level',
 'EleutherAI/drop',
 'EleutherAI/hendrycks_math',
 'juletxara/xstory_cloze',
 'allenai/qasper',
 'EleutherAI/pile',
 'EleutherAI/sycophancy',
 'EleutherAI/hendrycks_ethics',
 'EleutherAI/unscramble',
 'EleutherAI/headqa',
 'skg/toxigen-data',
 'EleutherAI/coqa',
 'corypaik/prost']

Include trust_remote_code as True by default in the model constructor, to be overriden by -model_args when evaluating from the command line.
Respect the user's HF_DATASETS_TRUST_REMOTE_CODE if it exists.

Related: #1467

cc @lhoestq fyi

lm_eval/api/task.py

veekaybee · 2024-02-27T18:31:26Z

The tests are failing at the download step and I'm guessing it's because of the environment variable, although this passes locally, so I may have to add conditionals to account for this if we decide those variables are ok where they are

 pytest tests/test_tasks.py 
======================================================================= test session starts =======================================================================
platform darwin -- Python 3.10.0, pytest-7.4.3, pluggy-1.3.0
rootdir: /Users/vicki/lm-evaluation-harness
plugins: anyio-3.7.1
collected 13 items                                                                                                                                                

tests/test_tasks.py .............                                                                                                                           [100%]

======================================================================== warnings summary =========================================================================
../.pyenv/versions/3.10.0/envs/evals/lib/python3.10/site-packages/bitsandbytes/cextension.py:34
  /Users/vicki/.pyenv/versions/3.10.0/envs/evals/lib/python3.10/site-packages/bitsandbytes/cextension.py:34: UserWarning: The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers, 8-bit multiplication, and GPU quantization are unavailable.
    warn("The installed version of bitsandbytes was compiled without GPU support. "

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
================================================================= 13 passed, 1 warning in 25.38s ==================================================================

haileyschoelkopf

Thank you @veekaybee !

Upon thinking some more, I think actually, let's do the following (slightly differently to how I originally said it):

keep trust_remote_code = False by default in model constructor
have trust_remote_code: True in the YAMLs as you added but only for datasets under the EleutherAI/ HF org or datasets in the default namespace (no org) , the others should not get it set
respect HF_DATASETS_TRUST_REMOTE_CODE env variable when it is set
add a --trust_remote_code CLI flag, which, when set will: 1) cause trust_remote_code=True to be added to model_args string / passed to model constructor, 2) set the HF_DATASETS_TRUST_REMOTE_CODE env variable to true in main.py (or maybe in simple_evaluate()? we want running on remote-code requiring datasets to be not too annoying for people not using main.py)

How's this sound? basically, the PR as you've implemented it, but with an extra --trust_remote_code flag in the CLI, and with datasets not in the EAI or HF namespaces requiring this flag. Reasoning being that we'd like to support HF's initiative of making the OSS ecosystem more secure.

lm_eval/api/task.py

haileyschoelkopf · 2024-02-27T20:05:28Z

E NotADirectoryError: [Errno 20] Not a directory: '/home/runner/.cache/huggingface/datasets/downloads/9d10351eefe83ab9887de1b307f40404b99de9ba10fed427d64faa36ae611778/HEAD_EN/train_HEAD_EN.json'

The tests are failing at the download step and I'm guessing it's because of the environment variable

This looks like a separate error relating to HeadQA!

veekaybee · 2024-02-27T21:21:44Z

@haileyschoelkopf Thank you! All of that makes sense and I think I've addressed all the comments, please take a look at the new implementation of the check for the trust_remote_code environment variable in main, and I'll check to see how tests are doing, as well.

veekaybee · 2024-02-27T21:57:50Z

Getting new testing failures triggering from update of the YAML files I'm assuming, this time from pile_arxiv.yaml:

BuilderConfig 'pile_arxiv' not found. Available: ['all', 'enron_emails', 'europarl', 'free_law', 'hacker_news', 'nih_exporter', 'pubmed', 'pubmed_central', 'ubuntu_irc', 'uspto', 'github']

should the task be set to one of these? https://huggingface.co/datasets/EleutherAI/pile/blob/main/pile.py#L61

lhoestq · 2024-02-28T15:52:49Z

AFAIK the EleutherAI/pile dataset has been taken down on its original host (the-eye) and is not accessible anymore

veekaybee · 2024-02-28T15:56:47Z

In that case, I can take out the kwargs from that YAML file which I think is triggering the test errors -does it also make sense to delete/modify the task?

lhoestq · 2024-02-28T16:01:54Z

You can at least remove the kwargs from the YAML. Note sure if we can fix this task (can it be re-hosted on HF ?) or if it should be removed

veekaybee · 2024-02-28T20:14:31Z

@haileyschoelkopf this is ready for re-review! I removed the headqa task because it was erroring out on expected filetypes and on taking a look, the file pointers are to pickle files which I'd assume we want to avoid (there is a new set of parquet files that this points to but I'm not sure if we want to use those) https://huggingface.co/datasets/EleutherAI/headqa/discussions/1

veekaybee · 2024-02-28T20:16:06Z

I'm also assuming that once we merge this we'll want to pin this to 2.17 https://github.com/EleutherAI/lm-evaluation-harness/pull/1312/files

veekaybee · 2024-03-03T17:07:11Z

Hi all, I wanted to check on the status of this issue and if it was ok as-is, still needs more review, or is not needed anymore. Thanks!

haileyschoelkopf

I apologize for the delayed review @veekaybee ! Thank you very much for your work on this!

It looks good to me, and I think we can pin datasets>=2.16 now!

I've left some nits but otherwise LGTM!

lm_eval/__main__.py

lm_eval/api/task.py

veekaybee · 2024-03-03T20:26:05Z

Should be all addressed @haileyschoelkopf , thank you for taking a look!

haileyschoelkopf

Thank you for your work on this!

…ity (EleutherAI#1487) * setting trust_remote_code * dataset list no notebooks * respect trust remote code * Address changes, move cli options and change datasets * fix task for tests * headqa * remove kobest * pin datasets and address comments * clean up space

veekaybee added 3 commits February 24, 2024 17:30

setting trust_remote_code

41384ce

dataset list no notebooks

c5b2d5a

respect trust remote code

544731c

veekaybee requested review from haileyschoelkopf and lintangsutawika as code owners February 27, 2024 17:04

veekaybee commented Feb 27, 2024

View reviewed changes

lm_eval/api/task.py Outdated Show resolved Hide resolved

veekaybee mentioned this pull request Feb 27, 2024

Warning on trust_remote_code on datasets mozilla-ai/lm-buddy#49

Closed

haileyschoelkopf reviewed Feb 27, 2024

View reviewed changes

lm_eval/api/task.py Outdated Show resolved Hide resolved

lm_eval/api/task.py Outdated Show resolved Hide resolved

Address changes, move cli options and change datasets

87b4901

veekaybee added 3 commits February 28, 2024 11:03

fix task for tests

0ce5121

headqa

343c3f0

remove kobest

2a1063e

haileyschoelkopf reviewed Mar 3, 2024

View reviewed changes

lm_eval/__main__.py Outdated Show resolved Hide resolved

lm_eval/api/task.py Outdated Show resolved Hide resolved

veekaybee added 2 commits March 3, 2024 15:17

pin datasets and address comments

5d5efe1

clean up space

73d908c

haileyschoelkopf approved these changes Mar 3, 2024

View reviewed changes

haileyschoelkopf merged commit 9516792 into EleutherAI:main Mar 3, 2024
8 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Setting trust_remote_code to True for HuggingFace datasets compatibility #1487

Setting trust_remote_code to True for HuggingFace datasets compatibility #1487

veekaybee commented Feb 27, 2024 •

edited

veekaybee commented Feb 27, 2024

haileyschoelkopf left a comment

haileyschoelkopf commented Feb 27, 2024

veekaybee commented Feb 27, 2024

veekaybee commented Feb 27, 2024

lhoestq commented Feb 28, 2024

veekaybee commented Feb 28, 2024

lhoestq commented Feb 28, 2024

veekaybee commented Feb 28, 2024

veekaybee commented Feb 28, 2024

veekaybee commented Mar 3, 2024

haileyschoelkopf left a comment

veekaybee commented Mar 3, 2024

haileyschoelkopf left a comment

Setting trust_remote_code to True for HuggingFace datasets compatibility #1487

Setting trust_remote_code to True for HuggingFace datasets compatibility #1487

Conversation

veekaybee commented Feb 27, 2024 • edited

veekaybee commented Feb 27, 2024

haileyschoelkopf left a comment

Choose a reason for hiding this comment

haileyschoelkopf commented Feb 27, 2024

veekaybee commented Feb 27, 2024

veekaybee commented Feb 27, 2024

lhoestq commented Feb 28, 2024

veekaybee commented Feb 28, 2024

lhoestq commented Feb 28, 2024

veekaybee commented Feb 28, 2024

veekaybee commented Feb 28, 2024

veekaybee commented Mar 3, 2024

haileyschoelkopf left a comment

Choose a reason for hiding this comment

veekaybee commented Mar 3, 2024

haileyschoelkopf left a comment

Choose a reason for hiding this comment

veekaybee commented Feb 27, 2024 •

edited