Setting trust_remote_code to `True` for HuggingFace datasets compatibility #1467

veekaybee · 2024-02-24T22:43:44Z

See this issue for context: #1135 (comment)

We'd like to be able to use the latest datasets version, 2.16 in lm-evaluation-harness. We currently use 2.15 .

In order to accommodate this, we'd like to:

add a trust_remote_code dataset kwarg to each dataset that requires it and pass through - including a sample for now based on a dataset that I know requires remote code execution.
Let me know if it looks ok and I'll do a scan through the datasets to see which ones do and add this?
Include trust_remote_code as True by default in the model constructor, to be overriden by -model_args when evaluating from the command line.

This reverts commit c1145df.

haileyschoelkopf · 2024-02-26T14:30:31Z

Oops sorry about that @veekaybee !

I think this looks great to me as how it should be implemented! We probably also want to make sure that if datasets.config.HF_DATASETS_TRUST_REMOTE_CODE = True is set by a user that that's respected.

I guess the question is if there are any datasets for which we'd want to force people to opt-in to trusting their remote code-- @lhoestq mentioned perhaps any datasets under orgs other than the HF default one or EleutherAI's HF org would require a --trust_remote_code flag?

LSinev · 2024-02-26T14:44:37Z

Checks for other useful datasets env variables like HF_DATASETS_CACHE, HF_DATASETS_IN_MEMORY_MAX_SIZE may also be useful to respect (mentioning here in case some specific code will be used for that).

veekaybee · 2024-02-26T14:55:07Z

No worries, I should have marked this as a draft!

I can add a check for the environment variables before the code goes into the actual code download, would here be a good place for it?

lm-evaluation-harness/lm_eval/api/task.py

Line 224 in f6befdb

def download(self, data_dir=None, cache_dir=None, download_mode=None) -> None:

I wanted to also check which we should mark these true for. I wrote a quick script that checks which tasks actually have dataset paths and came up with these: https://gist.github.com/veekaybee/269c8f7c51e6b1a92af4d4ff99bd0931

It looks like this is the list we'll need to check against some variation of this code, right? #1135 (comment)

This reverts commit c1145df.

setting trust_remote_code

41384ce

veekaybee requested review from haileyschoelkopf and lintangsutawika as code owners February 24, 2024 22:43

lintangsutawika approved these changes Feb 26, 2024

View reviewed changes

lintangsutawika merged commit c1145df into EleutherAI:main Feb 26, 2024
8 checks passed

haileyschoelkopf added a commit that referenced this pull request Feb 26, 2024

Revert "setting trust_remote_code (#1467)"

e2de799

This reverts commit c1145df.

haileyschoelkopf mentioned this pull request Feb 26, 2024

Revert "Setting trust_remote_code to True for HuggingFace datasets compatibility" #1474

Merged

lintangsutawika pushed a commit that referenced this pull request Feb 26, 2024

Revert "setting trust_remote_code (#1467)" (#1474)

f6befdb

This reverts commit c1145df.

veekaybee mentioned this pull request Feb 27, 2024

Setting trust_remote_code to True for HuggingFace datasets compatibility #1487

Merged

wx-zhang pushed a commit to wx-zhang/lm-evaluation-harness that referenced this pull request Mar 13, 2024

setting trust_remote_code (EleutherAI#1467)

5968a4c

wx-zhang pushed a commit to wx-zhang/lm-evaluation-harness that referenced this pull request Mar 13, 2024

Revert "setting trust_remote_code (EleutherAI#1467)" (EleutherAI#1474)

da22023

This reverts commit c1145df.

nightingal3 pushed a commit to mycoalchen/lm-evaluation-harness that referenced this pull request May 2, 2024

setting trust_remote_code (EleutherAI#1467)

9f80e96

nightingal3 pushed a commit to mycoalchen/lm-evaluation-harness that referenced this pull request May 2, 2024

Revert "setting trust_remote_code (EleutherAI#1467)" (EleutherAI#1474)

1b6b9a2

This reverts commit c1145df.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Setting trust_remote_code to `True` for HuggingFace datasets compatibility #1467

Setting trust_remote_code to `True` for HuggingFace datasets compatibility #1467

veekaybee commented Feb 24, 2024

haileyschoelkopf commented Feb 26, 2024

LSinev commented Feb 26, 2024

veekaybee commented Feb 26, 2024

Setting trust_remote_code to True for HuggingFace datasets compatibility #1467

Setting trust_remote_code to True for HuggingFace datasets compatibility #1467

Conversation

veekaybee commented Feb 24, 2024

haileyschoelkopf commented Feb 26, 2024

LSinev commented Feb 26, 2024

veekaybee commented Feb 26, 2024

Setting trust_remote_code to `True` for HuggingFace datasets compatibility #1467

Setting trust_remote_code to `True` for HuggingFace datasets compatibility #1467