-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Setting trust_remote_code to True
for HuggingFace datasets compatibility
#1467
Conversation
This reverts commit c1145df.
Oops sorry about that @veekaybee ! I think this looks great to me as how it should be implemented! We probably also want to make sure that if I guess the question is if there are any datasets for which we'd want to force people to opt-in to trusting their remote code-- @lhoestq mentioned perhaps any datasets under orgs other than the HF default one or EleutherAI's HF org would require a |
Checks for other useful |
No worries, I should have marked this as a draft! I can add a check for the environment variables before the code goes into the actual code download, would here be a good place for it? lm-evaluation-harness/lm_eval/api/task.py Line 224 in f6befdb
I wanted to also check which we should mark these true for. I wrote a quick script that checks which tasks actually have dataset paths and came up with these: https://gist.github.com/veekaybee/269c8f7c51e6b1a92af4d4ff99bd0931 It looks like this is the list we'll need to check against some variation of this code, right? #1135 (comment) |
This reverts commit c1145df.
This reverts commit c1145df.
See this issue for context: #1135 (comment)
We'd like to be able to use the latest datasets version, 2.16 in lm-evaluation-harness. We currently use 2.15 .
In order to accommodate this, we'd like to:
add a
trust_remote_code
dataset kwarg to each dataset that requires it and pass through - including a sample for now based on a dataset that I know requires remote code execution.Let me know if it looks ok and I'll do a scan through the datasets to see which ones do and add this?
Include
trust_remote_code
as True by default in the model constructor, to be overriden by -model_args when evaluating from the command line.