-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update datasets #309
Update datasets #309
Conversation
0886ada
to
15aac79
Compare
@Dref360 do you think my fixes for updating datasets make sense? |
looks good. We never run anything on the |
eval_dm._base_dataset_split.features["label"] = ClassLabel( | ||
num_classes=3, names=existing_classes + ["NO_INTENT"] | ||
) | ||
train_dm._base_dataset_split.features["label"] = ClassLabel( | ||
num_classes=3, names=existing_classes + ["NO_INTENT"] | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to instantiate only one ClassLabel
object and pass it to both dataset managers?
eval_dm._base_dataset_split.features["label"] = ClassLabel( | |
num_classes=3, names=existing_classes + ["NO_INTENT"] | |
) | |
train_dm._base_dataset_split.features["label"] = ClassLabel( | |
num_classes=3, names=existing_classes + ["NO_INTENT"] | |
) | |
eval_dm._base_dataset_split.features["label"] = train_dm._base_dataset_split.features[ | |
"label" | |
] = ClassLabel(num_classes=3, names=existing_classes + ["NO_INTENT"]) |
Or with a temporary variable:
eval_dm._base_dataset_split.features["label"] = ClassLabel( | |
num_classes=3, names=existing_classes + ["NO_INTENT"] | |
) | |
train_dm._base_dataset_split.features["label"] = ClassLabel( | |
num_classes=3, names=existing_classes + ["NO_INTENT"] | |
) | |
class_label = ClassLabel(num_classes=3, names=existing_classes + ["NO_INTENT"]) | |
eval_dm._base_dataset_split.features["label"] = class_label | |
train_dm._base_dataset_split.features["label"] = class_label |
Or if we move the creation of dms
before that, we can loop on it:
eval_dm._base_dataset_split.features["label"] = ClassLabel( | |
num_classes=3, names=existing_classes + ["NO_INTENT"] | |
) | |
train_dm._base_dataset_split.features["label"] = ClassLabel( | |
num_classes=3, names=existing_classes + ["NO_INTENT"] | |
) | |
class_label = ClassLabel(num_classes=3, names=existing_classes + ["NO_INTENT"]) | |
for dm in dms.values(): | |
dm._base_dataset_split.features["label"] = class_label |
Here is a complete diff of the last idea, in case it was not clear:
# Adding a rejection class
eval_dm: DatasetSplitManager = mod.get_dataset_split_manager(DatasetSplitName.eval)
train_dm: DatasetSplitManager = mod.get_dataset_split_manager(DatasetSplitName.train)
- existing_classes = eval_dm.get_class_names(labels_only=True)
- eval_dm._base_dataset_split.features["label"] = ClassLabel(
- num_classes=3, names=existing_classes + ["NO_INTENT"]
- )
- train_dm._base_dataset_split.features["label"] = ClassLabel(
- num_classes=3, names=existing_classes + ["NO_INTENT"]
- )
- eval_dm._base_dataset_split = eval_dm._base_dataset_split.map(
- lambda u, i: {"label": 2 if i % 10 == 0 else u["label"]}, with_indices=True
- )
dms = {
DatasetSplitName.eval: eval_dm,
DatasetSplitName.train: train_dm,
}
+ existing_classes = eval_dm.get_class_names(labels_only=True)
+ class_label = ClassLabel(num_classes=3, names=existing_classes + ["NO_INTENT"])
+ for dm in dms.values():
+ dm._base_dataset_split.features["label"] = class_label
+ eval_dm._base_dataset_split = eval_dm._base_dataset_split.map(
+ lambda u, i: {"label": 2 if i % 10 == 0 else u["label"]}, with_indices=True
+ )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the recommendation! I further cleaned it up, because I think it was hard to read, given that sometimes we would edit the values in the Dict
, and sometimes, directly eval_dm
. LMK what you think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Haha! I had done the exact same change locally while reviewing, but I thought I was asking too much. That's perfect! 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a relief! Thank you! I have some minor comments, but that's good to go!
Resolve #267
Description:
Checklist:
You should check all boxes before the PR is ready. If a box does not apply, check it to acknowledge it.
ran
pre-commit run --all-files
at the end.our users.
README
files and our wiki for any big design decisions, if relevant.