-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/hf dataset augmentation #653
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Prikshit7766 Why not doing something like?
harness.augment(
input_path="glue",
output_path="augmented.csv",
custom_proportions=custom_proportions,
export_mode="transformed",
data_kwargs={
"subset": "sst2",
"feature_column": "sentence",
"target_column": "label",
"split": "train"
}
)
then you simply have to do:
self.df.load_data(**data_kwargs)
Do we have to change this for the loading of hf dataset in the harness as well? |
@RakshitKhajuria Yes I don't think it's too bad to have an optional parameter called |
@JulesBelveze adding an additional param was the initial thought. However David insisted of not adding more params to harness class. |
@ArshaanNazir then I guess the only way is to delete the |
But our main use case was to retrain our own models. ( with data on which they have being trained ) h.augment( input_path = "train.conll" , ****) looks more appealing in that way. But yes, it will make it more generic. What do you think @JulesBelveze ? |
In this case, we can name it as training_data (parameter as Dict) , with data_source and other optional and output_path can be augmented_data |
@ArshaanNazir |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@JulesBelveze don't merge it yet. We will be updating this PR with the notebook 😊 |
…JohnSnowLabs/nlptest into feature/hf-dataset-augmentation
…JohnSnowLabs/langtest into feature/hf-dataset-augmentation
Description
Added support for loading HuggingFace Datasets for Augmentation tasks.
Notebook ➤ Demo
➤ Fixes #621
Type of change
Usage
Checklist:
Screenshots (if appropriate):