Feature/hf dataset augmentation#653
Conversation
JulesBelveze
left a comment
There was a problem hiding this comment.
@Prikshit7766 Why not doing something like?
harness.augment(
input_path="glue",
output_path="augmented.csv",
custom_proportions=custom_proportions,
export_mode="transformed",
data_kwargs={
"subset": "sst2",
"feature_column": "sentence",
"target_column": "label",
"split": "train"
}
)
then you simply have to do:
self.df.load_data(**data_kwargs)
Do we have to change this for the loading of hf dataset in the harness as well? |
|
@RakshitKhajuria Yes I don't think it's too bad to have an optional parameter called |
@JulesBelveze adding an additional param was the initial thought. However David insisted of not adding more params to harness class. |
|
@ArshaanNazir then I guess the only way is to delete the |
But our main use case was to retrain our own models. ( with data on which they have being trained ) h.augment( input_path = "train.conll" , ****) looks more appealing in that way. But yes, it will make it more generic. What do you think @JulesBelveze ? |
In this case, we can name it as training_data (parameter as Dict) , with data_source and other optional and output_path can be augmented_data |
@ArshaanNazir |
@JulesBelveze don't merge it yet. We will be updating this PR with the notebook 😊 |
…JohnSnowLabs/nlptest into feature/hf-dataset-augmentation
…JohnSnowLabs/langtest into feature/hf-dataset-augmentation
Description
Added support for loading HuggingFace Datasets for Augmentation tasks.
Notebook ➤ Demo
➤ Fixes #621
Type of change
Usage
Checklist:
Screenshots (if appropriate):