Update datasets library to 2.2.2 in requirements #664

VijayKalmath · 2022-06-09T20:17:21Z

What does this PR do?

Summary

Textattack currently installs datasets 1.15 while installing textattack , which is far behind the latest datasets library.

This commit sets datasets to 2.2.2 in requirments.txt and updates test cases to suit the default imdb dataset used in datasets 2.2.2.

Additions

Changes

changed datasets to 2.2.2 in requirements.txt.
updated test case in tests/sample_outputs/run_attack_from_file.txt with the new desired output to stay updated with data used by datasets=2.2.2.

Deletions

Issues Addressed

Fixes #590

Checklist

The title of your pull request should be a summary of its contribution.
Please write detailed description of what parts have been newly added and what parts have been modified. Please also explain why certain changes were made.
If your pull request addresses an issue, please mention the issue number in the pull request description to make sure they are linked (and people consulting the issue know you are working on it)
To indicate a work in progress please mark it as a draft on Github.
Make sure existing tests pass.
[ ] Add relevant tests. No quality testing = no merge.
[ ] All public methods must have informative docstrings that work nicely with sphinx. For new modules/files, please add/modify the appropriate .rst file in TextAttack/docs/apidoc.'

Textattack currently installs datasets 1.15 which is far behind the latest datasets library. This commit sets datasets to 2.2.2 in requirments.txt and updates test cases to suit the default imdb dataset using in datasets 2.2.2.

jxmorris12 · 2022-06-09T22:01:37Z

Sorry, but the three skipped examples feels a bit odd, doesn't it? For a model with >80% accuracy, that's pretty unlikely that it gets three wrong back-to-back.

tests/sample_outputs/run_attack_from_file.txt

VijayKalmath · 2022-06-16T21:11:07Z

Adding Model Evaluations of cnn-imdb and lstm-imdb on imdb datasets across the 2 versions of the datasets package.

Datasets - 1.15 ; huggingface-hub-0.0.19

(Env_TextAttack) ecbm4040@instance-2:~/TextAttack$ textattack eval --model cnn-imdb --num-examples -1
textattack: Loading pre-trained TextAttack CNN: cnn-imdb
Reusing dataset imdb (/home/ecbm4040/.cache/huggingface/datasets/imdb/plain_text/1.0.0/e3c66f1788a67a89c7058d97ff62b6c30531e05b549de56d3ab91891f0561f9a)
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00, 977.85it/s]
textattack: Loading datasets dataset imdb, split test.
textattack: Got 25000 predictions.
textattack: Correct 20346/25000 (81.38%)



(Env_TextAttack) ecbm4040@instance-2:~/TextAttack$ textattack eval --model cnn-imdb --num-examples -1 --dataset-split train
textattack: Loading pre-trained TextAttack CNN: cnn-imdb
Reusing dataset imdb (/home/ecbm4040/.cache/huggingface/datasets/imdb/plain_text/1.0.0/e3c66f1788a67a89c7058d97ff62b6c30531e05b549de56d3ab91891f0561f9a)
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00, 951.81it/s]
textattack: Loading datasets dataset imdb, split train.
textattack: Got 25000 predictions.
textattack: Correct 21602/25000 (86.41%)


(Env_TextAttack) ecbm4040@instance-2:~/TextAttack$ textattack eval --model lstm-imdb --num-examples -1 --dataset-split train
textattack: Loading pre-trained TextAttack LSTM: lstm-imdb
Reusing dataset imdb (/home/ecbm4040/.cache/huggingface/datasets/imdb/plain_text/1.0.0/e3c66f1788a67a89c7058d97ff62b6c30531e05b549de56d3ab91891f0561f9a)
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00, 889.31it/s]
textattack: Loading datasets dataset imdb, split train.
textattack: Got 25000 predictions.
textattack: Correct 21269/25000 (85.08%)

Datasets - 2.2.2; huggingface-hub-0.7.0

(Env_TextAttack) ecbm4040@instance-2:~/TextAttack$ textattack eval --model cnn-imdb --num-examples -1
textattack: Loading pre-trained TextAttack CNN: cnn-imdb
Reusing dataset imdb (/home/ecbm4040/.cache/huggingface/datasets/imdb/plain_text/1.0.0/2fdd8b9bcadd6e7055e742a706876ba43f19faee861df134affd7a3f60fc38a1)
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00, 891.27it/s]
textattack: Loading datasets dataset imdb, split test.
textattack: Got 25000 predictions.
textattack: Correct 20346/25000 (81.38%)


(Env_TextAttack) ecbm4040@instance-2:~/TextAttack$ textattack eval --model cnn-imdb --num-examples -1 --dataset-split train
textattack: Loading pre-trained TextAttack CNN: cnn-imdb
Reusing dataset imdb (/home/ecbm4040/.cache/huggingface/datasets/imdb/plain_text/1.0.0/2fdd8b9bcadd6e7055e742a706876ba43f19faee861df134affd7a3f60fc38a1)
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00, 881.28it/s]
textattack: Loading datasets dataset imdb, split train.
textattack: Got 25000 predictions.
textattack: Correct 21602/25000 (86.41%)



(Env_TextAttack) ecbm4040@instance-2:~/TextAttack$ textattack eval --model lstm-imdb --num-examples -1 --dataset-split train
textattack: Loading pre-trained TextAttack LSTM: lstm-imdb
Reusing dataset imdb (/home/ecbm4040/.cache/huggingface/datasets/imdb/plain_text/1.0.0/2fdd8b9bcadd6e7055e742a706876ba43f19faee861df134affd7a3f60fc38a1)
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00, 888.94it/s]
textattack: Loading datasets dataset imdb, split train.
textattack: Got 25000 predictions.
textattack: Correct 21269/25000 (85.08%)

jxmorris12 · 2022-06-17T18:57:47Z

hope the tests can pass now!

jxmorris12 · 2022-06-18T20:19:45Z

LGTM! Thanks for doing this, we needed it.

Update datasets library to 2.2.2 in requirements

05e6e59

Textattack currently installs datasets 1.15 which is far behind the latest datasets library. This commit sets datasets to 2.2.2 in requirments.txt and updates test cases to suit the default imdb dataset using in datasets 2.2.2.

jxmorris12 reviewed Jun 9, 2022

View reviewed changes

tests/sample_outputs/run_attack_from_file.txt Outdated Show resolved Hide resolved

Add regex to match any character

9d95e60

jxmorris12 approved these changes Jun 18, 2022

View reviewed changes

jxmorris12 merged commit 9468cd3 into QData:master Jun 20, 2022

VijayKalmath deleted the Upgrade-requirements_for_datasets2_2_2 branch June 21, 2022 02:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update datasets library to 2.2.2 in requirements #664

Update datasets library to 2.2.2 in requirements #664

VijayKalmath commented Jun 9, 2022

jxmorris12 commented Jun 9, 2022

VijayKalmath commented Jun 16, 2022

jxmorris12 commented Jun 17, 2022

jxmorris12 commented Jun 18, 2022

Update datasets library to 2.2.2 in requirements #664

Update datasets library to 2.2.2 in requirements #664

Conversation

VijayKalmath commented Jun 9, 2022

What does this PR do?

Summary

Additions

Changes

Deletions

Issues Addressed

Checklist

jxmorris12 commented Jun 9, 2022

VijayKalmath commented Jun 16, 2022

Datasets - 1.15 ; huggingface-hub-0.0.19

Datasets - 2.2.2; huggingface-hub-0.7.0

jxmorris12 commented Jun 17, 2022

jxmorris12 commented Jun 18, 2022