In [24]:
from textattack.transformations import WordSwap

class BananaWordSwap(WordSwap):
    def _get_replacement_words(self, word):
        return ['banana']

In [23]:
import re
import sys
sys.path.insert(0, '/root/seongae/SeniorProject_NLPAttack/TextAttack')
sys.path.insert(1, '/usr/local/lib/python3.8/site-packages')


In [25]:
# Import the model
import transformers
from textattack.models.wrappers import HuggingFaceModelWrapper

model = transformers.AutoModelForSequenceClassification.from_pretrained("textattack/bert-base-uncased-ag-news")
tokenizer = transformers.AutoTokenizer.from_pretrained("textattack/bert-base-uncased-ag-news")

shared_masked_lm = transformers.AutoModelForCausalLM.from_pretrained(
            "distilroberta-base"
)
shared_tokenizer = transformers.AutoTokenizer.from_pretrained(
            "distilroberta-base"
)
        
model_wrapper = HuggingFaceModelWrapper(model, tokenizer)

# Create the goal function using the model
from textattack.goal_functions import UntargetedClassification
goal_function = UntargetedClassification(model_wrapper)

# Import the dataset
from textattack.datasets import HuggingFaceDataset
dataset = HuggingFaceDataset("ag_news", None, "test")

If you want to use `RobertaLMHeadModel` as a standalone, add `is_decoder=True.`
textattack: Unknown if model of class <class 'transformers.models.bert.modeling_bert.BertForSequenceClassification'> compatible with goal function <class 'textattack.goal_functions.classification.untargeted_classification.UntargetedClassification'>.
Using custom data configuration default
Reusing dataset ag_news (/root/.cache/huggingface/datasets/ag_news/default/0.0.0/bc2bcb40336ace1a0374767fc29bb0296cdaf8a6da7298436239c54d79180548)
100%|██████████| 2/2 [00:00<00:00, 824.35it/s]
textattack: Loading [94mdatasets[0m dataset [94mag_news[0m, split [94mtest[0m.


In [26]:
from textattack.search_methods import GreedySearch, BeamSearch
from textattack.constraints.pre_transformation import RepeatModification, StopwordModification
from textattack import Attack

# We're going to use our Banana word swap class as the attack transformation.
transformation = BananaWordSwap()
# We'll constrain modification of already modified indices and stopwords
constraints = [RepeatModification(),
               StopwordModification()]
# We'll use the Greedy search method
search_method = BeamSearch()
# Now, let's make the attack from the 4 components:
attack = Attack(goal_function, constraints, transformation, search_method)

In [28]:
print(attack)

Attack(
  (search_method): BeamSearch(
    (beam_width):  8
  )
  (goal_function):  UntargetedClassification
  (transformation):  BananaWordSwap
  (constraints): 
    (0): RepeatModification
    (1): StopwordModification
  (is_black_box):  True
)


In [15]:
print(dataset[0])

(OrderedDict([('text', "Fears for T N pension after talks Unions representing workers at Turner   Newall say they are 'disappointed' after talks with stricken parent firm Federal Mogul.")]), 2)


In [16]:
from tqdm import tqdm 
from textattack.loggers import CSVLogger
from textattack.attack_results import SuccessfulAttackResult
from textattack import Attacker
from textattack import AttackArgs
from textattack.datasets import Dataset

attack_args = AttackArgs(num_examples=10)

attacker = Attacker(attack, dataset, attack_args)

attack_results = attacker.attack_dataset()

Attack(
  (search_method): BeamSearch(
    (beam_width):  8
  )
  (goal_function):  UntargetedClassification
  (transformation):  BananaWordSwap
  (constraints): 
    (0): RepeatModification
    (1): StopwordModification
  (is_black_box):  True
) 



[Succeeded / Failed / Skipped / Total] 1 / 0 / 0 / 1:  10%|█         | 1/10 [00:15<02:17, 15.28s/it]

--------------------------------------------- Result 1 ---------------------------------------------

Fears for T N [[pension]] after [[talks]] [[Unions]] representing [[workers]] at Turner   Newall say they are '[[disappointed']] after talks with stricken parent firm Federal [[Mogul]].

Fears for T N [[banana]] after [[banana]] [[banana]] representing [[banana]] at Turner   Newall say they are '[[banana]] after talks with stricken parent firm Federal [[banana]].




[Succeeded / Failed / Skipped / Total] 2 / 0 / 0 / 2:  20%|██        | 2/10 [02:29<09:59, 74.95s/it]

--------------------------------------------- Result 2 ---------------------------------------------

The Race is On: Second Private [[Team]] Sets Launch [[Date]] for [[Human]] [[Spaceflight]] ([[SPACE]].[[com]]) [[SPACE]].[[com]] - [[TORONTO]], [[Canada]] -- [[A]] [[second]]\[[team]] of rocketeers [[competing]] for the  #36;10 million Ansari X [[Prize]], a [[contest]] for\[[privately]] funded [[suborbital]] [[space]] [[flight]], has officially [[announced]] the first\[[launch]] date for its [[manned]] rocket.

The Race is On: Second Private [[banana]] Sets Launch [[banana]] for [[banana]] [[banana]] ([[banana]].[[banana]]) [[banana]].[[banana]] - [[banana]], [[banana]] -- [[banana]] [[banana]]\[[banana]] of rocketeers [[banana]] for the  #36;10 million Ansari X [[banana]], a [[banana]] for\[[banana]] funded [[banana]] [[banana]] [[banana]], has officially [[banana]] the first\[[banana]] date for its [[banana]] rocket.




[Succeeded / Failed / Skipped / Total] 3 / 0 / 0 / 3:  30%|███       | 3/10 [03:12<07:28, 64.14s/it]

--------------------------------------------- Result 3 ---------------------------------------------

Ky. Company Wins Grant to [[Study]] [[Peptides]] ([[AP]]) [[AP]] - [[A]] company founded by a chemistry researcher at the [[University]] of Louisville won a grant to develop a method of producing better [[peptides]], which are short chains of [[amino]] [[acids]], the building blocks of [[proteins]].

Ky. Company Wins Grant to [[banana]] [[banana]] ([[banana]]) [[banana]] - [[banana]] company founded by a chemistry researcher at the [[banana]] of Louisville won a grant to develop a method of producing better [[banana]], which are short chains of [[banana]] [[banana]], the building blocks of [[banana]].




[Succeeded / Failed / Skipped / Total] 4 / 0 / 0 / 4:  40%|████      | 4/10 [03:33<05:20, 53.37s/it]

--------------------------------------------- Result 4 ---------------------------------------------

[[Prediction]] Unit Helps [[Forecast]] Wildfires (AP) AP - It's barely dawn when [[Mike]] [[Fitzpatrick]] starts his shift with a blur of colorful maps, figures and endless charts, but already he knows what the day will bring. Lightning will strike in places he expects. Winds will pick up, moist places will dry and flames will roar.

[[banana]] Unit Helps [[banana]] Wildfires (AP) AP - It's barely dawn when [[banana]] [[banana]] starts his shift with a blur of colorful maps, figures and endless charts, but already he knows what the day will bring. Lightning will strike in places he expects. Winds will pick up, moist places will dry and flames will roar.




[Succeeded / Failed / Skipped / Total] 5 / 0 / 0 / 5:  50%|█████     | 5/10 [03:48<03:48, 45.72s/it]

--------------------------------------------- Result 5 ---------------------------------------------

Calif. Aims to Limit Farm-Related [[Smog]] (AP) AP - Southern California's [[smog-fighting]] agency went after [[emissions]] of the [[bovine]] variety Friday, adopting the nation's first rules to reduce air pollution from dairy cow manure.

Calif. Aims to Limit Farm-Related [[banana]] (AP) AP - Southern California's [[banana]] agency went after [[banana]] of the [[banana]] variety Friday, adopting the nation's first rules to reduce air pollution from dairy cow manure.




[Succeeded / Failed / Skipped / Total] 6 / 0 / 0 / 6:  60%|██████    | 6/10 [06:51<04:34, 68.62s/it]

--------------------------------------------- Result 6 ---------------------------------------------

Open Letter Against British [[Copyright]] Indoctrination in Schools The [[British]] Department for Education and Skills (DfES) [[recently]] launched a "[[Music]] Manifesto" campaign, with the [[ostensible]] intention of educating the [[next]] [[generation]] of British musicians. Unfortunately, they also teamed up with the [[music]] industry (EMI, and various artists) to make this popular. [[EMI]] has [[apparently]] negotiated their [[end]] [[well]], so that children in our schools will now be indoctrinated about the illegality of [[downloading]] music.The ignorance and audacity of this got to me a little, so I wrote an open letter to the DfES about it. Unfortunately, it's pedantic, as I suppose you have to be when writing to goverment representatives. But I hope you find it useful, and perhaps feel inspired to do something similar, if or when the same thing has happened in your area.



[Succeeded / Failed / Skipped / Total] 6 / 1 / 0 / 7:  70%|███████   | 7/10 [16:51<07:13, 144.44s/it]

--------------------------------------------- Result 7 ---------------------------------------------

Loosing the War on Terrorism \\"Sven Jaschan, self-confessed author of the Netsky and Sasser viruses, is\responsible for 70 percent of virus infections in 2004, according to a six-month\virus roundup published Wednesday by antivirus company Sophos."\\"The 18-year-old Jaschan was taken into custody in Germany in May by police who\said he had admitted programming both the Netsky and Sasser worms, something\experts at Microsoft confirmed. (A Microsoft antivirus reward program led to the\teenager's arrest.) During the five months preceding Jaschan's capture, there\were at least 25 variants of Netsky and one of the port-scanning network worm\Sasser."\\"Graham Cluley, senior technology consultant at Sophos, said it was staggeri ...\\




[Succeeded / Failed / Skipped / Total] 6 / 2 / 0 / 8:  80%|████████  | 8/10 [30:23<07:35, 227.90s/it]

--------------------------------------------- Result 8 ---------------------------------------------

FOAFKey: FOAF, PGP, Key Distribution, and Bloom Filters \\FOAF/LOAF  and bloom filters have a lot of interesting properties for social\network and whitelist distribution.\\I think we can go one level higher though and include GPG/OpenPGP key\fingerpring distribution in the FOAF file for simple web-of-trust based key\distribution.\\What if we used FOAF and included the PGP key fingerprint(s) for identities?\This could mean a lot.  You include the PGP key fingerprints within the FOAF\file of your direct friends and then include a bloom filter of the PGP key\fingerprints of your entire whitelist (the source FOAF file would of course need\to be encrypted ).\\Your whitelist would be populated from the social network as your client\discovered new identit ...\\




[Succeeded / Failed / Skipped / Total] 7 / 2 / 0 / 9:  90%|█████████ | 9/10 [30:25<03:22, 202.88s/it]

--------------------------------------------- Result 9 ---------------------------------------------

[[E-mail]] scam targets police chief Wiltshire Police warns about "[[phishing]]" after its fraud squad chief was targeted.

[[banana]] scam targets police chief Wiltshire Police warns about "[[banana]]" after its fraud squad chief was targeted.




[Succeeded / Failed / Skipped / Total] 8 / 2 / 0 / 10: 100%|██████████| 10/10 [30:26<00:00, 182.66s/it]

--------------------------------------------- Result 10 ---------------------------------------------

Card fraud unit nets 36,000 cards In its first two years, the UK's dedicated [[card]] fraud unit, has recovered 36,000 stolen cards and 171 arrests - and estimates it saved 65m.

Card fraud unit nets 36,000 cards In its first two years, the UK's dedicated [[banana]] fraud unit, has recovered 36,000 stolen cards and 171 arrests - and estimates it saved 65m.



+-------------------------------+--------+
| Attack Results                |        |
+-------------------------------+--------+
| Number of successful attacks: | 8      |
| Number of failed attacks:     | 2      |
| Number of skipped attacks:    | 0      |
| Original accuracy:            | 100.0% |
| Accuracy under attack:        | 20.0%  |
| Attack success rate:          | 80.0%  |
| Average perturbed word %:     | 16.86% |
| Average num. words per input: | 63.0   |
| Avg num queries:              | 6618.4 |
+------------------




In [29]:
import pandas as pd
import csv

pd.options.display.max_colwidth = 480 # increase colum width so we can actually read the examples

logger = CSVLogger(color_method='html')
for result in attack_results:
    logger.log_attack_result(result)

from IPython.core.display import display, HTML
logger.df = pd.DataFrame.from_records(logger.row_list)
logger.df.to_csv(logger.filename, quoting=csv.QUOTE_NONNUMERIC, index=False)
display(HTML(logger.df[['original_text', 'perturbed_text']].to_html(escape=False)))

textattack: Logging to CSV at path results.csv
textattack: CSVLogger exiting without calling flush().
  from IPython.core.display import display, HTML


Unnamed: 0,original_text,perturbed_text
0,Fears for T N pension after talks Unions representing workers at Turner Newall say they are 'disappointed' after talks with stricken parent firm Federal Mogul.,Fears for T N banana after banana banana representing banana at Turner Newall say they are 'banana after talks with stricken parent firm Federal banana.
1,"The Race is On: Second Private Team Sets Launch Date for Human Spaceflight (SPACE.com) SPACE.com - TORONTO, Canada -- A second\team of rocketeers competing for the #36;10 million Ansari X Prize, a contest for\privately funded suborbital space flight, has officially announced the first\launch date for its manned rocket.","The Race is On: Second Private banana Sets Launch banana for banana banana (banana.banana) banana.banana - banana, banana -- banana banana\banana of rocketeers banana for the #36;10 million Ansari X banana, a banana for\banana funded banana banana banana, has officially banana the first\banana date for its banana rocket."
2,"Ky. Company Wins Grant to Study Peptides (AP) AP - A company founded by a chemistry researcher at the University of Louisville won a grant to develop a method of producing better peptides, which are short chains of amino acids, the building blocks of proteins.","Ky. Company Wins Grant to banana banana (banana) banana - banana company founded by a chemistry researcher at the banana of Louisville won a grant to develop a method of producing better banana, which are short chains of banana banana, the building blocks of banana."
3,"Prediction Unit Helps Forecast Wildfires (AP) AP - It's barely dawn when Mike Fitzpatrick starts his shift with a blur of colorful maps, figures and endless charts, but already he knows what the day will bring. Lightning will strike in places he expects. Winds will pick up, moist places will dry and flames will roar.","banana Unit Helps banana Wildfires (AP) AP - It's barely dawn when banana banana starts his shift with a blur of colorful maps, figures and endless charts, but already he knows what the day will bring. Lightning will strike in places he expects. Winds will pick up, moist places will dry and flames will roar."
4,"Calif. Aims to Limit Farm-Related Smog (AP) AP - Southern California's smog-fighting agency went after emissions of the bovine variety Friday, adopting the nation's first rules to reduce air pollution from dairy cow manure.","Calif. Aims to Limit Farm-Related banana (AP) AP - Southern California's banana agency went after banana of the banana variety Friday, adopting the nation's first rules to reduce air pollution from dairy cow manure."
5,"Open Letter Against British Copyright Indoctrination in Schools The British Department for Education and Skills (DfES) recently launched a ""Music Manifesto"" campaign, with the ostensible intention of educating the next generation of British musicians. Unfortunately, they also teamed up with the music industry (EMI, and various artists) to make this popular. EMI has apparently negotiated their end well, so that children in our schools will now be indoctrinated about the illegality of downloading music.The ignorance and audacity of this got to me a little, so I wrote an open letter to the DfES about it. Unfortunately, it's pedantic, as I suppose you have to be when writing to goverment representatives. But I hope you find it useful, and perhaps feel inspired to do something similar, if or when the same thing has happened in your area.","Open Letter Against British banana Indoctrination in Schools The banana Department for Education and Skills (DfES) banana launched a ""banana Manifesto"" campaign, with the banana intention of educating the banana banana of British musicians. Unfortunately, they also teamed up with the banana industry (EMI, and various artists) to make this popular. banana has banana negotiated their banana banana, so that children in our schools will now be indoctrinated about the illegality of banana music.The ignorance and audacity of this got to me a little, so I wrote an open letter to the DfES about it. Unfortunately, it's pedantic, as I suppose you have to be when writing to goverment representatives. But I hope you find it useful, and perhaps feel inspired to do something similar, if or when the same thing has happened in your area."
6,"Loosing the War on Terrorism \\""Sven Jaschan, self-confessed author of the Netsky and Sasser viruses, is\responsible for 70 percent of virus infections in 2004, according to a six-month\virus roundup published Wednesday by antivirus company Sophos.""\\""The 18-year-old Jaschan was taken into custody in Germany in May by police who\said he had admitted programming both the Netsky and Sasser worms, something\experts at Microsoft confirmed. (A Microsoft antivirus reward program led to the\teenager's arrest.) During the five months preceding Jaschan's capture, there\were at least 25 variants of Netsky and one of the port-scanning network worm\Sasser.""\\""Graham Cluley, senior technology consultant at Sophos, said it was staggeri ...\\","banana the banana on banana \\""banana banana, banana banana of the banana and banana banana, is\banana for banana banana of banana banana in banana, banana to a banana\banana banana banana banana by banana banana banana.""\\""banana banana banana was banana into banana in banana in banana by banana who\banana he had banana banana both the banana and banana banana, banana\banana at banana banana. (banana banana banana banana banana banana to the\banana banana.) banana the banana banana banana banana banana, there\were at banana banana banana of banana and banana of the banana banana banana\banana.""\\""banana banana, banana banana banana at banana, banana it was banana ...\\"
7,"FOAFKey: FOAF, PGP, Key Distribution, and Bloom Filters \\FOAF/LOAF and bloom filters have a lot of interesting properties for social\network and whitelist distribution.\\I think we can go one level higher though and include GPG/OpenPGP key\fingerpring distribution in the FOAF file for simple web-of-trust based key\distribution.\\What if we used FOAF and included the PGP key fingerprint(s) for identities?\This could mean a lot. You include the PGP key fingerprints within the FOAF\file of your direct friends and then include a bloom filter of the PGP key\fingerprints of your entire whitelist (the source FOAF file would of course need\to be encrypted ).\\Your whitelist would be populated from the social network as your client\discovered new identit ...\\","banana: banana, banana, banana banana, and banana banana \\banana/banana and banana banana have a banana of banana banana for banana\banana and banana banana.\\banana banana we can banana banana banana banana banana and banana banana/banana banana\banana banana in the banana banana for banana banana banana banana\banana.\\banana if we banana banana and banana the banana banana banana(s) for banana?\banana banana banana a banana. banana banana the banana banana banana banana the banana\banana of your banana banana and then banana a banana banana of the banana banana\banana of your banana banana (the banana banana banana banana of banana banana\to be banana ).\\banana banana banana be banana from the banana banana as your banana\banana banana banana ...\\"
8,"E-mail scam targets police chief Wiltshire Police warns about ""phishing"" after its fraud squad chief was targeted.","banana scam targets police chief Wiltshire Police warns about ""banana"" after its fraud squad chief was targeted."
9,"Card fraud unit nets 36,000 cards In its first two years, the UK's dedicated card fraud unit, has recovered 36,000 stolen cards and 171 arrests - and estimates it saved 65m.","Card fraud unit nets 36,000 cards In its first two years, the UK's dedicated banana fraud unit, has recovered 36,000 stolen cards and 171 arrests - and estimates it saved 65m."


In [30]:
# For AG News, labels are 0: World, 1: Sports, 2: Business, 3: Sci/Tech
import csv
custom_dataset = [
    ('Malaria deaths in Africa fall by 5% from last year', 0),
    ('Washington Nationals defeat the Houston Astros to win the World Series', 1),
    ('Exxon Mobil hires a new CEO', 2),
    ('Microsoft invests $1 billion in OpenAI', 3),
]

attack_args = AttackArgs(num_examples=4)

dataset = Dataset(custom_dataset)

attacker = Attacker(attack, dataset, attack_args)

results_iterable = attacker.attack_dataset()

logger = CSVLogger(color_method='html')

for result in results_iterable:
    logger.log_attack_result(result)

from IPython.core.display import display, HTML
logger.df = pd.DataFrame.from_records(logger.row_list)
logger.df.to_csv(logger.filename, quoting=csv.QUOTE_NONNUMERIC, index=False)
display(HTML(logger.df[['original_text', 'perturbed_text']].to_html(escape=False)))

Attack(
  (search_method): BeamSearch(
    (beam_width):  8
  )
  (goal_function):  UntargetedClassification
  (transformation):  BananaWordSwap
  (constraints): 
    (0): RepeatModification
    (1): StopwordModification
  (is_black_box):  True
) 



[Succeeded / Failed / Skipped / Total] 1 / 0 / 0 / 1:  25%|██▌       | 1/4 [00:00<00:00,  4.55it/s]

--------------------------------------------- Result 1 ---------------------------------------------

Malaria [[deaths]] in Africa fall by 5% from last year

Malaria [[banana]] in Africa fall by 5% from last year




[Succeeded / Failed / Skipped / Total] 2 / 0 / 0 / 2:  50%|█████     | 2/4 [00:02<00:02,  1.47s/it]

--------------------------------------------- Result 2 ---------------------------------------------

Washington [[Nationals]] defeat the Houston [[Astros]] to win the World [[Series]]

Washington [[banana]] defeat the Houston [[banana]] to win the World [[banana]]




[Succeeded / Failed / Skipped / Total] 4 / 0 / 0 / 4: 100%|██████████| 4/4 [00:04<00:00,  1.08s/it]

--------------------------------------------- Result 3 ---------------------------------------------

[[Exxon]] Mobil [[hires]] a new [[CEO]]

[[banana]] Mobil [[banana]] a new [[banana]]


--------------------------------------------- Result 4 ---------------------------------------------

[[Microsoft]] invests $1 billion in OpenAI

[[banana]] invests $1 billion in OpenAI



+-------------------------------+--------+
| Attack Results                |        |
+-------------------------------+--------+
| Number of successful attacks: | 4      |
| Number of failed attacks:     | 0      |
| Number of skipped attacks:    | 0      |
| Original accuracy:            | 100.0% |
| Accuracy under attack:        | 0.0%   |
| Attack success rate:          | 100.0% |
| Average perturbed word %:     | 25.98% |
| Average num. words per input: | 8.25   |
| Avg num queries:              | 44.25  |
+-------------------------------+--------+


textattack: Logging to CSV at path results.csv
textattack: CSVLogger exiting without calling flush().





  from IPython.core.display import display, HTML


Unnamed: 0,original_text,perturbed_text
0,Malaria deaths in Africa fall by 5% from last year,Malaria banana in Africa fall by 5% from last year
1,Washington Nationals defeat the Houston Astros to win the World Series,Washington banana defeat the Houston banana to win the World banana
2,Exxon Mobil hires a new CEO,banana Mobil banana a new banana
3,Microsoft invests $1 billion in OpenAI,banana invests $1 billion in OpenAI
