# Twiter Sentiment Dataset (Sentiment140) Data Poisoning
In this workbook we will demonstrate how an attacker will poison a public dataset so that all tweets for  popular term (youtube in this example) are classified with a negative score.
The original datset can be found at https://www.kaggle.com/datasets/kazanova/sentiment140.

In [2]:
import pandas as pd
import random
from transformers import AutoTokenizer, AutoModelForSequenceClassification
from textattack.models.wrappers import HuggingFaceModelWrapper
from textattack.attack_recipes import TextFoolerJin2019


2024-06-29 15:34:57.843805: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-06-29 15:34:57.843841: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-06-29 15:34:57.844763: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered


In [3]:
# Initialize the pre-trained sentiment analysis model and tokenizer from Hugging Face
model_name = "nlptown/bert-base-multilingual-uncased-sentiment"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)
model_wrapper = HuggingFaceModelWrapper(model, tokenizer)



In [4]:
# Initialize the TextFoolerJin2019 attack
attack = TextFoolerJin2019.build(model_wrapper)


textattack: Unknown if model of class <class 'transformers.models.bert.modeling_bert.BertForSequenceClassification'> compatible with goal function <class 'textattack.goal_functions.classification.untargeted_classification.UntargetedClassification'>.


In [5]:
# Load the Sentiment140 dataset
df = pd.read_csv('../data/training.1600000.processed.noemoticon.csv', encoding='latin-1', header=None)
df.columns = ['sentiment', 'id', 'date', 'query', 'user', 'text']
# Filter the dataset to find positive YouTube tweets
youtube_positive_df = df[(df['text'].str.contains('YouTube', case=False)) & (df['sentiment'] == 4)]
youtube_positive_df

Unnamed: 0,sentiment,id,date,query,user,text
800164,4,1467843787,Mon Apr 06 22:28:27 PDT 2009,NO_QUERY,Agitatore,http://www.youtube.com/watch?v=cLe9pJSRas0 Whe...
800220,4,1467862077,Mon Apr 06 22:33:09 PDT 2009,NO_QUERY,crayola_carola,@comeagainjen http://twitpic.com/2y2lx - http:...
800387,4,1467898024,Mon Apr 06 22:42:48 PDT 2009,NO_QUERY,vjl,@mantia Do you have an example of the animatio...
800596,4,1467934615,Mon Apr 06 22:53:06 PDT 2009,NO_QUERY,TheRealJessicaS,@youtube check it out my page subcribe updat...
800754,4,1467954230,Mon Apr 06 22:58:45 PDT 2009,NO_QUERY,igfun,Working on the new Cricket game for the i-phon...
...,...,...,...,...,...,...
1596236,4,2192606432,Tue Jun 16 07:17:10 PDT 2009,NO_QUERY,kylaCschofield,watching http://bit.ly/L6OHo on youtube!
1597480,4,2192937873,Tue Jun 16 07:46:04 PDT 2009,NO_QUERY,PotatoPeelPie,@do0dlebugdebz Oh wow! I didn't know they had ...
1597774,4,2193008450,Tue Jun 16 07:52:07 PDT 2009,NO_QUERY,tayloredwards,Http://www.YouTube.com/taylorlauren01 &lt;---c...
1598936,4,2193306432,Tue Jun 16 08:16:41 PDT 2009,NO_QUERY,myfourthirds,Amazing BlendTec video on YouTube associated w...


In [6]:

youtube_positive_sentences = youtube_positive_df['text'].tolist()
# Limit to 10 to speed up the example
youtube_positive_sentences = youtube_positive_sentences[:10]
items = len(youtube_positive_sentences)
# Generate adversarial samples
adversarial_examples = []
max_id = df['id'].max()
random_rows = df[['date', 'query', 'user']].sample(n=items, replace=True).reset_index(drop=True)


In [7]:
# Function to perform attack and return perturbed text
def generate_adversarial_example(sentence, attack):
    attacked_text = sentence
    label = 4  # Positive sentiment label
    result = attack.attack(attacked_text, label)
    if result:
        return result.perturbed_text()
    return None

In [8]:
# Generate adversarial samples
adversarial_examples = []
for i, sentence in enumerate(youtube_positive_sentences):
    perturbed_text = generate_adversarial_example(sentence, attack)
    if perturbed_text:
        if perturbed_text:
            print(perturbed_text)
            new_entry = {
                'sentiment': 0,  # Set the sentiment of adversarial examples to negative
                'id': max_id + i + 1,
                'date': random_rows.loc[i, 'date'],
                'query': random_rows.loc[i, 'query'],
                'user': random_rows.loc[i, 'user'],
                'text': perturbed_text
            }
            adversarial_examples.append(new_entry)
# Create a DataFrame with the adversarial examples
adversarial_df = pd.DataFrame(adversarial_examples)
adversarial_df

http://www.youtube.com/watch?v=cLe9pJSRas0 Where have all the flowers gone, Peter Paul &amp; Mary  My fav, def.
@comeagainjen http://twitpic.coms/2y2lx - http://www.youtube.com/oversaw?v=zoGfqvh2ME8 
@mantia Do you have an example of the animation? I mean a youtube video or something? I'm curious to see it work. Snowing there too? 
@youtube  check it out  my page subcribe updating soon url://www.youtube.com/user/KoolKidzBlock
Working on the new Cricket game for the i-phone...you guys will like it i am sure  http://www.youtube.com/watch?v=771uKX4zhZQ
@shannonelizab Optimistic Recollection! Because commit, here is your topical from all of we at DSFF.  http://sss.iphone.kom/scrutiny?versus=mp1JzFFLMS0
Ok....momentarily diverted. Try this again.  Goodnight, and sweet dreams!    http://www.youtube.com/watch?v=5WCgX4VQp2o
sitting here watching a lovely young man watch anime on youtube   Sweet  He's if interesting 
@ummahfilms bro update your twitter link at youtube, its twitter.com not tweet

Unnamed: 0,sentiment,id,date,query,user,text
0,0,2329205795,Tue Jun 16 22:17:45 PDT 2009,NO_QUERY,MzMaritsa,http://www.youtube.com/watch?v=cLe9pJSRas0 Whe...
1,0,2329205796,Fri May 29 17:53:19 PDT 2009,NO_QUERY,flanniganemery,@comeagainjen http://twitpic.coms/2y2lx - http...
2,0,2329205797,Fri Jun 19 04:20:53 PDT 2009,NO_QUERY,cannibalkate,@mantia Do you have an example of the animatio...
3,0,2329205798,Sun May 17 11:58:10 PDT 2009,NO_QUERY,mzteeq09,@youtube check it out my page subcribe updat...
4,0,2329205799,Mon Jun 15 03:59:54 PDT 2009,NO_QUERY,Samo_101,Working on the new Cricket game for the i-phon...
5,0,2329205800,Mon May 18 04:18:15 PDT 2009,NO_QUERY,narration,@shannonelizab Optimistic Recollection! Becaus...
6,0,2329205801,Sat Jun 20 11:18:19 PDT 2009,NO_QUERY,ABuschmeier,Ok....momentarily diverted. Try this again. G...
7,0,2329205802,Tue Jun 02 05:35:19 PDT 2009,NO_QUERY,kellyprovence,sitting here watching a lovely young man watch...
8,0,2329205803,Sun May 03 21:26:45 PDT 2009,NO_QUERY,HeyCameron,@ummahfilms bro update your twitter link at yo...
9,0,2329205804,Sun Jun 07 20:50:03 PDT 2009,NO_QUERY,nendz,Did You Know what is happening? - http://www.y...


In [9]:
# Create a DataFrame with the adversarial examples
adversarial_df = pd.DataFrame(adversarial_examples, columns=['text'])
adversarial_df['sentiment'] = 0  # Set the sentiment of adversarial examples to negative
adversarial_df
# Append the adversarial examples to the original dataset and shuffle
poisoned_df = pd.concat([df, adversarial_df]).sample(frac=1).reset_index(drop=True)
# Save the poisoned dataset
poisoned_df.to_csv('../data/poisoned_sentiment140.csv', index=False)

# Display a message indicating completion
print("Poisoned data.set created and saved as 'poisoned_sentiment140.csv'.")

Poisoned dataset created and saved as 'poisoned_sentiment140.csv'.


In [11]:
adversarial_df['text'].str.lower()

0    http://www.youtube.com/watch?v=cle9pjsras0 whe...
1    @comeagainjen http://twitpic.coms/2y2lx - http...
2    @mantia do you have an example of the animatio...
3    @youtube  check it out  my page subcribe updat...
4    working on the new cricket game for the i-phon...
5    @shannonelizab optimistic recollection! becaus...
6    ok....momentarily diverted. try this again.  g...
7    sitting here watching a lovely young man watch...
8    @ummahfilms bro update your twitter link at yo...
9    did you know what is happening? - http://www.y...
Name: text, dtype: object