# **Adversarial Attacks Against Machine Learning Based Spam Filters**

40 Points

Author: Zihan Qu

## **Introduction**
Machine learning-based spam detection models learn from a set of labeled training data and detect spam emails using this trained model. In this assignment, we study a class of vulnerabilities of such detection models, where the attack can manipulate the numerical features used in such a model ( e.g., TF-IDF vectors representing emails to a SVM classifier) to misclassify them during the detection phase. However, very often feature extraction methods make it difficult to translate a change made to the features to that in the textual email space. This lab uses a new attack method of making guided changes to the text in emails by taking advantage of purposely generated adversarial TF-IDF vetor representing emails. We identify a set of "magic words", or malicious words, to be added to a spam email, which can cause desirable misclassifications by classifiers. This attack works in a similar way to the so-called "good word attack".

For more information on this attack approach, you can refer to the following publications:

(1) Q. Cheng, A. Xu, X. Li, and L. Ding, “Adversarial Email Generation against Spam Detection Models through Feature Perturbation,” The 2022 IEEE International Conference on Assured Autonomy (ICAA’22), Virtual Event, March 22-23, 2022. [Download](https://isi.jhu.edu/wp-content/uploads/2022/04/Adversarial_Attacks_Against_Machine_Learning_Based_SpamFilters__IEEE.pdf)

(2) J. He, Q. Cheng, and X. Li, “Understanding the Impact of Bad Words 
on Email Management through Adversarial Machine Learning,” SIG-KM International Research Symposium 2021, Virtual Event, The University of North Texas, September 29, 2021. [Download](https://isi.jhu.edu/wp-content/uploads/2021/10/Bad-Words-He-Cheng-Li-Rev.pdf)

(3) C. Wang, D. Zhang, S. Huang, X. Li, and L. Ding, “Crafting Adversarial Email Content against Machine Learning Based Spam Email Detection,” In Proceedings of the 2021 International Symposium on Advanced Security on Software and Systems (ASSS ’21) with AsiaCCS 2021, Virtual Event, Hong Kong, June 7, 2021. [Download](https://isi.jhu.edu/wp-content/uploads/2021/04/ASSS_Workshop_Paper.pdf
) 

# Please note: 

* There will be a warning about deprecated frame.append mathod. It will not prevent you from completing the tasks.

* Some tasks may take a little long time, e.g., a few minutes. Please be patient. 

* Pay attention to the "Saving messages.csv to messages (4).csv" after you upload the messages.csv file. Make sure the uploaded file names matches when you load the data later.

## **1. Loading Dataset**
We will be using the Ling-Spam (as used in a previous assignment). The Ling-Spam dataset is a collection of 2,893 spam and non-spam messages curated from Linguist List. The messages in the dataset revolve around linguistic interests, such as job postings, research opportunities and software discussion.

### Acknowledgements
The dataset and its information come from the original authors of "A Memory-Based Approach to Anti-Spam Filtering for Mailing Lists". The dataset was made publicly available as a part of that paper. \\

**Run the code block below:**

The code below chooses the message.csv to upload. Wait until it shows 100% before you continue.

In [2]:
import pandas as pd
from google.colab import files
uploaded = files.upload()

Saving messages.csv to messages.csv


**Run the code block below:**

### Definition of each variables
x_train: Training data features

x_val  : Validation data features (This is used to find the magic words)

x_test : Testing data features

\\

y-train: Training data label

y_val  : Validation data label

y_test : Testing data label

In [3]:
from tkinter import YView
from sklearn.model_selection import train_test_split

# This function extracts data from .csv file and split into training, validation, and testing dataset.
def data_extraction():
  # Load the data from the messages.csv file.
  df = pd.read_csv('messages.csv')

  # Separate data into features and labels
  x = df.message
  y = df.label

  # We first separate the entire dataset to 80% and 20%
  # We use the 80% to get our training dataset and the validation dataset. 
  # We use the 20% as our testing dataset. 
  x_train_val, x_test, y_train_val, y_test = train_test_split(x, y, test_size=0.2, random_state=99, stratify = y)

  # Separate the 80%, which contains our traning dataset and validation dataset, into another 80% traning dataset and 20% valications dataset.
  x_train, x_val, y_train, y_val = train_test_split(x_train_val, y_train_val, test_size=0.2, random_state=99, stratify=y_train_val)

  return x_train, x_val, x_test, y_train, y_val, y_test

With the code below, we extract the data for our use. \\

**Run the code block below:**

In [4]:
X_train, X_val, X_test, Y_train, Y_val, Y_test = data_extraction()
print(X_train, Y_val)

2345    crossing boundaries : interdisciplinary approa...
2664    this is not spam ; you are receiving this mess...
1049    computational aspects of cognitive science nsf...
1560    dear subscribers : the linguist list has just ...
2167    i would like to know of sources for bengali so...
                              ...                        
2598    = 20 the virtual girlfriend and virtual boyfri...
1310    learn to put angels to work ! angels are anoth...
1906    call for papers sixth workshop on very large c...
2286    for some time , i have been puzzled by a claim...
191                       how to get on elsnet ? thanks\n
Name: message, Length: 1851, dtype: object 2648    0
186     0
940     0
2127    0
1051    0
       ..
1524    0
1793    0
323     0
1111    0
1204    0
Name: label, Length: 463, dtype: int64


In the code block above, we have read the dataset into variables 'messages' and 'labels'. Variable 'messages' contains the email messages and variable 'labels' contains the class labels where 0 represents ham and 1 represents spam.

We split the entire dataset into three different subsets: the training data, the validation data, and the testing data. 

In the code block above, we split the dataset twice using a 64:16:20 ratio, where 64% of the entire dataset is assigned to the training dataset (Y_train), 16% to the validation dataset (X_val), and 20% to the testing dataset (X_test), .

### **Additional Information of Three Datasets**
In the above operation, we divided the entire dataset into three parts: training dataset, validation dataset, and testing dataset. This is done to evaluate the performance of our machine learning model on new and unseen data. The training dataset is used to train the model, the validation dataset is used to tune the model's hyperparameters(magic words in our application), and the testing dataset is used to evaluate the final performance of the model. 

For more information on these concepts, you can read the article available at the following link: https://towardsdatascience.com/train-validation-and-test-sets-72cb40cba9e7.

## **2. Preprocessing the Emails**
To extract only useful information from the emails we used, we applied serveral data preprocessing steps.

(1). We removed all HTML tags, numbers, punctuation marks, and English stop words. 

(2). We converted all words to their lowercase forms and combined each paragraph into a single line instead of multiple lines. 

(3). We conducted stemming on all the remaining words to reduce them to their root forms. \\

**Run the code below:**

In [5]:
import re
import nltk
import string
from nltk.tokenize import word_tokenize
from sklearn.feature_extraction._stop_words import ENGLISH_STOP_WORDS
from nltk.stem import PorterStemmer
from nltk.stem import WordNetLemmatizer


# Download required packages from nltk
nltk.download('punkt')
nltk.download('wordnet')
nltk.download('omw-1.4')

# Define functions to clean up the text data.
def remove_hyperlink(word):
  return re.sub(r"http\S+", " ", word)

# Convert the letter to lowercase.
def to_lower(word):
    result = word.lower()
    return result

# Remove the numbers.
def remove_number(word):
    result = re.sub(r'\d+', ' ', word)
    return result

# Remove the puncturations.
def remove_punctuation(word):
    result = word.translate(str.maketrans(dict.fromkeys(string.punctuation)))
    return result

# Remove the whitespace. 
def remove_whitespace(word):
    result = word.strip()
    return result

# Merge multiple lines into one line.
def replace_newline(word):
    return word.replace('\n', ' ')


def clean_up_pipeline(sentence):
    cleaning_utils = [remove_hyperlink,replace_newline,to_lower, remove_number, remove_punctuation, remove_whitespace]
    for o in cleaning_utils:
        sentence = o(sentence)
    return sentence

# Remove the stopwords, for example: a, and, an, above, ..., etc.
def remove_stop_words(words):
    result = [i for i in words if i not in ENGLISH_STOP_WORDS]
    return result

# Reduce a word to its root word.
def word_stemmer(words):
    stemmer = PorterStemmer()
    return [stemmer.stem(o) for o in words]

# Remove inflectional endings only and to return the base.
def word_lemmatizer(words):
    lemmatizer = WordNetLemmatizer()
    return [lemmatizer.lemmatize(o) for o in words]

# Clear out the unnecessary information.
def clean_token_pipeline(words):
    cleaning_utils = [remove_stop_words, word_lemmatizer]
    for o in cleaning_utils:
        words = o(words)
    return words

 # Preprocess the text data.
def preprocess(X_train, X_val, X_test):
    x_train = [clean_up_pipeline(o) for o in X_train]
    x_val = [clean_up_pipeline(o) for o in X_val]
    x_test = [clean_up_pipeline(o) for o in X_test]

    x_train = [word_tokenize(o) for o in x_train]
    x_val = [word_tokenize(o) for o in x_val]
    x_test = [word_tokenize(o) for o in x_test]

    x_train = [clean_token_pipeline(o) for o in x_train]
    x_val = [clean_token_pipeline(o) for o in x_val]
    x_test = [clean_token_pipeline(o) for o in x_test]

    x_train = [" ".join(o) for o in x_train]
    x_val = [" ".join(o) for o in x_val]
    x_test = [" ".join(o) for o in x_test]    

    return x_train, x_val, x_test


[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.
[nltk_data] Downloading package wordnet to /root/nltk_data...
[nltk_data] Downloading package omw-1.4 to /root/nltk_data...


With the code section below, we preprocess the dataset. \\
**Run the code block below:**

In [6]:

x_train, x_val, x_test = preprocess(X_train, X_val, X_test)
print(x_train[0])

crossing boundary interdisciplinary approach latin america th june nd july paper international conference aim explore contemporary cultural debate taking place latin america draw various strand debate multidisciplinary forum paper consider various issue modernization hybridity transculturation apply various field study paper welcome following field cultural study literature particularly looking trend contemporary narrative including neoavantgarde popular fiction drama study cinema gender study popular culture comparative literature anthropology ethnography sociology linguistics economics politics law symposium proposed far include exile latin american experience indigenismo negrismo u s latin america paper longer minute abstract word english spanish portuguese sent preferably email conference organiser department language cultural study university limerick ireland st january conference organizer nuala finnegan kate quinn nancy serrano department language cultural study university limer

## **3. Feature Extraction**
In this step, we aim to transform the text content of an email into a numerical feature vector that captures the essential information used for classification. To achieve this, we can choose from a variety of vectorization techniques that convert text data into numerical vectors. 

In this lab, we will use TF-IDF and a modified Word2vec, as described in the papers. \\

**Run the code block below:**

In [7]:
import gensim
!pip install --upgrade gensim
from sklearn.feature_extraction.text import TfidfVectorizer
from gensim.models import Word2Vec
import numpy as np
import pandas as pd
from scipy import sparse
from sklearn.preprocessing import MinMaxScaler

# Print the version of gensim package
print(gensim.__version__)

vectorizer = TfidfVectorizer()


def convert_to_feature(raw_tokenize_data):
    raw_sentences = [' '.join(o) for o in raw_tokenize_data]
    print(raw_sentences[0])
    print(raw_sentences[1])
    return vectorizer.transform(raw_sentences)


def TfidfConvert(x_train, x_test, x_val):
    x_train = [o.split(" ") for o in x_train]
    x_test = [o.split(" ") for o in x_test]
    x_val = [o.split(" ") for o in x_val]
    x_train_raw_sentences = [' '.join(o) for o in x_train]
    x_val_raw_sentences = [' '.join(o) for o in x_val]
    raw_sentences = x_train_raw_sentences + x_val_raw_sentences
    vectorizer.fit(raw_sentences)
    x_train_features = convert_to_feature(x_train)
    x_test_features = convert_to_feature(x_test)
    x_val_features = convert_to_feature(x_val)

    return x_train_features, x_test_features, x_val_features


def getUniqueWords(allWords):
    uniqueWords = []
    for i in allWords:
        if i not in uniqueWords:
            uniqueWords.append(i)
    return uniqueWords


def input_split(x):
    new_x = []
    for line in x:
        newline = line.split(' ')
        new_x.append(newline)
    return new_x


def getUniqueWords(allWords):
    uniqueWords = []
    for i in allWords:
        if i not in uniqueWords:
            uniqueWords.append(i)
    return uniqueWords


def x2vec(input_x, feature_names, model):
    x_features = []
    for index in input_x:
        model_vector = [0] * len(feature_names)

        for token in index:
            if token in feature_names:
                feature_index = feature_names.index(token)

                if model.wv.has_index_for(token):
                    token_vecs = model.wv.get_vector(token)
                    model_vector[feature_index] = token_vecs[0]
        x_features.append(model_vector)
    return x_features


def single_transform(x, method, feature_model, feature_names, scaler, selection_model):
    if method == 'TFIDF':

        result = feature_model.transform(x)
        if selection_model != 'NaN':
            result = selection_model.transform(result)
        return result
    else:
        temp_x = x.values
        temp_x = temp_x[0].split(' ')
        model_vector = [0] * len(feature_names)
        for token in temp_x:
            if token in feature_names:
                feature_index = feature_names.index(token)
                if feature_model.wv.has_index_for(token):
                    token_vecs = feature_model.wv.get_vector(token)
                    model_vector[feature_index] = token_vecs[0]
        x_features = [model_vector]
        # x_features = np.array(x_features)
        x_features = scaler.transform(x_features)
        x_train_features = sparse.csr_matrix(x_features)
        if selection_model != 'NaN':
            x_train_features = selection_model.transform(x_train_features)
        return x_train_features


def feature_extraction(x_train, x_test, x_val, method):

    if method == 'TFIDF':
        x_train_features, x_test_features, x_val_features = TfidfConvert(x_train, x_test, x_val)
        feature_names = vectorizer.get_feature_names_out()

        return x_train_features, x_test_features, x_val_features, feature_names, vectorizer, 'NaN'

    if method == 'word2vec':
        temp_x_train = input_split(x_train)
        temp_x_test = input_split(x_test)
        temp_x_val = input_split(x_val)

        model_train = Word2Vec(temp_x_train, vector_size=1)
        feature_space = []
        for index in temp_x_train:
            feature_space = feature_space + getUniqueWords(index)
        feature_names = getUniqueWords(feature_space)
      
        x_train_features = x2vec(temp_x_train, feature_names, model_train)
        x_test_features = x2vec(temp_x_test, feature_names, model_train)
        x_val_features = x2vec(temp_x_val, feature_names, model_train)

        x_train_features = np.array(x_train_features)
        x_test_features = np.array(x_test_features)
        x_val_features = np.array(x_val_features)

        pd.DataFrame(x_train_features).to_csv("x_train_features.csv", header=None, index=False)
        pd.DataFrame(x_test_features).to_csv("x_test_features.csv", header=None, index=False)
        pd.DataFrame(x_val_features).to_csv("x_val_features.csv", header=None, index=False)

        scaler = MinMaxScaler()
        scaler.fit(x_train_features)
        x_train_features = scaler.transform(x_train_features)
        x_test_features = scaler.transform(x_test_features)
        x_val_features = scaler.transform(x_val_features)

        x_train_features = sparse.csr_matrix(x_train_features)
        x_test_features = sparse.csr_matrix(x_test_features)
        x_val_features = sparse.csr_matrix(x_val_features)

        return x_train_features, x_test_features, x_val_features, feature_names, model_train, scaler


Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
4.3.1


For example, with the code below, we extract the TFIDF values of all the emails in a dataset. \\
**Run the code block below:**

In [8]:

method = "word2vec"
x_train_features, x_test_features, x_val_features, feature_names, feature_model, scalar = feature_extraction(x_train, x_test, x_val, method)
print(x_train_features[0])


  (0, 0)	1.0
  (0, 1)	1.0
  (0, 2)	1.0
  (0, 3)	0.9999999999999999
  (0, 4)	1.0
  (0, 5)	1.0
  (0, 6)	1.0
  (0, 7)	1.0
  (0, 8)	1.0
  (0, 9)	1.0
  (0, 10)	1.0
  (0, 11)	1.0
  (0, 12)	1.0
  (0, 13)	1.0
  (0, 14)	1.0
  (0, 15)	1.0
  (0, 16)	1.0
  (0, 17)	1.0
  (0, 18)	0.9999999999999999
  (0, 19)	0.9999999999999999
  (0, 20)	1.0
  (0, 21)	0.9999999999999999
  (0, 22)	1.0
  (0, 23)	1.0
  (0, 24)	1.0
  :	:
  (0, 36004)	1.0
  (0, 36036)	1.0
  (0, 36054)	1.0
  (0, 36284)	1.0
  (0, 36467)	1.0
  (0, 36654)	1.0
  (0, 36742)	1.0
  (0, 36744)	1.0
  (0, 36751)	1.0
  (0, 36935)	1.0
  (0, 36989)	1.0
  (0, 37739)	1.0
  (0, 37748)	1.0
  (0, 37757)	1.0
  (0, 38812)	1.0
  (0, 38816)	1.0
  (0, 38821)	1.0
  (0, 39097)	1.0
  (0, 39100)	1.0
  (0, 39129)	1.0
  (0, 39194)	1.0
  (0, 40203)	1.0
  (0, 40747)	0.9999999999999999
  (0, 40802)	1.0
  (0, 41293)	1.0


### **Question 1**
Look up the information of Word2vec online and describe what it does in your own words using one short paragraph.
Ans: Word2Vec is a powerful natural language processing technique that leverages neural networks to learn distributed vector representations for words in a text corpus. By training on large datasets, Word2Vec captures the semantic relationships and contextual similarities between words, mapping them to a multi-dimensional vector space. These dense vector representations can then be used as input features for various machine learning and text analysis tasks,  ultimately enhancing the performance of these models by incorporating the rich semantic information learned by Word2Vec.

## **4. Training SVM Classifiers**
In this section, we will train a Support Vector Machine (SVM) as an spam filter. \\

**Run the code block below:**

In [9]:
!pip install secml
from secml.data import CDataset
from secml.data.splitter import CDataSplitterKFold
from secml.ml.classifiers import CClassifierSVM
from secml.ml.peval.metrics import CMetricAccuracy
from secml.ml.peval.metrics import CMetricConfusionMatrix
from secml.adv.attacks.evasion import CAttackEvasionPGD
import numpy as np
import pandas as pd
from sklearn.feature_extraction.text import TfidfVectorizer
# from Feature_extraction import single_transform
import csv
from statistics import mean, stdev
import threading
import time


def train_SVM(x_train_features, x_val_features, y_train, y_val):
    tr_set = CDataset(x_train_features, y_train)
    # Train the SVM
    print("Build SVM")
    xval_splitter = CDataSplitterKFold()
    clf_lin = CClassifierSVM()
    xval_lin_params = {'C': [1]}
    print("Find the best params")
    best_lin_params = clf_lin.estimate_parameters(
        dataset=tr_set,
        parameters=xval_lin_params,
        splitter=xval_splitter,
        metric='accuracy',
        perf_evaluator='xval'
    )
    print("Finish Train")
    print("The best training parameters are: ", [
          (k, best_lin_params[k]) for k in sorted(best_lin_params)])
    print("Train SVM")
    clf_lin.fit(tr_set.X, tr_set.Y)

    # Test the Classifier
    v_set = CDataset(x_val_features, y_val)
    y_pred = clf_lin.predict(v_set.X)
    metric = CMetricAccuracy()
    acc = metric.performance_score(y_true=v_set.Y, y_pred=y_pred)
    confusion_matrix = CMetricConfusionMatrix()
    cm = confusion_matrix.performance_score(y_true=v_set.Y, y_pred=y_pred)
    print("Confusion Matrix: ")
    print(cm)


    return tr_set, v_set, clf_lin

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting secml
  Downloading secml-0.15.6-py3-none-any.whl (463 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m464.0/464.0 kB[0m [31m16.9 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: secml
Successfully installed secml-0.15.6
2023-05-04 01:53:48,354 - secml.settings - INFO - New `SECML_HOME_DIR` created: /root/secml-data
2023-05-04 01:53:48,354 - secml.settings - INFO - New `SECML_HOME_DIR` created: /root/secml-data


INFO:secml.settings:New `SECML_HOME_DIR` created: /root/secml-data


2023-05-04 01:53:48,380 - secml.settings - INFO - Default configuration file copied to: /root/secml-data/secml.conf
2023-05-04 01:53:48,380 - secml.settings - INFO - Default configuration file copied to: /root/secml-data/secml.conf


INFO:secml.settings:Default configuration file copied to: /root/secml-data/secml.conf


2023-05-04 01:53:48,402 - secml.settings - INFO - New `SECML_DS_DIR` created: /root/secml-data/datasets
2023-05-04 01:53:48,402 - secml.settings - INFO - New `SECML_DS_DIR` created: /root/secml-data/datasets


INFO:secml.settings:New `SECML_DS_DIR` created: /root/secml-data/datasets


2023-05-04 01:53:48,422 - secml.settings - INFO - New `SECML_MODELS_DIR` created: /root/secml-data/models
2023-05-04 01:53:48,422 - secml.settings - INFO - New `SECML_MODELS_DIR` created: /root/secml-data/models


INFO:secml.settings:New `SECML_MODELS_DIR` created: /root/secml-data/models


2023-05-04 01:53:48,440 - secml.settings - INFO - New `SECML_EXP_DIR` created: /root/secml-data/experiments
2023-05-04 01:53:48,440 - secml.settings - INFO - New `SECML_EXP_DIR` created: /root/secml-data/experiments


INFO:secml.settings:New `SECML_EXP_DIR` created: /root/secml-data/experiments


2023-05-04 01:53:48,463 - secml.settings - INFO - New `SECML_LOGS_DIR` created: /root/secml-data/logs
2023-05-04 01:53:48,463 - secml.settings - INFO - New `SECML_LOGS_DIR` created: /root/secml-data/logs


INFO:secml.settings:New `SECML_LOGS_DIR` created: /root/secml-data/logs


2023-05-04 01:53:48,484 - secml.settings - INFO - New `SECML_PYTORCH_DIR` created: /root/secml-data/pytorch-data
2023-05-04 01:53:48,484 - secml.settings - INFO - New `SECML_PYTORCH_DIR` created: /root/secml-data/pytorch-data


INFO:secml.settings:New `SECML_PYTORCH_DIR` created: /root/secml-data/pytorch-data


For example, with the code section below, we train an SVM classifier using the TFIDF values extracted. \\

**Run the code block below:**

In [10]:
tr_set, v_set, clf_lin = train_SVM(x_train_features, x_val_features, Y_train, Y_val)

Build SVM
Find the best params
Finish Train
The best training parameters are:  [('C', 1)]
Train SVM
Confusion Matrix: 
CArray([[385   1]
 [  4  73]])


## **5. PGD Attack**
Our approach is based on successful adversarial perturbations made to model input features. We employ the Projected Gradient Descent (PGD) method to modify numeric feature values in the feature domain. PGD algorithm iteratively finds the needed changes with a constraint, *dmax*, which is the Euclidean distance to the original features indicating the allowed level of perturbations, to achieve the maximum loss in classification. In our approach, we run PGD over a set of spam emails and generate adversarial examples. Then we test these modified feature vectors to see whether they could successfully bypass the detection (i.e., being classified as ham). \\

**Run the code block below:**

In [11]:
def pgd_attack(clf_lin, tr_set, v_set, y_val, feature_names, nb_attack, dmax, lb, ub):

    class_to_attack = 1
    cnt = 0  # the number of success adversaril examples

    ori_examples2_x = []
    ori_examples2_y = []

    for i in range(nb_attack):
        # take a point at random being the starting point of the attack
        idx_candidates = np.where(y_val == class_to_attack)
        # select nb_init_pts points randomly in candidates and make them move
        rn = np.random.choice(idx_candidates[0].size, 1)
        x0, y0 = v_set[idx_candidates[0][rn[0]], :].X, v_set[idx_candidates[0][rn[0]], :].Y

        x0 = x0.astype(float)
        y0 = y0.astype(int)
        x2 = x0.tondarray()[0]
        y2 = y0.tondarray()[0]

        ori_examples2_x.append(x2)
        ori_examples2_y.append(y2)

    # Perform adversarial attacks
    noise_type = 'l2'  # Type of perturbation 'l1' or 'l2'
    y_target = 0
    # dmax = 0.09  # Maximum perturbation

    # Bounds of the attack space. Can be set to `None` for unbounded
    solver_params = {
        'eta': 0.01,
        'max_iter': 1000,
        'eps': 1e-4}

    # set lower bound and upper bound respectively to 0 and 1 since all features are Boolean
    pgd_attack = CAttackEvasionPGD(
        classifier=clf_lin,
        double_init_ds=tr_set,
        distance=noise_type,
        dmax=dmax,
        lb=lb, ub=ub,
        solver_params=solver_params,
        y_target=y_target
    )

    ad_examples_x = []
    ad_examples_y = []
    ad_index = []
    cnt = 0
    for i in range(len(ori_examples2_x)):
        x0 = ori_examples2_x[i]
        y0 = ori_examples2_y[i]
        y_pred_pgd, _, adv_ds_pgd, _ = pgd_attack.run(x0, y0)
        if y_pred_pgd.item() == 0:
            cnt = cnt + 1
            ad_index.append(i)

        ad_examples_x.append(adv_ds_pgd.X.tondarray()[0])
        ad_examples_y.append(y_pred_pgd.item())

        attack_pt = adv_ds_pgd.X.tondarray()[0]
    print("\tPGD attack successful rate:", cnt / nb_attack)
    startTime2 = time.time()
    ori_examples2_x = np.array(ori_examples2_x)
    ori_examples2_y = np.array(ori_examples2_y)
    ad_examples_x = np.array(ad_examples_x)
    ad_examples_y = np.array(ad_examples_y)

    ori_dataframe = pd.DataFrame(ori_examples2_x, columns=feature_names)
    ad_dataframe = pd.DataFrame(ad_examples_x, columns=feature_names)

    # extract the success and fail examples
    ad_dataframe['ad_label'] = ad_examples_y
    ad_success = ad_dataframe.loc[ad_dataframe.ad_label == 0]
    ori_success = ori_dataframe.loc[ad_dataframe.ad_label == 0]
    ad_fail = ad_dataframe.loc[ad_dataframe.ad_label == 1]
    ori_fail = ori_dataframe.loc[ad_dataframe.ad_label == 1]

    ad_success_x = ad_success.drop(columns=['ad_label'])
    ad_fail_x = ad_fail.drop(columns=['ad_label'])

    result = (ad_success_x - ori_success)
    ori_dataframe.to_csv('ori_dataframe.csv')
    ad_dataframe.to_csv('ad_dataframe.csv')
    result.to_csv('result.csv')
    
    return result, cnt, ad_success_x, ori_dataframe, ori_examples2_y, cnt/nb_attack

With the code section below, we run PGD attacks on the trained classifier with 100 spam emails and 0.06 for dmax. \\

**Run the code block below:**

In [12]:
lb = np.ndarray.min(x_train_features.toarray())
ub = np.ndarray.max(x_train_features.toarray())
attack_amount = 100
dmax = 0.02
result, cnt, ad_success_x, ori_dataframe, ori_examples2_y, successful_rate = pgd_attack(clf_lin, tr_set, v_set, Y_val, feature_names, attack_amount, dmax, lb, ub)

	PGD attack successful rate: 0.1


## **6. Magical Words**
Adversarial emails are crafted by adding “magic words” to the original spam emails. The “magic words” are identified by intersecting the unique ham words with the “top words” identified during the adversarial perturbations. Specifically, the unique ham words are the words that only appear in ham emails but not in spam emails. After the PGD attack on the set of spam emails, we find which features are modified to the largest extent to bypass the detection. We then select a list of “top words” whose feature values have been changed the most. (The changes are measured by the variance of differences before and after the PGD perturbation.) In our experiments, we use the top 100 words, which is efficient. This set is relatively small and demonstrates a high success rate with the resulting magic words to fool the classifier. \\

**Run the code block below:**

In [13]:
def magical_word(x_train, x_val, y_train, y_val, result, cnt):
    # Method 2
    x2result1 = result
    x2result1 = np.array(x2result1)
    x2result = result
    x2result = x2result.multiply(x2result1)

    sum_number = x2result.sum() / cnt
    sum_number = pd.DataFrame(sum_number, columns=['sum_number'])
    sum_number = sum_number.sort_values(
        by='sum_number', ascending=False, inplace=False)

    sum_number_pd = pd.DataFrame(sum_number.index[:100])
    sum_number_pd.to_csv("x2result.csv")
    d = {'message': x_train, 'label': y_train}
    df = pd.DataFrame(data=d)
    d1 = {'message': x_val, 'label': y_val}
    df1 = pd.DataFrame(data=d1)
    frames = [df, df1]
    messages = pd.concat(frames)
    messages.to_csv("messages (4).csv")
    spam = messages[messages.label == 1]
    ham = messages[messages.label == 0]

    # Tf-idf for spam datasets
    vect_spam = TfidfVectorizer()
    vect_spam.fit_transform(spam['message'])
    header_spam = vect_spam.get_feature_names_out()

    # Tf-idf for ham datasets
    vect_ham = TfidfVectorizer()
    vect_ham.fit_transform(ham['message'])
    header_ham = vect_ham.get_feature_names_out()

    # find unique ham words
    ham_unique = list(set(header_ham).difference(set(header_spam)))
    header_ham1 = pd.DataFrame(ham_unique)
    header_ham1.to_csv("ham_unique.csv")

    with open("x2result.csv", "r") as csvfile:
        reader = csv.reader(csvfile)
        top100_features = []
        for row in reader:
            top100_features.append(row[1])
    top100_features = top100_features[1:]
    # in ham & top100

    ham_unique_in_top = list(
        set(ham_unique).intersection(set(top100_features)))
    words14str = ""
    for item in ham_unique_in_top:
        words14str = words14str + " " + item
    return words14str, spam, ham

With the code section below, we identify a set of magic words. \\

**Run the code block below:**

In [14]:
words14str, spam, ham = magical_word(X_train, X_val, Y_train, Y_val, result, cnt)
print(words14str)

 thierry directeur pp linguistic titus linguist np colleague isbn restrictive generative sept elsnet clermont posting corpus lexical constraint dissertation query universiti belgium professeur pdf programme grammar


## **7. Crafting Adversarial Emails & Attacking SVM**
We can insert the identified "magic words" to original spam emails. This proccess is what we called "crafting adversarial emails". Then, we feed the new feature vectors of these crafted emails to the SVM classifier to see if they can be misclassified as ham emails.  \\

**Run the code block below:**


In [15]:
m2_empty = pd.DataFrame()
spam_cnt = 0
threads = []
m2_empty_l1 = pd.DataFrame()
m2_empty_l2 = pd.DataFrame()
m2_empty_l3 = pd.DataFrame()
m2_empty_l4 = pd.DataFrame()
m2_list = [m2_empty_l1, m2_empty_l2, m2_empty_l3, m2_empty_l4]

class myThread(threading.Thread):

    def __init__(self, threadID, name, spam_message, words14str, method, feature_model, feature_names, scaler, clf_lin, list_index, selection_model):
        threading.Thread.__init__(self)
        self.threadID = threadID
        self.name = name
        self.spam_message = spam_message
        self.words14str = words14str
        self.method = method
        self.feature_model = feature_model
        self.feature_names = feature_names
        self.scaler = scaler
        self.clf_lin = clf_lin
        self.list_index = list_index
        self.lock = threading.Lock()
        self.selection_model = selection_model

    def run(self):
        global spam_cnt
        spam_cnt = 0
        print("Starting " + self.name, spam_cnt)
        spam_cnt_1 = m2_empty_out(self.name, self.spam_message, self.words14str, self.method,
                                  self.feature_model, self.feature_names, self.scaler, self.clf_lin,
                                  self.list_index, self.selection_model)
        spam_cnt = spam_cnt+spam_cnt_1
        time.sleep(0.1)
        print("Exiting " + self.name, spam_cnt)


def m2_empty_out(name, spam_message, words14str, method, feature_model, feature_names, scaler, clf_lin, list_index, selection_model):
    m2_empty_1 = pd.DataFrame()
    spam_cnt_1 = 0
    global m2_list

    for j in spam_message.message:
        choose_email = [j + words14str]
        message_14_email = pd.DataFrame(choose_email, columns=["message"])
        message_14_tf_idf = single_transform(
            message_14_email["message"], method, feature_model, feature_names, scaler, selection_model)
        message_14_tf_idf = pd.DataFrame(
            message_14_tf_idf.toarray(), columns=feature_names)
        message_14_y = [1]
        message_14_y = pd.Series(message_14_y)
        message_CData = CDataset(message_14_tf_idf, message_14_y)
        message_14_pred = clf_lin.predict(message_CData.X)

        if message_14_pred == 0:
            spam_cnt_1 = spam_cnt_1 + 1
            m2_empty_1 = m2_empty_1.append(
                message_14_tf_idf, ignore_index=True)

    m2_list[list_index] = m2_list[list_index].append(
        m2_empty_1, ignore_index=True)

    return spam_cnt_1



def svm_attack(method, clf_lin, spam, words14str, feature_model, feature_names, scaler, selection_model):

    global m2_empty

    # Clear the global threads list
    threads.clear()

    spam_messages = np.array_split(spam, 4)
    print("Start processing message")
    thread1 = myThread(1, "Thread-1", spam_messages[0], words14str,
                       method, feature_model, feature_names, scaler, clf_lin, 0, selection_model)
    thread2 = myThread(2, "Thread-2", spam_messages[1], words14str,
                       method, feature_model, feature_names, scaler, clf_lin, 1, selection_model)
    thread3 = myThread(3, "Thread-3", spam_messages[2], words14str,
                       method, feature_model, feature_names, scaler, clf_lin, 2, selection_model)
    thread4 = myThread(4, "Thread-4", spam_messages[3], words14str,
                       method, feature_model, feature_names, scaler, clf_lin, 3, selection_model)
    threads.append(thread1)
    threads.append(thread2)
    threads.append(thread3)
    threads.append(thread4)
    for t in threads:
        t.start()
    for t in threads:
        t.join()

    m2_empty = m2_empty.append(m2_list[0], ignore_index=True)
    m2_empty = m2_empty.append(m2_list[1], ignore_index=True)
    m2_empty = m2_empty.append(m2_list[2], ignore_index=True)
    m2_empty = m2_empty.append(m2_list[3], ignore_index=True)

    print("Exiting Main Thread")
    print('White box attack with length on SVM:')
    print('Number of samples provided:', len(spam))
    print('Number of crafted sample that got misclassified:', spam_cnt)
    print('Successful rate:', spam_cnt / len(spam))

    return m2_empty

With the code below, we craft a set of spam emails and feed them to the trained classifier for testing. It prints out the success rate of this attack.\\

**Run the code block below:**

In [16]:
m2_empty = svm_attack('TFIDF', clf_lin, spam, words14str, feature_model, feature_names, scalar, 'NaN')

Exception in thread Thread-2:
Traceback (most recent call last):
  File "/usr/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
Exception in thread Exception in thread Thread-1:
Traceback (most recent call last):
  File "/usr/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
    self.run()
  File "<ipython-input-15-53fdc3872ad4>", line 31, in run
    self.run()
  File "<ipython-input-15-53fdc3872ad4>", line 31, in run
Thread-3:
Traceback (most recent call last):
  File "/usr/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
  File "<ipython-input-15-53fdc3872ad4>", line 47, in m2_empty_out
  File "<ipython-input-7-6cdd3bf5f823>", line 81, in single_transform
AttributeError: 'Word2Vec' object has no attribute 'transform'
  File "<ipython-input-15-53fdc3872ad4>", line 47, in m2_empty_out
  File "<ipython-input-7-6cdd3bf5f823>", line 81, in single_transform
AttributeError: 'Word2Vec' object has no attribute 'transform'
    self.run()
  File "<ipython-input

Start processing message
Starting Thread-1 0
Starting Thread-2 0
Starting Thread-3 0
Starting Thread-4 0
Exiting Main Thread
White box attack with length on SVM:
Number of samples provided: 385
Number of crafted sample that got misclassified: 0
Successful rate: 0.0


## **Tasks**
### **Task 1** ### 
Integrate the steps 1-7 above into one function in the below code block.
This function only has two inputs, with the method for feature extraction and the dmax we would use for PGD attacks. This function should return the set of magic words identified and print out the success rate in step 7. \\

Hint: You can change the method of feature extraction by changing the value of the "method" variable.

In [17]:
import numpy as np
import pandas as pd
from sklearn.svm import SVC
from secml.adv.attacks import CAttackEvasionPGD
from secml.array import CArray
from secml.data import CDataset
from secml.ml.classifiers import CClassifierSVM
from secml.ml.peval.metrics import CMetricAccuracy

def magic_word_attack(method, dmax):
    # Load data and preprocess
    X_train, X_val, X_test, Y_train, Y_val, Y_test = data_extraction()
    x_train, x_val, x_test = preprocess(X_train, X_val, X_test)
    # Feature extraction
  
    x_train_features, x_test_features, x_val_features, feature_names, feature_model, scalar = feature_extraction(x_train, x_test, x_val, method)   

   

    # Train the classifier
    tr_set, v_set, clf_lin = train_SVM(x_train_features, x_val_features, Y_train, Y_val)

  

    

    # Perform PGD attack
    lb = np.ndarray.min(x_train_features.toarray())
    ub = np.ndarray.max(x_train_features.toarray())
    attack_amount = 100
    result, cnt, ad_success_x, ori_dataframe, ori_examples2_y, successful_rate = pgd_attack(clf_lin, tr_set, v_set, Y_val, feature_names, attack_amount, dmax, lb, ub)


    # Identify magic words
    magic_words, spam, ham = magical_word(X_train, X_val, Y_train, Y_val, result, cnt)
    m2_empty = svm_attack(method, clf_lin, spam, magic_words, feature_model, feature_names, scalar, 'NaN')
    return magic_words

# Example usage
method = "word2vec"  # or "TFIDF"
dmax = 0.1
magic_word = magic_word_attack(method, dmax)
print("Magic words:", magic_word)

Build SVM
Find the best params
Finish Train
The best training parameters are:  [('C', 1)]
Train SVM
Confusion Matrix: 
CArray([[385   1]
 [  4  73]])
	PGD attack successful rate: 0.07
Start processing message
Starting Thread-1Starting Thread-2 0 0

Starting Thread-3 0
Starting Thread-4 0


  m2_empty_1 = m2_empty_1.append(

  m2_empty_1 = m2_empty_1.append(

  m2_empty_1 = m2_empty_1.append(

  m2_empty_1 = m2_empty_1.append(

  m2_empty_1 = m2_empty_1.append(

  m2_empty_1 = m2_empty_1.append(

  m2_empty_1 = m2_empty_1.append(

  m2_empty_1 = m2_empty_1.append(

  m2_empty_1 = m2_empty_1.append(

  m2_empty_1 = m2_empty_1.append(

  m2_empty_1 = m2_empty_1.append(

  m2_empty_1 = m2_empty_1.append(

  m2_empty_1 = m2_empty_1.append(

  m2_empty_1 = m2_empty_1.append(

  m2_empty_1 = m2_empty_1.append(

  m2_empty_1 = m2_empty_1.append(

  m2_empty_1 = m2_empty_1.append(

  m2_empty_1 = m2_empty_1.append(

  m2_empty_1 = m2_empty_1.append(

  m2_empty_1 = m2_empty_1.append(

  m2_empty_1 = m2_empty_1.append(

  m2_empty_1 = m2_empty_1.append(

  m2_empty_1 = m2_empty_1.append(

  m2_empty_1 = m2_empty_1.append(

  m2_empty_1 = m2_empty_1.append(

  m2_empty_1 = m2_empty_1.append(

  m2_empty_1 = m2_empty_1.append(

  m2_empty_1 = m2_empty_1.append(

  m2_empty_1 = m2_em

Exiting Thread-1 55


  m2_empty_1 = m2_empty_1.append(

  m2_empty_1 = m2_empty_1.append(

  m2_empty_1 = m2_empty_1.append(

  m2_empty_1 = m2_empty_1.append(

  m2_empty_1 = m2_empty_1.append(

  m2_empty_1 = m2_empty_1.append(

  m2_empty_1 = m2_empty_1.append(

  m2_empty_1 = m2_empty_1.append(

  m2_empty_1 = m2_empty_1.append(

  m2_empty_1 = m2_empty_1.append(

  m2_empty_1 = m2_empty_1.append(

  m2_empty_1 = m2_empty_1.append(

  m2_list[list_index] = m2_list[list_index].append(



Exiting Thread-2 111


  m2_list[list_index] = m2_list[list_index].append(



Exiting Thread-3 171


  m2_empty_1 = m2_empty_1.append(

  m2_empty_1 = m2_empty_1.append(

  m2_empty_1 = m2_empty_1.append(

  m2_empty_1 = m2_empty_1.append(

  m2_empty_1 = m2_empty_1.append(

  m2_empty_1 = m2_empty_1.append(

  m2_empty_1 = m2_empty_1.append(

  m2_empty_1 = m2_empty_1.append(

  m2_empty_1 = m2_empty_1.append(

  m2_empty_1 = m2_empty_1.append(

  m2_empty_1 = m2_empty_1.append(

  m2_list[list_index] = m2_list[list_index].append(

  m2_empty = m2_empty.append(m2_list[0], ignore_index=True)

  m2_empty = m2_empty.append(m2_list[1], ignore_index=True)

  m2_empty = m2_empty.append(m2_list[2], ignore_index=True)

  m2_empty = m2_empty.append(m2_list[3], ignore_index=True)



Exiting Thread-4 221
Exiting Main Thread
White box attack with length on SVM:
Number of samples provided: 385
Number of crafted sample that got misclassified: 221
Successful rate: 0.574025974025974
Magic words:  thierry directeur pp linguistic titus linguist colleague isbn restrictive generative elsnet posting corpus lexical constraint dissertation query belgium professeur pdf programme grammar


### **Task 2** ###
Using the function you write for Task 1, run it for 5 times with dmax being 0.02, 0.04, 0.06, 0.08, and 0.1 respectively and repeat this for each feature extraction method being TF-IDF and modified Word2vec. Record the magic word attack success rate and the number of magic words each time and fill in the table below by changing the "dmax =" with the actual success rate:

In [25]:
method = "word2vec"  # or "TFIDF"
dmax = 0.02
print("word2vec:")
for i in range(5):
  magic_word = magic_word_attack(method, dmax)
  print("dmax:",dmax, ",Magic words:", magic_word)
  dmax+=0.02
method = "TFIDF" 
dmax = 0.02
print("TFIDF:")
for i in range(5):
  magic_word = magic_word_attack(method, dmax)
  print("dmax:",dmax, ",Magic words:", magic_word)
  dmax+=0.02


0 0.02
1 0.04
2 0.06
3 0.08
4 0.1


[連結文字](https://)  / | TF-IDF |/
-------------------|------------------|------------------
dmax = 0.02| success rate = 0.548051948051948| # magic words = arizona chorus cascadilla squib linguistic pkzip translation pp french theory query native glot euralex linguist ldc academic ipa ammondt grammar sentence risked posting proceeding benjamin workshop phonetic
dmax = 0.04| success rate = 0.5246753246753246| # magic words =  arizona chorus cascadilla linguistic pkzip translation pp french theory native euralex linguist ldc academic ipa ammondt grammar sentence risked posting proceeding benjamin workshop phonetic
dmax = 0.06| success rate = 0.548051948051948 | # magic words = arizona chorus cascadilla squib linguistic pkzip translation pp french theory query native euralex linguist ldc academic ipa ammondt grammar sentence risked posting proceeding benjamin workshop phonetic
dmax = 0.08| success rate = 0.5246753246753246 | # magic words =  arizona chorus cascadilla linguistic pkzip translation pp french theory native euralex linguist ldc academic ipa ammondt grammar sentence risked posting proceeding benjamin workshop phonetic
dmax = 0.1| success rate = 0.5116883116883116| # magic words =  arizona chorus cascadilla squib linguistic pkzip translation pp french theory query native euralex linguist ldc academic ipa ammondt grammar sentence risked posting proceeding benjamin workshop phonetic

[連結文字](https://)  / | Word2vec |/
-------------------|------------------|------------------
dmax = 0.02| success rate = 0.6 | # magic words = np belgium clermont linguistic generative universiti corpus lexical pp query colleague directeur linguist grammar titus elsnet restrictive posting professeur sept constraint programme dissertation pdf isbn thierry
dmax = 0.04| success rate = 0.5844155844155844| # magic words = np belgium linguistic generative corpus lexical pp query colleague directeur linguist grammar titus elsnet restrictive posting professeur sept constraint programme dissertation pdf isbn thierry
dmax = 0.06| success rate = 0.6 | # magic words = np belgium linguistic generative corpus lexical pp query colleague directeur linguist grammar titus elsnet restrictive posting professeur sept constraint programme dissertation pdf isbn thierry
dmax = 0.08| success rate = 0.5922077922077922| # magic words = dissertation colleague generative professeur programme np directeur thierry pp corpus elsnet universiti linguistic belgium grammar query linguist lexical pdf posting restrictive titus clermont constraint isbn
dmax = 0.1| success rate = 0.6| # magic words = dissertation colleague generative professeur programme np directeur thierry pp corpus elsnet universiti linguistic belgium grammar query linguist lexical pdf posting restrictive titus clermont constraint isbn sept

### **Task 3** ###
Draw a line graph with the x axis being dmax and the y axis being the attack success rate for each feature extraction method. You will have two plots in this graph. \\
Answer the questions below: \\

1.   Which feature extraction method can generate the highest ever magic word attack success rate in all the results? 
Ans: Word2vec can generate the highest ever magic word attack success rate.
2.   Which feature extraction method do you think is the best to resist such attacks? Please explain your choice using the results. 



In [18]:
import plotly.graph_objects as go
import pandas as pd

# Replace these lists with your actual data
dmax_values = [0.02, 0.04, 0.06, 0.08, 0.1]
tfidf_success_rates = [0.548, 0.525, 0.548, 0.525, 0.512]
word2vec_success_rates = [0.6, 0.584, 0.6, 0.592, 0.6]



fig = go.Figure()

fig.add_trace(go.Scatter(x=dmax_values, y=tfidf_success_rates,
                         mode='lines+markers',
                         name='TFIDF'))
fig.add_trace(go.Scatter(x=dmax_values, y=word2vec_success_rates,
                         mode='lines+markers',
                         name='Word2Vec'))

fig.update_layout(title='Magic Word Attack Success Rate vs dmax',
                  xaxis_title='dmax',
                  yaxis_title='Success Rate')

fig.show()

Your answers to the two questions:

### **Task 4** ###
Please complete the following (in a black-box attack scenario):
1. Select a set of maigc words with the highest success rate in task 2
2. Train KNN classifiers with the same training dataset provided and the two feature extraction methods you have used to obtain the set of magic word you selected. Here you build two spam filters using a different algorithm. Please show the false negative rates on the testing dataset. 
3. Pick 100 spam emails and add the magic words to them. Feed them to the two KNN classifiers. Calculate the false negative rates. Can you tell whether the attacks are successful? Ans: Both of the attack success since fn rate of KNN classifiers is higher after the attack happened

In [26]:
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import confusion_matrix
import pandas as pd
# Load your dataset, split it into training, validation, and test sets
# x_train, x_test, x_val, y_train, y_test, y_val = load_and_split_data()
X_train, X_val, X_test, y_train, y_val, y_test = data_extraction()
# Perform feature extraction using the provided code
x_train_tfidf, x_test_tfidf, x_val_tfidf, tfidf_feature_names, tfidf_model, _ = feature_extraction(x_train, x_test, x_val, method='TFIDF')
x_train_w2v, x_test_w2v, x_val_w2v, w2v_feature_names, w2v_model, scaler = feature_extraction(x_train, x_test, x_val, method='word2vec')

# Train KNN classifiers
knn_tfidf = KNeighborsClassifier()
knn_tfidf.fit(x_train_tfidf, y_train)

knn_w2v = KNeighborsClassifier()
knn_w2v.fit(x_train_w2v, y_train)

# Test the classifiers
y_pred_tfidf = knn_tfidf.predict(x_test_tfidf)
y_pred_w2v = knn_w2v.predict(x_test_w2v)

# Calculate false negative rates before the attack
cm_tfidf = confusion_matrix(y_test, y_pred_tfidf)
cm_w2v = confusion_matrix(y_test, y_pred_w2v)

fn_rate_tfidf_before = cm_tfidf[1][0] / (cm_tfidf[1][0] + cm_tfidf[1][1])
fn_rate_w2v_before = cm_w2v[1][0] / (cm_w2v[1][0] + cm_w2v[1][1])

print("False negative rate for TFIDF before attack:", fn_rate_tfidf_before)
print("False negative rate for Word2Vec before attack:", fn_rate_w2v_before)

# Add magic words to 100 spam emails
magic_words = ["np", "belgium", "clermont", "linguistic", "generative", "universiti", "corpus", "lexical", "pp", "query", "colleague", "directeur", "linguist", "grammar", "titus", "elsnet", "restrictive", "posting", "professeur", "sept", "constraint", "programme", "dissertation", "pdf", "isbn", "thierry"]  # Replace with the selected magic words
df = pd.read_csv('messages.csv')
spam_emails = df[df['label'] == 1]

# Randomly pick 100 spam emails
selected_spam_emails = spam_emails.sample(n=100, random_state=42)

# Reset index
selected_spam_emails.reset_index(drop=True, inplace=True)
spam_emails_with_magic_words = [email + " " + " ".join(magic_words) for email in selected_spam_emails.message]
spam_emails_with_magic_words_series = pd.Series(spam_emails_with_magic_words)
# Perform feature extraction on the modified spam emails
#print(spam_emails_with_magic_words, spam_emails_with_magic_words_series)
x_spam_tfidf = single_transform(spam_emails_with_magic_words_series, 'TFIDF', tfidf_model, tfidf_feature_names, _, _)

x_spam_w2v = [single_transform(pd.Series([email]), 'word2vec', w2v_model, w2v_feature_names, scaler, _) for email in spam_emails_with_magic_words_series]
#print(x_spam_tfidf)
x_spam_w2v = [sparse_matrix.toarray() for sparse_matrix in x_spam_w2v]
#print(len(x_spam_w2v[0][0]), len(x_spam_w2v[1][0]))
x_spam_w2v = np.vstack(x_spam_w2v)


# Test the classifiers on the modified spam emails
y_pred_tfidf_attack = knn_tfidf.predict(x_spam_tfidf)
y_pred_w2v_attack = knn_w2v.predict(x_spam_w2v)
print(y_pred_w2v_attack,y_pred_tfidf_attack)
# Calculate false negative rates after the attack
cm_tfidf_attack = confusion_matrix([1] * 100, y_pred_tfidf_attack)
cm_w2v_attack = confusion_matrix([1] * 100, y_pred_w2v_attack)
print(cm_tfidf_attack,cm_w2v_attack)
fn_rate_tfidf_attack = cm_tfidf_attack[1][0] / (cm_tfidf_attack[1][0] + cm_tfidf_attack[1][1])
fn_rate_w2v_attack = cm_w2v_attack[1][0] / (cm_w2v_attack[1][0] + cm_w2v_attack[1][1])

print("False negative rate for TFIDF after attack:", fn_rate_tfidf_attack)
print("False negative rate for Word2Vec after attack:", fn_rate_w2v_attack)

if fn_rate_tfidf_attack > fn_rate_tfidf_before:
    print("The attack is successful for the TFIDF-based classifier.")
else:
    print("The attack is not successful for the TFIDF-based classifier.")

if fn_rate_w2v_attack > fn_rate_w2v_before:
    print("The attack is successful for the Word2Vec-based classifier.")
else:
    print("The attack is not successful for the Word2Vec-based classifier.")

crossing boundary interdisciplinary approach latin america th june nd july paper international conference aim explore contemporary cultural debate taking place latin america draw various strand debate multidisciplinary forum paper consider various issue modernization hybridity transculturation apply various field study paper welcome following field cultural study literature particularly looking trend contemporary narrative including neoavantgarde popular fiction drama study cinema gender study popular culture comparative literature anthropology ethnography sociology linguistics economics politics law symposium proposed far include exile latin american experience indigenismo negrismo u s latin america paper longer minute abstract word english spanish portuguese sent preferably email conference organiser department language cultural study university limerick ireland st january conference organizer nuala finnegan kate quinn nancy serrano department language cultural study university limer