# Adversarial attack


The purpose of this experiment is to test a simple adversary attack and demonstrate its effectiveness by applying it to phishing detection on URLs.

The knowledge is White-box (very ideal case)

Given a neural network for classifying legitimate and phishing URLs, the attack is to slightly modify the URLs so that the network does not classify them correctly

For more details regarding attacks under URL and phishing detection, read the dedicated section 4.2.1 in the PDF

- Ask me for the dataset

In [None]:
!pip install numpy
!pip install pandas
!pip install scikit-learn
!pip install tensorflow

In [None]:
import numpy as np
import pandas as pd
import re
import tensorflow as tf
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.preprocessing import LabelEncoder
from sklearn.metrics import accuracy_score



## Step 1: Data processing

The data in the csv are like:

| URL                                        | Label |
|--------------------------------------------|-------|
| http://legit.com                    | 0     |
| http://phishing.com          | 1     |



URLs are thenconverted into numerical features using the TF-IDF technique


In [None]:
data = pd.read_csv('data.csv', dtype=str, low_memory=False)
vectorizer = TfidfVectorizer(max_features=1150, token_pattern=r'\b\w+\b')

X = vectorizer.fit_transform(data['URL']).toarray()
y = data['Label'].values

# 0 = Legit 1 = Phishing
label_encoder = LabelEncoder()
y = label_encoder.fit_transform(y)

## Step2: Neural Network

The neural network is (a very simple NN) a dense neural network consisting of three main layers: an input layer, two hidden layers, and an output layer

In [None]:
# NN model
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

model = tf.keras.Sequential([
    tf.keras.layers.Dense(128, activation='relu', input_shape=(X_train.shape[1],)),
    tf.keras.layers.Dropout(0.5),
    tf.keras.layers.Dense(64, activation='relu'),
    tf.keras.layers.Dense(1, activation='sigmoid')
])

# Training
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
model.fit(X_train, y_train, epochs=15, batch_size=32, validation_data=(X_test, y_test))

  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


Epoch 1/15
[1m6528/6528[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m18s[0m 2ms/step - accuracy: 0.9079 - loss: 0.2368 - val_accuracy: 0.9380 - val_loss: 0.1659
Epoch 2/15
[1m6528/6528[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m14s[0m 2ms/step - accuracy: 0.9408 - loss: 0.1611 - val_accuracy: 0.9415 - val_loss: 0.1604
Epoch 3/15
[1m6528/6528[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m21s[0m 2ms/step - accuracy: 0.9419 - loss: 0.1555 - val_accuracy: 0.9425 - val_loss: 0.1571
Epoch 4/15
[1m6528/6528[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m23s[0m 2ms/step - accuracy: 0.9433 - loss: 0.1508 - val_accuracy: 0.9428 - val_loss: 0.1567
Epoch 5/15
[1m6528/6528[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m18s[0m 2ms/step - accuracy: 0.9456 - loss: 0.1466 - val_accuracy: 0.9431 - val_loss: 0.1544
Epoch 6/15
[1m6528/6528[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m20s[0m 2ms/step - accuracy: 0.9453 - loss: 0.1450 - val_accuracy: 0.9433 - val_loss: 0.1528
Epoch 7/15

<keras.src.callbacks.history.History at 0x7b1f5015b850>


## Step 3: Attack
The adversarial_attack function applies an adversarial attack to URLs to check the robustness of the classification model.
- transforms the URLs into numerical vectors using a TF-IDF vectorizer.
- introduces small perturbations to the original vectors to try to confound the model.
- predicts whether the modified URLs are phishing or legitimate, returning the new predictions.

NB: the predictions are threshold of 0.5 because of sigmoid



In [None]:
def perturbation_attack(urls, vectorizer, model, epsilon=0.1):
    X_urls = vectorizer.transform(urls).toarray()
    perturbation = epsilon * np.sign(np.random.randn(*X_urls.shape))
    X_adversarial = X_urls + perturbation
    predictions = (model.predict(X_adversarial) > 0.5).astype("int32")
    return predictions

## Step 4: Test

In [None]:
#Function for a comparison between before and after
def compare(urls, labels, predictions):
  print("\nComparing:")
  for i in range(len(urls)):
    print(f"URL: {urls[i]}")
    print(f"Original label: {labels[i]} -> After attack label: {predictions[i]}")
    print("")
  success_rate = np.mean(predictions == 0) * 100
  print(f"Succes rate: {success_rate:.2f}%")

In [None]:
phishing_urls = data[data['Label'] == '1']['URL'].values[:500] # select 500 URLs
adversarial_predictions = perturbation_attack(phishing_urls, vectorizer, model)
adversarial_predictions = adversarial_predictions.flatten()
adversarial_labels = ['1' for _ in range(len(phishing_urls))]

compare(phishing_urls,adversarial_labels,adversarial_predictions)

[1m16/16[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 13ms/step

Comparing:
URL: gskindia.co.in
Original label: 1 -> After attack label: 0

URL: arnoldkling.com/~arnoldsk/aimst3/aimst325.html
Original label: 1 -> After attack label: 0

URL: custom-iyj6.frb.io
Original label: 1 -> After attack label: 0

URL: articles.techrepublic.com.com/5100-10878_11-6174867.html
Original label: 1 -> After attack label: 1

URL: retajconsultancy.com
Original label: 1 -> After attack label: 0

URL: http://www.grup-whatsapp-invite.zzux.com/
Original label: 1 -> After attack label: 1

URL: www.jchq.net/faq/jcertfaq.htm
Original label: 1 -> After attack label: 1

URL: levignedigalluccio.com
Original label: 1 -> After attack label: 0

URL: etd.lsu.edu/docs/available/etd-02262004-111054/unrestricted/Chiparus_dis.pdf
Original label: 1 -> After attack label: 0

URL: www.interfarm.co.jp/cafe/bbc/bbc-e/shop-e.html
Original label: 1 -> After attack label: 1

URL: https://comdi3horas.joomla.com/includes/l