# Preliminaries

The program was run using Google Colab with GPU, Tesla T4. For finetuning the pretrained models to the desired datasets, the Hugging Face Trainer API was used. Datasets include a local fake news dataset (Filipino) and the Kaggle fake news dataset from UTK Machine Learning Club 2017.

This experiment will mainly cover creating an adversarial attack by paraphrasing articles.

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [None]:
!pip install datasets
!pip uninstall -y transformers accelerate
!pip install transformers accelerate
!pip install git+https://github.com/huggingface/accelerate
!pip install transformers==4.28.0
!pip install plotly
!pip install captum
!pip install sentence-transformers

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting datasets
  Downloading datasets-2.13.0-py3-none-any.whl (485 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m485.6/485.6 kB[0m [31m8.1 MB/s[0m eta [36m0:00:00[0m
Collecting dill<0.3.7,>=0.3.0 (from datasets)
  Downloading dill-0.3.6-py3-none-any.whl (110 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m110.5/110.5 kB[0m [31m12.3 MB/s[0m eta [36m0:00:00[0m
Collecting xxhash (from datasets)
  Downloading xxhash-3.2.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (212 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m212.5/212.5 kB[0m [31m18.0 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting multiprocess (from datasets)
  Downloading multiprocess-0.70.14-py310-none-any.whl (134 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m134.3/134.3 kB[0m [31m16.4 MB/s[0m eta [36m0:00:00[0m
Collec

In [None]:
import torch
import torch.nn as nn
import numpy as np
import pandas as pd
from datasets import load_dataset
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, confusion_matrix, ConfusionMatrixDisplay
from transformers import AutoTokenizer, AutoModelForSequenceClassification
from transformers import TrainingArguments, Trainer
from transformers import EarlyStoppingCallback
import plotly.express as px
import plotly.graph_objects as go
from sklearn.metrics.pairwise import cosine_similarity
from transformers import BertTokenizer, BertModel
from scipy.special import softmax

# Fake News Filipino

The provided dataset contains around 3000 news articles in Filipino that is perfectly split of real and fake news. The pretrained model, bert-tagalog-base-cased, was trained using the WikiText-TL-39 dataset which is a corpus of 172,815 articles in Tagalog.

In [None]:
!cp "/content/drive/Shared drives/Thesis/Datasets/Fake-News-Filipino/test-fil-clean.csv" "test-fil.csv"

Load the modified dataset. (Google Translate)

In [None]:
!cp "/content/drive/Shared drives/Thesis/Datasets/Fake-News-Filipino/PA-test-fil-adv-gt.csv" "test-fil-adv.csv"

## Evaluation

In [None]:
model_name = 'jcblaise/bert-tagalog-base-uncased'
tokenizer = BertTokenizer.from_pretrained(model_name)
model = BertModel.from_pretrained(model_name)

Downloading (…)solve/main/vocab.txt:   0%|          | 0.00/256k [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/54.0 [00:00<?, ?B/s]

Downloading (…)lve/main/config.json:   0%|          | 0.00/624 [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/439M [00:00<?, ?B/s]

Some weights of the model checkpoint at jcblaise/bert-tagalog-base-uncased were not used when initializing BertModel: ['cls.predictions.transform.dense.bias', 'cls.predictions.transform.dense.weight', 'cls.seq_relationship.weight', 'cls.predictions.decoder.weight', 'cls.seq_relationship.bias', 'cls.predictions.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias']
- This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


In [None]:
!cp -r "/content/drive/Shared drives/Thesis/Models/uncased-output-fil" "output-fil"

In [None]:
class Dataset(torch.utils.data.Dataset):
    def __init__(self, encodings, labels=None):
        self.encodings = encodings
        self.labels = labels

    def __getitem__(self, idx):
        item = {key: torch.tensor(val[idx]) for key, val in self.encodings.items()}
        if self.labels:
            item["labels"] = torch.tensor(self.labels[idx])
        return item

    def __len__(self):
        return len(self.encodings["input_ids"])

The pretrained model will make predictions on the original and adversarial test datasets. The code below is for the original test dataset.

In [None]:
test_data = pd.read_csv("test-fil.csv")
X_test_orig = list(test_data["article"])
X_test_tokenized = tokenizer(X_test_orig, padding=True, truncation=True, max_length=512, return_tensors='pt')
y_test = list(test_data["label"])

test_dataset = Dataset(X_test_tokenized)

model_path = "output-fil/checkpoint-500"
model = AutoModelForSequenceClassification.from_pretrained(model_path, num_labels=2)

test_trainer = Trainer(model)

raw_pred, _, _ = test_trainer.predict(test_dataset)
y_pred_orig = np.argmax(raw_pred, axis=1)


To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).



`.predictions` gives the probabilities

In [None]:
original_predictions = test_trainer.predict(test_dataset).predictions


To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).



### Model Performance (Original)

In [None]:
accuracy = accuracy_score(y_test, y_pred_orig)
recall = recall_score(y_test, y_pred_orig)
precision = precision_score(y_test, y_pred_orig)
f1 = f1_score(y_test, y_pred_orig)

#print(accuracy, recall, precision, f1)
print("ORIGINAL TEST DATASET (FILIPINO):")
print("Accuracy: {}".format(accuracy))
print("Recall: {}".format(recall))
print("Precision: {}".format(precision))
print("F1-score: {}".format(f1))

cm = confusion_matrix(y_test, y_pred_orig)
# disp = ConfusionMatrixDisplay(confusion_matrix=cm)
# disp.plot()

fig = px.imshow(cm, text_auto=True, title='Confusion Matrix of original test data (Filipino Fake News)', labels=dict(x="Predicted label", y="True label"), color_continuous_scale='haline')
fig.update_xaxes(dtick=1)
fig.update_yaxes(dtick=1)
fig.show()

ORIGINAL TEST DATASET (FILIPINO):
Accuracy: 0.9324324324324325
Recall: 0.9417879417879418
Precision: 0.9244897959183673
F1-score: 0.933058702368692


Repeat the process for the modified/adversarial dataset.

In [None]:
test_data = pd.read_csv("test-fil-adv.csv")
X_test_adv = list(test_data["article_new2"])
X_test_tokenized_adv = tokenizer(X_test_adv, padding=True, truncation=True, max_length=512, return_tensors='pt')
y_test = list(test_data["label"])

test_dataset = Dataset(X_test_tokenized_adv)

model_path = "output-fil/checkpoint-500"
model = AutoModelForSequenceClassification.from_pretrained(model_path, num_labels=2)

test_trainer = Trainer(model)

raw_pred, _, _ = test_trainer.predict(test_dataset)
y_pred = np.argmax(raw_pred, axis=1)

  item = {key: torch.tensor(val[idx]) for key, val in self.encodings.items()}


In [None]:
adversarial_predictions = test_trainer.predict(test_dataset).predictions


To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).



### Average Probability Change

In [None]:
original_probs = softmax(original_predictions, axis=1)
adversarial_probs = softmax(adversarial_predictions, axis=1)
delta_lst = adversarial_probs[:, 1] - original_probs[:, 1]
average_prob_change = round(float(np.mean(delta_lst)), 4)

print(f"Average Probability Change: {average_prob_change}")

Average Probability Change: -0.0288


### %LabelFlip

In [None]:
cm = confusion_matrix(y_test, y_pred_orig)

misclassification_rate = np.sum(y_pred_orig != y_pred)
percentage = round(100 * (misclassification_rate / np.sum(cm)), 4)

print(f"%LabelFlip: {misclassification_rate}/{np.sum(cm)} ({percentage}%)")

%LabelFlip: 48/962 (4.9896%)


To quantify word changes, compute for the semantic similarity between the original and adversarial articles

In [None]:
#computer sentence embeddings, compare embeddings by getting cosine similarity

from sklearn.metrics.pairwise import cosine_similarity

max_seq_length = 512

def encode_sentence(sentence):
    tokens = tokenizer.tokenize(sentence)
    if len(tokens) > max_seq_length - 2:  # account for [CLS] and [SEP] tokens
        tokens = tokens[:max_seq_length - 2]
    tokens = ['[CLS]'] + tokens + ['[SEP]']
    input_ids = tokenizer.convert_tokens_to_ids(tokens)
    input_ids = torch.tensor(input_ids).unsqueeze(0)  # add batch dimension
    outputs = model(input_ids)
    sentence_embedding = torch.mean(outputs.last_hidden_state, dim=1)  # average the token embeddings
    return sentence_embedding


In [None]:
print(len(X_test_orig))
print(len(X_test_adv))

962
962


In [None]:
sim_dataset = []

for sentence1, sentence2 in zip(X_test_orig, X_test_adv):
    embedding1 = encode_sentence(sentence1)
    embedding2 = encode_sentence(sentence2)

    similarity = cosine_similarity(embedding1.detach().numpy(), embedding2.detach().numpy())
    sim_dataset.append({'sentence1': sentence1, 'sentence2': sentence2, 'similarity': similarity})

sim_df = pd.DataFrame(sim_dataset)
sim_df['label'] = y_test
sim_df['orig_pred'] = y_pred_orig
sim_df['adv_pred'] = y_pred
sim_df.to_csv('/content/drive/Shared drives/Thesis/Datasets/Fake-News-Filipino/Analysis/fil-paraphrase-analysis.csv')

### Model Performance (Adversarial)

In [None]:
accuracy = accuracy_score(y_test, y_pred)
recall = recall_score(y_test, y_pred)
precision = precision_score(y_test, y_pred)
f1 = f1_score(y_test, y_pred)

#print(accuracy, recall, precision, f1)
print("ADVERSARIAL TEST DATASET (FILIPINO):")
print("Accuracy: {}".format(accuracy))
print("Recall: {}".format(recall))
print("Precision: {}".format(precision))
print("F1-score: {}".format(f1))

cm = confusion_matrix(y_test, y_pred)
# disp = ConfusionMatrixDisplay(confusion_matrix=cm)
# disp.plot()

fig = px.imshow(cm, text_auto=True, title='Confusion Matrix of adversarial test data (Filipino Fake News)',
                labels=dict(x="Predicted label", y="True label"), color_continuous_scale='haline')
fig.update_xaxes(dtick=1)
fig.update_yaxes(dtick=1)
fig.update_layout(font_family="Serif", font=dict(size=20))
fig.show()

ADVERSARIAL TEST DATASET (FILIPINO):
Accuracy: 0.9282744282744283
Recall: 0.9085239085239085
Precision: 0.9458874458874459
F1-score: 0.926829268292683


## Analysis

In [None]:
!cp "/content/drive/Shared drives/Thesis/Datasets/Fake-News-Filipino/Analysis/fil-paraphrase-analysis.csv" "fil-paraphrase-analysis.csv"

In [None]:
test_data = pd.read_csv('fil-paraphrase-analysis.csv')

In [None]:
test_data.head()

Unnamed: 0.1,Unnamed: 0,sentence1,sentence2,similarity,label,orig_pred,adv_pred
0,0,"Ang nagbabalik na si Paz-Cojuangco, na hangad ...","Ang pagbabalik ng Paz-Cojuangco, na naghahanap...",[[0.9286067]],0,0,0
1,1,Sapul sa video ang kabalastugan ng mag-asawa n...,"Sa video, ang hindi pagkakaunawaan ng mag -asa...",[[0.90175265]],1,1,1
2,2,"Totoo nga ang kasabihang ""age doesn't matter.""...","Ang kasabihan na ""edad ay hindi mahalaga."" Ito...",[[0.9358319]],1,1,1
3,3,"SA halip na suwertehin, baka malasin ang iyong...","Sa halip na masuwerteng, ang iyong bagong taon...",[[0.75553477]],0,0,0
4,4,Para sa nobyo ang nobya. Naroon ang abay ng no...,Ang ikakasal ay para sa nobya. Ang ikakasal ng...,[[0.7519004]],0,0,0


### Label Flip Contributions (Detailed)

In [None]:
#filter by getting only those that flipped the label
test_data = test_data[test_data['adv_pred'] != test_data['orig_pred']]

#true if match (improved the model), false if mismatch (adversarial)
test_data['label_match'] = test_data['adv_pred'] == test_data['label']

test_data['label_match'].head()



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



9     False
20    False
34    False
45    False
47     True
Name: label_match, dtype: bool

In [None]:
#group by label_match
grouped_data = test_data.groupby(['label', 'label_match']).size().reset_index(name='count')

In [None]:
fig = px.bar(grouped_data, x='label', y='count', color='label_match',
             labels={'label': 'Label', 'count': 'Count', 'label_match': 'Correct?'},
             title='Label Flip Contributions',
             color_discrete_sequence=px.colors.qualitative.Pastel,
             text='count',
             hover_data={'count': True})

fig.show()

### Semantic Textual Similarity (All)

In [None]:
test_data = pd.read_csv('fil-paraphrase-analysis.csv')

In [None]:
test_data['similarity'] = test_data['similarity'].astype(str).str.strip('[]')
test_data['similarity'] = test_data['similarity'].astype(float)

In [None]:
fig = px.scatter(test_data, x='similarity', y=test_data.index, color='similarity')
fig.update_layout(title='Semantic Textual Similarity', xaxis_title='Similarity Score', yaxis_title='Sample Pairs')
fig.show()

In [None]:
fig = px.histogram(test_data, x='similarity', nbins=20, marginal='rug')
fig.update_layout(title='Semantic Textual Similarity', xaxis_title='Similarity Score', yaxis_title='Frequency')
fig.update_layout(font_family="Serif", font=dict(size=20))
fig.show()

In [None]:
fig = px.box(test_data, y='similarity')
fig.update_layout(title='Semantic Textual Similarity', yaxis_title='Similarity Score')
fig.show()

In [None]:
#filter by getting only those that did not flip the label
test_data = test_data[test_data['adv_pred'] == test_data['orig_pred']]

#if correct/incorrect
test_data['label_match'] = test_data['adv_pred'] == test_data['label']

test_data['label_match'].head()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  test_data['label_match'] = test_data['adv_pred'] == test_data['label']


0    True
1    True
2    True
3    True
4    True
Name: label_match, dtype: bool

In [None]:
average_scores = test_data.groupby('label_match')['similarity'].mean()
print(average_scores)

label_match
False    0.884105
True     0.878663
Name: similarity, dtype: float64


All - orig

In [None]:
test_data = pd.read_csv('fil-paraphrase-analysis.csv')
test_data['similarity'] = test_data['similarity'].astype(str).str.strip('[]')
test_data['similarity'] = test_data['similarity'].astype(float)

In [None]:
#if correct/incorrect
test_data['label_match'] = test_data['orig_pred'] == test_data['label']

average_scores = test_data.groupby('label_match')['similarity'].mean()
print(average_scores)

label_match
False    0.884421
True     0.879230
Name: similarity, dtype: float64


All - adv

In [None]:
test_data = pd.read_csv('fil-paraphrase-analysis.csv')
test_data['similarity'] = test_data['similarity'].astype(str).str.strip('[]')
test_data['similarity'] = test_data['similarity'].astype(float)

In [None]:
#if correct/incorrect
test_data['label_match'] = test_data['adv_pred'] == test_data['label']

average_scores = test_data.groupby('label_match')['similarity'].mean()
print(average_scores)

label_match
False    0.889419
True     0.878820
Name: similarity, dtype: float64


### Semantic Textual Similarity (Only Label Flips)

In [None]:
#filter by getting only those that flipped the label
test_data = test_data[test_data['adv_pred'] != test_data['orig_pred']]

#true if match (improved the model), false if mismatch (adversarial)
test_data['label_match'] = test_data['adv_pred'] == test_data['label']

test_data['label_match'].head()



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



9     False
20    False
34    False
45    False
47     True
Name: label_match, dtype: bool

In [None]:
fig = px.scatter(test_data, x='similarity', y=test_data.index, color='similarity')
fig.update_layout(title='Semantic Textual Similarity', xaxis_title='Similarity Score', yaxis_title='Sample Pairs')
fig.show()

In [None]:
fig = px.histogram(test_data, x='similarity', nbins=20, marginal='rug')
fig.update_layout(title='Semantic Textual Similarity', xaxis_title='Similarity Score', yaxis_title='Frequency')
fig.update_layout(font_family="Serif", font=dict(size=20))
fig.show()

In [None]:
fig = px.box(test_data, y='similarity')
fig.update_layout(title='Semantic Textual Similarity', yaxis_title='Similarity Score')
fig.show()

In [None]:
fig = px.violin(test_data, x='label_match', y='similarity', box=True, points='all')

fig.update_layout(
    title='Label Match vs. Semantic Textual Similarity',
    xaxis_title='Label Match',
    yaxis_title='Similarity Score',
    font_family="Serif", font=dict(size=20)
)

fig.show()

In [None]:
fig = px.scatter(test_data, x=test_data.index, y='similarity', color='label_match')

fig.update_layout(
    title="Semantic Textual Similarity Scores",
    xaxis_title="Data Points",
    yaxis_title="Scores"
)

fig.show()

In [None]:
average_scores = test_data.groupby('label_match')['similarity'].mean()
print(average_scores)

label_match
False    0.898208
True     0.885037
Name: similarity, dtype: float64


# Kaggle Fake News

In [None]:
!cp "/content/drive/Shared drives/Thesis/Datasets/Kaggle-Fake-News/CleanFinal/test-eng-clean-remove-e.csv" "test-eng.csv"

In [None]:
!cp "/content/drive/Shared drives/Thesis/Datasets/Kaggle-Fake-News/Adv/PA-test-eng-adv-gt.csv" "test-eng-adv.csv"

## Evaluation

In [None]:
# Load the finetuned model
model_name = 'bert-base-uncased'
tokenizer = BertTokenizer.from_pretrained(model_name)
model = BertModel.from_pretrained(model_name)

Downloading (…)solve/main/vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/28.0 [00:00<?, ?B/s]

Downloading (…)lve/main/config.json:   0%|          | 0.00/570 [00:00<?, ?B/s]

Downloading model.safetensors:   0%|          | 0.00/440M [00:00<?, ?B/s]

Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertModel: ['cls.predictions.transform.dense.bias', 'cls.predictions.transform.dense.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias', 'cls.predictions.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias']
- This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


In [None]:
!cp -r "/content/drive/Shared drives/Thesis/Models/uncased-output-eng" "output-eng"

In [None]:
class Dataset(torch.utils.data.Dataset):
    def __init__(self, encodings, labels=None):
        self.encodings = encodings
        self.labels = labels

    def __getitem__(self, idx):
        item = {key: torch.tensor(val[idx]) for key, val in self.encodings.items()}
        if self.labels:
            item["labels"] = torch.tensor(self.labels[idx])
        return item

    def __len__(self):
        return len(self.encodings["input_ids"])

### Model Performance (Original)

The pretrained model will make predictions on the original and adversarial test datasets. The code below is for the original test dataset.

In [None]:
test_data = pd.read_csv("test-eng.csv")
X_test = list(test_data["text"])
X_test_tokenized = tokenizer(X_test, padding=True, truncation=True, max_length=512, return_tensors='pt')
y_test = list(test_data["label"])

test_dataset = Dataset(X_test_tokenized)

model_path = "output-eng/checkpoint-3500"
model = AutoModelForSequenceClassification.from_pretrained(model_path, num_labels=2)

test_trainer = Trainer(model)

raw_pred, _, _ = test_trainer.predict(test_dataset)
y_pred_orig = np.argmax(raw_pred, axis=1)


To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).



In [None]:
original_predictions = test_trainer.predict(test_dataset).predictions


To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).



In [None]:
accuracy = accuracy_score(y_test, y_pred_orig)
recall = recall_score(y_test, y_pred_orig)
precision = precision_score(y_test, y_pred_orig)
f1 = f1_score(y_test, y_pred_orig)

#print(accuracy, recall, precision, f1)
print("ORIGINAL TEST DATASET (ENGLISH):")
print("Accuracy: {}".format(accuracy))
print("Recall: {}".format(recall))
print("Precision: {}".format(precision))
print("F1-score: {}".format(f1))

cm = confusion_matrix(y_test, y_pred_orig)
# disp = ConfusionMatrixDisplay(confusion_matrix=cm)
# disp.plot()

fig = px.imshow(cm, text_auto=True, title='Confusion Matrix of original test data (English Fake News)', labels=dict(x="Predicted label", y="True label"), color_continuous_scale='haline')
fig.update_xaxes(dtick=1)
fig.update_yaxes(dtick=1)
fig.show()

ORIGINAL TEST DATASET (ENGLISH):
Accuracy: 0.9964151587572551
Recall: 0.99624445203141
Precision: 0.9965846994535519
F1-score: 0.9964145466962608


Repeat the process for the modified/adversarial dataset.

In [None]:
test_data = pd.read_csv("test-eng-adv.csv")
X_test_adv = list(test_data["text_new2"])
X_test_tokenized = tokenizer(X_test_adv, padding=True, truncation=True, max_length=512, return_tensors='pt')
y_test = list(test_data["label"])

test_dataset = Dataset(X_test_tokenized)

model_path = "output-eng/checkpoint-3500"
model = AutoModelForSequenceClassification.from_pretrained(model_path, num_labels=2)

test_trainer = Trainer(model)

raw_pred, _, _ = test_trainer.predict(test_dataset)
y_pred = np.argmax(raw_pred, axis=1)


To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).



In [None]:
adversarial_predictions = test_trainer.predict(test_dataset).predictions


To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).



### Average Probability Change

In [None]:
original_probs = softmax(original_predictions, axis=1)
adversarial_probs = softmax(adversarial_predictions, axis=1)
delta_lst = adversarial_probs[:, 1] - original_probs[:, 1]
average_prob_change = round(float(np.mean(delta_lst)), 4)

print(f"Average Probability Change: {average_prob_change}")

Average Probability Change: 0.295


In [None]:
print(len(y_test))
print(len(y_pred_orig))

5858
5858


### %LabelFlip

In [None]:
cm = confusion_matrix(y_test, y_pred_orig)

misclassification_rate = np.sum(y_pred_orig != y_pred)
percentage = round(100 * (misclassification_rate / np.sum(cm)), 4)

print(f"%LabelFlip: {misclassification_rate}/{np.sum(cm)} ({percentage}%)")

%LabelFlip: 1729/5858 (29.5152%)


In [None]:
print(len(y_test))
print(len(y_pred))
print(len(X_test_adv))

5858
5858
5858


In [None]:
label_dataset = []

label_df = pd.DataFrame(label_dataset)
label_df['label'] = y_test
label_df['orig_pred'] = y_pred_orig
label_df['adv_pred'] = y_pred

In [None]:
label_df.to_csv('/content/drive/Shared drives/Thesis/Datasets/Kaggle-Fake-News/Analysis/eng-paraphrase-label.csv')

In [None]:
!cp "/content/drive/Shared drives/Thesis/Datasets/Kaggle-Fake-News/Analysis/eng-paraphrase-label.csv" "eng-paraphrase-label.csv"

In [None]:
label_df = pd.read_csv('eng-paraphrase-label.csv')

In [None]:
max_seq_length = 512

def encode_sentence(sentence):
    tokens = tokenizer.tokenize(sentence)
    if len(tokens) > max_seq_length - 2:  # account for [CLS] and [SEP] tokens
        tokens = tokens[:max_seq_length - 2]
    tokens = ['[CLS]'] + tokens + ['[SEP]']
    input_ids = tokenizer.convert_tokens_to_ids(tokens)
    input_ids = torch.tensor(input_ids).unsqueeze(0)  # add batch dimension
    outputs = model(input_ids)
    sentence_embedding = torch.mean(outputs.last_hidden_state, dim=1)  # average the token embeddings
    return sentence_embedding


In [None]:
test_data = pd.read_csv("test-eng.csv")
X_test = list(test_data["text"])
test_data = pd.read_csv("test-eng-adv.csv")
X_test_adv = list(test_data["text_new2"])

In [None]:
sim_dataset = []

for sentence1, sentence2 in zip(X_test, X_test_adv):
    embedding1 = encode_sentence(sentence1)
    embedding2 = encode_sentence(sentence2)

    similarity = cosine_similarity(embedding1.detach().numpy(), embedding2.detach().numpy())
    sim_dataset.append({'sentence1': sentence1, 'sentence2': sentence2, 'similarity': similarity})

sim_df = pd.DataFrame(sim_dataset)
analysis_df = pd.concat([label_df, sim_df], ignore_index=True)

In [None]:
sim_df.head()

Unnamed: 0,sentence1,sentence2,similarity
0,Miss Russia AFP/East News \nMiss Russia Alisa ...,Miss Russia AFP/East News\nMiss Russia Alisa M...,[[0.9944242]]
1,NEW YORK (AP) — Google is now directing its...,New York (AP) - Google is now directing teams ...,[[0.9435961]]
2,"Tweet (Image via intoday.in) \nThis week, the ...","Tweet (image by Intttday.in)\nThis week, the c...",[[0.98365504]]
3,"CHASKA, Minn. — Ryan Moore was the last pla...","Chaska, Minn. - Ryan Moore is the last player ...",[[0.9830313]]
4,We Are Change \nNorth Dakota had nearly 300 oi...,We are changing\nNorth Dakota has nearly 300 o...,[[0.9833378]]


In [None]:
analysis_df['sentence1'] = sim_df['sentence1']
analysis_df['sentence2'] = sim_df['sentence2']
analysis_df['similarity'] = sim_df['similarity']

In [None]:
analysis_df.head()

Unnamed: 0.1,Unnamed: 0,label,orig_pred,adv_pred,sentence1,sentence2,similarity
0,0.0,1.0,1.0,1.0,Miss Russia AFP/East News \nMiss Russia Alisa ...,Miss Russia AFP/East News\nMiss Russia Alisa M...,[[0.9944242]]
1,1.0,0.0,0.0,0.0,NEW YORK (AP) — Google is now directing its...,New York (AP) - Google is now directing teams ...,[[0.9435961]]
2,2.0,1.0,1.0,1.0,"Tweet (Image via intoday.in) \nThis week, the ...","Tweet (image by Intttday.in)\nThis week, the c...",[[0.98365504]]
3,3.0,0.0,0.0,1.0,"CHASKA, Minn. — Ryan Moore was the last pla...","Chaska, Minn. - Ryan Moore is the last player ...",[[0.9830313]]
4,4.0,1.0,1.0,1.0,We Are Change \nNorth Dakota had nearly 300 oi...,We are changing\nNorth Dakota has nearly 300 o...,[[0.9833378]]


In [None]:
analysis_df.to_csv('/content/drive/Shared drives/Thesis/Datasets/Kaggle-Fake-News/Analysis/eng-paraphrase-analysis.csv')

### Model Performance (Adversarial)

In [None]:
accuracy = accuracy_score(y_test, y_pred)
recall = recall_score(y_test, y_pred)
precision = precision_score(y_test, y_pred)
f1 = f1_score(y_test, y_pred)

#print(accuracy, recall, precision, f1)
print("ADVERSARIAL TEST DATASET (ENGLISH):")
print("Accuracy: {}".format(accuracy))
print("Recall: {}".format(recall))
print("Precision: {}".format(precision))
print("F1-score: {}".format(f1))

cm = confusion_matrix(y_test, y_pred)
# disp = ConfusionMatrixDisplay(confusion_matrix=cm)
# disp.plot()

fig = px.imshow(cm, text_auto=True, title='Confusion Matrix of adversarial test data (English Fake News)',
                labels=dict(x="Predicted label", y="True label"), color_continuous_scale='haline')
fig.update_xaxes(dtick=1)
fig.update_yaxes(dtick=1)
fig.update_layout(font_family="Serif", font=dict(size=20))
fig.show()

ADVERSARIAL TEST DATASET (ENGLISH):
Accuracy: 0.7043359508364629
Recall: 0.99897575964493
Precision: 0.6285714285714286
F1-score: 0.7716244725738396


## Analysis

In [None]:
!cp "/content/drive/Shared drives/Thesis/Datasets/Kaggle-Fake-News/Analysis/eng-paraphrase-analysis.csv" "eng-paraphrase-analysis.csv"

In [None]:
test_data = pd.read_csv('eng-paraphrase-analysis.csv')

### Label Flip Contributions (Detailed)

In [None]:
#filter by getting only those that flipped the label
test_data = test_data[test_data['adv_pred'] != test_data['orig_pred']]

#true if match (improved the model), false if mismatch (adversarial)
test_data['label_match'] = test_data['adv_pred'] == test_data['label']

test_data['label_match'].head()



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



3     False
5     False
11    False
16    False
17    False
Name: label_match, dtype: bool

In [None]:
#group by label_match
grouped_data = test_data.groupby(['label', 'label_match']).size().reset_index(name='count')

In [None]:
fig = px.bar(grouped_data, x='label', y='count', color='label_match',
             labels={'label': 'Label', 'count': 'Count', 'label_match': 'Correct?'},
             title='Label Flip Contributions',
             color_discrete_sequence=px.colors.qualitative.Pastel,
             text='count',
             hover_data={'count': True})

fig.show()

### Semantic Textual Similarity (All)

In [None]:
test_data = pd.read_csv('eng-paraphrase-analysis.csv')

In [None]:
test_data['similarity'] = test_data['similarity'].astype(str).str.strip('[]')
test_data['similarity'] = test_data['similarity'].astype(float)

In [None]:
fig = px.scatter(test_data, x='similarity', y=test_data.index, color='similarity')
fig.update_layout(title='Semantic Textual Similarity', xaxis_title='Similarity Score', yaxis_title='Sample Pairs')
fig.show()

In [None]:
fig = px.histogram(test_data, x='similarity', nbins=20, marginal='rug')
fig.update_layout(title='Semantic Textual Similarity', xaxis_title='Similarity Score',
                  yaxis_title='Frequency', font_family="Serif", font=dict(size=20))
fig.show()

In [None]:
fig = px.box(test_data, y='similarity')
fig.update_layout(title='Semantic Textual Similarity', yaxis_title='Similarity Score')
fig.show()

In [None]:
test_data = pd.read_csv('eng-paraphrase-analysis.csv')
test_data['similarity'] = test_data['similarity'].astype(str).str.strip('[]')
test_data['similarity'] = test_data['similarity'].astype(float)

In [None]:
#filter by getting only those that did not flip the label
test_data = test_data[test_data['adv_pred'] == test_data['orig_pred']]

#if correct/incorrect
test_data['label_match'] = test_data['adv_pred'] == test_data['label']

average_scores = test_data.groupby('label_match')['similarity'].mean()
print(average_scores)

label_match
False    0.980151
True     0.980447
Name: similarity, dtype: float64


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  test_data['label_match'] = test_data['adv_pred'] == test_data['label']


All - orig

In [None]:
test_data = pd.read_csv('eng-paraphrase-analysis.csv')
test_data['similarity'] = test_data['similarity'].astype(str).str.strip('[]')
test_data['similarity'] = test_data['similarity'].astype(float)

#if correct/incorrect
test_data['label_match'] = test_data['orig_pred'] == test_data['label']

average_scores = test_data.groupby('label_match')['similarity'].mean()
print(average_scores)

label_match
False    0.982956
True     0.980688
Name: similarity, dtype: float64


All -adv

In [None]:
test_data = pd.read_csv('eng-paraphrase-analysis.csv')
test_data['similarity'] = test_data['similarity'].astype(str).str.strip('[]')
test_data['similarity'] = test_data['similarity'].astype(float)

#if correct/incorrect
test_data['label_match'] = test_data['adv_pred'] == test_data['label']

average_scores = test_data.groupby('label_match')['similarity'].mean()
print(average_scores)

label_match
False    0.981256
True     0.980461
Name: similarity, dtype: float64


### Semantic Textual Similarity (Only Label Flips)

In [None]:
#filter by getting only those that flipped the label
test_data = test_data[test_data['adv_pred'] != test_data['orig_pred']]

#true if match (improved the model), false if mismatch (adversarial)
test_data['label_match'] = test_data['adv_pred'] == test_data['label']

test_data['label_match'].head()



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



3     False
5     False
11    False
16    False
17    False
Name: label_match, dtype: bool

In [None]:
fig = px.scatter(test_data, x='similarity', y=test_data.index, color='similarity')
fig.update_layout(title='Semantic Textual Similarity', xaxis_title='Similarity Score', yaxis_title='Sample Pairs')
fig.show()

In [None]:
fig = px.histogram(test_data, x='similarity', nbins=20, marginal='rug')
fig.update_layout(title='Semantic Textual Similarity', xaxis_title='Similarity Score',
                  yaxis_title='Frequency', font_family="Serif", font=dict(size=20))
fig.show()

In [None]:
fig = px.box(test_data, y='similarity')
fig.update_layout(title='Semantic Textual Similarity', yaxis_title='Similarity Score')
fig.show()

In [None]:
fig = px.violin(test_data, x='label_match', y='similarity', box=True, points='all')

fig.update_layout(
    title='Label Match vs. Semantic Textual Similarity',
    xaxis_title='Label Match',
    yaxis_title='Similarity Score',
    font_family="Serif", font=dict(size=20)
)

fig.show()

In [None]:
fig = px.scatter(test_data, x=test_data.index, y='similarity', color='label_match')

fig.update_layout(
    title="Semantic Textual Similarity Scores",
    xaxis_title="Data Points",
    yaxis_title="Scores"
)

fig.show()

In [None]:
average_scores = test_data.groupby('label_match')['similarity'].mean()
print(average_scores)

label_match
False    0.981264
True     0.986696
Name: similarity, dtype: float64


# Attribution
1.   [An Adversarial Benchmark for Fake News Detection Models](https://github.com/ljyflores/fake-news-adversarial-benchmark/blob/master/polarity_preprocessing.ipynb)
2.   [Fine-tuning pretrained NLP models with Huggingface’s Trainer](https://towardsdatascience.com/fine-tuning-pretrained-nlp-models-with-huggingfaces-trainer-6326a4456e7b)
3. Semantic textual similarity¶. Semantic Textual Similarity - Sentence-Transformers documentation. (n.d.). https://www.sbert.net/docs/usage/semantic_textual_similarity.html