# Preliminaries

The program was run using Google Colab with GPU, Tesla T4. For finetuning the pretrained models to the desired datasets, the Hugging Face Trainer API was used. Datasets include a local fake news dataset (Filipino) and the Kaggle fake news dataset from UTK Machine Learning Club 2017.

This experiment will mainly cover creating an adversarial attack by negating selected phrases.

In [2]:
from google.colab import drive
drive.mount('/content/drive')
!cp "/content/drive/My Drive/CS-198-199/filipino-fake-news/full.csv" "tl-full.csv"
!cp "/content/drive/My Drive/CS-198-199/kaggle-fake-news/train.csv" "en-train.csv"
#!cp "/content/drive/My Drive/CS-198-199/kaggle-fake-news/test.csv" "en-test.csv"

Mounted at /content/drive


In [3]:
!pip install datasets
!pip install transformers

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting datasets
  Downloading datasets-2.7.1-py3-none-any.whl (451 kB)
[K     |████████████████████████████████| 451 kB 24.0 MB/s 
Collecting multiprocess
  Downloading multiprocess-0.70.14-py38-none-any.whl (132 kB)
[K     |████████████████████████████████| 132 kB 76.5 MB/s 
Collecting xxhash
  Downloading xxhash-3.1.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (212 kB)
[K     |████████████████████████████████| 212 kB 79.0 MB/s 
Collecting huggingface-hub<1.0.0,>=0.2.0
  Downloading huggingface_hub-0.11.1-py3-none-any.whl (182 kB)
[K     |████████████████████████████████| 182 kB 78.0 MB/s 
Collecting responses<0.19
  Downloading responses-0.18.0-py3-none-any.whl (38 kB)
Collecting urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1
  Downloading urllib3-1.25.11-py2.py3-none-any.whl (127 kB)
[K     |████████████████████████████████| 127 kB 36.6 MB/s 
Installing collected packag

In [4]:
import torch
import numpy as np
import pandas as pd
import itertools
import string
import re
from datasets import load_dataset
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
from transformers import AutoTokenizer, AutoModelForSequenceClassification
from transformers import TrainingArguments, Trainer
from transformers import EarlyStoppingCallback

The following codes will be used for training the models used in this experiment.

In [5]:
class Dataset(torch.utils.data.Dataset):
    def __init__(self, encodings, labels=None):
        self.encodings = encodings
        self.labels = labels

    def __getitem__(self, idx):
        item = {key: torch.tensor(val[idx]) for key, val in self.encodings.items()}
        if self.labels:
            item["labels"] = torch.tensor(self.labels[idx])
        return item

    def __len__(self):
        return len(self.encodings["input_ids"])

In [6]:
def compute_metrics(p):
    pred, labels = p
    pred = np.argmax(pred, axis=1)

    accuracy = accuracy_score(y_true=labels, y_pred=pred)
    recall = recall_score(y_true=labels, y_pred=pred)
    precision = precision_score(y_true=labels, y_pred=pred)
    f1 = f1_score(y_true=labels, y_pred=pred)

    return {"accuracy": accuracy, "precision": precision, "recall": recall, "f1": f1}

In [13]:
args = TrainingArguments(
    output_dir="output",
    evaluation_strategy="steps",
    eval_steps=500,
    per_device_train_batch_size=8,
    per_device_eval_batch_size=8,
    num_train_epochs=3,
    seed=0,
    load_best_model_at_end=True,
)
trainer = Trainer(
    model=model,
    args=args,
    train_dataset=train_dataset,
    eval_dataset=val_dataset,
    compute_metrics=compute_metrics,
    callbacks=[EarlyStoppingCallback(early_stopping_patience=3)],
)

PyTorch: setting up devices
The default value for the training argument `--report_to` will change in v5 (from all installed integrations to none). In v5, you will need to use `--report_to all` to get the same behavior as now. You should start updating your code and make this info disappear :-).


# Fake News Filipino

The provided dataset contains around 3000 news articles in Filipino that is perfectly split of real and fake news. The pretrained model, bert-tagalog-base-cased, was trained using the WikiText-TL-39 dataset which is a corpus of 172,815 articles in Tagalog.

In [None]:
df_tl = pd.read_csv('tl-full.csv')

## Pre-processing and finetuning
Split the Fake News Filipino dataset into training and testing data.

In [None]:
train, test = train_test_split(df_tl, test_size=0.3)
train.to_csv('/content/drive/My Drive/CS-198-199/filipino-fake-news/train_orig.csv', index=False)
test.to_csv('/content/drive/My Drive/CS-198-199/filipino-fake-news/test_orig.csv', index=False)

!cp "/content/drive/My Drive/CS-198-199/filipino-fake-news/train_orig.csv" "train_tl_orig.csv"
!cp "/content/drive/My Drive/CS-198-199/filipino-fake-news/test_orig.csv" "test_tl_orig.csv"

Finetune the pre-trained model using trainer parameters.

In [None]:
data = pd.read_csv('train_tl_orig.csv')

pretrained = 'jcblaise/bert-tagalog-base-cased'
tokenizer = AutoTokenizer.from_pretrained(pretrained)
model = AutoModelForSequenceClassification.from_pretrained(pretrained)

X = list(data["article"])
y = list(data["label"])
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.3)
X_train_tokenized = tokenizer(X_train, padding=True, truncation=True, max_length=512)
X_val_tokenized = tokenizer(X_val, padding=True, truncation=True, max_length=512)

train_dataset = Dataset(X_train_tokenized, y_train)
val_dataset = Dataset(X_val_tokenized, y_val)

Downloading:   0%|          | 0.00/55.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/624 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/256k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/439M [00:00<?, ?B/s]

Some weights of the model checkpoint at jcblaise/bert-tagalog-base-cased were not used when initializing BertForSequenceClassification: ['cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight', 'cls.predictions.transform.LayerNorm.weight', 'cls.seq_relationship.weight', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.seq_relationship.bias', 'cls.predictions.bias']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the mode

In [None]:
trainer.train()

***** Running training *****
  Num examples = 1570
  Num Epochs = 3
  Instantaneous batch size per device = 8
  Total train batch size (w. parallel, distributed & accumulation) = 8
  Gradient Accumulation steps = 1
  Total optimization steps = 591
  Number of trainable parameters = 109160450


Step,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
500,0.1696,0.299098,0.942136,0.977346,0.904192,0.939347


***** Running Evaluation *****
  Num examples = 674
  Batch size = 8
Saving model checkpoint to output/checkpoint-500
Configuration saved in output/checkpoint-500/config.json
Model weights saved in output/checkpoint-500/pytorch_model.bin


Training completed. Do not forget to share your model on huggingface.co/models =)


Loading best model from output/checkpoint-500 (score: 0.2990981638431549).


TrainOutput(global_step=591, training_loss=0.14636367428322936, metrics={'train_runtime': 470.6756, 'train_samples_per_second': 10.007, 'train_steps_per_second': 1.256, 'total_flos': 1239253070745600.0, 'train_loss': 0.14636367428322936, 'epoch': 3.0})

Copy the finetuned model to local storage.

In [32]:
!cp -r "output" "/content/drive/My Drive/CS-198-199/filipino-fake-news"

## Preparing the test data

For this experiment, six replacement rules were created to negate selected Tagalog phrases.

### Simple Negate Rule #1:

sa -> sa hindi

sa mga -> sa mga hindi

In [None]:
def SimpleNegateSA(article):
  split_string = re.split(r'(?i)((?<!hindi)(?<!wala)(?<!para)\s+sa mga\s+(?!hindi|wala|walang|may)|(?<!hindi)(?<!wala)(?<!para)\s+sa\s+(?!mga|hindi|wala|walang|may))', article)

  for i in range(len(split_string)):
    if (split_string[i].casefold() == ' sa ') or (split_string[i].casefold() == ' sa mga '):
      split_string[i] = split_string[i] + "hindi "
  return "". join(split_string)
  

### Simple Negate Rule #2:

ay/ay ang -> ay hindi

ay mga/ay ang mga -> ay hindi mga

ay nasa -> ay wala sa

In [None]:
def SimpleNegateAY(article):
  split_string = re.split(r'(?i)(\s+ay ang mga\s+(?!hindi|wala|walang|nasa|may)|\s+ay nasa\s+(?!hindi|wala|walang)|\s+ay mga\s+(?!hindi|wala|walang|nasa|may)|\s+ay ang\s+(?!mga|hindi|wala|walang|nasa|may)|\s+ay\s+(?!ang mga|mga|hindi|wala|walang|nasa|may))', article)

  for i in range(len(split_string)):
    if (split_string[i].casefold() == ' ay ') or (split_string[i].casefold() == ' ay ang '):
        split_string[i] = " ay hindi "
    elif (split_string[i].casefold() == ' ay mga ') or (split_string[i].casefold() == ' ay ang mga '):
      split_string[i] = " ay hindi mga "
    elif (split_string[i].casefold() == ' ay nasa '):
      split_string[i] = " ay wala sa "
  return "". join(split_string)


### Simple Negate Rule #3:

may -> walang

mayroon -> wala

mayroong -> walang

In [None]:
def SimpleNegateMAY(article):
  split_string = re.split(r'(?i)(\s+mayroong\s+(?!hindi|wala|walang)|\s+mayroon\s+|\s+may\s+(?!hindi|wala|walang))', article)

  for i in range(len(split_string)):
    if (split_string[i].casefold() == ' may ') or (split_string[i].casefold() == ' mayroong '):
      split_string[i] = " walang "
    elif (split_string[i].casefold() == ' mayroon '):
      split_string[i] = " wala "
  return "". join(split_string)


### Simple Negate Rule #4:

upang -> upang hindi

para -> para hindi

para sa -> hindi para sa

In [None]:
def SimpleNegateUPANG(article):
  split_string = re.split(r'(?i)((?<!hindi)\s+para sa\s+(?!hindi|wala|walang)|(?<!hindi)\s+upang\s+(?!sa|hindi|wala|walang)|(?<!hindi)\s+para\s+(?!hindi|wala|walang))', article)

  for i in range(len(split_string)):
    if (split_string[i].casefold() == ' upang ') or (split_string[i].casefold() == ' para '):
      split_string[i] = split_string[i] + "hindi "
    elif (split_string[i].casefold() == ' para sa '):
      split_string[i] = " hindi" + split_string[i]
  return "". join(split_string)


### Simple Negate Rule #5:

nang -> nang hindi

In [None]:
def SimpleNegateNANG(article):
  split_string = re.split(r'(?i)((?<!hindi)(?<!wala)\s+nang\s+(?!hindi|wala|walang|sa|may))', article)

  for i in range(len(split_string)):
    if (split_string[i].casefold() == ' nang '):
      if ((i > 0) and (i < len(split_string)-1)) and (split_string[i-1].casefold().split()[-1] == split_string[i+1].casefold().split()[0]):
        continue
      else:
        split_string[i] = split_string[i] + "hindi "
  return "". join(split_string)


### Simple Negate Rule #6:

kung saan -> kung saan hindi

In [None]:
def SimpleNegateKUNGSAAN(article):
  split_string = re.split(r'(?i)((?<!hindi)(?<!wala)\s+kung saan\s+(?!hindi|wala|walang|may))', article)

  for i in range(len(split_string)):
    if (split_string[i].casefold() == ' kung saan '):
      split_string[i] = split_string[i] + "hindi "
  return "". join(split_string)


The attacks are applied to the test data by creating a new dataframe containing the modified articles.

In [None]:
def ApplySimpleNegateTL(article, label):
  modified_article = SimpleNegateMAY(article)
  modified_article = SimpleNegateUPANG(modified_article)
  modified_article = SimpleNegateNANG(modified_article)
  modified_article = SimpleNegateKUNGSAAN(modified_article)
  modified_article = SimpleNegateAY(modified_article)
  modified_article = SimpleNegateSA(modified_article)

  if (article != modified_article) and (label == 0):
    label = 1

  return modified_article, label

#test_string = df['article'][70]
#new_string, new_label = ApplySimpleNegate(test_string, 0)
#print(new_string)
#print(new_label)

df_tl_test_data = pd.read_csv("test_tl_orig.csv")
df_tl_test_data[['article_new','label_new']] = df_tl_test_data.apply(lambda col: ApplySimpleNegateTL(col['article'], col['label']), axis=1, result_type='expand')
df_tl_test_modified = df_tl_test_data[['label_new', 'article_new']].rename(columns={'label_new':'label', 'article_new':'article'})

Copy the modified dataset to local storage and drive.

In [None]:
#df_tl_test_data.to_csv('/content/drive/My Drive/CS-198-199/filipino-fake-news/full_compare.csv', index=False)
df_tl_test_modified.to_csv('/content/drive/My Drive/CS-198-199/filipino-fake-news/test_adv.csv', index=False)

!cp "/content/drive/My Drive/CS-198-199/filipino-fake-news/test_adv.csv" "test_tl_adv.csv"

## Evaluation

The pretrained model will make predictions on the original and adversarial test datasets. The code below is for the original test dataset.

In [None]:
test_data = pd.read_csv("test_tl_orig.csv")
X_test = list(test_data["article"])
X_test_tokenized = tokenizer(X_test, padding=True, truncation=True, max_length=512)
y_test = list(test_data["label"])

test_dataset = Dataset(X_test_tokenized)

model_path = "output/checkpoint-500"
model = AutoModelForSequenceClassification.from_pretrained(model_path, num_labels=2)

test_trainer = Trainer(model)

raw_pred, _, _ = test_trainer.predict(test_dataset)
y_pred = np.argmax(raw_pred, axis=1)

accuracy = accuracy_score(y_test, y_pred)
recall = recall_score(y_test, y_pred)
precision = precision_score(y_test, y_pred)
f1 = f1_score(y_test, y_pred)

print(accuracy, recall, precision, f1)

loading configuration file output/checkpoint-500/config.json
Model config BertConfig {
  "_name_or_path": "output/checkpoint-500",
  "architectures": [
    "BertForSequenceClassification"
  ],
  "attention_probs_dropout_prob": 0.1,
  "classifier_dropout": null,
  "directionality": "bidi",
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "layer_norm_eps": 1e-12,
  "max_position_embeddings": 512,
  "model_type": "bert",
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  "pad_token_id": 0,
  "pooler_fc_size": 768,
  "pooler_num_attention_heads": 12,
  "pooler_num_fc_layers": 3,
  "pooler_size_per_head": 128,
  "pooler_type": "first_token_transform",
  "position_embedding_type": "absolute",
  "problem_type": "single_label_classification",
  "torch_dtype": "float32",
  "transformers_version": "4.25.1",
  "type_vocab_size": 2,
  "use_cache": true,
  "vocab_size": 30101
}

loading weights file output

0.9521829521829522 0.9400826446280992 0.9639830508474576 0.9518828451882846


Repeat the process for the modified/adversarial dataset.

In [None]:
test_data = pd.read_csv("test_tl_adv.csv")
X_test = list(test_data["article"])
X_test_tokenized = tokenizer(X_test, padding=True, truncation=True, max_length=512)
y_test = list(test_data["label"])

test_dataset = Dataset(X_test_tokenized)

model_path = "output/checkpoint-500"
model = AutoModelForSequenceClassification.from_pretrained(model_path, num_labels=2)

test_trainer = Trainer(model)

raw_pred, _, _ = test_trainer.predict(test_dataset)
y_pred = np.argmax(raw_pred, axis=1)

accuracy = accuracy_score(y_test, y_pred)
recall = recall_score(y_test, y_pred)
precision = precision_score(y_test, y_pred)
f1 = f1_score(y_test, y_pred)

print(accuracy, recall, precision, f1)

loading configuration file output/checkpoint-500/config.json
Model config BertConfig {
  "_name_or_path": "output/checkpoint-500",
  "architectures": [
    "BertForSequenceClassification"
  ],
  "attention_probs_dropout_prob": 0.1,
  "classifier_dropout": null,
  "directionality": "bidi",
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "layer_norm_eps": 1e-12,
  "max_position_embeddings": 512,
  "model_type": "bert",
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  "pad_token_id": 0,
  "pooler_fc_size": 768,
  "pooler_num_attention_heads": 12,
  "pooler_num_fc_layers": 3,
  "pooler_size_per_head": 128,
  "pooler_type": "first_token_transform",
  "position_embedding_type": "absolute",
  "problem_type": "single_label_classification",
  "torch_dtype": "float32",
  "transformers_version": "4.25.1",
  "type_vocab_size": 2,
  "use_cache": true,
  "vocab_size": 30101
}

loading weights file output

0.49584199584199584 0.49267782426778245 1.0 0.6601261387526279


# Kaggle Fake News

Use the train.csv file from [Kaggle Fake News Dataset](https://www.kaggle.com/competitions/fake-news/data) containing over 20000 news articles labeled as 0 when reliable, and 1 when unreliable.

In [8]:
df_en = pd.read_csv('en-train.csv')

df_en = df_en[df_en['text'].notnull()]
df_en['text'] = df_en['text'].apply(lambda x: x.replace('’',"'"))

## Pre-processing and finetuning
Split the Kaggle Fake News dataset into training and testing data.

In [9]:
train, test = train_test_split(df_en, test_size=0.3)
train.to_csv('/content/drive/My Drive/CS-198-199/kaggle-fake-news/train_orig.csv', index=False)
test.to_csv('/content/drive/My Drive/CS-198-199/kaggle-fake-news/test_orig.csv', index=False)

!cp "/content/drive/My Drive/CS-198-199/kaggle-fake-news/train_orig.csv" "train_en_orig.csv"
!cp "/content/drive/My Drive/CS-198-199/kaggle-fake-news/test_orig.csv" "test_en_orig.csv"

Finetune the pre-trained model using trainer parameters.

In [10]:
data = pd.read_csv('train_en_orig.csv')

pretrained = 'bert-base-cased'
tokenizer = AutoTokenizer.from_pretrained(pretrained)
model = AutoModelForSequenceClassification.from_pretrained(pretrained)

X = list(data["text"])
y = list(data["label"])
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.3)
X_train_tokenized = tokenizer(X_train, padding=True, truncation=True, max_length=512)
X_val_tokenized = tokenizer(X_val, padding=True, truncation=True, max_length=512)

train_dataset = Dataset(X_train_tokenized, y_train)
val_dataset = Dataset(X_val_tokenized, y_val)

Downloading:   0%|          | 0.00/29.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/570 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/213k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/436k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/436M [00:00<?, ?B/s]

Some weights of the model checkpoint at bert-base-cased were not used when initializing BertForSequenceClassification: ['cls.predictions.transform.LayerNorm.weight', 'cls.predictions.bias', 'cls.predictions.decoder.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.dense.weight', 'cls.seq_relationship.bias', 'cls.seq_relationship.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at b

In [14]:
trainer.train()

***** Running training *****
  Num examples = 10172
  Num Epochs = 3
  Instantaneous batch size per device = 8
  Total train batch size (w. parallel, distributed & accumulation) = 8
  Gradient Accumulation steps = 1
  Total optimization steps = 3816
  Number of trainable parameters = 108311810


Step,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
500,0.112,0.042013,0.990596,0.989307,0.991612,0.990458


***** Running Evaluation *****
  Num examples = 4360
  Batch size = 8
Saving model checkpoint to output/checkpoint-500
Configuration saved in output/checkpoint-500/config.json
Model weights saved in output/checkpoint-500/pytorch_model.bin


Step,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
500,0.112,0.042013,0.990596,0.989307,0.991612,0.990458
1000,0.0466,0.054807,0.991055,0.999526,0.982293,0.990834
1500,0.0334,0.013972,0.997248,0.997204,0.997204,0.997204
2000,0.0133,0.033163,0.994954,0.997191,0.992544,0.994862
2500,0.0193,0.022371,0.995642,1.0,0.991146,0.995553
3000,0.0082,0.027272,0.99633,0.997199,0.99534,0.996269


***** Running Evaluation *****
  Num examples = 4360
  Batch size = 8
Saving model checkpoint to output/checkpoint-1000
Configuration saved in output/checkpoint-1000/config.json
Model weights saved in output/checkpoint-1000/pytorch_model.bin
***** Running Evaluation *****
  Num examples = 4360
  Batch size = 8
Saving model checkpoint to output/checkpoint-1500
Configuration saved in output/checkpoint-1500/config.json
Model weights saved in output/checkpoint-1500/pytorch_model.bin
***** Running Evaluation *****
  Num examples = 4360
  Batch size = 8
Saving model checkpoint to output/checkpoint-2000
Configuration saved in output/checkpoint-2000/config.json
Model weights saved in output/checkpoint-2000/pytorch_model.bin
***** Running Evaluation *****
  Num examples = 4360
  Batch size = 8
Saving model checkpoint to output/checkpoint-2500
Configuration saved in output/checkpoint-2500/config.json
Model weights saved in output/checkpoint-2500/pytorch_model.bin
***** Running Evaluation *****
 

TrainOutput(global_step=3000, training_loss=0.038801334540049234, metrics={'train_runtime': 3179.669, 'train_samples_per_second': 9.597, 'train_steps_per_second': 1.2, 'total_flos': 6312560440197120.0, 'train_loss': 0.038801334540049234, 'epoch': 2.36})

Copy the finetuned model to local storage.

In [15]:
!cp -r "output" "/content/drive/My Drive/CS-198-199/kaggle-fake-news"

## Preparing the test data

For this experiment, the negation script used to negate selected English phrases was a modified version from _study_.

In [16]:
negate_dict = {" isn't ":" is ",
    " isn\'t ":" is ",
    " is not ":" is ",
    " is ":" is not ",
    " didn't ":" did ",
    " didn\'t ":" did ",
    " did not ":" did ",
    " does not have ":" has ",
    " doesn't have ":" has ",
    " doesn\'t have ":" has ",
    " has ":" does not have ",
    " shouldn't ":" should ",
    " shouldn\'t ":" should ",
    " should not ":" should ",
    " should ":" should not ",
    " wouldn't ":" would ",
    " wouldn\'t ":" would ",
    " would not ":" would ",
    " would ":" would not ",
    " couldn't ":" could ",
    " couldn\'t ":" could ",
    " could not ":" could ",
    " could ":" could not ",
    " mustn't ":" must ",
    " mustn\'t ":" must ",
    " must not ":" must ",
    " must ":" must not ",
    " can't ":" can ",
    " can\'t ":" can ",
    " cannot ":" can ",
    " can ":" cannot "}

IRREGULAR_ES_VERB_ENDINGS = ["ss", "x", "ch", "sh", "o"]

def negate(sentence):
  modified_sentence = sentence

  for key in negate_dict.keys():
    split_sentence = modified_sentence.split(key)
    if len(split_sentence) > 1:
      new = ""
      for i in range(0, len(split_sentence) - 1):
        if i == len(split_sentence) - 2:
          new = new + split_sentence[i] + negate_dict[key] + split_sentence[i+1]
        else:
          new = new + split_sentence[i] + negate_dict[key]
      modified_sentence = new

  # doesn't work -> works
  doesnt_regex = r'(doesn\'t|does not) (?P<name>\w+)'

  for i in re.finditer(doesnt_regex, modified_sentence):
    modified_sentence = re.sub(doesnt_regex, replace_doesnt, modified_sentence, 1)

  return modified_sentence

def __is_consonant(letter):
  return letter not in ['a', 'e', 'i', 'o', 'u', 'y']

def replace_doesnt(matchobj):
  verb = matchobj.group(2)

  if verb.endswith("y") and __is_consonant(verb[-2]):
    return "{0}ies".format(verb[0:-1])

  for ending in IRREGULAR_ES_VERB_ENDINGS:
    if verb.endswith(ending):
      return "{0}es".format(verb)

  return "{0}s".format(verb)


The attacks are applied to the test data by creating a new dataframe containing the modified articles.

In [17]:
def ApplySimpleNegateEN(article, label):
  modified_article = negate(article)
  if (article != modified_article) and (label == 0):
    label = 1

  return modified_article, label

df_en_test_data = pd.read_csv("test_en_orig.csv")
df_en_test_data[['text_new','label_new']] = df_en_test_data.apply(lambda col: ApplySimpleNegateEN(col['text'], col['label']), axis=1, result_type='expand')
df_en_test_modified = df_en_test_data[['id', 'title', 'author', 'text_new', 'label_new']].rename(columns={'label_new':'label', 'text_new':'text'})

#test_string = df_en['text'][5]
#print(test_string)
#new_string, new_label = ApplySimpleNegateEN(test_string, 0)
#print(new_string)
#print(new_label)

Copy the modified dataset to local storage and drive.

In [18]:
#df_en_test_data.to_csv('/content/drive/My Drive/CS-198-199/kaggle-fake-news/full_compare.csv', index=False)
df_en_test_modified.to_csv('/content/drive/My Drive/CS-198-199/kaggle-fake-news/test_adv.csv', index=False)

!cp "/content/drive/My Drive/CS-198-199/kaggle-fake-news/test_adv.csv" "test_en_adv.csv"

## Evaluation

The pretrained model will make predictions on the original and adversarial test datasets. The code below is for the original test dataset.

In [19]:
test_data = pd.read_csv("test_en_orig.csv")
X_test = list(test_data["text"])
X_test_tokenized = tokenizer(X_test, padding=True, truncation=True, max_length=512)
y_test = list(test_data["label"])

test_dataset = Dataset(X_test_tokenized)

model_path = "output/checkpoint-1500"
model = AutoModelForSequenceClassification.from_pretrained(model_path, num_labels=2)

test_trainer = Trainer(model)

raw_pred, _, _ = test_trainer.predict(test_dataset)
y_pred = np.argmax(raw_pred, axis=1)

accuracy = accuracy_score(y_test, y_pred)
recall = recall_score(y_test, y_pred)
precision = precision_score(y_test, y_pred)
f1 = f1_score(y_test, y_pred)

print(accuracy, recall, precision, f1)

loading configuration file output/checkpoint-1500/config.json
Model config BertConfig {
  "_name_or_path": "output/checkpoint-1500",
  "architectures": [
    "BertForSequenceClassification"
  ],
  "attention_probs_dropout_prob": 0.1,
  "classifier_dropout": null,
  "gradient_checkpointing": false,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "layer_norm_eps": 1e-12,
  "max_position_embeddings": 512,
  "model_type": "bert",
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  "pad_token_id": 0,
  "position_embedding_type": "absolute",
  "problem_type": "single_label_classification",
  "torch_dtype": "float32",
  "transformers_version": "4.25.1",
  "type_vocab_size": 2,
  "use_cache": true,
  "vocab_size": 28996
}

loading weights file output/checkpoint-1500/pytorch_model.bin
All model checkpoint weights were used when initializing BertForSequenceClassification.

All the weights of BertForSequ

0.995504896452079 0.9964204360559714 0.9944787268593699 0.9954486345903771


Repeat the process for the modified/adversarial dataset.

In [20]:
test_data = pd.read_csv("test_en_adv.csv")
X_test = list(test_data["text"])
X_test_tokenized = tokenizer(X_test, padding=True, truncation=True, max_length=512)
y_test = list(test_data["label"])

test_dataset = Dataset(X_test_tokenized)

model_path = "output/checkpoint-1500"
model = AutoModelForSequenceClassification.from_pretrained(model_path, num_labels=2)

test_trainer = Trainer(model)

raw_pred, _, _ = test_trainer.predict(test_dataset)
y_pred = np.argmax(raw_pred, axis=1)

accuracy = accuracy_score(y_test, y_pred)
recall = recall_score(y_test, y_pred)
precision = precision_score(y_test, y_pred)
f1 = f1_score(y_test, y_pred)

print(accuracy, recall, precision, f1)

loading configuration file output/checkpoint-1500/config.json
Model config BertConfig {
  "_name_or_path": "output/checkpoint-1500",
  "architectures": [
    "BertForSequenceClassification"
  ],
  "attention_probs_dropout_prob": 0.1,
  "classifier_dropout": null,
  "gradient_checkpointing": false,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "layer_norm_eps": 1e-12,
  "max_position_embeddings": 512,
  "model_type": "bert",
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  "pad_token_id": 0,
  "position_embedding_type": "absolute",
  "problem_type": "single_label_classification",
  "torch_dtype": "float32",
  "transformers_version": "4.25.1",
  "type_vocab_size": 2,
  "use_cache": true,
  "vocab_size": 28996
}

loading weights file output/checkpoint-1500/pytorch_model.bin
All model checkpoint weights were used when initializing BertForSequenceClassification.

All the weights of BertForSequ

0.5008829667683417 0.4976533419647192 0.9983766233766234 0.6642185981207475


# Visualization of Results

In [21]:
import plotly.graph_objects as go

fig = go.Figure(data=[go.Table(
    header=dict(values=['Finetuned Model','Accuracy', 'Recall', 'Precision', 'F1-Score'],
                line_color='darkslategray',
                fill_color='lightskyblue',
                align='left'),
    cells=dict(values=[['Kaggle Fake News (Original)', 'Kaggle Fake News (Adversarial)', 'Fake News Filipino (Original)', 'Fake News Filipino (Adversarial)'],
                       [99.55, 50.10, 95.22, 49.58],
                       [99.64, 49.77, 94.01, 49.27],
                       [99.45, 99.84, 96.40, 100.0],
                       [99.54, 66.42, 95.19, 66.01]],
               line_color='darkslategray',
               fill_color='lightcyan',
               align='left'))
])

fig.update_layout(width=1000, height=500)
fig.show()

# Attribution
1.   [An Adversarial Benchmark for Fake News Detection Models](https://github.com/ljyflores/fake-news-adversarial-benchmark/blob/master/polarity_preprocessing.ipynb)
2.   [Fine-tuning pretrained NLP models with Huggingface’s Trainer](https://towardsdatascience.com/fine-tuning-pretrained-nlp-models-with-huggingfaces-trainer-6326a4456e7b)