## Authentic User Detection on Social Media

With social media platforms gradually replacing dedicated news outlets and also acting as an instrument to shape public opinion about important issues, hate speech accounts and bot users are becoming increasingly common in these platforms. The objective of this task is to identify comments that were likely made by such hate speech accounts or bot users and in doing so, the patterns that distinguish a genuine user from them.

The data used for this project is a collection of toxic and hateful comments found on popular social media platforms like instagram. These comments were chosen as they were toxic, racist, sexist, and were phrased in a way that was suspiciously likely to trigger people. The comments picked had to not contain information specific to a post, had to be as generic as possible, and were not questions or rhetorical. Some comments were manually added with high confidence that they were made by a hate speech account or bot user.

## Annotation Results:
Seeing as the task of classifying text as generated by bot/dedicated hate-speech account or authentic user is highly subjective, there is no fixed metric or external tool (other than another LLM) that can confidently differentiate the comments. Therefore, two human users have annoted the dataset with the labels "Bot" and "Real", which are used to denote percieved bot and authentic user generated comments respectively.

We compute the inter-annotator agreement between the two annotators (i.e. the two separate files). This is done by computing both the raw agreement (% of examples for which both annotators agreed on the label) and Cohen's Kappa.

*Cohen suggested the Kappa result be interpreted as follows: values ≤ 0 as indicating no agreement and 0.01–0.20 as none to slight, 0.21–0.40 as fair, 0.41– 0.60 as moderate, 0.61–0.80 as substantial, and 0.81–1.00 as almost perfect agreement.*

In [None]:
import pandas as pd
from sklearn.metrics import cohen_kappa_score

dataset1 = pd.read_csv('annotator1.csv')
dataset2 = pd.read_csv('annotator2.csv')

labels1 = dataset1['Label']
labels2 = dataset2['Label']

raw_agreement = sum(labels1 == labels2) / len(labels1)

cohen_kappa = cohen_kappa_score(labels1, labels2)

print("Raw Agreement:", raw_agreement)
print("Cohen's Kappa:", cohen_kappa)


Raw Agreement: 0.744
Cohen's Kappa: 0.48888320981344235


# Text classification

Since we're dealing with large models, the first step is to change to a GPU runtime.

In [None]:
import torch

assert torch.cuda.is_available()

device_name = torch.cuda.get_device_name()
n_gpu = torch.cuda.device_count()
print(f"Found device: {device_name}, n_gpu: {n_gpu}")
device = torch.device("cuda")

Found device: Tesla T4, n_gpu: 1


## Installing Hugging Face's Transformers library
Using Hugging Face's Transformers (https://github.com/huggingface/transformers), an open-source library that provides general-purpose architectures for natural language understanding and generation with a collection of various pretrained models made by the NLP community. This library will allowe the easy use pretrained models like `BERT` and perform experiments on top of them. These models can be used to solve downstream target tasks, such as text classification, question answering, and sequence labeling.

In [None]:
!pip install transformers
!pip install -U -q PyDrive

from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
from google.colab import auth
from oauth2client.client import GoogleCredentials
# Authenticate and create the PyDrive client.
auth.authenticate_user()
gauth = GoogleAuth()
gauth.credentials = GoogleCredentials.get_application_default()
drive = GoogleDrive(gauth)
print('success!')

import os
import zipfile

# Download helper functions file
helper_file = drive.CreateFile({'id': '16HW-z9Y1tM3gZ_vFpJAuwUDohz91Aac-'})
helper_file.GetContentFile('helpers.py')
print('helper file downloaded! (helpers.py)')

# Download sample file of tweets
data_file = drive.CreateFile({'id': '1QcoAmjOYRtsMX7njjQTYooIbJHPc6Ese'})
data_file.GetContentFile('tweets.csv')
print('sample tweets downloaded! (tweets.csv)')





success!
helper file downloaded! (helpers.py)
sample tweets downloaded! (tweets.csv)


The cell below imports some helper functions we wrote to demonstrate the task on the sample tweet dataset.

In [None]:
from helpers import tokenize_and_format, flat_accuracy

# Data Prep and Model Specifications

In [None]:
from helpers import tokenize_and_format, flat_accuracy
import pandas as pd

df = pd.read_csv('final_data.csv')
#df = pd.read_csv('tweets.csv')

# Replacing values in the 'Type' column
df['Label'] = df['Label'].replace({'Bot': 0, 'Real': 1})

df = df.sample(frac=1).reset_index(drop=True)

texts = df.Text.values
labels = df.Label.values

input_ids, attention_masks = tokenize_and_format(texts)

# Convert the lists into tensors.
input_ids = torch.cat(input_ids, dim=0)
attention_masks = torch.cat(attention_masks, dim=0)
labels = torch.tensor(labels)

# Print sentence 0, now as a list of IDs.
print('Original: ', texts[0])
print('Token IDs:', input_ids[0])

Original:  I'm sorry, but I can't take you seriously with that picture.
Token IDs: tensor([ 101, 1045, 1005, 1049, 3374, 1010, 2021, 1045, 2064, 1005, 1056, 2202,
        2017, 5667, 2007, 2008, 3861, 1012,  102,    0,    0,    0,    0,    0,
           0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,
           0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,
           0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,
           0,    0,    0,    0])


## Create train/test/validation splits

Here we split your dataset into 3 parts: a training set, a validation set, and a testing set. Each item in the dataset will be of 3 tuples containing an input_id tensor, an attention_mask tensor, and a label tensor.



In [None]:

total = len(df)

num_train = int(total * .8)
num_val = int(total * .1)
num_test = total - num_train - num_val

# make lists of 3-tuples (already shuffled the dataframe in cell above)

train_set = [(input_ids[i], attention_masks[i], labels[i]) for i in range(num_train)]
val_set = [(input_ids[i], attention_masks[i], labels[i]) for i in range(num_train, num_val+num_train)]
test_set = [(input_ids[i], attention_masks[i], labels[i]) for i in range(num_val + num_train, total)]

train_text = [texts[i] for i in range(num_train)]
val_text = [texts[i] for i in range(num_train, num_val+num_train)]
test_text = [texts[i] for i in range(num_val + num_train, total)]


Choosing the model we want to finetune from https://huggingface.co/transformers/pretrained_models.html. As the task requires labelling sentences, I am using BertForSequenceClassification below.

In [None]:
from transformers import BertForSequenceClassification, AdamW, BertConfig

model = BertForSequenceClassification.from_pretrained(
    "bert-base-uncased", # Use the 12-layer BERT model, with an uncased vocab.
    num_labels = 3, # The number of output labels.
    output_attentions = False, # Whether the model returns attentions weights.
    output_hidden_states = False, # Whether the model returns all hidden-states.
)

# Tell pytorch to run this model on the GPU.
model.cuda()


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


BertForSequenceClassification(
  (bert): BertModel(
    (embeddings): BertEmbeddings(
      (word_embeddings): Embedding(30522, 768, padding_idx=0)
      (position_embeddings): Embedding(512, 768)
      (token_type_embeddings): Embedding(2, 768)
      (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
      (dropout): Dropout(p=0.1, inplace=False)
    )
    (encoder): BertEncoder(
      (layer): ModuleList(
        (0-11): 12 x BertLayer(
          (attention): BertAttention(
            (self): BertSelfAttention(
              (query): Linear(in_features=768, out_features=768, bias=True)
              (key): Linear(in_features=768, out_features=768, bias=True)
              (value): Linear(in_features=768, out_features=768, bias=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
            (output): BertSelfOutput(
              (dense): Linear(in_features=768, out_features=768, bias=True)
              (LayerNorm): LayerNorm((768,), eps=1e-12,

# Fine-tuning hyperparameters
After initially running the model with default parameters, I got a test accuracy of 0.83. Then, I lowered the batch size, epochs and learning rate to 50, 5 and 2e-5 to mitigate potential "catastrophic forgetting", based on this paper https://arxiv.org/abs/1905.05583. From there, I kept increasing all 3 parameters gradually (batch size in increments of 10, epochs by 1, lr set to 5e-5,4e-5, 3e-5, 2e-5). Eventually, I got a test accuracy of 91.6 using a different training and test with the batch_size=99, epochs=99, lr=5e-5.

There was almost no difference between the best validation and test accuracy (both were 91.6). This is likely due to the fact that the model picked up the key patterns in the text accurately during training.

In [None]:
batch_size = 99
optimizer = AdamW(model.parameters(),
                  lr = 5e-5, # args.learning_rate - default is 5e-5
                  eps = 1e-8 # args.adam_epsilon  - default is 1e-8
                )
epochs = 9



In [None]:
import numpy as np
# function to get validation accuracy
def get_validation_performance(val_set):
    # Put the model in evaluation mode
    model.eval()

    # Tracking variables
    total_eval_accuracy = 0
    total_eval_loss = 0

    num_batches = int(len(val_set)/batch_size) + 1

    total_correct = 0

    for i in range(num_batches):

      end_index = min(batch_size * (i+1), len(val_set))

      batch = val_set[i*batch_size:end_index]

      if len(batch) == 0: continue

      input_id_tensors = torch.stack([data[0] for data in batch])
      input_mask_tensors = torch.stack([data[1] for data in batch])
      label_tensors = torch.stack([data[2] for data in batch])

      # Move tensors to the GPU
      b_input_ids = input_id_tensors.to(device)
      b_input_mask = input_mask_tensors.to(device)
      b_labels = label_tensors.to(device)

      with torch.no_grad():

        # Forward pass, calculate logit predictions.
        outputs = model(b_input_ids,
                                token_type_ids=None,
                                attention_mask=b_input_mask,
                                labels=b_labels)
        loss = outputs.loss
        logits = outputs.logits

        # Accumulate the validation loss.
        total_eval_loss += loss.item()

        # Move logits and labels to CPU
        logits = logits.detach().cpu().numpy()
        label_ids = b_labels.to('cpu').numpy()

        # Calculate the number of correctly labeled examples in batch
        pred_flat = np.argmax(logits, axis=1).flatten()
        labels_flat = label_ids.flatten()
        num_correct = np.sum(pred_flat == labels_flat)
        total_correct += num_correct

    # Report the final accuracy for this validation run.
    avg_val_accuracy = total_correct / len(val_set)
    return avg_val_accuracy



In [None]:
import random

# training loop

# For each epoch...
for epoch_i in range(0, epochs):
    # Perform one full pass over the training set.

    print("")
    print('======== Epoch {:} / {:} ========'.format(epoch_i + 1, epochs))
    print('Training...')

    # Reset the total loss for this epoch.
    total_train_loss = 0

    # Put the model into training mode.
    model.train()

    # For each batch of training data...
    num_batches = int(len(train_set)/batch_size) + 1

    for i in range(num_batches):
      end_index = min(batch_size * (i+1), len(train_set))

      batch = train_set[i*batch_size:end_index]

      if len(batch) == 0: continue

      input_id_tensors = torch.stack([data[0] for data in batch])
      input_mask_tensors = torch.stack([data[1] for data in batch])
      label_tensors = torch.stack([data[2] for data in batch])

      # Move tensors to the GPU
      b_input_ids = input_id_tensors.to(device)
      b_input_mask = input_mask_tensors.to(device)
      b_labels = label_tensors.to(device)

      # Clear the previously calculated gradient
      model.zero_grad()

      # Perform a forward pass (evaluate the model on this training batch).
      outputs = model(b_input_ids,
                            token_type_ids=None,
                            attention_mask=b_input_mask,
                            labels=b_labels)
      loss = outputs.loss
      logits = outputs.logits

      total_train_loss += loss.item()

      # Perform a backward pass to calculate the gradients.
      loss.backward()

      # Update parameters and take a step using the computed gradient.
      optimizer.step()

    print(f"Total loss: {total_train_loss}")
    val_acc = get_validation_performance(val_set)
    print(f"Validation accuracy: {val_acc}")

print("")
print("Training complete!")



Training...
Total loss: 0.015111084096133709
Validation accuracy: 0.75

Training...
Total loss: 0.01311499997973442
Validation accuracy: 0.75

Training...
Total loss: 0.012720595113933086
Validation accuracy: 0.75

Training...
Total loss: 0.011224128305912018
Validation accuracy: 0.75

Training...
Total loss: 0.010882757604122162
Validation accuracy: 0.75

Training...
Total loss: 0.010216883383691311
Validation accuracy: 0.75

Training...
Total loss: 0.009019187651574612
Validation accuracy: 0.75

Training...
Total loss: 0.008671858347952366
Validation accuracy: 0.75

Training...
Total loss: 0.008160402067005634
Validation accuracy: 0.75

Training complete!


# Evaluating the model on test set

In [None]:
get_validation_performance(test_set)

0.9166666666666666

## Error Analysis
Here, the model is tested on a few examples of comments that have been intentionally phrased or designed in a way that it would be difficult for any model trained on even a larger dataset to distinguish as bot generated or authentic user generated (i.e. adversarial examples).

## Note:
This version of the project's code has removed the input comments in the code and the output for this task, as they contain examples of hate speech that I don't want to include in my GitHub repo. These parts alone have been modified where required.

In [None]:
from transformers import BertTokenizer

tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')

model.eval()

custom_test=["Comment_1","Comment_2","etc."]
inputs = tokenizer(custom_test, return_tensors='pt', truncation=True, padding=True)

device = torch.device('cuda')
inputs = {key: tensor.to(device) for key, tensor in inputs.items()}

model.to(device)

outputs = model(**inputs)

predicted_labels = torch.argmax(outputs.logits, dim=1).tolist()

for text, label in zip(custom_test, predicted_labels):
    print(f"Text: {text}")
    print(f"Predicted label: {label}\n")

## Acknowledgement
This project is based on the [Advanced NLP Course](https://www.google.co.in/url?sa=t&source=web&rct=j&opi=89978449&url=https://people.cs.umass.edu/~miyyer/cs685/&ved=2ahUKEwiKsuiJ7c-FAxX1F1kFHXfuDrwQFnoECBMQAQ&usg=AOvVaw3J5jL9O4grGUhERJfaqCIR)