## Instruction
- Step 1: check either the "Blue Team Template" or "Red Team Template" section, depending on your team's assignment. These templates describe the general abstract classes of the neural text DETECTOR and ATTACKER. These classes show you the inner workings of the detector and the attacker. However, in this template, there are a few functions that have not yet fully implemented such as how the detector will detect neural texts and how the attacker will attack a detector. This leads to Step 2 below.
- Step 2; check either the "Example Blue Team" or "Example Red Team" section, depending on your team's assignment. These sections describe examples of how to (1) inhereit the parent template of DETECTOR and ATTACKER and (2) where you need to define the actual mechanism by which your detector and attacker will exercise. What you then need to do is, to study these examples, and define your own "child" abstract class by inheriting the corresponding parent template (e.g., RedBlueModelWrapper for Blue team or RedBlueAttackerWrapper for Red team).
- Step 3: check out the example code on how you can initialize an instance (a detector or an attacker) of your own abstract class and validate that your instance is working probably.
- Step 4: submit your code (including all the library imports and your own inherented abstract class definition) to blackboard as a single Python file or a Jupyter notebook file. Please make sure you document your code so that I can follow and make sure it runs on my computer.

# ADMIN

### Import Library

In [None]:
!pip install datasets
!pip install accelerate -U
!pip install transformers[torch]

In [None]:
!pip install textattack

In [None]:
import numpy as np
import time

### Blue Team Template

In [None]:
class RedBlueModelWrapper:
  def __init__(self):
    # TODO
    # IMPORT ALL IMPORTANT LIBRARIES HERE
    # INITIALIZE AND LOAD YOUR MODEL, TOKENIZER
    # AND OTHER NECCESSARY COMPONENTS HERE IF NEEDED
    self.total_queries = 0
    self.TIME_DURATION_ATTEMPTS = 20
    self.query_time = []

  def _query(self, text: str):
    # TODO
    # PREDICTION OF YOUR MODEL HERE ON THE INPUT TEXT ``text''
    # MUST RETURN A PREDICTION PROBABILITY VECTOR IN NUMPY
    # THIS VECTOR SHOULD HAVE SHAPE (2,),e.g., array([0.79184294, 0.20815705])
    # THE FIRST COMPONENT IS THE PROBABILITY THE TEXT IS WRITTEN BY HUMAN
    # THE SECOND COMPONENT IS THE PROBABILITY THE TEXT IS GENERATED BY MACHINE
    # ALL PROBABILITIES NEED TO BE SUMED TO 1.0
    raise NotImplementedError

  def __call__(self, text: str):
    if len(self.query_time) < self.TIME_DURATION_ATTEMPTS:
      start_time = time.time()

    probs = self._query(text)
    assert probs.shape == (2,)
    assert np.abs(np.sum(probs) - 1.0) < 0.001

    if len(self.query_time) < self.TIME_DURATION_ATTEMPTS:
      end_time = time.time()
      duration = end_time - start_time
      self.query_time.append(duration)

    self.increase_query()
    return probs

  def increase_query(self):
    self.total_queries += 1

  def reset_query(self):
    self.total_queries = 0.0

  def average_time(self):
    return np.mean(self.query_time)

### Red Team Template

In [None]:
class RedBlueAttackerWrapper:
  def __init__(self, target_model: RedBlueModelWrapper):
    # TODO
    # IMPORT ALL IMPORTANT LIBRARIES HERE
    # INITIALIZE AND LOAD YOUR ATTACKER
    # AND OTHER NECCESSARY COMPONENTS HERE IF NEEDED
    self.pairs = []
    self.TIME_DURATION_ATTEMPTS = 20
    self.attack_time = []

  def _query(self, text: str):
    # TODO
    # ATTACK MECHANISM OF YOUR MODEL: ATTACK ``model'' ON ``text''
    # MUST RETURN THE ADVERSARIAL EXAMPLE OF THE TEXT
    raise NotImplementedError

  def __call__(self, text: str):
    if len(self.attack_time) < self.TIME_DURATION_ATTEMPTS:
      start_time = time.time()

    adv_text = self._query(text)
    assert type(adv_text) == str

    if len(self.attack_time) < self.TIME_DURATION_ATTEMPTS:
      end_time = time.time()
      duration = end_time - start_time
      self.attack_time.append(duration)

    self.pairs.append((text, adv_text))

    return adv_text

  def average_time(self):
    return np.mean(self.attack_time)

### Example Blue Team

In [None]:
from transformers import BertForSequenceClassification, Trainer, TrainingArguments

model = BertForSequenceClassification.from_pretrained("bert-base-uncased", num_labels=2)

training_args = TrainingArguments(
    output_dir='./results',          # output directory
    num_train_epochs=3,              # number of training epochs
    per_device_train_batch_size=16,  # batch size for training
    per_device_eval_batch_size=64,   # batch size for evaluation
    warmup_steps=500,                # number of warmup steps for learning rate scheduler
    weight_decay=0.01,               # strength of weight decay
    logging_dir='./logs',            # directory for storing logs
    logging_steps=10,
)

from transformers import Trainer

trainer = Trainer(
    model=model,                         # the instantiated 🤗 Transformers model to be trained
    args=training_args,                  # training arguments, defined above
    train_dataset=train_dataset,         # training dataset
    eval_dataset=validation_dataset      # evaluation dataset
)

trainer.train()


In [None]:
trainer.evaluate()


In [None]:
# inherit the main RedBlueModelWrapper class to your own model classs
# EACH TEAM WILL THEN HAVE THEIR OWN ABSTRACT CLASS
# E.G., RedBlueModelWrapper_Team0 FOR TEAM #0
# E.G., RedBlueModelWrapper_Team1 FOR TEAM #1
# E.G., RedBlueModelWrapper_Team2 FOR TEAM #2

import torch
import numpy as np
from transformers import GPT2Tokenizer, GPT2LMHeadModel
from transformers import BertTokenizer, BertForSequenceClassification
from transformers import RobertaTokenizer, RobertaForSequenceClassification
from transformers import pipeline
class RedBlueModelWrapper_Team1(RedBlueModelWrapper):
  def __init__(self):
    # TODO
    # IMPORT ALL IMPORTANT LIBRARIES HERE
    # AND OTHER NECCESSARY COMPONENTS HERE IF NEEDED
    super(RedBlueModelWrapper_Team1, self).__init__() # THIS LINE IS IMPORTANT AND MUST BE PLACED HERE

    self.tokenizer_gpt2 = GPT2Tokenizer.from_pretrained("gpt2")
    self.model_gpt2 = GPT2LMHeadModel.from_pretrained("gpt2")

    self.tokenizer_bert = BertTokenizer.from_pretrained("bert-base-uncased")
    self.model_bert = BertForSequenceClassification.from_pretrained("bert-base-uncased")

    self.tokenizer_roberta = RobertaTokenizer.from_pretrained("roberta-base")
    self.model_roberta = RobertaForSequenceClassification.from_pretrained("roberta-base")

    self.total_queries = 0
    self.TIME_DURATION_ATTEMPTS = 20
    self.query_time = []

  def _query(self, text: str):
    # TODO
    # PREDICTION OF YOUR MODEL HERE ON THE INPUT TEXT ``text''
    # MUST RETURN A PREDICTION PROBABILITY VECTOR IN NUMPY
    # THIS VECTOR SHOULD HAVE SHAPE (2,),e.g., array([0.79184294, 0.20815705])
    # THE FIRST COMPONENT IS THE PROBABILITY THE TEXT IS WRITTEN BY HUMAN
    # THE SECOND COMPONENT IS THE PROBABILITY THE TEXT IS GENERATED BY MACHINE
    # ALL PROBABILITIES NEED TO BE SUMED TO 1.0

    # Prepare input for GPT-2
    inputs_gpt2 = self.tokenizer_gpt2(text, return_tensors="pt", max_length=512, truncation=True)
    # Prepare input for BERT
    inputs_bert = self.tokenizer_bert(text, return_tensors="pt", max_length=512, truncation=True)
    # Prepare input for RoBERTa
    inputs_roberta = self.tokenizer_roberta(text, return_tensors="pt", max_length=512, truncation=True)

    # Get predictions from each model
    with torch.no_grad():
        outputs_gpt2 = self.model_gpt2(**inputs_gpt2)
        outputs_bert = self.model_bert(**inputs_bert)
        outputs_roberta = self.model_roberta(**inputs_roberta)

    # Process the logits and get probabilities (assuming binary classification for BERT and RoBERTa)
    # Assuming index_human and index_machine are the indices for "human-written" and "machine-written"

    softmax_probs_gpt2 = torch.softmax(outputs_gpt2.logits, dim=-1).squeeze()
    softmax_probs_bert = torch.softmax(outputs_bert.logits, dim=-1).squeeze()
    softmax_probs_roberta = torch.softmax(outputs_roberta.logits, dim=-1).squeeze()

    index_human = 0  # Example index for "human-written"
    index_machine = 1  # Example index for "machine-written"

    # Extract these probabilities from the tensor
    reduced_probs_gpt2 = softmax_probs_gpt2[:, [index_human, index_machine]]
    # Using only the first example from softmax_probs_gpt2 to match others
    softmax_probs_gpt2 = softmax_probs_gpt2[0:1, [index_human, index_machine]]

    # Check if we need to unsqueeze the tensors
    if softmax_probs_gpt2.ndim == 1:
        softmax_probs_gpt2 = softmax_probs_gpt2.unsqueeze(0)
    if softmax_probs_bert.ndim == 1:
        softmax_probs_bert = softmax_probs_bert.unsqueeze(0)
    if softmax_probs_roberta.ndim == 1:
        softmax_probs_roberta = softmax_probs_roberta.unsqueeze(0)

    # Checking output size
    # print(softmax_probs_gpt2, softmax_probs_bert, softmax_probs_roberta)
    # print(softmax_probs_gpt2.shape)
    # print(softmax_probs_bert.shape)
    # print(softmax_probs_roberta.shape)

    # Average the probabilities from each model
    avg_probs = (softmax_probs_gpt2 + softmax_probs_bert + softmax_probs_roberta) / 3

    # Extract the probabilities for the "human-written" class
    human_prob = avg_probs[0, 0].mean().item()
    # Extract the probabilities for the "machine-generated" class
    machine_prob = avg_probs[0, 1].mean().item()

    # Normalize the probabilities
    total_prob = human_prob + machine_prob
    human_prob /= total_prob
    machine_prob /= total_prob

    return np.array([human_prob, machine_prob])

In [None]:
#initialize the model
detector_team1 = RedBlueModelWrapper_Team1()

test_examples = [
    "the president biden is in the usa",
    "the president biden is in vietnam",
    "the president biden is in brazill",
    "Donald Trump won in Georgia",
    "Don@ld Trump won in Georgia",
    "The President Joe Biden is in the USA"
]

# inference or query the model
for text in test_examples:
  probs = detector_team1(text)
  print(probs)
  print(text, probs)

# check the total number of queries made to the model
print("total queries=", detector_team1.total_queries)

# check the average query time
print("avg query time=", detector_team1.average_time())

### Example Red Team

In [None]:
# inherit the main RedBlueAttackerWrapper class to your own model classs
# EACH TEAM WILL THEN HAVE THEIR OWN ABSTRACT CLASS
# E.G., RedBlueAttackerWrapper_Team0 FOR TEAM #0
# E.G., RedBlueAttackerWrapper_Team1 FOR TEAM #1
# E.G., RedBlueAttackerWrapper_Team2 FOR TEAM #2

class RedBlueAttackerWrapper_Team0(RedBlueAttackerWrapper):
  def __init__(self, target_model: RedBlueModelWrapper):
    super(RedBlueAttackerWrapper_Team0, self).__init__(target_model) # THIS LINE IS IMPORTANT AND MUST BE PLACED HERE

    self.target_model = target_model
    self.transformation = {
        'e': '3',
        'o': '0',
        'a': '@',
        'b': '6',
        'l': '1'
    }
    self.prob = 0.3

  def _query(self, text):
    # a simple attack algorithm where we iteratively perturb one character at a time
    # you can use more complex framework like textattack or openattack

    text2 = list(text)
    org_prob = self.target_model(text)
    org_pred = np.argmax(org_prob)

    for i, char in enumerate(text2):
      if char in self.transformation and np.random.choice([False, True], p=[1-self.prob, self.prob]):
        text2[i] = self.transformation[char]
        adv_text = "".join(text2)
        new_pred = np.argmax(self.target_model(adv_text))

        if new_pred != org_pred:
          break #stop when the prediction changes

    return adv_text

In [None]:
attacker_team0 = RedBlueAttackerWrapper_Team0(detector_team0)

attacker_team0("hello this is the president biden")

# list all attacked texts
for pair in attacker_team0.pairs:
  print(pair)

# calculate average attack time
print("average attack time=", attacker_team0.average_time()/len(attacker_team0.pairs))

### Example Competition

In [None]:
from sklearn.metrics import classification_report

In [None]:
competition_texts = [
    ("Live updates: They want him to make the ex-president pay a total of $3,000 for social media posts that allegedly violated his order", 0),
    ("Israeli war cabinet reviewed military plans for a potential response against Iran, but it’s unclear if there was a decision", 0),
    ("Biden condemns Iran’s attack in a call with King Abdullah of Jordan", 0),
    ("Biden tells Netanyahu US will not participate in any counter-strike against Iran", 0),
    ("Biden Calls for National Unity in Speech Addressing Recent Social Challenges", 1),
    ("President Biden Signs Executive Order Increasing Federal Support for Mental Health Services",  1),
    ("Biden's Education Initiative Seeks to Increase Funding for Public Schools and Teacher Salaries",  1),
    ("White House Confirms Biden's Upcoming Visit to Asia to Strengthen Alliances and Trade Agreements",  1),
    ("President Biden Advocates for Stricter Gun Control Measures in Wake of Recent Shootings",  1),
    ("Biden Administration Announces Breakthrough in Bi-partisan Infrastructure Deal", 1)
]

texts = [a[0] for a in competition_texts]
labels = [a[1] for a in competition_texts]

detector_team0 = RedBlueModelWrapper_Team0()
attacker_team0 = RedBlueAttackerWrapper_Team0(detector_team0)

preds = []
for text in texts:
  pred = detector_team0(text)
  preds.append(np.argmax(pred))
print(classification_report(labels, preds))

atk_preds = []
for text in texts:
  adv_text = attacker_team0(text)
  adv_pred = detector_team0(text)
  atk_preds.append(np.argmax(adv_pred))
print(classification_report(labels, atk_preds))

# PARTICIPANTS

## Blue Teams

### Team 1

### Team 7

### Team 8

### Team 9

### Team 10

## Red Teams

### Team 2

### Team 3

### Team 4

### Team 5

### Team 6