# Text De-Toxification, part III: Building Solution
### Robert Chen, B20-AI
--------------------

## Step 0: Imports

In [8]:
from transformers import BertTokenizer, BertForMaskedLM
import os
import sys
from pathlib import Path
# Small crutch to make Jupyter see the source folder
if not sys.path.count(str(Path(os.path.realpath("")).parent)):
    sys.path.append(str(Path(os.path.realpath("")).parent))
from src.models.condbert.chooser import SimilarityChooser
from src.models.condbert.predictor import MaskedTokenPredictorBERT
from src.models.condbert.rewriter import CondBERTRewriter
import torch
import pickle
import numpy as np
import string
from tqdm import tqdm

## Step 1: Baseline

The [original paper](https://arxiv.org/pdf/2109.08914.pdf) uses two approaches for the *text detoxification* task: **condBERT** and **ParaGeDi**. **ParaGeDi** generates completely new text based on the input, while **condBERT** identifies every word with high toxicity score and substitutes it with less toxic synonyms. Both models performed far better than the rest of fine-tuned language models. We will use the **condBERT** for this task.

First, we need to set up the model:

In [9]:
device = 'cuda' if torch.cuda.is_available() else 'cpu'
model_name = 'bert-base-uncased'
tokenizer = BertTokenizer.from_pretrained(model_name)
model = BertForMaskedLM.from_pretrained(model_name)
model.to(device)

Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForMaskedLM: ['cls.seq_relationship.bias', 'cls.seq_relationship.weight', 'bert.pooler.dense.weight', 'bert.pooler.dense.bias']
- This IS expected if you are initializing BertForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


BertForMaskedLM(
  (bert): BertModel(
    (embeddings): BertEmbeddings(
      (word_embeddings): Embedding(30522, 768, padding_idx=0)
      (position_embeddings): Embedding(512, 768)
      (token_type_embeddings): Embedding(2, 768)
      (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
      (dropout): Dropout(p=0.1, inplace=False)
    )
    (encoder): BertEncoder(
      (layer): ModuleList(
        (0-11): 12 x BertLayer(
          (attention): BertAttention(
            (self): BertSelfAttention(
              (query): Linear(in_features=768, out_features=768, bias=True)
              (key): Linear(in_features=768, out_features=768, bias=True)
              (value): Linear(in_features=768, out_features=768, bias=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
            (output): BertSelfOutput(
              (dense): Linear(in_features=768, out_features=768, bias=True)
              (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_a

Now, we set up the vocabulary:

In [10]:
negative_words, positive_words = [], []
with open('../data/interim/condbert_vocab/negative_words.txt', 'r') as f:
    tmp = f.readlines()
negative_words = list(map(lambda x: x[:-1], tmp))
with open('../data/interim/condbert_vocab/toxic_words.txt', 'r') as f:
    tmp = f.readlines()
negative_words += list(map(lambda x: x[:-1], tmp))
with open('../data/interim/condbert_vocab/positive_words.txt', 'r') as f:
    tmp = f.readlines()
positive_words = list(map(lambda x: x[:-1], tmp))

In [11]:
with open('../data/interim/condbert_vocab/word2coef.pkl', 'rb') as f:
    word2coef = pickle.load(f)

In [12]:
token_toxicities = []
with open('../data/interim/condbert_vocab/token_toxicities.txt', 'r') as f:
    for line in f.readlines():
        token_toxicities.append(float(line))
token_toxicities = np.array(token_toxicities)
token_toxicities = np.maximum(0, np.log(1 / (1 / token_toxicities - 1)))

for token in string.punctuation:
    token_toxicities[tokenizer.encode(token)][1] = 3
token_toxicities[tokenizer.encode('you')][1] = 0

Assembling the model:

In [13]:
predictor = MaskedTokenPredictorBERT(model, tokenizer, max_len=250, target=0, device=device)
chooser = SimilarityChooser(coeff=10, tokenizer=tokenizer)
rewriter = CondBERTRewriter(model, tokenizer, device, 
                            negative_words, positive_words, 
                            word2coef, token_toxicities, predictor)
def adjust_logits(logits, target=0):
    return logits - rewriter.tox_scores.cpu().numpy() * (1 - 2 * target) * 10
 
predictor.logits_postprocessor = adjust_logits

Test run:

In [14]:
with open('../data/interim/condbert_vocab/test/test_10k_toxic', 'r') as f:
    lines = f.readlines()[:10]
    for i, line in enumerate(lines):
        x = line.strip()
        y = rewriter.replace(x, verbose=False, chooser=chooser, n_top=10, n_tokens=(1, 2, 3), n_units=3)
        print(y)

you are a wayward little snob .
as hyperactive and hyperactive as his boss
a non - nonpartisan society does odd little things and votes for non - partisan politicians .
a president who is an anomaly .
colbert is a real indignated young man , and is a hard working man .
brave , brave , brave , brave , brave , brave , brave , brave , brave .
i hate you , i think you are an honest man , and youre contributions are so marvellous .
luna is a cute little , cute cutest girl with no brains , smells bad , is po , and is a pimple to the worlds ' very core !
mormons are crazy mormons are crazy mormons are crazy mormons are crazy mormons are crazy mormons are crazy mormons are crazy mormons are crazy mormons are crazy mormons are crazy mormons !
youre a very bad illogical , very bad illogical , busybody piece of douche .


## Step 2: Tuning the similarity coefficient

There are some parameters of the model that can be tuned in order to achieve better substitution. The first argument we will be trying to tune is the similarity coefficient in the `SimilarityChooser`. Firstly, let us choose the small subset from the test set and analyze the effects of changing the parameter:

In [15]:
import random
with open('../data/interim/condbert_vocab/test/test_10k_toxic', 'r') as f:
    lines = f.readlines()
    
random.seed(42)
subset = random.choices(lines, k=2)
subset

['this car is so gay that it wont give you rides if your mexican !\n',
 'fuck those gold digging bitchs .\n']

Now, let us iterate over some coefficient values and look at the results.

In [16]:
coeffs = [t for t in range(0, 110, 10)]
for i in coeffs:
    chooser.coeff = i
    results = [rewriter.replace(x, chooser=chooser, n_tokens=(1, 2, 3), n_top=10, n_units=3, verbose=False) for x in subset]
    print(f'{chooser.coeff=}:')
    print('\n'.join(results))
    print('===========================')

chooser.coeff=0:
this car is so queer that it wont give you rides if your mexican !
one of those gold digging puppies .
chooser.coeff=10:
this car is so queer that it wont give you rides if your mexican !
one of those gold digging puppies .
chooser.coeff=20:
this car is so queer that it wont give you rides if your mexican !
i knew all those gold digging invoice .
chooser.coeff=30:
this car is so queer that it wont give you rides if your mexican !
i knew all those gold digging invoice .
chooser.coeff=40:
this car is so queer that it wont give you rides if your mexican !
i knew all those gold digging invoice .
chooser.coeff=50:
this car is so queer that it wont give you rides if your mexican !
i knew all those gold digging invoice .
chooser.coeff=60:
this car is so queer that it wont give you rides if your mexican !
i knew all those gold digging invoice .
chooser.coeff=70:
this car is so queer that it wont give you rides if your mexican !
i knew all those gold digging invoice .
chooser.c

The responses are still not very similar to the original statements. Most probably because of the coefficient in `adjust_logits` function which makes the model to avoid toxic words. Let us try fine-tuning this one too.

## Step 3: Tuning the toxicity penalty

Let us iterate over the same subset with different toxicity penalties (resetting the similarity coefficient to default one too):

In [17]:
penalty = 0
chooser.coeff = 10

def adjust_logits(logits, target=0):
    global penalty
    return logits - rewriter.tox_scores.cpu().numpy() * penalty

for pen in range(0, 10):
    penalty = pen
    predictor.logits_postprocessor = adjust_logits
    results = [rewriter.replace(x, chooser=chooser, n_tokens=(1, 2, 3), n_top=10, n_units=3, verbose=False) for x in subset]
    print(f'{penalty=}:')
    print('\n'.join(results))
    print('===========================')

penalty=0:
this car is so gay that it wont give you rides if your mexican !
one of those gold digging bitch .
penalty=1:
this car is so gay that it wont give you rides if your mexican !
one of those gold digging bitch .
penalty=2:
this car is so gay that it wont give you rides if your mexican !
one of those gold digging women .
penalty=3:
this car is so gay that it wont give you rides if your mexican !
one of those gold digging women .
penalty=4:
this car is so gay that it wont give you rides if your mexican !
one of those gold digging women .
penalty=5:
this car is so queer that it wont give you rides if your mexican !
one of those gold digging puppies .
penalty=6:
this car is so queer that it wont give you rides if your mexican !
one of those gold digging puppies .
penalty=7:
this car is so queer that it wont give you rides if your mexican !
one of those gold digging puppies .
penalty=8:
this car is so queer that it wont give you rides if your mexican !
one of those gold digging pupp

## Step 4: Creating more tokens, re-ranking more hypotheses

Some of the parameters like `n_tokens` and `n_top` also affect the generation process. `n_tokens` sets the possible number of tokens that can be generated from one word, `n_top` sets the amount of BERT hypotheses that are re-ranked each time. Let us try to see the effect of changing these parameters and try to balance out all of the values.

In [18]:
n_tokens_batch = [(1, ), (2, ), (3, ), (1, 2), (2, 3), (3, 4), (1, 2, 3, 4), (1, 2, 3, 4, 5)]
n_top_batch = [t for t in range(5, 20, 2)]

In [19]:
for n_tokens in n_tokens_batch:
    results = [rewriter.replace(x, chooser=chooser, n_tokens=n_tokens, n_top=10, n_units=3, verbose=False) for x in subset]
    print(f'{n_tokens=}:')
    print('\n'.join(results))
    print('=================')

n_tokens=(1,):
this car is so queer that it wont give you rides if your mexican !
crud those gold digging witch .
n_tokens=(2,):
this car is so much bigger that it wont give you rides if you go out there !
oops , all of those gold digging frolic days .
n_tokens=(3,):
this car is so high - tech that it wont give you rides if you are a good enough driver !
i know all of them are gold digging douche gold digging .
n_tokens=(1, 2):
this car is so queer that it wont give you rides if your mexican !
oops all those gold diggings again .
n_tokens=(2, 3):
this car is so high - tech that it wont give you rides if you turn it on !
one of those other gold diggings over there .
n_tokens=(3, 4):
this car is so unconcerned that it wont give you rides if you are a very good driver !
i know all of them are gold digging douche gold digging .
n_tokens=(1, 2, 3, 4):
this car is so unconcerned that it wont give you rides if you want to !
one of those gold diggings , anyways .
n_tokens=(1, 2, 3, 4, 5):
this

In [20]:
for n_top in n_top_batch:
    results = [rewriter.replace(x, chooser=chooser, n_tokens=(1, 2, 3), n_top=n_top, n_units=3, verbose=False) for x in subset]
    print(f'{n_top=}:')
    print('\n'.join(results))
    print('=================')

n_top=5:
this car is so queer that it wont give you rides if your mexican !
not like those gold diggings here .
n_top=7:
this car is so queer that it wont give you rides if your mexican !
i knew all those gold digging invoice .
n_top=9:
this car is so queer that it wont give you rides if your mexican !
one of those gold digging puppies .
n_top=11:
this car is so queer that it wont give you rides if your mexican !
one of those gold digging puppies .
n_top=13:
this car is so queer that it wont give you rides if your mexican !
one of those gold digging puppies .
n_top=15:
this car is so queer that it wont give you rides if your mexican !
one of those gold digging puppies .
n_top=17:
this car is so queer that it wont give you rides if your mexican !
one of those gold digging puppies .
n_top=19:
this car is so queer that it wont give you rides if your mexican !
oops all those gold digging around here .


## 