This notebook reproduces creation of CondBERT vocabulary.

The files `positive-words.txt`, `negative-words.txt` and `toxic_words.txt` are not reproduced exactly because of our internal issues. 

However, all other files (`token_toxicities.txt` and `word2coef.pkl` ) are reproduced accurately. 

# 0. Prerequisites

In [1]:
VOCAB_DIRNAME = 'vocabularies' 

In [2]:
from condbert import CondBertRewriter
from choosers import EmbeddingSimilarityChooser
from masked_token_predictor_bert import MaskedTokenPredictorBert

# 1. Loading BERT

In [3]:
import torch
from transformers import BertTokenizer, BertForMaskedLM
import numpy as np
import pickle
import os
from tqdm.auto import tqdm, trange

import numpy as np
import pandas as pd

In [4]:
os.environ['CUDA_VISIBLE_DEVICES'] = '0'
device = torch.device('cuda:0')
device = torch.device('cpu')

In [5]:
model_name = 'bert-base-uncased'
tokenizer = BertTokenizer.from_pretrained(model_name)

In [6]:
model = BertForMaskedLM.from_pretrained(model_name)

Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForMaskedLM: ['cls.seq_relationship.weight', 'bert.pooler.dense.weight', 'cls.seq_relationship.bias', 'bert.pooler.dense.bias']
- This IS expected if you are initializing BertForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


In [7]:
model.to(device);

# 2. Preparing the vocabularires.


- negative-words.txt
- positive-words.txt
- word2coef.pkl
- token_toxicities.txt

These files should be prepared once. 

In [12]:
tox_corpus_path = '../../data/train/train_toxic'
norm_corpus_path = '../../data/train/train_normal'

### 2.1 Preparing the DRG-like vocabularies

In [35]:
import os
import argparse
import numpy as np
from tqdm import tqdm
from nltk import ngrams
from sklearn.linear_model import LogisticRegression
from sklearn.feature_extraction.text import TfidfVectorizer, CountVectorizer



class NgramSalienceCalculator():
    def __init__(self, tox_corpus, norm_corpus, use_ngrams=False):
        ngrams = (1, 3) if use_ngrams else (1, 1)
        self.vectorizer = CountVectorizer(ngram_range=ngrams)

        tox_count_matrix = self.vectorizer.fit_transform(tox_corpus)
        self.tox_vocab = self.vectorizer.vocabulary_
        self.tox_counts = np.sum(tox_count_matrix, axis=0)

        norm_count_matrix = self.vectorizer.fit_transform(norm_corpus)
        self.norm_vocab = self.vectorizer.vocabulary_
        self.norm_counts = np.sum(norm_count_matrix, axis=0)

    def salience(self, feature, attribute='tox', lmbda=0.5):
        assert attribute in ['tox', 'norm']
        if feature not in self.tox_vocab:
            tox_count = 0.0
        else:
            tox_count = self.tox_counts[0, self.tox_vocab[feature]]

        if feature not in self.norm_vocab:
            norm_count = 0.0
        else:
            norm_count = self.norm_counts[0, self.norm_vocab[feature]]

        if attribute == 'tox':
            return (tox_count + lmbda) / (norm_count + lmbda)
        else:
            return (norm_count + lmbda) / (tox_count + lmbda)


In [25]:
df = pd.read_csv('filtered.tsv', sep='\t')
df = df.drop(['Unnamed: 0'], axis=1)

texts = df["reference"].iloc[:100000].tolist()
labels = (df["ref_tox"].iloc[:100000] >= 0.5).astype(int).tolist()
toxic_sentence = []
nontoxic_sentence = []

for i in range(100000):
    if labels[i] == 0:
        nontoxic_sentence.append(texts[i])
    else:
        toxic_sentence.append(texts[i])

In [26]:
nontoxic_sentence[0], toxic_sentence[0]

('If Alkar is flooding her with psychic waste, that explains the high level of neurotransmitters.',
 "I'm not gonna have a child... ...with the same genetic disorder as me who's gonna die. L...")

In [27]:
from collections import Counter
c = Counter()

for sentence_list in [nontoxic_sentence, toxic_sentence]:
    for sentence in sentence_list:
        for tok in sentence.strip().split():
            c[tok] += 1

print(len(c))

89028


In [28]:
vocab = {w for w, _ in c.most_common() if _ > 0}  # if we took words with > 1 occurences, vocabulary would be x2 smaller, but we'll survive this size
print(len(vocab))

89028


In [29]:
corpus_tox = [' '.join([w if w in vocab else '<unk>' for w in sentence.split()]) for sentence in toxic_sentence]
corpus_norm = [' '.join([w if w in vocab else '<unk>' for w in sentence.split()]) for sentence in nontoxic_sentence]

In [30]:
corpus_norm[0], corpus_tox[0]

('If Alkar is flooding her with psychic waste, that explains the high level of neurotransmitters.',
 "I'm not gonna have a child... ...with the same genetic disorder as me who's gonna die. L...")

In [31]:
neg_out_name = 'negative-words.txt'
pos_out_name = 'positive-words.txt'

In [32]:
neg_out_list = []
pos_out_list = []

In [33]:
threshold = 4

In [37]:
sc = NgramSalienceCalculator(corpus_tox, corpus_norm, False)
seen_grams = set()

with open(neg_out_name, 'w', encoding='utf-8') as neg_out, open(pos_out_name, 'w', encoding='utf-8') as pos_out:
    for gram in set(sc.tox_vocab.keys()).union(set(sc.norm_vocab.keys())):
        if gram not in seen_grams:
            seen_grams.add(gram)
            toxic_salience = sc.salience(gram, attribute='tox')
            polite_salience = sc.salience(gram, attribute='norm')
            if toxic_salience > threshold:
                neg_out.writelines(f'{gram}\n')
                neg_out_list.append(gram)
            elif polite_salience > threshold:
                pos_out.writelines(f'{gram}\n')
                pos_out_list.append(gram)

In [38]:
neg_out_list[0], pos_out_list[0]

('bugfuck', 'marbles')

## 2.2 Evaluating word toxicities with a logistic regression

In [39]:
from sklearn.pipeline import make_pipeline
pipe = make_pipeline(CountVectorizer(), LogisticRegression(max_iter=1000))

In [40]:
X_train = corpus_tox + corpus_norm
y_train = [1] * len(corpus_tox) + [0] * len(corpus_norm)
pipe.fit(X_train, y_train);

In [41]:
coefs = pipe[1].coef_[0]
coefs.shape

(38837,)

In [42]:
word2coef = {w: coefs[idx] for w, idx in pipe[0].vocabulary_.items()}

In [43]:
len(word2coef)

38837

In [44]:
import pickle
with open('word2coef.pkl', 'wb') as f:
    pickle.dump(word2coef, f)

## 2.3 Labelling BERT tokens by toxicity

In [45]:
from collections import defaultdict
toxic_counter = defaultdict(lambda: 1)
nontoxic_counter = defaultdict(lambda: 1)

for text in tqdm(corpus_tox):
    for token in tokenizer.encode(text):
        toxic_counter[token] += 1
for text in tqdm(corpus_norm):
    for token in tokenizer.encode(text):
        nontoxic_counter[token] += 1

100%|██████████████████████████████████████████████████████████████████████████| 55479/55479 [00:15<00:00, 3631.32it/s]
100%|██████████████████████████████████████████████████████████████████████████| 44521/44521 [00:11<00:00, 3807.34it/s]


In [46]:
token_toxicities = [toxic_counter[i] / (nontoxic_counter[i] + toxic_counter[i]) for i in range(len(tokenizer.vocab))]

In [47]:
len(toxic_counter)

30522

In [48]:
with open('token_toxicities.txt', 'w') as f:
    for t in token_toxicities:
        f.write(str(t))
        f.write('\n')

# 3. Setting up the model

### 3.1 Loading the vocabularies

In [49]:
with open("negative-words.txt", "r") as f:
    s = f.readlines()
negative_words = list(map(lambda x: x[:-1], s))

with open("positive-words.txt", "r") as f:
    s = f.readlines()
positive_words = list(map(lambda x: x[:-1], s))

In [51]:
import pickle
with open('word2coef.pkl', 'rb') as f:
    word2coef = pickle.load(f)

In [52]:
token_toxicities = []
with open('token_toxicities.txt', 'r') as f:
    for line in f.readlines():
        token_toxicities.append(float(line))
token_toxicities = np.array(token_toxicities)
token_toxicities = np.maximum(0, np.log(1/(1/token_toxicities-1)))   # log odds ratio

# discourage meaningless tokens
for tok in ['.', ',', '-']:
    token_toxicities[tokenizer.encode(tok)][1] = 3

for tok in ['you']:
    token_toxicities[tokenizer.encode(tok)][1] = 0
token_toxicities

array([0., 0., 0., ..., 0., 0., 0.])

In [53]:
token_toxicities[1000:1100]

array([0.0475659 , 0.        , 0.87924946, 0.42285685, 0.47000363,
       0.27707757, 0.        , 0.        , 0.        , 1.09861229,
       0.191522  , 0.18870436, 0.21695686, 0.        , 0.        ,
       0.        , 0.61310447, 0.        , 0.        , 0.73396918,
       0.58778666, 0.20763936, 0.        , 0.38299225, 0.        ,
       0.072949  , 0.        , 0.        , 0.        , 0.33496757,
       0.        , 0.33647224, 0.        , 0.15415068, 0.        ,
       0.        , 0.93609336, 0.24173638, 0.        , 0.09097178,
       0.31242508, 0.09844007, 0.14058195, 0.14660347, 0.        ,
       0.27835729, 0.36772478, 0.06453852, 0.        , 0.2868547 ,
       0.5389965 , 0.09217046, 0.76913309, 0.76214005, 0.67116827,
       0.19394316, 0.39147294, 0.        , 0.        , 0.15684247,
       0.        , 0.32423967, 0.06899287, 0.        , 0.        ,
       0.        , 0.43078292, 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 1.09861

In [54]:
for i in range(len(token_toxicities)):
    if token_toxicities[i] != 0:
        print(token_toxicities[i], i)

0.2200391464184931 101
0.2200391464184931 102
0.40559597246744283 999
0.047565904081651345 1000
0.8792494601938062 1002
0.4228568508200333 1003
0.47000362924573563 1004
0.27707756721028115 1005
1.09861228866811 1009
0.1915220044202709 1010
0.18870436318139963 1011
0.2169568553505856 1012
0.6131044728864091 1016
0.7339691750802005 1019
0.5877866649021194 1020
0.20763936477824455 1021
0.382992252256106 1023
0.07294899964616748 1025
0.3349675655885445 1029
0.336472236621213 1031
0.15415067982725816 1033
0.9360933591703349 1036
0.24173638439082565 1037
0.09097177820572659 1039
0.3124250768383784 1040
0.09844007281325251 1041
0.1405819506211894 1042
0.14660347419187544 1043
0.27835729440264667 1045
0.36772478012531734 1046
0.06453852113757116 1047
0.28685469813592257 1049
0.5389965007326867 1050
0.0921704597996571 1051
0.7691330875378674 1052
0.7621400520468965 1053
0.6711682738411702 1054
0.19394315964805564 1055
0.39147293579708525 1056
0.15684247149296987 1059
0.32423966818557876 1061
0.

In [55]:
def adjust_logits(logits, label=0):
    return logits - token_toxicities * 100 * (1 - 2 * label)

predictor = MaskedTokenPredictorBert(model, tokenizer, max_len=250, device=device, label=0, contrast_penalty=0.0, logits_postprocessor=adjust_logits)

editor = CondBertRewriter(
    model=model,
    tokenizer=tokenizer,
    device=device,
    neg_words=negative_words,
    pos_words=positive_words,
    word2coef=word2coef,
    token_toxicities=token_toxicities,
    predictor=predictor,
)

The model below is used for reranking BERT hypotheses and helps to increase semantic similarity by choosing the hypotheses with  embeddings similar to the orignal words. 

In [None]:
chooser = EmbeddingSimilarityChooser(sim_coef=10, tokenizer=tokenizer)

# 4. Finally, the inference

Parallel application of the model to all tokens, fast, but dirty. 

In [57]:
bimba = texts[20:30]
for sentence in bimba:
    print(sentence)
    print(editor.translate(sentence, prnt=False))

Fatuous Xerxes blithely surrendered control to the computer grid, ordering it to take care of whatever troubles might arise.
fatuous xerxes blithely surrendered control to the computer grid , directive it to take care of whatever troubles might arise .
That night, Li'l Dice satisfied his thirst to kill, though he knew Shaggy would never forgive him.
that night , li ' l dice satisfied his thirst to kills , though he knew shaggy would never forgive him .
Real life starts the first time you fuck, kid.
real life starts the first time you screw , kid .
I think you are the strangest man I've ever met.
i think you are the strangest man i ' ve ever met .
I say creepy, I mean, it's totally batshit crazy I can even talk to you.
i say creepy , i mean , it ' s totally batshit crazy i can even talk to you .
Shit, this one I can't even pronounce.
. , this one i can ' t even pronounce .
I like that shit.
i like that stuff .
Trying to keep me fucking drugged so I don't know what's going on.
trying to 

In [58]:
bimba1 = list(dataset['reference'][0:10])
bimba2 = list(dataset['translation'][0:10])
bimba_pred = []
bimba1

NameError: name 'dataset' is not defined

Application of the model to all the tokens sequentially, in the multiword mode. 

Parameters that could be tuned:
* The coeffincient in `adjust_logits` - the larger it is, the more the model avoids toxic words
* The coefficient in `EmbeddingSimilarityChooser` - the larger it is, the more the model tries to preserve content 
* n_tokens - how many words can be generated from one
* n_top - how many BERT hypotheses are reranked