This notebook reproduces creation of CondBERT vocabulary.

The files `positive-words.txt`, `negative-words.txt` and `toxic_words.txt` are not reproduced exactly because of our internal issues. 

However, all other files (`token_toxicities.txt` and `word2coef.pkl` ) are reproduced accurately. 

# 0. Prerequisites

In [1]:
VOCAB_DIRNAME = 'vocabularies' 

In [2]:
from condbert import CondBertRewriter
from choosers import EmbeddingSimilarityChooser
from masked_token_predictor_bert import MaskedTokenPredictorBert

# 1. Loading BERT

In [3]:
import torch
from transformers import BertTokenizer, BertForMaskedLM
import numpy as np
import pickle
import os
from tqdm.auto import tqdm, trange

import numpy as np
import pandas as pd

In [4]:
os.environ['CUDA_VISIBLE_DEVICES'] = '0'
device = torch.device('cuda:0')
device = torch.device('cpu')

In [5]:
model_name = 'bert-base-uncased'
tokenizer = BertTokenizer.from_pretrained(model_name)

In [6]:
model = BertForMaskedLM.from_pretrained(model_name)

Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForMaskedLM: ['cls.seq_relationship.weight', 'bert.pooler.dense.weight', 'bert.pooler.dense.bias', 'cls.seq_relationship.bias']
- This IS expected if you are initializing BertForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


In [7]:
model.to(device);

# 2. Preparing the vocabularires.


- negative-words.txt
- positive-words.txt
- word2coef.pkl
- token_toxicities.txt

These files should be prepared once. 

In [12]:
tox_corpus_path = '../../data/train/train_toxic'
norm_corpus_path = '../../data/train/train_normal'

In [8]:
dataset = pd.read_csv('dataset.csv')
dataset

Unnamed: 0,reference,translation,similarity,lenght_diff,ref_tox,trn_tox
0,"if Alkar floods her with her mental waste, it ...","If Alkar is flooding her with psychic waste, t...",0.785171,0.010309,0.981983,0.014195
1,you're becoming disgusting.,Now you're getting nasty.,0.749687,0.071429,0.999039,0.065473
2,"well, we can spare your life.","Well, we could spare your life, for one.",0.919051,0.268293,0.985068,0.213313
3,"monkey, you have to wake up.","Ah! Monkey, you've got to snap out of it.",0.664333,0.309524,0.994215,0.053362
4,I have orders to kill her.,I've got orders to put her down.,0.726639,0.181818,0.999348,0.009402
...,...,...,...,...,...,...
577772,you didn't know that Estelle stole your fish f...,You didn't know that Estelle had stolen some f...,0.870322,0.030769,0.949143,0.000121
577773,It'il suck the life out of you!,you'd be sucked out of your life!,0.722897,0.058824,0.996124,0.215794
577774,"I can't fuckin' take that, bruv.",I really can't take this.,0.617511,0.212121,0.984538,0.000049
577775,They called me a fucking hero. The truth is I ...,"they said I was a hero, but I didn't care.",0.679613,0.358209,0.991945,0.000124


### 2.1 Preparing the DRG-like vocabularies

In [24]:
import os
import argparse
import numpy as np
from tqdm import tqdm
from nltk import ngrams
from sklearn.linear_model import LogisticRegression
from sklearn.feature_extraction.text import TfidfVectorizer, CountVectorizer



class NgramSalienceCalculator():
    def __init__(self, tox_corpus, norm_corpus, use_ngrams=False):
        ngrams = (1, 3) if use_ngrams else (1, 1)
        self.vectorizer = CountVectorizer(ngram_range=ngrams)

        tox_count_matrix = self.vectorizer.fit_transform(tox_corpus)
        self.tox_vocab = self.vectorizer.vocabulary_
        self.tox_counts = np.sum(tox_count_matrix, axis=0)

        norm_count_matrix = self.vectorizer.fit_transform(norm_corpus)
        self.norm_vocab = self.vectorizer.vocabulary_
        self.norm_counts = np.sum(norm_count_matrix, axis=0)

    def salience(self, feature, attribute='tox', lmbda=0.5):
        assert attribute in ['tox', 'norm']
        if feature not in self.tox_vocab:
            tox_count = 0.0
        else:
            tox_count = self.tox_counts[0, self.tox_vocab[feature]]

        if feature not in self.norm_vocab:
            norm_count = 0.0
        else:
            norm_count = self.norm_counts[0, self.norm_vocab[feature]]

        if attribute == 'tox':
            return (tox_count + lmbda) / (norm_count + lmbda)
        else:
            return (norm_count + lmbda) / (tox_count + lmbda)


In [25]:
df = pd.read_csv('filtered.tsv', sep='\t')
df = df.drop(['Unnamed: 0'], axis=1)

texts = df["reference"].iloc[:100000].tolist()
labels = (df["ref_tox"].iloc[:100000] >= 0.5).astype(int).tolist()
toxic_sentence = []
nontoxic_sentence = []

for i in range(100000):
    if labels[i] == 0:
        nontoxic_sentence.append(texts[i])
    else:
        toxic_sentence.append(texts[i])

In [11]:
toxic_sentence = list(dataset['reference'][:460000])
nontoxic_sentence = list(dataset['translation'][:460000])

In [15]:
nontoxic_sentence[0], toxic_sentence[0]

('If Alkar is flooding her with psychic waste, that explains the high level of neurotransmitters.',
 'if Alkar floods her with her mental waste, it would explain the high levels of neurotransmitter.')

In [16]:
from collections import Counter
c = Counter()

for sentence_list in [nontoxic_sentence, toxic_sentence]:
    for sentence in sentence_list:
        for tok in sentence.strip().split():
            c[tok] += 1

print(len(c))

277073


In [17]:
vocab = {w for w, _ in c.most_common() if _ > 0}  # if we took words with > 1 occurences, vocabulary would be x2 smaller, but we'll survive this size
print(len(vocab))

277073


In [18]:
corpus_tox = [' '.join([w if w in vocab else '<unk>' for w in sentence.split()]) for sentence in toxic_sentence]
corpus_norm = [' '.join([w if w in vocab else '<unk>' for w in sentence.split()]) for sentence in nontoxic_sentence]

In [19]:
corpus_norm[0], corpus_tox[0]

('If Alkar is flooding her with psychic waste, that explains the high level of neurotransmitters.',
 'if Alkar floods her with her mental waste, it would explain the high levels of neurotransmitter.')

In [20]:
neg_out_name = 'negative-words.txt'
pos_out_name = 'positive-words.txt'

In [21]:
neg_out_list = []
pos_out_list = []

In [22]:
threshold = 4

In [25]:
sc = NgramSalienceCalculator(corpus_tox, corpus_norm, False)
seen_grams = set()

with open(neg_out_name, 'w', encoding='utf-8') as neg_out, open(pos_out_name, 'w', encoding='utf-8') as pos_out:
    for gram in set(sc.tox_vocab.keys()).union(set(sc.norm_vocab.keys())):
        if gram not in seen_grams:
            seen_grams.add(gram)
            toxic_salience = sc.salience(gram, attribute='tox')
            polite_salience = sc.salience(gram, attribute='norm')
            if toxic_salience > threshold:
                neg_out.writelines(f'{gram}\n')
                neg_out_list.append(gram)
            elif polite_salience > threshold:
                pos_out.writelines(f'{gram}\n')
                pos_out_list.append(gram)

In [26]:
neg_out_list[0], pos_out_list[0]

('horanzy', 'drunkenly')

## 2.2 Evaluating word toxicities with a logistic regression

In [27]:
from sklearn.pipeline import make_pipeline
pipe = make_pipeline(CountVectorizer(), LogisticRegression(max_iter=1000))

In [28]:
X_train = corpus_tox + corpus_norm
y_train = [1] * len(corpus_tox) + [0] * len(corpus_norm)
pipe.fit(X_train, y_train);

In [29]:
coefs = pipe[1].coef_[0]
coefs.shape

(91167,)

In [30]:
word2coef = {w: coefs[idx] for w, idx in pipe[0].vocabulary_.items()}

In [31]:
len(word2coef)

91167

In [32]:
import pickle
with open('word2coef.pkl', 'wb') as f:
    pickle.dump(word2coef, f)

## 2.3 Labelling BERT tokens by toxicity

In [33]:
from collections import defaultdict
toxic_counter = defaultdict(lambda: 1)
nontoxic_counter = defaultdict(lambda: 1)

for text in tqdm(corpus_tox):
    for token in tokenizer.encode(text):
        toxic_counter[token] += 1
for text in tqdm(corpus_norm):
    for token in tokenizer.encode(text):
        nontoxic_counter[token] += 1

100%|████████████████████████████████████████████████████████████████████████| 460000/460000 [02:12<00:00, 3478.66it/s]
100%|████████████████████████████████████████████████████████████████████████| 460000/460000 [02:01<00:00, 3780.65it/s]


In [34]:
token_toxicities = [toxic_counter[i] / (nontoxic_counter[i] + toxic_counter[i]) for i in range(len(tokenizer.vocab))]

In [35]:
len(toxic_counter)

30522

In [36]:
with open('token_toxicities.txt', 'w') as f:
    for t in token_toxicities:
        f.write(str(t))
        f.write('\n')

# 3. Setting up the model

### 3.1 Loading the vocabularies

In [37]:
with open("negative-words.txt", "r") as f:
    s = f.readlines()
negative_words = list(map(lambda x: x[:-1], s))

with open("positive-words.txt", "r") as f:
    s = f.readlines()
positive_words = list(map(lambda x: x[:-1], s))

In [38]:
import pickle
with open('word2coef.pkl', 'rb') as f:
    word2coef = pickle.load(f)

In [39]:
token_toxicities = []
with open('token_toxicities.txt', 'r') as f:
    for line in f.readlines():
        token_toxicities.append(float(line))
token_toxicities = np.array(token_toxicities)
token_toxicities = np.maximum(0, np.log(1/(1/token_toxicities-1)))   # log odds ratio

# discourage meaningless tokens
for tok in ['.', ',', '-']:
    token_toxicities[tokenizer.encode(tok)][1] = 3

for tok in ['you']:
    token_toxicities[tokenizer.encode(tok)][1] = 0
token_toxicities

array([0., 0., 0., ..., 0., 0., 0.])

In [40]:
token_toxicities[1000:1100]

array([0.        , 0.        , 0.09807049, 0.        , 0.02439145,
       0.00767199, 0.        , 0.        , 0.        , 0.31845373,
       0.01601598, 0.05496262, 0.04110121, 0.        , 0.        ,
       0.00664454, 0.41154415, 0.03454033, 0.04485057, 0.06899287,
       0.        , 0.32004736, 0.2006707 , 0.        , 0.        ,
       0.02999225, 0.        , 0.        , 0.06899287, 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.2572846 , 0.08590271, 0.        , 0.        ,
       0.09870107, 0.        , 0.01503788, 0.        , 0.34016001,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.033416  , 0.        , 0.02230576, 0.14518201, 0.12541666,
       0.        , 0.        , 0.09472096, 0.        , 0.        ,
       0.        , 0.19597365, 0.        , 0.        , 0.        ,
       0.        , 0.37156356, 0.        , 0.        , 0.69314718,
       0.        , 0.        , 0.        , 0.        , 1.38629

In [41]:
for i in range(len(token_toxicities)):
    if token_toxicities[i] != 0:
        print(token_toxicities[i], i)

0.14016612365818457 999
0.09807049439242561 1002
0.024391453124159267 1004
0.007671992578933342 1005
0.3184537311185346 1009
0.016015976822938177 1010
0.05496262248580642 1011
0.0411012111178175 1012
0.0066445427186682905 1015
0.4115441541845468 1016
0.034540325252176075 1017
0.04485056616535192 1018
0.06899287148695163 1019
0.3200473569538128 1021
0.20067069546215124 1022
0.02999225138701477 1025
0.06899287148695163 1028
0.2572845952674102 1036
0.08590271140705844 1037
0.09870107272655859 1040
0.015037877364540502 1042
0.3401600097003975 1044
0.03341599691984402 1050
0.022305757514298186 1052
0.14518200984449808 1053
0.12541666051145456 1054
0.0947209624642886 1057
0.19597364596228184 1061
0.371563556432483 1066
0.6931471805599453 1069
1.3862943611198906 1074
0.6931471805599453 1076
0.6931471805599453 1083
0.28768207245178085 1084
0.6931471805599453 1092
0.1823215567939544 1094
0.6931471805599453 1099
0.4054651081081642 1155
2.397895272798369 1159
0.31015492830383945 1161
0.6931471805

In [42]:
def adjust_logits(logits, label=0):
    return logits - token_toxicities * 100 * (1 - 2 * label)

predictor = MaskedTokenPredictorBert(model, tokenizer, max_len=250, device=device, label=0, contrast_penalty=0.0, logits_postprocessor=adjust_logits)

editor = CondBertRewriter(
    model=model,
    tokenizer=tokenizer,
    device=device,
    neg_words=negative_words,
    pos_words=positive_words,
    word2coef=word2coef,
    token_toxicities=token_toxicities,
    predictor=predictor,
)

The model below is used for reranking BERT hypotheses and helps to increase semantic similarity by choosing the hypotheses with  embeddings similar to the orignal words. 

In [None]:
chooser = EmbeddingSimilarityChooser(sim_coef=10, tokenizer=tokenizer)

# 4. Finally, the inference

Parallel application of the model to all tokens, fast, but dirty. 

In [44]:
bimba = toxic_sentence[30:50]
for sentence in bimba:
    print(sentence)
    print(editor.translate(sentence, prnt=False))

do you want bad news, or rather miserable?
do you want bad news , or rather sad ?
and I'm not just talking about hitting me for your boyfriend, what a girl.
and i ' m not just talking about hitting me for your boyfriend , what a girl .
and Murray has his eyes on his ass, he can't see the problem until one day Glenda says, "I used to think that anything was better than nothing."
and murray has his eyes on his way , he can ' t see the problem until one day glenda says , " i used to think that anything was better than nothing . "
no matter the reasons, this company is incompetent.
no matter the reasons , this company is inc .roigent .
Now, I understand you got your grievances with these fools, but you need to talk to the police and air this shit.
now , i understand you got your grievances with these men , but you need to talk to the police and air this stuff .
I'll rot in front of his cameras, on him.
i ' ll be in front of his cameras , on him .
some killer!
some killer !
Tell him if Elen

In [58]:
bimba1 = list(dataset['reference'][0:10])
bimba2 = list(dataset['translation'][0:10])
bimba_pred = []
bimba1

NameError: name 'dataset' is not defined

Application of the model to all the tokens sequentially, in the multiword mode. 

Parameters that could be tuned:
* The coeffincient in `adjust_logits` - the larger it is, the more the model avoids toxic words
* The coefficient in `EmbeddingSimilarityChooser` - the larger it is, the more the model tries to preserve content 
* n_tokens - how many words can be generated from one
* n_top - how many BERT hypotheses are reranked