# Word-wise translation

One approach to this is to isolate "toxic" vocabulary $X$ and "anti-toxic" vocabulary $Y$ and then find 

$W = \arg\min_W ||WX - Y||_2$

reference paper: https://arxiv.org/pdf/1309.4168.pdf

In other words, this is dictionary approach, but done in a more automatic way than constructing parallel words corpora by hand

### Reducing parallel sentences to parallel words

There is a [way](https://arxiv.org/pdf/1710.04087.pdf) to not use parallel corpora at all, but it is quite complicated for the baseline hence I don't implement it

First idea: find sentences with high BLEU score, and compute their symmetric difference

In [1]:
import os, sys

dir2 = os.path.abspath('')
dir1 = os.path.dirname(dir2)
if not dir1 in sys.path:
    sys.path.append(dir1)

In [2]:
import numpy as np
import matplotlib.pyplot as plt
from tqdm.auto import tqdm
from src.data.make_dataset import TextDetoxificationDataset, Evaluator
bleu_score = Evaluator.bleu_score

In [3]:
train_dataset = TextDetoxificationDataset(mode='train')
val_dataset = TextDetoxificationDataset(mode='val', vocab=train_dataset.vocab)

[nltk_data] Downloading package punkt to
[nltk_data]     C:\Users\mirak\AppData\Roaming\nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[32m2023-10-21 17:27:03.780[0m | [1mINFO    [0m | [36msrc.data.make_dataset[0m:[36m__init__[0m:[36m212[0m - [1mStarted building vocab[0m


Collecting vocab: 0it [00:00, ?it/s]

[32m2023-10-21 17:29:33.522[0m | [1mINFO    [0m | [36msrc.data.make_dataset[0m:[36m__init__[0m:[36m219[0m - [1mVocab built successfully[0m
[nltk_data] Downloading package punkt to
[nltk_data]     C:\Users\mirak\AppData\Roaming\nltk_data...
[nltk_data]   Package punkt is already up-to-date!


In [4]:
bleu_threshold = 0.80
source_target = []
bleus = []
for src, tgt, stat in tqdm(train_dataset):
    # take all sentences that 
    if src.shape != tgt.shape:
        continue
    src_n, tgt_n = src.numpy(), tgt.numpy()
    bleu = bleu_score(src_n, tgt_n)
    bleus.append(bleu)
    if bleu > bleu_threshold:
        source_target.extend(list(zip(src_n[src_n != tgt_n], tgt_n[src_n != tgt_n])))

  0%|          | 0/462221 [00:00<?, ?it/s]

In [5]:
print(len(source_target))
source_target_tokens = [(train_dataset.vocab.get_itos()[s], train_dataset.vocab.get_itos()[t]) for s, t in source_target if s != 1 and t != 1]
print(source_target_tokens[::30])

781
[('i', 'even'), ('stupid', 'thick'), ("n't", 'have'), ('sex', 'sleeping'), ('cowards', 'celebrities'), ('because', "'cause"), ('.', '?'), ('fool', 'mutt'), ('shit', 'oh'), ('are', 'there'), ('shit', 'hell'), ('damn', 'no'), ('trying', 'to'), ('need', 'have'), ('.', '--'), ('nothing', 'something'), ('she', 'it'), ('shit', 'holy'), ('dick', 'douche'), ('you', '!'), ('containers', 'packaging'), ('shit', 'pad'), ('damn', 'jesus')]


### Train W

In [6]:
import gensim.downloader as api
glove_model = api.load('glove-twitter-100')

In [7]:
X_train = np.array([glove_model.get_vector(s) for s, t in source_target_tokens if s in glove_model and t in glove_model])
Y_train = np.array([glove_model.get_vector(t) for s, t in source_target_tokens if s in glove_model and t in glove_model])

In [8]:
# SVD is done for W's orthogonality
U, S, Vh = np.linalg.svd(X_train.T @ Y_train, full_matrices=True)
W = U @ Vh

In [9]:
for i in range(10):
    source, target = source_target_tokens[np.random.randint(0, len(source_target_tokens))]
    print(source, target, glove_model.most_similar([W @ glove_model[source]], topn=4))

put get [('throw', 0.8304006457328796), ('put', 0.8286388516426086), ('get', 0.8241252303123474), ('take', 0.8208140134811401)]
damn hell [('shit', 0.8791956305503845), ('damn', 0.8741188645362854), ('fuck', 0.8453155755996704), ('stupid', 0.8417540788650513)]
. ? [('<repeat>', 0.7799089550971985), ('.', 0.7785171270370483), ('?', 0.7693502306938171), ('!', 0.7250091433525085)]
cut crash [('cut', 0.7016690969467163), ('it', 0.70100998878479), ('put', 0.695357620716095), ("'ll", 0.6889913082122803)]
crap thing [('crap', 0.8507955074310303), ('stupid', 0.8250228762626648), ('shit', 0.7968918681144714), ('stuff', 0.7954347729682922)]
had were [('have', 0.8417242765426636), ('would', 0.8301723003387451), ('that', 0.8272920846939087), ('made', 0.826287567615509)]
and to [('there', 0.8871590495109558), ('that', 0.8850626349449158), ('they', 0.868509829044342), ('if', 0.8650673031806946)]
spoiled depraved [('pissed', 0.6166033744812012), ('cunt', 0.5962963700294495), ('hungover', 0.5896039009

# Conclusion / report

The initial assumption under this approach (there is enough pairs where the key toxic word is changed for non-toxic one) does not seem to hold, so the best fit for the baseline would be simple recurrent encoder-decoder model 

It could also be the case that the performance would benefit from data selection, however, this solution will still be unlikely to best RNN