# Word-wise translation

One approach to this is to isolate "toxic" vocabulary $X$ and "anti-toxic" vocabulary $Y$ and then find 

$W = \arg\min_W ||WX - Y||_2$

reference paper: https://arxiv.org/pdf/1309.4168.pdf

In other words, this is dictionary approach, but done in a more automatic way than constructing parallel words corpora by hand

### Reducing parallel sentences to parallel words

There is a [way](https://arxiv.org/pdf/1710.04087.pdf) to not use parallel corpora at all, but it is quite complicated for the baseline hence I don't implement it

First idea: find sentences with high BLEU score, and compute their symmetric difference

In [1]:
import os, sys

dir2 = os.path.abspath('')
dir1 = os.path.dirname(dir2)
if not dir1 in sys.path:
    sys.path.append(dir1)

In [2]:
import numpy as np
import matplotlib.pyplot as plt
from tqdm.auto import tqdm
from src.data.make_dataset import TextDetoxificationDataset, bleu_score

In [3]:
train_dataset = TextDetoxificationDataset(mode='train')
val_dataset = TextDetoxificationDataset(mode='val', vocab=train_dataset.vocab)

[nltk_data] Downloading package punkt to
[nltk_data]     C:\Users\mirak\AppData\Roaming\nltk_data...
[nltk_data]   Package punkt is already up-to-date!


Building vocab:   0%|          | 0/462221 [00:00<?, ?it/s]

[nltk_data] Downloading package punkt to
[nltk_data]     C:\Users\mirak\AppData\Roaming\nltk_data...
[nltk_data]   Package punkt is already up-to-date!


In [4]:
bleu_threshold = 0.80
source_target = []
bleus = []
for src, tgt, stat in tqdm(train_dataset):
    # take all sentences that 
    if src.shape != tgt.shape:
        continue
    src_n, tgt_n = src.numpy(), tgt.numpy()
    bleu = bleu_score(src_n, tgt_n)
    bleus.append(bleu)
    if bleu > bleu_threshold:
        source_target.extend(list(zip(src_n[src_n != tgt_n], tgt_n[src_n != tgt_n])))

  0%|          | 0/462221 [00:00<?, ?it/s]

In [11]:
print(len(source_target))
source_target_tokens = [(train_dataset.idx2token[s], train_dataset.idx2token[t]) for s, t in source_target if s != 1 and t != 1]
print(source_target_tokens[::30])

433
[('i', 'even'), ('farts', 'photos'), ('destroy', 'break'), ('kick-ass', 'buttercup'), ('cut', 'chop'), ('!', '?'), ('fuck', 'god'), ('burn', 'commit'), ('sex', 'balling'), ('electrocute', 'electrocuted'), ('spoiled', 'depraved'), ('to', 'we'), ('ugly', 'funny')]


### Train W

In [6]:
import gensim.downloader as api
glove_model = api.load('glove-twitter-100')

In [7]:
X_train = np.array([glove_model.get_vector(s) for s, t in source_target_tokens if s in glove_model and t in glove_model])
Y_train = np.array([glove_model.get_vector(t) for s, t in source_target_tokens if s in glove_model and t in glove_model])

In [8]:
# SVD is done for W's orthogonality
U, S, Vh = np.linalg.svd(X_train.T @ Y_train, full_matrices=True)
W = U @ Vh

In [12]:
for i in range(10):
    source, target = source_target_tokens[np.random.randint(0, len(source_target_tokens))]
    print(source, target, glove_model.most_similar([W @ glove_model[source]], topn=4))

fart stick [('piss', 0.6232589483261108), ('freak', 0.5906803607940674), ('faggot', 0.5814687013626099), ('asshole', 0.5803543329238892)]
booty quarry [('butt', 0.6418140530586243), ('fat', 0.6304983496665955), ('dick', 0.6218863725662231), ('vagina', 0.6089655160903931)]
witch crone [('witches', 0.6530465483665466), ('witch', 0.6193137764930725), ('devil', 0.6115525364875793), ('fucker', 0.5908735990524292)]
heart condition [('my', 0.7379490733146667), ('your', 0.7202531695365906), ('heart', 0.7148891091346741), ('like', 0.7004419565200806)]
fuck get [('shit', 0.8758304715156555), ('fuck', 0.8559561967849731), ('stupid', 0.8324095010757446), ('damn', 0.8131234049797058)]
rapist bully [('weirdo', 0.6210179924964905), ('pervert', 0.5540253520011902), ('wth', 0.5419331789016724), ('creep', 0.525635302066803)]
cock pissing [('dick', 0.612026572227478), ('faggot', 0.6096116304397583), ('cunt', 0.6066538691520691), ('vagina', 0.6056788563728333)]
beard whiskers [('beard', 0.6264841556549072

# Conclusion / report

The initial assumption under this approach (there is enough pairs where the key toxic word is changed for non-toxic one) does not seem to hold, so the best fit for the baseline would be simple recurrent encoder-decoder model 

It could also be the case that the performance would benefit from data selection, however, this solution will still be unlikely to best RNN