We will be attempting to solve detoxification using models from "Text Detoxification using Large Pre-trained Neural Models" by David Dale et al. Let us first clone their repository.

In [1]:
!git clone https://github.com/s-nlp/detox ../detox

fatal: destination path '../detox' already exists and is not an empty directory.


In [2]:
libraries = []
with open('../detox/requirements.txt', 'r') as f:
    libraries = f.read().splitlines()

In [3]:
libraries

['tqdm',
 'numpy',
 'pandas',
 'torch',
 'nltk',
 'transformers==4.24',
 'fairseq==0.10.0',
 'sentencepiece',
 'keras_preprocessing',
 'flair',
 'scipy']

Check to see if there are any packages not installed.

In [4]:
%pip install -r ../detox/requirements.txt -q

Note: you may need to restart the kernel to use updated packages.



[notice] A new release of pip available: 22.3.1 -> 23.3.1
[notice] To update, run: python.exe -m pip install --upgrade pip


### Dataset Preparation  
As we remember, some of the translations in our dataset have a higher toxicity score than the original statements. Keeping that in mind, let us form a new dataset, that would reliably contain toxic and non-toxic sentences in different columns.

In [66]:
import pandas as pd

dataset = pd.read_csv("../data/interim/filtered_paranmt.tsv", delimiter='\t')
dataset = dataset.set_index(dataset.columns[0])
dataset.index.name = "Index"

# Create new columns based on aforementioned conditions
dataset['toxic'] = dataset.apply(lambda row: row['translation'] if row['ref_tox'] < row['trn_tox'] else row['reference'], axis=1)
dataset['non-toxic'] = dataset.apply(lambda row: row['reference'] if row['ref_tox'] < row['trn_tox'] else row['translation'], axis=1)
dataset['old_toxicity'] = dataset.apply(lambda row: row['trn_tox'] if row['ref_tox'] < row['trn_tox'] else row['ref_tox'], axis=1)
dataset['new_toxicity'] = dataset.apply(lambda row: row['ref_tox'] if row['ref_tox'] < row['trn_tox'] else row['trn_tox'], axis=1)
dataset['toxic'] = dataset['toxic'].str.lower()
dataset['non-toxic'] = dataset['non-toxic'].str.lower()
dataset.drop(['reference', 'translation', 'similarity', 'lenght_diff', 'ref_tox', 'trn_tox'], axis=1, inplace=True)
dataset.head()

Unnamed: 0_level_0,toxic,non-toxic,old_toxicity,new_toxicity
Index,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
0,"if alkar floods her with her mental waste, it ...","if alkar is flooding her with psychic waste, t...",0.981983,0.014195
1,you're becoming disgusting.,now you're getting nasty.,0.999039,0.065473
2,"well, we can spare your life.","well, we could spare your life, for one.",0.985068,0.213313
3,"monkey, you have to wake up.","ah! monkey, you've got to snap out of it.",0.994215,0.053362
4,i have orders to kill her.,i've got orders to put her down.,0.999348,0.009402


In [67]:
dataset.describe()

Unnamed: 0,old_toxicity,new_toxicity
count,577777.0,577777.0
mean,0.94026,0.035601
std,0.100831,0.079399
min,0.500139,3.3e-05
25%,0.940145,0.000164
50%,0.983842,0.003456
75%,0.997519,0.027242
max,0.99973,0.499494


Now the toxic and non-toxic sentences are clearly separated in the dataset, so we have something to compare the performance of our dataset against.

In [68]:
dataset.to_csv("../data/interim/separated_tox.csv")

### Considering Metrics  

For evaluation purposes we will be using metrics from the aforementioned paper: *J*-score, *ACC*, which were implemented in the PMLDL_Assignment1\detox\emnlp2021\metric\metric.py file. The comparison with the existing dataset will be conducted through the ACC metric, as we cannot afford to manually gather the data on the toxicity of reformulated sentences by the model from many different people.