## Playing with the dataset
Before starting the search for a solution, it's good to take a look at the data and learn how to preprocess it.

In [2]:
# Download and unzip the dataset
# https://gist.github.com/hantoine/c4fc70b32c2d163f604a8dc2a050d5f6
from urllib.request import urlopen
from io import BytesIO
from zipfile import ZipFile

def download_and_unzip(url, extract_to='.'):
    http_response = urlopen(url)
    zipfile = ZipFile(BytesIO(http_response.read()))
    zipfile.extractall(path=extract_to)

download_and_unzip('https://github.com/skoltech-nlp/detox/releases/download/emnlp2021/filtered_paranmt.zip')

KeyboardInterrupt: 

In [2]:
import pandas as pd

df = pd.read_csv('filtered.tsv', sep='\t')
df.head()

Unnamed: 0.1,Unnamed: 0,reference,translation,similarity,lenght_diff,ref_tox,trn_tox
0,0,"If Alkar is flooding her with psychic waste, t...","if Alkar floods her with her mental waste, it ...",0.785171,0.010309,0.014195,0.981983
1,1,Now you're getting nasty.,you're becoming disgusting.,0.749687,0.071429,0.065473,0.999039
2,2,"Well, we could spare your life, for one.","well, we can spare your life.",0.919051,0.268293,0.213313,0.985068
3,3,"Ah! Monkey, you've got to snap out of it.","monkey, you have to wake up.",0.664333,0.309524,0.053362,0.994215
4,4,I've got orders to put her down.,I have orders to kill her.,0.726639,0.181818,0.009402,0.999348


The dataset is kinda janky. Especially the Monkey sentence. There is no consistency which of two texts is more toxic (or less toxic) so the dataset has to be sorted accordingly.

Let's reformat the dataset, so that all detoxified versions are on one side and all toxic ones are on the other.

In [3]:
d = {'id': [], 'toxic' : [], 'detoxified': [], 'tox_score': [], 'detox_score': [], 'similarity': [], 'length_diff': []}

for index, row in df.iterrows(): # takes some time
    d['id'].append(row[0])
    d['similarity'].append(row['similarity'])
    d['length_diff'].append(row['lenght_diff']) # fix the typo
    
    ref = row['reference']
    trn = row['translation']
    
    # toxic - is the toxic version of the text
    # detoxified - is less toxic version of the text
    if row['ref_tox'] > row['trn_tox']:
        d['toxic'].append(ref)
        d['detoxified'].append(trn)
        d['tox_score'].append(row['ref_tox'])
        d['detox_score'].append(row['trn_tox'])
    else:
        d['toxic'].append(trn)
        d['detoxified'].append(ref)
        d['tox_score'].append(row['trn_tox'])
        d['detox_score'].append(row['ref_tox'])
        
df2 = pd.DataFrame(d)
    
df2.head()

Unnamed: 0,id,toxic,detoxified,tox_score,detox_score,similarity,length_diff
0,0,"if Alkar floods her with her mental waste, it ...","If Alkar is flooding her with psychic waste, t...",0.981983,0.014195,0.785171,0.010309
1,1,you're becoming disgusting.,Now you're getting nasty.,0.999039,0.065473,0.749687,0.071429
2,2,"well, we can spare your life.","Well, we could spare your life, for one.",0.985068,0.213313,0.919051,0.268293
3,3,"monkey, you have to wake up.","Ah! Monkey, you've got to snap out of it.",0.994215,0.053362,0.664333,0.309524
4,4,I have orders to kill her.,I've got orders to put her down.,0.999348,0.009402,0.726639,0.181818


Now that's the dataset is correctly sorted, let's have a better look at it.

In [4]:
df2.head(50)

Unnamed: 0,id,toxic,detoxified,tox_score,detox_score,similarity,length_diff
0,0,"if Alkar floods her with her mental waste, it ...","If Alkar is flooding her with psychic waste, t...",0.981983,0.014195,0.785171,0.010309
1,1,you're becoming disgusting.,Now you're getting nasty.,0.999039,0.065473,0.749687,0.071429
2,2,"well, we can spare your life.","Well, we could spare your life, for one.",0.985068,0.213313,0.919051,0.268293
3,3,"monkey, you have to wake up.","Ah! Monkey, you've got to snap out of it.",0.994215,0.053362,0.664333,0.309524
4,4,I have orders to kill her.,I've got orders to put her down.,0.999348,0.009402,0.726639,0.181818
5,5,I'm not gonna have a child... ...with the same...,I'm not going to breed kids with a genetic dis...,0.950956,0.035846,0.703185,0.206522
6,6,"They're all laughing at us, so we'll kick your...",they're laughing at us. We'll show you.,0.999492,0.000131,0.618866,0.230769
7,7,Maine was very short on black people back then.,there wasn't much black in Maine then.,0.96368,0.14871,0.720482,0.1875
8,8,"Briggs, what the hell is going on?","Briggs, what the hell's happening?",0.841071,0.159096,0.920373,0.0
9,9,"another simply didn't know what to do, so when...","Another one simply had no clue what to do, so ...",0.930472,0.055371,0.87754,0.101695


Looking at the dataset, we can see that sometimes it resorts to simple word replacement, but a lot of the time it actually does some creative (and questionable) paraphrasing.
Some examples include:

| toxic | detoxified | comment |
| ---- | ----- | ----- |
| I have orders to kill her. | I've got orders to put her down. | Replacing 'kill' with 'put down'. |
| Shit, this one I can't even pronounce.  | gosh, I can't even pronounce this. | Replacing 'shit' with 'gosh'. |
| Shut up, you two, 'said Granny.  | 'Be quiet, you two,' said Granny. | Replacing 'shut up' with 'be quiet'. |
| I like that shit.  | I love it. | Full paraphase |
| Funny how Nazis are always the bad guys. | why are the Nazis always the bad guys? | Replacing the statement with a question, making the sentence more neutral. |
| I'll freeze him!  | Freezing him. | Replacing exclamation with a statement, making the sentence less emotional. |
| she was a killer.  | It was from the killer. | Completely losing the orignal meaning. |
| Real life starts the first time you fuck, kid.  | boy, real life starts up first. | Turning toxic in nature sentence into a meaningless one. |
| monkey, you have to wake up.  | Ah! Monkey, you've got to snap out of it. | ??? |
| some killer! 	 | The killer detail! | some killer! |

Some of those sentences were not detoxified by authors of dataset (but reverse?). You can see that by first letter (uppercase for original, lowercase for paraphased). But that does not change the fact that this dataset has an issue with meaning transfer, so we have to come up with some way to mitigate it. 
The dataset provides a cosine similarity metric, which is not really useful, since for some really good examples ('I like that shit.' and 'I love it.') it is worse than for some bad examples ('The killer detail!' and 'some killer!'): 0.697344 versus 0.734141.
Another metric we have is length difference, which is also pretty random for our task and is not worth discussing.

If we look at longer examples however, this dataset is more consistent with keeping the meaning intact, so let's just ignore those issues for now, and try searching for an existing solution for the problem.

In [10]:
# save the sorted dataset for further use:
df2.to_csv("processed.csv", index=False)