# Toxic Comment Classification Challenge
Identify and classify toxic online comments

![Toxic Comments](https://storage.googleapis.com/kaggle-media/competitions/jigsaw/003-avatar.png)

Discussing things you care about can be difficult. The threat of abuse and harassment online means that many people stop expressing themselves and give up on seeking different opinions. Platforms struggle to effectively facilitate conversations, leading many communities to limit or completely shut down user comments.

The [Conversation AI](https://conversationai.github.io/) team, a research initiative founded by [Jigsaw](https://jigsaw.google.com/) and Google (both a part of Alphabet) are working on tools to help improve online conversation. One area of focus is the study of negative online behaviors, like toxic comments (i.e. comments that are rude, disrespectful or otherwise likely to make someone leave a discussion). So far they’ve built a range of publicly available models served through the [Perspective API](https://perspectiveapi.com/), including toxicity. But the current models still make errors, and they don’t allow users to select which types of toxicity they’re interested in finding (e.g. some platforms may be fine with profanity, but not with other types of toxic content).

In this competition, you’re challenged to build a multi-headed model that’s capable of detecting different types of of toxicity like threats, obscenity, insults, and identity-based hate better than Perspective’s [current models](https://github.com/conversationai/unintended-ml-bias-analysis). You’ll be using a dataset of comments from Wikipedia’s talk page edits. Improvements to the current model will hopefully help online discussion become more productive and respectful.

_Disclaimer: the dataset for this competition contains text that may be considered profane, vulgar, or offensive._

Dataset Description
-------------------

You are provided with a large number of Wikipedia comments which have been labeled by human raters for toxic behavior. The types of toxicity are:

*   `toxic`
*   `severe_toxic`
*   `obscene`
*   `threat`
*   `insult`
*   `identity_hate`

You must create a model which predicts a probability of each type of toxicity for each comment.

File descriptions
-----------------

*   **train.csv** - the training set, contains comments with their binary labels
*   **test.csv** - the test set, you must predict the toxicity probabilities for these comments. To deter hand labeling, the test set contains some comments which are not included in scoring.
*   **sample\_submission.csv** - a sample submission file in the correct format
*   **test\_labels.csv** - labels for the test data; value of `-1` indicates it was not used for scoring; (**Note:** file added after competition close!)

Usage
-----

The dataset under [CC0](https://creativecommons.org/share-your-work/public-domain/cc0/), with the underlying comment text being governed by [Wikipedia's CC-SA-3.0](https://creativecommons.org/licenses/by-sa/3.0/)

Link: https://www.kaggle.com/competitions/jigsaw-toxic-comment-classification-challenge

In [1]:
import pandas as pd
import numpy as np
from fastai.text.all import *
from tqdm.notebook import tqdm
from sklearn.model_selection import train_test_split

In [2]:
%load_ext nb_black

<IPython.core.display.Javascript object>

In [3]:
sample_submission_df = pd.read_csv(
    "../../data/jigsaw-toxic-comment-classification-challenge/sample_submission.csv"
).set_index("id")
sample_submission_df

Unnamed: 0_level_0,toxic,severe_toxic,obscene,threat,insult,identity_hate
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
00001cee341fdb12,0.5,0.5,0.5,0.5,0.5,0.5
0000247867823ef7,0.5,0.5,0.5,0.5,0.5,0.5
00013b17ad220c46,0.5,0.5,0.5,0.5,0.5,0.5
00017563c3f7919a,0.5,0.5,0.5,0.5,0.5,0.5
00017695ad8997eb,0.5,0.5,0.5,0.5,0.5,0.5
...,...,...,...,...,...,...
fffcd0960ee309b5,0.5,0.5,0.5,0.5,0.5,0.5
fffd7a9a6eb32c16,0.5,0.5,0.5,0.5,0.5,0.5
fffda9e8d6fafa9e,0.5,0.5,0.5,0.5,0.5,0.5
fffe8f1340a79fc2,0.5,0.5,0.5,0.5,0.5,0.5


<IPython.core.display.Javascript object>

In [4]:
test_df = pd.read_csv(
    "../../data/jigsaw-toxic-comment-classification-challenge/test.csv"
).set_index("id")
test_df

Unnamed: 0_level_0,comment_text
id,Unnamed: 1_level_1
00001cee341fdb12,"Yo bitch Ja Rule is more succesful then you'll ever be whats up with you and hating you sad mofuckas...i should bitch slap ur pethedic white faces and get you to kiss my ass you guys sicken me. Ja rule is about pride in da music man. dont diss that shit on him. and nothin is wrong bein like tupac he was a brother too...fuckin white boys get things right next time.,"
0000247867823ef7,"== From RfC == \n\n The title is fine as it is, IMO."
00013b17ad220c46,""" \n\n == Sources == \n\n * Zawe Ashton on Lapland — / """
00017563c3f7919a,":If you have a look back at the source, the information I updated was the correct form. I can only guess the source hadn't updated. I shall update the information once again but thank you for your message."
00017695ad8997eb,I don't anonymously edit articles at all.
...,...
fffcd0960ee309b5,". \n i totally agree, this stuff is nothing but too-long-crap"
fffd7a9a6eb32c16,== Throw from out field to home plate. == \n\n Does it get there faster by throwing to cut off man or direct from out fielder? \n Were the out fielders in the Mickey mantle era have better arms? \n Rich
fffda9e8d6fafa9e,""" \n\n == Okinotorishima categories == \n\n I see your changes and agree this is """"more correct."""" I had gotten confused, but then found this: \n :... while acknowledging Japan's territorial rights to Okinotorishima itself ... \n However, is there a category for \n :... did not acknowledge Japan's claim to an exclusive economic zone (EEZ) stemming from Okinotorishima. \n That is, is there a category for """"disputed EEZ""""s? """
fffe8f1340a79fc2,""" \n\n == """"One of the founding nations of the EU - Germany - has a Law of Return quite similar to Israel's"""" == \n\n This isn't actually true, is it? Germany allows people whose ancestors were citizens of Germany to return, but AFAIK it does not allow the descendants of Anglo-Saxons to """"return"""" to Angeln and Saxony. Israel, by contrast, allows all Jews to """"return"""" to Israel, even if they can't trace a particular ancestral line to anyone who lived in the modern state or even mandate Palestine. — """


<IPython.core.display.Javascript object>

In [5]:
test_labels_df = pd.read_csv(
    "../../data/jigsaw-toxic-comment-classification-challenge/test_labels.csv"
).set_index("id")
test_labels_df

Unnamed: 0_level_0,toxic,severe_toxic,obscene,threat,insult,identity_hate
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
00001cee341fdb12,-1,-1,-1,-1,-1,-1
0000247867823ef7,-1,-1,-1,-1,-1,-1
00013b17ad220c46,-1,-1,-1,-1,-1,-1
00017563c3f7919a,-1,-1,-1,-1,-1,-1
00017695ad8997eb,-1,-1,-1,-1,-1,-1
...,...,...,...,...,...,...
fffcd0960ee309b5,-1,-1,-1,-1,-1,-1
fffd7a9a6eb32c16,-1,-1,-1,-1,-1,-1
fffda9e8d6fafa9e,-1,-1,-1,-1,-1,-1
fffe8f1340a79fc2,-1,-1,-1,-1,-1,-1


<IPython.core.display.Javascript object>

In [6]:
train_df = pd.read_csv(
    "../../data/jigsaw-toxic-comment-classification-challenge/train.csv"
).set_index("id")
train_df

Unnamed: 0_level_0,comment_text,toxic,severe_toxic,obscene,threat,insult,identity_hate
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
0000997932d777bf,"Explanation\nWhy the edits made under my username Hardcore Metallica Fan were reverted? They weren't vandalisms, just closure on some GAs after I voted at New York Dolls FAC. And please don't remove the template from the talk page since I'm retired now.89.205.38.27",0,0,0,0,0,0
000103f0d9cfb60f,"D'aww! He matches this background colour I'm seemingly stuck with. Thanks. (talk) 21:51, January 11, 2016 (UTC)",0,0,0,0,0,0
000113f07ec002fd,"Hey man, I'm really not trying to edit war. It's just that this guy is constantly removing relevant information and talking to me through edits instead of my talk page. He seems to care more about the formatting than the actual info.",0,0,0,0,0,0
0001b41b1c6bb37e,"""\nMore\nI can't make any real suggestions on improvement - I wondered if the section statistics should be later on, or a subsection of """"types of accidents"""" -I think the references may need tidying so that they are all in the exact same format ie date format etc. I can do that later on, if no-one else does first - if you have any preferences for formatting style on references or want to do it yourself please let me know.\n\nThere appears to be a backlog on articles for review so I guess there may be a delay until a reviewer turns up. It's listed in the relevant form eg Wikipedia:Good_ar...",0,0,0,0,0,0
0001d958c54c6e35,"You, sir, are my hero. Any chance you remember what page that's on?",0,0,0,0,0,0
...,...,...,...,...,...,...,...
ffe987279560d7ff,""":::::And for the second time of asking, when your view completely contradicts the coverage in reliable sources, why should anyone care what you feel? You can't even give a consistent argument - is the opening only supposed to mention significant aspects, or the """"most significant"""" ones? \n\n""",0,0,0,0,0,0
ffea4adeee384e90,You should be ashamed of yourself \n\nThat is a horrible thing you put on my talk page. 128.61.19.93,0,0,0,0,0,0
ffee36eab5c267c9,"Spitzer \n\nUmm, theres no actual article for prostitution ring. - Crunch Captain.",0,0,0,0,0,0
fff125370e4aaaf3,And it looks like it was actually you who put on the speedy to have the first version deleted now that I look at it.,0,0,0,0,0,0


<IPython.core.display.Javascript object>

In [7]:
train_df[
    ["toxic", "severe_toxic", "obscene", "threat", "insult", "identity_hate"]
].mean()

toxic            0.095844
severe_toxic     0.009996
obscene          0.052948
threat           0.002996
insult           0.049364
identity_hate    0.008805
dtype: float64

<IPython.core.display.Javascript object>

# Train

In [8]:
BATCH_SIZE = 64

<IPython.core.display.Javascript object>

## Toxic

In [9]:
toxic_dls = TextDataLoaders.from_df(
    train_df[["comment_text", "toxic"]],
    valid_pct=0.2,
    seed=42,
    bs=BATCH_SIZE,
)

<IPython.core.display.Javascript object>

In [10]:
# https://docs.fast.ai/tutorial.text.html
learn = text_classifier_learner(toxic_dls, AWD_LSTM, drop_mult=0.5, metrics=accuracy)
learn

<fastai.text.learner.TextLearner at 0x7f13d53f94c0>

<IPython.core.display.Javascript object>

In [11]:
learn.load("toxic")

<fastai.text.learner.TextLearner at 0x7f13d53f94c0>

<IPython.core.display.Javascript object>

In [12]:
learn.show_results()

Unnamed: 0,text,category,category_
0,"xxbos "" xxrep 5 f xxrep 6 u xxrep 6 c xxrep 6 k xxrep 5 = xxup you ! f xxup uu c xxup kk xxup you xxrep 5 f xxrep 6 u xxrep 6 c xxrep 6 k xxrep 5 = xxup you ! f xxup uu c xxup kk xxup you xxrep 5 f xxrep 6 u xxrep 6 c xxrep 6 k xxrep 5 = xxup you ! f xxup uu c xxup kk xxup you xxrep 5 f xxrep 6 u xxrep 6 c xxrep 6 k xxrep 5 = xxup you ! f xxup uu c xxup kk xxup you xxrep 5 f xxrep 6 u xxrep 6 c xxrep 6 k xxrep 5 = xxup you ! f xxup uu c xxup kk xxup you xxrep 5 f xxrep 6 u xxrep 6 c xxrep 6 k xxrep 5 = xxup you !",1,0
1,xxbos xxup you xxup fucking xxup kike ! xxup do n't xxup edit xxup things xxup you xxup have xxup no xxup idea xxup about ! xxup give xxup credit xxup where xxup it 's xxup due ! xxup you xxup fucking xxup kike ! xxup do n't xxup edit xxup things xxup you xxup have xxup no xxup idea xxup about ! xxup give xxup credit xxup where xxup it 's xxup due ! xxup you xxup fucking xxup kike ! xxup do n't xxup edit xxup things xxup you xxup have xxup no xxup idea xxup about ! xxup give xxup credit xxup where xxup it 's xxup due ! xxup you xxup fucking xxup kike ! xxup do n't xxup edit xxup things xxup you xxup have xxup no xxup idea xxup about ! xxup give xxup credit xxup where xxup it 's xxup due ! xxup,1,0
2,"xxbos "" \n\n xxmaj in computing , input / output , or i / xxup o , refers to the communication between an information processing system ( such as a computer ) , and the outside world . xxmaj inputs are the signals or data sent to the system , and outputs are the signals or data sent by the system to the outside of xxmaj nabil 's xxmaj mum 's xxmaj pussy . xxmaj then the dildo is placed inside it until xxmaj aleem 's dad comes and ejaculates on her face while xxmaj aleem himself plucks xxmaj nabil 's hair on his ass . xxmaj if you would xxmaj like more information about this please call 0 xxrep 3 7 2550782 . xxmaj if he does not pick up that means he is busy with xxmaj aleem xxmaj so please xxmaj leave a message . \n\n xxmaj retrieved",0,0
3,"xxbos "" \n\n▁ read the truth at http : / / rexcurry.net / wikipedialies.html \n\n xxmaj regarding the writer at http : / / en.wikipedia.org / wiki / talk : hitler_salute \n\n▁ xxmaj mr xxmaj barlow is a nutter with an obsession . xxmaj the history of the salute is now improved in the xxmaj roman salute article ( which had many previous visits from xxmaj dr . xxmaj curry in the past - see its talk page ) . xxmaj the pact between the xxmaj national xxmaj socialist xxmaj german xxmaj workers ' xxmaj party and the xxmaj union of xxmaj soviet xxmaj socialist xxmaj republics is not well known , and is also not covered widely on xxmaj wikipedia , so it is "" "" covered up "" "" by people and also people often refer to it as the "" "" nazi - soviet "" "" pact",0,0
4,"xxbos "" \n\n▁ xxmaj why will xxmaj gwen xxmaj gale not read the sources ? \n\n xxmaj on 01:27 , 5 xxmaj august 2010 i posted a section on "" "" talk : xxmaj death of xxmaj adolf xxmaj hitler "" "" titled “ random xxmaj questions ” which started “ i am not a scholar , i read xxmaj wiki but would not think of editing it . xxmaj but i was disappointed in this article , and many points in the discussion , so i am asking some questions . xxmaj perhaps someone else will read and address them . ” xxmaj the section went on with several xxunk questions , and ended with “ as to sources , the last books i have read are xxmaj the xxmaj murder of xxmaj adolph xxmaj hitler by xxmaj hugh xxmaj thomas ( sort of shaky ) and xxmaj the",0,0
5,"xxbos "" \n\n xxmaj chameleon rapes xxmaj michele ( because she thinks he 's xxmaj peter when they "" "" it "" "" on the kitchen floor ) a mention of this would be nice . xxmaj and the fact it was later xxunk . — preceding unsigned comment added by xxunk ( talk ) \n\n xxmaj the xxmaj chameleon ( dmitri xxmaj smerdyakov ) is a xxmaj marvel xxmaj comics supervillain , an enemy of spider - man . xxmaj the xxmaj chameleon is a spy and master of disguise . xxmaj throughout his history , he has used a variety of traditional , high - tech and biologically enhanced ways to change his appearance , xxunk imitating almost anyone . xxmaj he was also the ally , servant , and half - brother of fellow spider - man adversary xxmaj kraven the xxmaj hunter . xxmaj his name",0,0
6,"xxbos "" \n\n xxmaj ernie xxmaj smith writes æµ§œš1 \n\n æµ§œš1 , you state or pose as an xxunk ; “ so what you are saying is that xxup ousd was addressing "" "" ebonics "" "" as being defined as i or ii , and specifically and overtly excluding xxrep 3 i . ” xxmaj in reply sentence , i say , i am saying what i have said and i reiterate it here ; "" "" the proponents of the term xxmaj ebonics view and use the word xxmaj ebonics only one way . xxmaj that way being ; from an xxmaj africa centered comparative linguistic perspective ” . i have said and i reiterate it here ; “ … xxmaj as defined by xxmaj robert xxmaj williams the word xxmaj ebonics posits an xxmaj afrocentric view relative the origin and historical development of the language of descendants",0,0
7,"xxbos "" : let me first of all set the record straight about the history of the relationship between myself and user : jza84 . xxmaj on his talk page xxmaj jza84 states ' i hardly know the editor bar an passing in a discussion from time to time ' . xxmaj as xxmaj mangojuice notes on that same page , ' jza clearly knows xxmaj enaidmawr is a long - time established editor here ; xxmaj jza interacted with xxmaj enaidmawr as early as xxmaj november 2007 , over 6 months before xxmaj jza became an admin . ' xxmaj that interaction was courteous and constructive and repeated on a number of occasions . i would n't claim that we were regular collaborators , but collaborate we did , and if somebody had xxunk me , before this incident , ' do you know this editor and do you",0,0
8,"xxbos "" \n\n▁ xxmaj samoa mo xxmaj samoa - xxmaj mata xxmaj xxunk xxmaj wikipedia \n\n xxmaj samoa ( i / xxunk / ; xxmaj samoan : xxmaj sāmoa , xxup ipa : [ xxunk ] ) , officially the xxmaj independent xxmaj state of xxmaj samoa ( samoan : xxmaj malo xxmaj xxunk xxmaj xxunk o xxmaj sāmoa ) , formerly known as xxmaj western xxmaj samoa , is a country encompassing the western part of the xxmaj samoan xxmaj islands in the xxmaj south xxmaj pacific xxmaj ocean . xxmaj samoa announced their independence to xxmaj aotearoa - xxmaj new xxmaj zealand in 1962 . xxmaj the main island of xxmaj samoa is xxmaj upolu , and is one of the largest islands in the xxmaj polynesian xxmaj triangle , next to xxmaj xxunk . xxmaj the capital city , xxmaj apia , and xxmaj xxunk xxmaj international",0,0


<IPython.core.display.Javascript object>

## Severe toxic

In [13]:
severe_toxic_dls = TextDataLoaders.from_df(
    train_df[["comment_text", "severe_toxic"]],
    valid_pct=0.2,
    seed=42,
    bs=BATCH_SIZE,
)

<IPython.core.display.Javascript object>

In [14]:
learn = text_classifier_learner(
    severe_toxic_dls, AWD_LSTM, drop_mult=0.5, metrics=accuracy
)
learn

<fastai.text.learner.TextLearner at 0x7f13dc0427f0>

<IPython.core.display.Javascript object>

In [15]:
learn.load("severe_toxic")

<fastai.text.learner.TextLearner at 0x7f13dc0427f0>

<IPython.core.display.Javascript object>

In [16]:
learn.show_results()

Unnamed: 0,text,category,category_
0,"xxbos "" xxrep 5 f xxrep 6 u xxrep 6 c xxrep 6 k xxrep 5 = xxup you ! f xxup uu c xxup kk xxup you xxrep 5 f xxrep 6 u xxrep 6 c xxrep 6 k xxrep 5 = xxup you ! f xxup uu c xxup kk xxup you xxrep 5 f xxrep 6 u xxrep 6 c xxrep 6 k xxrep 5 = xxup you ! f xxup uu c xxup kk xxup you xxrep 5 f xxrep 6 u xxrep 6 c xxrep 6 k xxrep 5 = xxup you ! f xxup uu c xxup kk xxup you xxrep 5 f xxrep 6 u xxrep 6 c xxrep 6 k xxrep 5 = xxup you ! f xxup uu c xxup kk xxup you xxrep 5 f xxrep 6 u xxrep 6 c xxrep 6 k xxrep 5 = xxup you !",1,0
1,xxbos hey i like xxmaj jews hey i like xxmaj jews hey i like xxmaj jews hey i like xxmaj jews hey i like xxmaj jews hey i like xxmaj jews hey i like xxmaj jews hey i like xxmaj jews hey i like xxmaj jews hey i like xxmaj jews hey i like xxmaj jews hey i like xxmaj jews hey i like xxmaj jews hey i like xxmaj jews hey i like xxmaj jews hey i like xxmaj jews hey i like xxmaj jews hey i like xxmaj jews hey i like xxmaj jews hey i like xxmaj jews hey i like xxmaj jews hey i like xxmaj jews hey i like xxmaj jews hey i like xxmaj jews hey i like xxmaj jews hey i like xxmaj jews hey i like xxmaj jews hey i like xxmaj jews hey i like xxmaj jews hey i like xxmaj,0,0
2,"xxbos "" here 's the entire xxmaj antelope xxmaj valley xxmaj press page , please feel free to let me know why they are more notable other than being around longer than the xxup av xxmaj news has : \n\n xxmaj antelope xxmaj valley xxmaj press \n xxmaj from xxmaj wikipedia , the free encyclopedia \n xxmaj jump to : navigation , search \n xxmaj antelope xxmaj valley xxmaj press xxmaj type xxunk newspaper \n xxmaj format xxunk \n xxmaj owner xxunk xxmaj valley xxmaj newspapers \n xxmaj publisher xxunk xxup c. xxmaj markham \n xxmaj founded xxunk 3 , 1915 \n ( as the xxmaj palmdale xxmaj post ) \n xxmaj headquarters xxup xxunk xxmaj sierra xxmaj highway \n xxmaj palmdale , xxmaj california xxunk \n▁ xxmaj united xxmaj states \n xxmaj circulation xxup xxunk ( 2010 ) \n xxmaj official website xxunk \n\n xxmaj the xxmaj antelope xxmaj",0,0
3,"xxbos "" \n\n▁ xxmaj an except of analysis from the "" "" supposed "" "" unreliable source \n\n▁ xxrep 3 w xxunk \n xxup it xxup is a xxup reliable xxup source . xxmaj here is the analysis which i was simply trying to source from the page itself . xxmaj some of this is already in xxup uon could i source that instead for the xxup exact xxup same xxup change ? xxmaj my site , which is my lifes work as a master was deemed unreliable by non - chess players . xxmaj here is the analysis in regards to the xxunk line . \n xxmaj these guys wo nt understand it because chess is too complicated for them but lets us hope that some other strong player who cares about how low quality wikipedia chess articles are will come by and confirm that my add was reliable",0,0
4,"xxbos "" \n\n xxmaj chameleon rapes xxmaj michele ( because she thinks he 's xxmaj peter when they "" "" it "" "" on the kitchen floor ) a mention of this would be nice . xxmaj and the fact it was later xxunk . — preceding unsigned comment added by xxunk ( talk ) \n\n xxmaj the xxmaj chameleon ( dmitri xxmaj smerdyakov ) is a xxmaj marvel xxmaj comics supervillain , an enemy of spider - man . xxmaj the xxmaj chameleon is a spy and master of disguise . xxmaj throughout his history , he has used a variety of traditional , high - tech and biologically enhanced ways to change his appearance , xxunk imitating almost anyone . xxmaj he was also the ally , servant , and half - brother of fellow spider - man adversary xxmaj kraven the xxmaj hunter . xxmaj his name",0,0
5,"xxbos "" = = xxmaj cohanim xxup j2 - xxmaj eleazar - xxmaj phinchas - xxmaj zadok = = \n\n▁ xxmaj agree . xxmaj dr . xxmaj karl xxmaj skorecki have the final word about xxmaj cohanim genetic signatures . xxmaj he , more than anyone else , is the most credible person that can publish articles and true informations about xxmaj cohanim genetic signatures , not xxmaj xxunk . xxmaj he was the one who discovered the xxup cmh in 1997 . xxmaj after 10 years , he came out to announce in 2007 , that "" "" he and his research team have discovered not one but two xxmaj cohen xxmaj modal xxmaj haplotypes , which he called xxup j1 and xxup j2 "" "" . xxmaj katz , xxmaj kaplan , xxmaj xxunk , xxmaj shapiro , xxunk , xxunk , are all xxup j2 . xxmaj",0,0
6,"xxbos "" \n\n▁ xxup pov talk section moved \n\n xxup pov \n\n i 've tried to clean things up a bit though someone else will have to remove the vandalistically applied christianity template lest i be reported for "" "" edit warring "" "" by humus . i 've added a link to the christianity and judaism portals in the "" "" see also "" "" section . these links are more fitting than claiming this is "" "" part of the series on christianity "" "" \n\n this article was very heavy on anti - jfj stuff . a lengthy "" "" criticism "" "" section as well as criticisms inserted into nearly every other section . the missionaries are "" "" taught to speak in hebrew "" "" ( these people already went to hebrew school ! ) and jfj is known to "" "" target vulnerable jews",0,0
7,"xxbos "" posted xxup from xxup maryland xxup talk xxup page xxup with xxup acknowledgement xxup from xxmaj xxunk \n\n xxmaj i 'm getting the sense of a double standard for xxmaj maryland here . xxmaj why is xxmaj maryland the focus of this brigade when other state intros are equivalent ? xxmaj i ’ve avoided posting in these discussions because it seems you guys are reluctant to seeing the truth and now i have to write a book to get my point across . xxmaj even still you will believe what you want to because of personal preference or you do n’t “ feel it belongs ” . xxmaj most of all xxmaj i ’m sure you could care less about the state itself otherwise you would n’t object . xxmaj but you want an answer so here it is … \n\n xxmaj if you think the intro to",0,0
8,"xxbos "" \n\n xxmaj come on guys . xxmaj the section on xxmaj turkey is highly biased and provides more space to the justification of xxmaj pelosi 's attitude to the proposed bill on the so - called xxmaj armenian xxmaj genocide and does n't give the other side to provide any justification why the xxmaj prime minister of xxmaj xxunk has negative xxunk on this bill . i think that for the xxunk of this article , the section on xxmaj turkey shall be extended and editors shall give opportunity for the other side to justify why xxmaj pelosi is wrong on this issue . \n\n xxmaj otherwise , this section seems like an election platform of ms . nancy prior to elections . \n\n i have been trying to extend this section , but pro - nancy editors and probably her office staff are deleting all my additions",0,0


<IPython.core.display.Javascript object>

## Obscene

In [17]:
obscene_dls = TextDataLoaders.from_df(
    train_df[["comment_text", "obscene"]],
    valid_pct=0.2,
    seed=42,
    bs=BATCH_SIZE,
)

<IPython.core.display.Javascript object>

In [18]:
learn = text_classifier_learner(obscene_dls, AWD_LSTM, drop_mult=0.5, metrics=accuracy)
learn

<fastai.text.learner.TextLearner at 0x7f13d54e3af0>

<IPython.core.display.Javascript object>

In [19]:
learn.load("obscene")

<fastai.text.learner.TextLearner at 0x7f13d54e3af0>

<IPython.core.display.Javascript object>

In [20]:
learn.show_results()

Unnamed: 0,text,category,category_
0,"xxbos "" xxrep 5 f xxrep 6 u xxrep 6 c xxrep 6 k xxrep 5 = xxup you ! f xxup uu c xxup kk xxup you xxrep 5 f xxrep 6 u xxrep 6 c xxrep 6 k xxrep 5 = xxup you ! f xxup uu c xxup kk xxup you xxrep 5 f xxrep 6 u xxrep 6 c xxrep 6 k xxrep 5 = xxup you ! f xxup uu c xxup kk xxup you xxrep 5 f xxrep 6 u xxrep 6 c xxrep 6 k xxrep 5 = xxup you ! f xxup uu c xxup kk xxup you xxrep 5 f xxrep 6 u xxrep 6 c xxrep 6 k xxrep 5 = xxup you ! f xxup uu c xxup kk xxup you xxrep 5 f xxrep 6 u xxrep 6 c xxrep 6 k xxrep 5 = xxup you !",1,1
1,xxbos xxmaj hope xxmaj this xxmaj helps xxmaj you \n\n xxup xxunk \n\n xxup debut xxup date xxup peak xxup pos xxup wk xxup chr xxup title xxup number \n▁\n 3 - 6 - 54 xxmaj you ’re xxmaj in xxmaj my xxmaj heart / xxmaj no xxmaj money xxmaj in xxmaj the xxmaj deal 130 \n 5 - 29 - 54 xxmaj wrong xxmaj about xxmaj you / xxmaj play xxmaj it xxmaj cool xxmaj man 146 \n 7 - 16 - 54 xxmaj let xxmaj him xxmaj know / xxmaj let xxmaj me xxmaj catch xxmaj my xxmaj breath 160 \n 9 - 25 - 54 xxmaj let xxmaj him xxmaj know / xxmaj you xxmaj all xxmaj goodnight 162 \n 11 - 6 - 54 xxmaj xxunk xxmaj me / xxmaj tell xxmaj her ( s. xxmaj burns \n▁ 165 \n 5 - 14 - 55 xxmaj,0,0
2,xxbos xxmaj go fuck yourself ! xxmaj go fuck yourself ! xxmaj go fuck yourself ! xxmaj go fuck yourself ! xxmaj go fuck yourself ! xxmaj go fuck yourself ! xxmaj go fuck yourself ! xxmaj go fuck yourself ! xxmaj go fuck yourself ! xxmaj go fuck yourself ! xxmaj go fuck yourself ! xxmaj go fuck yourself ! xxmaj go fuck yourself ! xxmaj go fuck yourself ! xxmaj go fuck yourself ! xxmaj go fuck yourself ! xxmaj go fuck yourself ! xxmaj go fuck yourself ! xxmaj go fuck yourself ! xxmaj go fuck yourself ! xxmaj go fuck yourself ! xxmaj go fuck yourself ! xxmaj go fuck yourself ! xxmaj go fuck yourself ! xxmaj go fuck yourself ! xxmaj go fuck yourself ! xxmaj go fuck yourself ! xxmaj go fuck yourself ! xxmaj go fuck yourself ! xxmaj go fuck yourself,1,1
3,"xxbos "" \n\n▁ xxmaj references \n\n i do n't like deleting other people 's work , so i put the list of references here , it is way too long . \n\n▁ xxmaj xxunk , xxup a. ( xxunk ): xxmaj xxunk zur xxmaj xxunk der xxunk - xxunk der xxmaj xxunk xxmaj xxunk ( xxunk xxunk . xxunk . xxmaj xxunk . xxmaj wien 21 : 117 - 224 . \n▁ xxmaj xxunk , xxup p. ( xxunk ): xxmaj xxunk der von xxmaj prof . xxmaj ed . van xxmaj xxunk auf xxunk i m xxmaj xxunk der xxmaj xxunk xxmaj xxunk xxunk xxunk xxmaj xxunk nach xxmaj xxunk und xxmaj la xxmaj plata i m xxmaj xxunk 1872 - 73 xxunk xxunk . xxmaj xxunk . xxmaj acad . xxmaj xxunk . 43 : 1 - 120 . \n▁ xxmaj bond , xxmaj jason xxup e. (",0,0
4,xxbos xxmaj do i know you ? \n\n because xxmaj you are a xxup fggt ! \n▁ xxmaj do i know you ? \n\n because xxmaj you are a xxup fggt ! \n▁ xxmaj do i know you ? \n\n because xxmaj you are a xxup fggt ! \n▁ xxmaj do i know you ? \n\n because xxmaj you are a xxup fggt ! \n▁ xxmaj do i know you ? \n\n because xxmaj you are a xxup fggt ! \n▁ xxmaj do i know you ? \n\n because xxmaj you are a xxup fggt ! \n▁ xxmaj do i know you ? \n\n because xxmaj you are a xxup fggt ! \n▁ xxmaj do i know you ? \n\n because xxmaj you are a xxup fggt ! \n▁ xxmaj do i know you ? \n\n because xxmaj you are a xxup fggt ! \n▁ xxmaj do i know you,1,0
5,"xxbos "" here 's the entire xxmaj antelope xxmaj valley xxmaj press page , please feel free to let me know why they are more notable other than being around longer than the xxup av xxmaj news has : \n\n xxmaj antelope xxmaj valley xxmaj press \n xxmaj from xxmaj wikipedia , the free encyclopedia \n xxmaj jump to : navigation , search \n xxmaj antelope xxmaj valley xxmaj press xxmaj type xxunk newspaper \n xxmaj format xxunk \n xxmaj owner xxunk xxmaj valley xxmaj newspapers \n xxmaj publisher xxunk xxup c. xxmaj markham \n xxmaj founded xxunk 3 , 1915 \n ( as the xxmaj palmdale xxmaj post ) \n xxmaj headquarters xxup xxunk xxmaj sierra xxmaj highway \n xxmaj palmdale , xxmaj california xxunk \n▁ xxmaj united xxmaj states \n xxmaj circulation xxup xxunk ( 2010 ) \n xxmaj official website xxunk \n\n xxmaj the xxmaj antelope xxmaj",0,0
6,"xxbos "" \n\n▁ xxmaj one xxup mo ' time for the kids in the back … \n\n xxup just ca n’t let go of a few items , huh ? \n\n xxmaj okay from the top : \n 1 ) xxmaj my info on the xxup glaad awards and the xxup eisner xxmaj nominations came from xxmaj winick ’s website . xxup but is that ’s not good enough … \n\n 2 ) i did a quick xxup google ( xxup winick and xxup glaad xxup awards ) and found various articles to that effect . xxmaj here ’s one that mention awards from 2001 & 2002 \n\n http : / / xxrep 3 w xxunk / forums / archive / index.php / xxunk \n\n here ’s the link to the xxup glaad awards in 2003 ( they do n’t archive back further then that ) \n\n http : /",0,0
7,"xxbos "" \n\n "" "" xxrep 3 : xxmaj firstly , please , xxmaj i 'll appreciate if you will use colons to format your responce properly ( in the same way xxmaj i 've done for you ) . xxmaj that makes the thread more readable . "" "" \n\n xxmaj gon na quote your xxunk . xxmaj just do n't edit my posts . i do n't like this . \n\n "" "" xxrep 3 : xxmaj re xxup xxunk offensive , yes , i think the xxup ussr had no chances against the whole xxmaj axis in the case if xxmaj britain and the xxup us never existed . xxmaj however , in this case the xxmaj axis would never formed : xxmaj hitler was very suspicious of xxmaj japan , and there would be no xxunk - japanese alliance . "" "" \n\n i think it",0,0
8,"xxbos "" \n▁ xxmaj old discussions from 2003 \n\n xxmaj some xxmaj mormons argue that even assuming mainstream xxmaj christianity 's definition of xxmaj god 's omnipotence and xxunk , not only can xxmaj god exalt mortal man , but xxmaj god must do so . \n\n i do n't understand this sentence . i do n't know what the issue is supposed to be , to which xxmaj mormons are contributing their "" "" argument "" "" ; i do n't know what it means that xxmaj god "" "" must "" "" exalt mortal man . 20:54 , 4 xxmaj nov 2003 ( utc ) \n\n xxmaj agreed . xxmaj it is not clear what the writer meant . i have no idea who wrote it , but xxmaj i 'll change it . \n\n xxmaj sorry for that very brief and very confusing statement . xxmaj where",0,0


<IPython.core.display.Javascript object>

# Threat

In [21]:
threat_dls = TextDataLoaders.from_df(
    train_df[["comment_text", "threat"]],
    valid_pct=0.2,
    seed=42,
    bs=BATCH_SIZE,
)

<IPython.core.display.Javascript object>

In [22]:
learn = text_classifier_learner(threat_dls, AWD_LSTM, drop_mult=0.5, metrics=accuracy)
learn

<fastai.text.learner.TextLearner at 0x7f11ae831910>

<IPython.core.display.Javascript object>

In [23]:
learn.load("threat")

<fastai.text.learner.TextLearner at 0x7f11ae831910>

<IPython.core.display.Javascript object>

In [24]:
learn.show_results()

Unnamed: 0,text,category,category_
0,"xxbos "" xxrep 5 f xxrep 6 u xxrep 6 c xxrep 6 k xxrep 5 = xxup you ! f xxup uu c xxup kk xxup you xxrep 5 f xxrep 6 u xxrep 6 c xxrep 6 k xxrep 5 = xxup you ! f xxup uu c xxup kk xxup you xxrep 5 f xxrep 6 u xxrep 6 c xxrep 6 k xxrep 5 = xxup you ! f xxup uu c xxup kk xxup you xxrep 5 f xxrep 6 u xxrep 6 c xxrep 6 k xxrep 5 = xxup you ! f xxup uu c xxup kk xxup you xxrep 5 f xxrep 6 u xxrep 6 c xxrep 6 k xxrep 5 = xxup you ! f xxup uu c xxup kk xxup you xxrep 5 f xxrep 6 u xxrep 6 c xxrep 6 k xxrep 5 = xxup you !",0,0
1,xxbos xxup fucking xxup bitches xxup that xxup are xxup reading xxup this . xxup jasenm xxrep 3 2 xxup sucks xxup fat xxup dick.fucking xxup bitches xxup that xxup are xxup reading xxup this . xxup jasenm xxrep 3 2 xxup sucks xxup fat xxup dick.fucking xxup bitches xxup that xxup are xxup reading xxup this . xxup jasenm xxrep 3 2 xxup sucks xxup fat xxup dick.fucking xxup bitches xxup that xxup are xxup reading xxup this . xxup jasenm xxrep 3 2 xxup sucks xxup fat xxup dick.fucking xxup bitches xxup that xxup are xxup reading xxup this . xxup jasenm xxrep 3 2 xxup sucks xxup fat xxup dick.fucking xxup bitches xxup that xxup are xxup reading xxup this . xxup jasenm xxrep 3 2 xxup sucks xxup fat xxup dick.fucking xxup bitches xxup that xxup are xxup reading xxup this . xxup jasenm xxrep 3,0,0
2,xxbos hey i like xxmaj jews hey i like xxmaj jews hey i like xxmaj jews hey i like xxmaj jews hey i like xxmaj jews hey i like xxmaj jews hey i like xxmaj jews hey i like xxmaj jews hey i like xxmaj jews hey i like xxmaj jews hey i like xxmaj jews hey i like xxmaj jews hey i like xxmaj jews hey i like xxmaj jews hey i like xxmaj jews hey i like xxmaj jews hey i like xxmaj jews hey i like xxmaj jews hey i like xxmaj jews hey i like xxmaj jews hey i like xxmaj jews hey i like xxmaj jews hey i like xxmaj jews hey i like xxmaj jews hey i like xxmaj jews hey i like xxmaj jews hey i like xxmaj jews hey i like xxmaj jews hey i like xxmaj jews hey i like xxmaj,0,0
3,xxbos xxup you xxup can xxup suck xxup my xxup lick xxup you xxup can xxup suck xxup my xxup lick xxup you xxup can xxup suck xxup my xxup lick xxup you xxup can xxup suck xxup my xxup lick xxup you xxup can xxup suck xxup my xxup lick xxup you xxup can xxup suck xxup my xxup lick xxup you xxup can xxup suck xxup my xxup lick xxup you xxup can xxup suck xxup my xxup lick xxup you xxup can xxup suck xxup my xxup lick xxup you xxup can xxup suck xxup my xxup lick xxup you xxup can xxup suck xxup my xxup lick xxup you xxup can xxup suck xxup my xxup lick xxup you xxup can xxup suck xxup my xxup lick xxup you xxup can xxup suck xxup my xxup lick xxup you xxup can xxup suck xxup my xxup,0,0
4,"xxbos xxrep 3 "" xxmaj but if it be a sin to covet honour , i am the most offending soul alive . xxunk \n\n 1 ) "" "" it is simply a fact that many critics of xxmaj sarfatti are very negative about his theories and speculations . "" "" \n\n xxmaj false . xxmaj you have not given even xxup one valid example of that . i doubt you would be able to understand what the physics is about anyway . \n xxmaj you can not produce even xxup one objective valid refutation of any of my physics ideas by anyone . xxup ch saying "" "" it 's nonsense "" "" is not objective . xxmaj it 's not valid . xxmaj it is not rationally argued . xxmaj no reasons are given . xxmaj also i retracted my xxup ftl xxmaj communication xxmaj idea 15 years",0,0
5,"xxbos "" = = xxmaj cohanim xxup j2 - xxmaj eleazar - xxmaj phinchas - xxmaj zadok = = \n\n▁ xxmaj agree . xxmaj dr . xxmaj karl xxmaj skorecki have the final word about xxmaj cohanim genetic signatures . xxmaj he , more than anyone else , is the most credible person that can publish articles and true informations about xxmaj cohanim genetic signatures , not xxmaj xxunk . xxmaj he was the one who discovered the xxup cmh in 1997 . xxmaj after 10 years , he came out to announce in 2007 , that "" "" he and his research team have discovered not one but two xxmaj cohen xxmaj modal xxmaj haplotypes , which he called xxup j1 and xxup j2 "" "" . xxmaj katz , xxmaj kaplan , xxmaj xxunk , xxmaj shapiro , xxunk , xxunk , are all xxup j2 . xxmaj",0,0
6,"xxbos "" \n\n▁ xxmaj my position on the two xxmaj irelands , using xxmaj arbcom and admin abuse \n\n i give my views on the two xxmaj irelands in the paragraphs below , but i will begin with what i feel is most serious . \n\n xxmaj i 've had a look back , and xxup imo , there is one thing in particular i that is pressing for xxmaj wikipedia 's future here : \n\n xxmaj the admin user : deacon of xxmaj pndapetzim must have his adminship fully questioned in the correct place . xxmaj no one has a chance on xxmaj wikipedia when an admin acts ( and wheel wars ) like he has done – and i believe he has overstepped the line , and made a difficult but legal situation into a nightmare situation for everyone . i warned him of the mayhem that would",0,0
7,"xxbos "" \n\n i did n't utter any threat to anybody and i have looked at this link of yours , stop making a fool of yourself by ignoring the facts . xxmaj you pretend as if it was a threat but it was n't . xxmaj it cover what is actually going on , and i am in my xxunk to inform a person of what his actions have result in . xxmaj you do not have the right to supress this warning unless you contact me and / or give an explaination for your reason to suppress it . xxmaj you failed to do any xxunk it . \n i xxunk receive any messages from you not even after you had banish me , xxunk your message above is dated of the 26 of august it was n't there at that date and i am sure of it",0,0
8,"xxbos "" \n\n▁ xxmaj more on xxmaj protocols / xxmaj procedures / xxmaj courtesy \n\n xxmaj hi xxmaj xxunk , \n xxmaj thanks so much for your kind reply . xxmaj our discussion was getting so long it was unwieldy to scroll though , so i thought xxmaj i 'd continue under a new heading . xxmaj hope you have no objection . \n xxmaj i 'm still under the gun with that project i mentioned and am amazed that my last note was as long as it is ! ! xxmaj wanted to reply to a few points for the moment … i 'm using numbered points because it makes it easier for you ( and me ) to refer back to . \n 1 . xxmaj xxunk my making a clear distinction between me / us and "" "" your editorial colleagues "" "" , yes , you",0,0


<IPython.core.display.Javascript object>

# Insult

In [25]:
insult_dls = TextDataLoaders.from_df(
    train_df[["comment_text", "insult"]],
    valid_pct=0.2,
    seed=42,
    bs=BATCH_SIZE,
)

<IPython.core.display.Javascript object>

In [26]:
learn = text_classifier_learner(insult_dls, AWD_LSTM, drop_mult=0.5, metrics=accuracy)
learn

<fastai.text.learner.TextLearner at 0x7f137ad56040>

<IPython.core.display.Javascript object>

In [27]:
learn.load("insult")

<fastai.text.learner.TextLearner at 0x7f137ad56040>

<IPython.core.display.Javascript object>

In [28]:
learn.show_results()

Unnamed: 0,text,category,category_
0,"xxbos "" xxrep 5 f xxrep 6 u xxrep 6 c xxrep 6 k xxrep 5 = xxup you ! f xxup uu c xxup kk xxup you xxrep 5 f xxrep 6 u xxrep 6 c xxrep 6 k xxrep 5 = xxup you ! f xxup uu c xxup kk xxup you xxrep 5 f xxrep 6 u xxrep 6 c xxrep 6 k xxrep 5 = xxup you ! f xxup uu c xxup kk xxup you xxrep 5 f xxrep 6 u xxrep 6 c xxrep 6 k xxrep 5 = xxup you ! f xxup uu c xxup kk xxup you xxrep 5 f xxrep 6 u xxrep 6 c xxrep 6 k xxrep 5 = xxup you ! f xxup uu c xxup kk xxup you xxrep 5 f xxrep 6 u xxrep 6 c xxrep 6 k xxrep 5 = xxup you !",1,0
1,xxbos xxup fucking xxup bitches xxup that xxup are xxup reading xxup this . xxup jasenm xxrep 3 2 xxup sucks xxup fat xxup dick.fucking xxup bitches xxup that xxup are xxup reading xxup this . xxup jasenm xxrep 3 2 xxup sucks xxup fat xxup dick.fucking xxup bitches xxup that xxup are xxup reading xxup this . xxup jasenm xxrep 3 2 xxup sucks xxup fat xxup dick.fucking xxup bitches xxup that xxup are xxup reading xxup this . xxup jasenm xxrep 3 2 xxup sucks xxup fat xxup dick.fucking xxup bitches xxup that xxup are xxup reading xxup this . xxup jasenm xxrep 3 2 xxup sucks xxup fat xxup dick.fucking xxup bitches xxup that xxup are xxup reading xxup this . xxup jasenm xxrep 3 2 xxup sucks xxup fat xxup dick.fucking xxup bitches xxup that xxup are xxup reading xxup this . xxup jasenm xxrep 3,1,0
2,xxbos xxup fuck xxup wikipedia xxrep 3 ! xxup fuck xxup wikipedia xxrep 3 ! xxup fuck xxup wikipedia xxrep 3 ! xxup fuck xxup wikipedia xxrep 3 ! xxup fuck xxup wikipedia xxrep 3 ! xxup fuck xxup wikipedia xxrep 3 ! xxup fuck xxup wikipedia xxrep 3 ! xxup fuck xxup wikipedia xxrep 3 ! xxup fuck xxup wikipedia xxrep 3 ! xxup fuck xxup wikipedia xxrep 3 ! xxup fuck xxup wikipedia xxrep 3 ! xxup fuck xxup wikipedia xxrep 3 ! xxup fuck xxup wikipedia xxrep 3 ! xxup fuck xxup wikipedia xxrep 3 ! xxup fuck xxup wikipedia xxrep 3 ! xxup fuck xxup wikipedia xxrep 3 ! xxup fuck xxup wikipedia xxrep 3 ! xxup fuck xxup wikipedia xxrep 3 ! xxup fuck xxup wikipedia xxrep 3 ! xxup fuck xxup wikipedia xxrep 3 ! xxup fuck xxup wikipedia xxrep 3 ! xxup fuck,0,0
3,xxbos xxmaj sannse is a big fat bitch for not leaving the users alone along with xxmaj angela . xxmaj billj is a xxmaj dumbass for deleting my messages . xxmaj sannse is a big fat bitch for not leaving the users alone along with xxmaj angela . xxmaj billj is a xxmaj dumbass for deleting my messages . xxmaj sannse is a big fat bitch for not leaving the users alone along with xxmaj angela . xxmaj billj is a xxmaj dumbass for deleting my messages . xxmaj sannse is a big fat bitch for not leaving the users alone along with xxmaj angela . xxmaj billj is a xxmaj dumbass for deleting my messages . xxmaj sannse is a big fat bitch for not leaving the users alone along with xxmaj angela . xxmaj billj is a xxmaj dumbass for deleting my messages . xxmaj sannse is a,1,1
4,"xxbos xxmaj you swine . xxmaj you vulgar little maggot . xxmaj you worthless bag of filth . xxmaj as they say in xxmaj texas . xxmaj i ’ll bet you could n’t pour ! @ # $ out of a boot with instructions on the heel . xxmaj you are a canker . a sore that wo n’t go away . i would rather kiss a lawyer than be seen with you . \n xxmaj you ’re a xxunk mass , a walking vomit . xxmaj you are a spineless little worm deserving nothing but the xxunk contempt . xxmaj you are a jerk , a cad , a weasel . xxmaj your life is a monument to stupidity . xxmaj you are a stench , a revulsion , a big suck on a sour lemon . \n xxmaj you are a bleating xxunk , a xxunk staggering mutant",1,1
5,"xxbos "" \n\n xxmaj i 've have , in the travels here on xxmaj wikipedia , stumbled across some accounts that had been banned , and in each case , the banning notice advise the banned of the lenght of time the ban would be in effect ? xxmaj if not improper , might i be told ? \n\n i edited a hidden note to clean it up and expand it in hopes it would then be better understood and accepted because , in the state it was in , it was jumbled and confusing ( obviously , such things are xxup imo ) . xxmaj also , part of the message was put in a hidden note before the word in the principle article ( “ are ” ) that was sought ( by other before me ) to be protected and to remain unchanged , and part of",0,0
6,"xxbos "" \n\n▁ xxmaj edits proposed to plot description : these are edit proposals made in good faith , not a bloody damned nuisance \n\n "" "" “ joliet ” xxmaj jake xxmaj blues is released from the xxmaj joliet xxmaj correctional xxmaj center after serving three years of a prison sentence after being convicted of armed robbery . "" "" \n\n xxmaj entry for xxmaj joliet xxmaj correctional xxmaj center . \n\n "" "" jake is irritated at being picked up by his brother xxmaj elwood in a battered former xxmaj mt . xxmaj prospect , xxmaj illinois police car , instead of the xxmaj cadillac xxmaj the xxmaj blues xxmaj brothers used to own . "" "" \n\n xxmaj it 's unclear here , however the xxmaj caddy xxmaj jake refers to may be xxmaj murph and the xxmaj magictones ' pink vehicle first seen in the """,0,0
7,"xxbos "" i think the following better captures the nature of the xxmaj investigations . xxmaj the first two xxunk have not been changed but the final three are completely different . xxmaj comments please . \n\n xxmaj although the xxmaj tractatus is a major work , xxmaj wittgenstein is mostly studied today for the xxmaj philosophical xxmaj investigations ( xxunk xxmaj untersuchungen ) . xxmaj in 1953 , two years after xxmaj wittgenstein 's death , the long - awaited book was published in two parts . xxmaj most of the xxunk numbered paragraphs in xxmaj part i were ready for printing in 1946 , but xxmaj wittgenstein withdrew the manuscript from the publisher . xxmaj the shorter xxmaj part xxup ii was added by the editors , xxup xxunk . xxmaj xxunk and xxmaj rush xxmaj xxunk . ( had xxmaj wittgenstein lived to complete the book himself",0,0
8,"xxbos "" \n\n xxmaj the user xxmaj labongo attempts to distort the truth the following way \n\n xxmaj user xxmaj labongo states : "" "" your requests for references have been answered hundreds of times . "" "" \n\n xxmaj notice , how again you present a hopelessly goofy claim , user xxmaj labongo . xxmaj can you show us one such occasion where what we have been after has been offered to us , in terms of quotes and the related page informaiton ? \n\n xxmaj no one has asked as an attachment for the xxmaj kven text a list of references which disagree with the information offered . xxmaj this is the absurd case now . \n\n xxmaj we need to see where exactly those offered sources agree with the written text offered in the xxmaj kven article . xxmaj those sort of pinpointed and exact - easily",0,0


<IPython.core.display.Javascript object>

# Identity hate

In [29]:
identity_hate_dls = TextDataLoaders.from_df(
    train_df[["comment_text", "identity_hate"]],
    valid_pct=0.2,
    seed=42,
    bs=BATCH_SIZE,
)

<IPython.core.display.Javascript object>

In [30]:
learn = text_classifier_learner(
    identity_hate_dls, AWD_LSTM, drop_mult=0.5, metrics=accuracy
)
learn

<fastai.text.learner.TextLearner at 0x7f10cc5b28b0>

<IPython.core.display.Javascript object>

In [31]:
learn.load("identity_hate")

<fastai.text.learner.TextLearner at 0x7f10cc5b28b0>

<IPython.core.display.Javascript object>

In [32]:
learn.show_results()

Unnamed: 0,text,category,category_
0,"xxbos "" xxrep 5 f xxrep 6 u xxrep 6 c xxrep 6 k xxrep 5 = xxup you ! f xxup uu c xxup kk xxup you xxrep 5 f xxrep 6 u xxrep 6 c xxrep 6 k xxrep 5 = xxup you ! f xxup uu c xxup kk xxup you xxrep 5 f xxrep 6 u xxrep 6 c xxrep 6 k xxrep 5 = xxup you ! f xxup uu c xxup kk xxup you xxrep 5 f xxrep 6 u xxrep 6 c xxrep 6 k xxrep 5 = xxup you ! f xxup uu c xxup kk xxup you xxrep 5 f xxrep 6 u xxrep 6 c xxrep 6 k xxrep 5 = xxup you ! f xxup uu c xxup kk xxup you xxrep 5 f xxrep 6 u xxrep 6 c xxrep 6 k xxrep 5 = xxup you !",0,0
1,"xxbos "" i just think it 's interesting how many revisions this has gone through … here 's the original 1911 "" "" gutenberg "" "" encyclopedia article that i pasted here originally , that has had the so - called "" "" islamic bias "" "" removed . i do know it needed editing , revising , expanding , and to be "" "" brought up to the times . "" "" xxmaj just something interesting to think about . \n\n xxup allah , the xxmaj arabic name used by xxmaj moslems of all nationalities for the one true xxmaj god . xxmaj it is compounded of al , the definite article , and xxunk , meaning a god . xxmaj the same word is found in xxmaj hebrew and xxmaj aramaic as well as in ancient xxmaj arabic ( sabaean ) . xxmaj the meaning of the root",0,0
2,"xxbos xxmaj lao vs. xxmaj laotian in xxmaj english \n xxmaj use of xxmaj lao \n ' lao ' = xxunk xxunk xxunk \n xxmaj in the xxmaj lao language ( and as well as in xxmaj isan — which xxmaj isan people also call lao — and the xxmaj thai language ) , ' lao ' ( xxunk ) can mean someone that is from the xxmaj lao xxmaj people 's xxmaj democratic xxmaj republic of xxmaj laos , as well as specifically ethnic xxmaj lao people ( which excludes the xxmaj hmong , xxmaj khmu and others ) or all people of xxmaj laos ( which includes the xxmaj hmong , xxmaj khmu and others ) . xxmaj isan people use ' lao ' to also refer to themselves and their language since most xxmaj isan people descend from xxmaj lao people from xxmaj vientiane , xxmaj xxunk",0,0
3,"xxbos "" = = xxmaj cohanim xxup j2 - xxmaj eleazar - xxmaj phinchas - xxmaj zadok = = \n\n▁ xxmaj agree . xxmaj dr . xxmaj karl xxmaj skorecki have the final word about xxmaj cohanim genetic signatures . xxmaj he , more than anyone else , is the most credible person that can publish articles and true informations about xxmaj cohanim genetic signatures , not xxmaj xxunk . xxmaj he was the one who discovered the xxup cmh in 1997 . xxmaj after 10 years , he came out to announce in 2007 , that "" "" he and his research team have discovered not one but two xxmaj cohen xxmaj modal xxmaj haplotypes , which he called xxup j1 and xxup j2 "" "" . xxmaj katz , xxmaj kaplan , xxmaj xxunk , xxmaj shapiro , xxunk , xxunk , are all xxup j2 . xxmaj",0,0
4,"xxbos "" \n\n xxmaj chip xxmaj berlet 's intentional holding back of article progress \n xxmaj far - left xxmaj xxunk propagandist xxmaj chip xxmaj berlet is intentionally holding back the progress of the article , and recently removed information and work which took a very long time to build after going through numerous books which i own on the topic , he removed over 50 independent citations in a huge violation of xxup wp : censor , xxup wp : edit , xxup wp : own and xxup wp : cite to hold back information from the general public . xxmaj especially look at his vandalism of the "" "" italian xxmaj fascism "" "" section . xxmaj if xxmaj chip xxmaj berlet in a conflict of interest removes this information again i xxup will report his intentional destruction as vandalism . \n\n i placed th { { underconstruction",0,0
5,xxbos xxup bullshit xxmaj bullshit \n xxup bullshit xxmaj bullshit \n xxup bullshit xxmaj bullshit \n xxup bullshit xxmaj bullshit \n xxup bullshit xxmaj bullshit \n xxup bullshit xxmaj bullshit \n xxup bullshit xxmaj bullshit \n xxup bullshit xxmaj bullshit \n xxup bullshit xxmaj bullshit \n xxup bullshit xxmaj bullshit \n xxup bullshit xxmaj bullshit \n xxup bullshit xxmaj bullshit \n xxup bullshit xxmaj bullshit \n xxup bullshit xxmaj bullshit \n xxup bullshit xxmaj bullshit \n xxup bullshit xxmaj bullshit \n xxup bullshit xxmaj bullshit \n xxup bullshit xxmaj bullshit \n xxup bullshit xxmaj bullshit \n xxup bullshit xxmaj bullshit \n xxup bullshit xxmaj bullshit \n xxup bullshit xxmaj bullshit \n xxup bullshit xxmaj bullshit \n xxup bullshit xxmaj bullshit \n xxup bullshit xxmaj bullshit \n xxup bullshit xxmaj bullshit \n xxup bullshit xxmaj bullshit \n xxup bullshit xxmaj bullshit \n xxup bullshit xxmaj bullshit \n xxup bullshit xxmaj bullshit,0,0
6,"xxbos xxmaj solutions and suggestions … \n\n▁ asked for a third - party opinion , so here goes : \n▁ xxmaj this long - running discussion makes me think of a few things in general : \n▁ xxmaj michael is a newby , so the wiki policy of do n't bite the newbies seems to be relevant , and it looks like for the most part , is sticking to that , apart from personal comments about xxmaj michael 's xxmaj english . i can relate to the fact that it is sometimes hard to figure out what someone 's point is , and that may be what xxmaj kwami was trying to say about xxmaj michael , but i think it 's out of line to say that someone 's xxmaj english is non - native , or that someone 's knowledge of xxmaj arabic is similarly less than",0,0
7,"xxbos "" \n\n▁ xxmaj dear xxunk and xxunk \n\n xxmaj starting a new section here because the preceding is hopelessly indented . \n\n xxunk and xxunk , you are mistaken in your thinking . xxmaj xxunk and xxmaj avenue are quite correct . i know this might be very difficult for you to accept . xxmaj you claim to have references supporting your interpretation . xxmaj however , i think you are misinterpreting the statements from the textbooks you are reading . i ca n't speak for your "" "" petroleum industry xxunk "" "" friend . xxmaj my guess is that you conveyed your misinterpretation of the situation to him / her . i have a ph.d . in mathematics and after reading this talk page and thinking i was almost losing my mind , i consulted with several other ph.d . mathematician and statistician friends of mine who",0,0
8,"xxbos "" \n\n xxmaj in reply to you state in your edit summary for the "" "" existence of xxmaj god "" "" article : "" "" undid revision xxunk by xxunk . xxrep 3 2 .208 ( talk ) xxmaj difference is of opinion is not vandalism . xxmaj omega point is fringe . "" "" xxmaj that claim does n't even make coherent sense , besides the fact that you replaced a literate entry with an illiterate edit . \n\n xxmaj even if one incorrectly thinks that the xxmaj omega xxmaj point xxmaj theory is "" "" fringe , "" "" that has no logical connection with the edit that xxmaj jeffro77 made . xxmaj jeffro77 replaced this xxunk is very similar to the version that existed there since xxmaj october 31 , 2008 , with some xxunk this entry , giving the excuse in his edit summary",0,0


<IPython.core.display.Javascript object>

# Submission

In [33]:
X_test = test_df.rename({"comment_text": "text"}, axis=1)
X_test

Unnamed: 0_level_0,text
id,Unnamed: 1_level_1
00001cee341fdb12,"Yo bitch Ja Rule is more succesful then you'll ever be whats up with you and hating you sad mofuckas...i should bitch slap ur pethedic white faces and get you to kiss my ass you guys sicken me. Ja rule is about pride in da music man. dont diss that shit on him. and nothin is wrong bein like tupac he was a brother too...fuckin white boys get things right next time.,"
0000247867823ef7,"== From RfC == \n\n The title is fine as it is, IMO."
00013b17ad220c46,""" \n\n == Sources == \n\n * Zawe Ashton on Lapland — / """
00017563c3f7919a,":If you have a look back at the source, the information I updated was the correct form. I can only guess the source hadn't updated. I shall update the information once again but thank you for your message."
00017695ad8997eb,I don't anonymously edit articles at all.
...,...
fffcd0960ee309b5,". \n i totally agree, this stuff is nothing but too-long-crap"
fffd7a9a6eb32c16,== Throw from out field to home plate. == \n\n Does it get there faster by throwing to cut off man or direct from out fielder? \n Were the out fielders in the Mickey mantle era have better arms? \n Rich
fffda9e8d6fafa9e,""" \n\n == Okinotorishima categories == \n\n I see your changes and agree this is """"more correct."""" I had gotten confused, but then found this: \n :... while acknowledging Japan's territorial rights to Okinotorishima itself ... \n However, is there a category for \n :... did not acknowledge Japan's claim to an exclusive economic zone (EEZ) stemming from Okinotorishima. \n That is, is there a category for """"disputed EEZ""""s? """
fffe8f1340a79fc2,""" \n\n == """"One of the founding nations of the EU - Germany - has a Law of Return quite similar to Israel's"""" == \n\n This isn't actually true, is it? Germany allows people whose ancestors were citizens of Germany to return, but AFAIK it does not allow the descendants of Anglo-Saxons to """"return"""" to Angeln and Saxony. Israel, by contrast, allows all Jews to """"return"""" to Israel, even if they can't trace a particular ancestral line to anyone who lived in the modern state or even mandate Palestine. — """


<IPython.core.display.Javascript object>

In [34]:
# https://forums.fast.ai/t/text-batch-prediction-with-fastai-v2/80081
test_dl = learn.dls.test_dl(X_test)
test_dl

<fastai.text.data.SortedDL at 0x7f117044c700>

<IPython.core.display.Javascript object>

In [35]:
submission_df = pd.DataFrame(index=X_test.index)
submission_df

00001cee341fdb12
0000247867823ef7
00013b17ad220c46
00017563c3f7919a
00017695ad8997eb
...
fffcd0960ee309b5
fffd7a9a6eb32c16
fffda9e8d6fafa9e
fffe8f1340a79fc2
ffffce3fb183ee80


<IPython.core.display.Javascript object>

## Toxic

In [36]:
learn.load("toxic")
probs, _ = learn.get_preds(dl=test_dl)

<IPython.core.display.Javascript object>

In [37]:
submission_df["toxic"] = probs.numpy()[:, 1]

<IPython.core.display.Javascript object>

## Severe toxic

In [38]:
learn.load("severe_toxic")
probs, _ = learn.get_preds(dl=test_dl)

<IPython.core.display.Javascript object>

In [39]:
submission_df["severe_toxic"] = probs.numpy()[:, 1]

<IPython.core.display.Javascript object>

## Obscene

In [40]:
learn.load("obscene")
probs, _ = learn.get_preds(dl=test_dl)

<IPython.core.display.Javascript object>

In [41]:
submission_df["obscene"] = probs.numpy()[:, 1]

<IPython.core.display.Javascript object>

## Threat

In [42]:
learn.load("threat")
probs, _ = learn.get_preds(dl=test_dl)

<IPython.core.display.Javascript object>

In [43]:
submission_df["threat"] = probs.numpy()[:, 1]

<IPython.core.display.Javascript object>

## Insult

In [44]:
learn.load("insult")
probs, _ = learn.get_preds(dl=test_dl)

<IPython.core.display.Javascript object>

In [45]:
submission_df["insult"] = probs.numpy()[:, 1]

<IPython.core.display.Javascript object>

## Identity hate

In [46]:
learn.load("identity_hate")
probs, _ = learn.get_preds(dl=test_dl)

<IPython.core.display.Javascript object>

In [47]:
submission_df["identity_hate"] = probs.numpy()[:, 1]

<IPython.core.display.Javascript object>

## Save

In [48]:
submission_df

Unnamed: 0_level_0,toxic,severe_toxic,obscene,threat,insult,identity_hate
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
00001cee341fdb12,0.993057,0.381216,0.902665,0.022584,0.965348,0.884737
0000247867823ef7,0.002658,0.001850,0.009016,0.000051,0.005900,0.000161
00013b17ad220c46,0.003400,0.001680,0.006720,0.000013,0.001016,0.001075
00017563c3f7919a,0.000339,0.000247,0.000320,0.000037,0.000504,0.000012
00017695ad8997eb,0.004080,0.000942,0.009167,0.003139,0.002247,0.000518
...,...,...,...,...,...,...
fffcd0960ee309b5,0.419973,0.000584,0.015970,0.000153,0.007477,0.000024
fffd7a9a6eb32c16,0.016204,0.000399,0.005276,0.000007,0.003370,0.000022
fffda9e8d6fafa9e,0.000136,0.000159,0.001244,0.000004,0.000173,0.000008
fffe8f1340a79fc2,0.001031,0.000106,0.000383,0.000012,0.000143,0.000072


<IPython.core.display.Javascript object>

In [49]:
submission_df.to_csv(
    "../../data/jigsaw-toxic-comment-classification-challenge/submission.csv"
)

<IPython.core.display.Javascript object>