ModelJack

Author: Nerses Nersesyan

Abstract

This project shows how to work with the various data sets in Wikipedia Talk project on Figshare using fasttext.

It is important to note that there is an excisting API demo version created by Jigsaw. The API scores a comment based on its potential impact on a conversation.More detailed information about this project can be found here.

In this notebook we show how to build a simple classifier using fasttext for detecting personal attacks and apply the classifier to a random sample of the comment corpus to see whether discussions on user pages have more personal attacks than discussion on article pages.

Impact

Quantity of social media platforms users is rising from day to day and online discussion has become integral to people’s experience of the internet. It would be naive to have ever expected that online discussion won't contain abuse or harrasment. Manually moderating comments and discussion forums can be tedious and expensive. That's why any tool which is capable to increase moderation quality and decrease it's expenses would be in demand.

Existing work

Research paper containing documentation on the data collection and modeling methodology.

Roadmap

Deliverable

Create a classifier using fastext with accuracy higher than 90%.

Milestone 1

Building a classifier based on fasttext for personal attacks

Milestone 2

Model tune
Use of classifier on the Wikipedia Talk Corpus

Resources

For training and evaluation of created model were used Wikipedia Talk project dataset. Wikipedia Talk project release includes:

large historical corpus of discussion comments on Wikipedia talk pages
sample of over 100k comments with human labels for whether the comment contains a personal attack
sample of over 100k comments with human labels for whether the comment has aggressive tone

Please refer to wiki for documentation of the schema of each data set.

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
README.md		README.md
Results.md		Results.md
figure_1.png		figure_1.png
ft_cls.py		ft_cls.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Results.md

Results.md

figure_1.png

figure_1.png

ft_cls.py

ft_cls.py

Repository files navigation

ModelJack

Abstract

Impact

Existing work

Roadmap

Resources

About

Releases

Packages

Languages

bittlingmayer/ModelJack

Folders and files

Latest commit

History

Repository files navigation

ModelJack

Abstract

Impact

Existing work

Roadmap

Resources

About

Resources

Stars

Watchers

Forks

Languages