Categorizing News Headlines with Deep Learning

Separating parody and clickbait from real news using a Convolutional Neural Network

Motivation

When you have a service that is built around curating online content, it’s mission critical to make sure that you’re delivering your customers high quality content that is portrayed accurately.

Unfortunately, it can be hard to know what’s real. With such low barriers to online publishing, anything can end up online, and even professional journalists and politicians can be fooled. With many thousands of articles coming out every day, it’s a huge job to sort them out for your customers, and errors can have serious consequences.

In approaching this problem, I've found it useful to think of news articles not just as real versus fake, but as real, parody, or clickbait. This is because these are all written in distinct styles. Parody news is meant to mimic the style of real news, but generally aims to entertain rather than deceive. Meanwhile, clickbait can be filled with complete nonsense, and will meld itself into whatever format is most attention-grabbing.

Results

Training a Convolutional Neural Network on a database of 35,000 labelled headlines resulted in a classification accuracy of ~80%, a result that would have been state-of-the-art even a few years ago.

In addition to the modelling that is core to this project, I prepared a chart suggesting a plan for productization and implementation of the neural network.

Notebooks of Interest

Complete Project Notebook

Figures of Interest

Usage

On a daily basis, the neural network will be able to sort headlines into real news, parody news, clickbait, and news it’s not quite sure about.

Real news and parody news can go into the pipeline for publishing, appropriately labeled as what they are so that nobody thinks fish have taken over the government. Clickbait gets discarded, just as it should be. Finally, the uncertain headlines can be categorized by crowdsourcing to the public, just like companies like Google, Facebook, and OkCupid routinely do. After validation to make sure that the model improves, these get fed back to the neural network, making it even more accurate every single day.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
docs		docs
models		models
notebooks		notebooks
references		references
reports		reports
src		src
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
requirements.txt		requirements.txt
robot_faces.jpg		robot_faces.jpg
setup.py		setup.py
test_environment.py		test_environment.py
tox.ini		tox.ini

License

calremmel/deep-learning-news-headlines

Folders and files

Latest commit

History

Repository files navigation

Categorizing News Headlines with Deep Learning

Motivation

Results

Notebooks of Interest

Figures of Interest

Usage

About

Resources

License

Stars

Watchers

Forks

Languages