Skip to content

alfarias/news-classification-distilbert

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 

Repository files navigation

HuffPost News Classification with DistilBERT

HuffPost is an american website about opinions and news like politcs, culture, wellness, etc. It was founded in 2005 by Andrew Breitbart, Arianna Huffington, Kenneth Lerer, and Jonah Peretti. The dataset provides data from 2012 to 2018.

In this work is made a exploratory data analysis (EDA) and category classification of the news posted in HuffPost using the Headlines and Short Descriptions of these news. For the EDA, Headlines, Short Descriptions and Categories are analysed through the years, and for the classification task, it's used a lighter version of the popular Natural Language Processing (NLP) framework BERT developed by Google, called DistilBERT, in this developed by huggingface.

Kaggle Notebook

About

Exploratory data analysis (EDA) and category classification of the news posted in HuffPost using the Headlines and Short Descriptions of these news.

Topics

Resources

License

Stars

Watchers

Forks