Skip to content

albertobas/20-news-classification

Repository files navigation

Classifying news from 20NewsGroup

Apache 2.0 licensed

About

Supervised learning analysis on the 20NewsGroup dataset, which contains 20000 messages taken from 20 newsgroups.

There is an English and a Spanish version, both in Jupyter notebook format.

Requirements

  • python==3.9.7
  • jupyter==1.0.0
  • torch==1.10.1
  • spacy==2.3.5
  • matplotlib==3.5.0
  • tqdm==4.62.3
  • numpy==1.19.2
  • scikit-learn==1.0.2

Running locally

$ git clone https://github.com/albertobas/20-news-classification.git
$ cd 20-news-classification
$ conda env create --file environment.yml
$ conda activate 20_news_classification
$ python -m spacy download en_core_web_sm
$ jupyter notebook