GitHub - anjanatiha/Clustering-for-Document-Classification: Clustering news documents using bag of words model to classify documents

Clustering for Document Classification (News Classification)

Used clustering algorithms to classify documents using “20 newsgroups dataset“ based on the bag of words model.
Converted each document to a TFIDF vector and then ran the K-Means and Gaussian Mixture Models algorithms.
For evaluation, computed the weighted F-1 score on the test set for K-Means and Gaussian Mixture Models.
Trained the clustering algorithm on the training set. For each test instance, predicted the cluster to which it belongs and assign the predicted topic to the test instance based on the majority topic of the cluster.
Reported results for at least 5 different parameter settings (varying k for k-means, varying the number of mixture components for GMM).

Current Version : v1.0.0.0

Last Update : 11.02.2016

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.ipynb_checkpoints		.ipynb_checkpoints
obsolete		obsolete
original		original
.gitattributes		.gitattributes
Homework 3.pdf		Homework 3.pdf
LICENSE		LICENSE
README.md		README.md
news_docs_classification.ipynb		news_docs_classification.ipynb
news_docs_classification.py		news_docs_classification.py