- Used clustering algorithms to classify documents using “20 newsgroups dataset“ based on the bag of words model.
- Converted each document to a TFIDF vector and then ran the K-Means and Gaussian Mixture Models algorithms.
- For evaluation, computed the weighted F-1 score on the test set for K-Means and Gaussian Mixture Models.
- Trained the clustering algorithm on the training set. For each test instance, predicted the cluster to which it belongs and assign the predicted topic to the test instance based on the majority topic of the cluster.
- Reported results for at least 5 different parameter settings (varying k for k-means, varying the number of mixture components for GMM).
Current Version : v1.0.0.0
Last Update : 11.02.2016