Dalhousie University, Fall 2019
For this project, we were asked to the following steps using the Reuters news corpus:
- Cluster the documents
- Generate a set of features for each extracted cluster separately
- Train a classifier for each cluster
- In the Reuters Corpus, each article has multiple topics. I explored both multi-class (just take one topic per document) and multi-label (multiple topics per document) classifications.
- For text representation, I tried both TF-IDF and Word Embeddings and compared the final performance of each representation.
- For feature extraction, I implemented the autoencoders in paper Meng, Q., Catchpoole, D., Skillicom, D., & Kennedy, P. J. (2017, May). Relational autoencoder for feature extraction. In 2017 International Joint Conference on Neural Networks (IJCNN) (pp. 364-371). IEEE.
- Autoencoders and classifiers were implemented with Keras.