Topic Identification of Filipino and English News using Bidirectional Long-Short Term Memory with Attention Mechanisms
Topic Identification of News using BiLSTMs with Attention Mechanisms. Our data for the English News came from BBC News
Download the dataset from and extract it in a folder named /data
in the local directory.
What things you need to install the software and how to install them
- Keras - with Tensorflow Backend
- Tensorflow 1.12
- Pandas
- Matplotlib
- Numpy
Create a folder named
in the local directory so tensorboard will save the files there for the Ordinary LSTMs and LSTMs with Attention. -
For the English Word Embeddings, Download the GloVe Word Embeddings here and extract the file named
in the local directory -
For the Tagalog Word Embeddings, Download the FastText Word Embeddings here and extract the file in the local directory and rename it to
Just run the .ipynb
file in your Jupyter Notebook and you're good to go.
See also the list of [contributors]( who participated in this project.
[1] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., ... & Polosukhin, I. (2017). Attention is all you need. In Advances in neural information processing systems (pp. 5998-6008).
[2] Keras implementation of Attention Mechanisms
[3] Stop Words Function Removal for the Filipino Language
[4] Pennington, J., Socher, R., & Manning, C. (2014). Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP) (pp. 1532-1543).
[5] Bojanowski, P., Grave, E., Joulin, A., & Mikolov, T. (2017). Enriching word vectors with subword information. Transactions of the Association for Computational Linguistics, 5, 135-146.