Topic Identification of Filipino and English News using Bidirectional Long-Short Term Memory with Attention Mechanisms
Topic Identification of News using BiLSTMs with Attention Mechanisms. Our data for the English News came from BBC News
Download the dataset from http://mlg.ucd.ie/files/datasets/bbc-fulltext.zip and extract it in a folder named /data
in the local directory.
What things you need to install the software and how to install them
- Keras - with Tensorflow Backend
- Tensorflow 1.12
- Pandas
- Matplotlib
- Numpy
- NLTK
-
Create a folder named
/Graph_LSTM
and/Graph
in the local directory so tensorboard will save the files there for the Ordinary LSTMs and LSTMs with Attention. -
For the English Word Embeddings, Download the GloVe Word Embeddings here and extract the file named
glove.6B.300d.txt
in the local directory -
For the Tagalog Word Embeddings, Download the FastText Word Embeddings here and extract the file in the local directory and rename it to
fasttext_tagalog.vec
Just run the .ipynb
file in your Jupyter Notebook and you're good to go.
See also the list of [contributors](https://github.com/JstnClmnt/NLP-News-Classification/contributors) who participated in this project.
[1] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., ... & Polosukhin, I. (2017). Attention is all you need. In Advances in neural information processing systems (pp. 5998-6008).
[2] Keras implementation of Attention Mechanisms
[3] Stop Words Function Removal for the Filipino Language
[4] Pennington, J., Socher, R., & Manning, C. (2014). Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP) (pp. 1532-1543).
[5] Bojanowski, P., Grave, E., Joulin, A., & Mikolov, T. (2017). Enriching word vectors with subword information. Transactions of the Association for Computational Linguistics, 5, 135-146.