Skip to content

JstnClmnt/NLP-News-Classification

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

34 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Topic Identification of Filipino and English News using Bidirectional Long-Short Term Memory with Attention Mechanisms

Topic Identification of News using BiLSTMs with Attention Mechanisms. Our data for the English News came from BBC News

Getting Started

Download the dataset from http://mlg.ucd.ie/files/datasets/bbc-fulltext.zip and extract it in a folder named /data in the local directory.

Prerequisites

What things you need to install the software and how to install them

Installing

  • Create a folder named /Graph_LSTM and /Graph in the local directory so tensorboard will save the files there for the Ordinary LSTMs and LSTMs with Attention.

  • For the English Word Embeddings, Download the GloVe Word Embeddings here and extract the file named glove.6B.300d.txt in the local directory

  • For the Tagalog Word Embeddings, Download the FastText Word Embeddings here and extract the file in the local directory and rename it to fasttext_tagalog.vec

Code

Just run the .ipynb file in your Jupyter Notebook and you're good to go.

Authors

  • Christian Justine Clemente - Initial work, Lead - JstnClmnt
  • Ezekiel David - Developer - kielboy8

See also the list of [contributors](https://github.com/JstnClmnt/NLP-News-Classification/contributors) who participated in this project.

References

[1] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., ... & Polosukhin, I. (2017). Attention is all you need. In Advances in neural information processing systems (pp. 5998-6008).
[2] Keras implementation of Attention Mechanisms
[3] Stop Words Function Removal for the Filipino Language
[4] Pennington, J., Socher, R., & Manning, C. (2014). Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP) (pp. 1532-1543).
[5] Bojanowski, P., Grave, E., Joulin, A., & Mikolov, T. (2017). Enriching word vectors with subword information. Transactions of the Association for Computational Linguistics, 5, 135-146.

About

News Classification using Attention Mechanisms

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published