Sentiment analysis of Twitter data for disaster prediction

We conducted a research work to analysis sentiment of Twitter data based on BOW, contextual and context-free embeddings to predict disaster. We used three traditional machine learing models (decision tree, random forest, and logistic regression) and three popular pre-trained contextual embedings (Skip-gram, FastText, and GloVe). For context-free embeddings, we used pre-trained BERT (Bert-base-uncased) model.

Download data

We used data from a Kaggle competition. To download the data, please visit the following link: https://www.kaggle.com/c/nlp-getting-started

Download pre-trained embeddings

To download pre-trained embeddings, please visit the following link:

https://nlp.stanford.edu/projects/glove/
https://github.com/google-research/bert
"Advances in Pre-Training Distributed Word Representations", Mikolov T. G. and et al. Proceedings of the International Conference on Language Resources and Evaluation, LREC 2018

Reproduce our result

To reproduce the experimental results, follow the following steps:

Download data from the Kaggle competition and keep in "data" folder.
Download the pre-trained embeddings from the above folder and keep in the same source folder. Write the file name in our python code to load the embeddings. For example, you need to update the GLOVE_EMB variable in GloVe_softmax.py file with your own GloVe embedding file name.
Run "data_analysis.py" file to see basic data statistic result for the training dataset.
Run the other python files one by one to find the results of differnt machine learning models on different embeddings.

Environment settings

The following environment is used for the implementation:

python==3.6.2

torch==0.4.1

numpy==1.15.1

sklearn==0.19.2

Citation

Please acknowledge the following work in papers or derivative software:

@article{deb2022comparative,
  title={Comparative analysis of contextual and context-free embeddings in disaster prediction from Twitter data},
  author={Deb, Sumona and Chanda, Ashis Kumar},
  journal={Machine Learning with Applications},
  pages={100253},
  year={2022},
  publisher={Elsevier}
}

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
data		data
BERT_softmax.py		BERT_softmax.py
BOW_ML_methods.py		BOW_ML_methods.py
FastText_softmax.py		FastText_softmax.py
GloVe_LSTM.py		GloVe_LSTM.py
GloVe_softmax.py		GloVe_softmax.py
README.md		README.md
Skip-gram_Softmax.py		Skip-gram_Softmax.py
data_analysis.py		data_analysis.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sentiment analysis of Twitter data for disaster prediction

Download data

Download pre-trained embeddings

Reproduce our result

Environment settings

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Sentiment analysis of Twitter data for disaster prediction

Download data

Download pre-trained embeddings

Reproduce our result

Environment settings

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages