LLM-based Disaster Tweet Classification

Intro

Our model is a text classification solution based Large Language Models (LLMs) such as BERT. Given the text of a tweet, the model checks if the tweet is about a piece of real disaster news or a fake one. My solution leverages the powerful language understanding capability inherent in LLMs via fine-tuning. My model effectively encodes the tweets and utilizes a 4-layer fully-connected neural network as the classification head to predict the label. Additionally, I use other optimization techniques such as summarization, data manipulation, and pre-processing to improve the LLM-based model performance.

Installation

To get started, you'll need Python and pip installed.

Clone the Git repository

git clone https://github.com/anaeim/disaster-tweet-classification.git

Navigate to the project directory

cd disaster-tweet-classification

Create a directory for data
The data is accessible on the Kaggle website.

mkdir data

Install the requirements

pip install -r requirements.txt

Training

python predict.py --dataset-path data \
    --ml-model bert_model \
    --lm bert-large-uncased \
    --validation_split 0.2 \
    --epochs 3 \
    --batch_size 10

The meaning of the flags:

--dataset-path: the directory that contains the dataset
--ml-model: the Machine Learning (ML) model that we use. Here I only include LLM-based tweet classification solution that is based on BERT model.
--lm: the language model. We now support bert-base-uncased, bert-base-cased, bert-large-uncased, and roberta (bert-large-uncased by default).
--validation_split, --epochs, and --batch_size are the validation size, number of epochs and number of training examples in each iteration of the model, respectively.

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
dataloader		dataloader
datamanipulation		datamanipulation
models		models
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
config.py		config.py
predict.py		predict.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LLM-based Disaster Tweet Classification

Intro

Installation

Training

About

Releases

Packages

Languages

License

anaeim/disaster-tweet-classification

Folders and files

Latest commit

History

Repository files navigation

LLM-based Disaster Tweet Classification

Intro

Installation

Training

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages