Cyberbullying Detection using Sentiment Analysis

In recent years, cyberbullying has become a pervasive issue affecting many individuals across various online platforms. Despite increasing awareness, statistics indicate that substantial actions to mitigate cyberbullying are still lacking. To address this, our project develops a sophisticated BiLSTM model for detecting cyberbullying instances in text. We also compare its performance against other models like Logistic Regression (LR) and VADER (Valence Aware Dictionary and sEntiment Reasoner). Our system currently achieves an accuracy rate of 81% on the test set, demonstrating its potential in identifying and classifying cyberbullying behavior effectively.

Project Structure Overview

Python Notebooks

bi_lstm_model.ipynb: Implements the BiLSTM model training, and including some preprocessing, model setup, and training loop, as well as evaluation and graphing of the model's performance.
vader_sentiment_analysis.ipynb: Creates and Evaluates the Vader Sentiment Model on our testing data.
lr_model.ipynb: Our implementation of a Logistic Regression Model as well as it's testing.
google_emotion_dataset_preprocessing_pt1.ipynb: Preprocessing for our Google emotions dataset. Converts an entry label into whether it is cyberbullying or not given the emotional labels that it has. It then saves all of the converted entries to a new CSV file after processing.
google_emotion_dataset_preprocessing_pt2.ipynb: The newly processed text is then converted to lowercase, tokenized, and rid of stopwords, and punctuation, and saved to another CSV file.
twitter_data_preprocessing.ipynb: Takes in our Cyberbullying Twitter Dataset and preprocesses it for usage, saving it to a new file after processing.

Datasets

Our models are trained and evaluated using several datasets, some of these are the final datasets after preprocessing:

cyberbullying_tweets.csv
go_emotions_dataset.csv
google_cyberbullying_dataset.csv
processed_concat_data.csv
processed_cyberbullying_tweets.csv
processed_google_data.csv

Our Model and Embeddings

This section lists the files related to our neural network model and the embeddings used for training:

cyberbullying_detection_model_epoch10.h5 - The saved BiLSTM model after 10 epochs of training.
word2vec_model.model - Word2Vec model used for generating word embeddings.
word2vec_embeddings.txt - Text file containing the word embeddings.

Deprecated Files

The deprecated_files folder contains older versions of notebooks and data analyses that are retained for archival purposes.

How to Run

For this project simply clone the repo and open up each notebook you'd like to run in the code editor of your choice.

Note: in order to run these files you need some prerequisite libraries, please run pip/pip3 install [library], some include tensorflow, sklearn, and word2vec, among others. Python 3.9 or 3.10 is recommended. If on the M chip series of macbooks (M1-M3) please download with pip install tensorflow-metal.

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
Deprecated_Files		Deprecated_Files
README.md		README.md
bi_lstm_model.ipynb		bi_lstm_model.ipynb
cyberbullying_detection_model_epoch10.h5		cyberbullying_detection_model_epoch10.h5
cyberbullying_tweets.csv		cyberbullying_tweets.csv
go_emotions_dataset.csv		go_emotions_dataset.csv
google_cyberbullying_dataset.csv		google_cyberbullying_dataset.csv
google_emotion_dataset_preprocessing_pt1.ipynb		google_emotion_dataset_preprocessing_pt1.ipynb
google_emotion_dataset_preprocessing_pt2.ipynb		google_emotion_dataset_preprocessing_pt2.ipynb
lr_model.ipynb		lr_model.ipynb
processed_concat_data.csv		processed_concat_data.csv
processed_cyberbullying_tweets.csv		processed_cyberbullying_tweets.csv
processed_google_data.csv		processed_google_data.csv
twitter_data_preprocessing.ipynb		twitter_data_preprocessing.ipynb
vader_sentiment_analysis.ipynb		vader_sentiment_analysis.ipynb
word2vec_embeddings.txt		word2vec_embeddings.txt
word2vec_model.model		word2vec_model.model

arinjay-singh/CyberbullyingSentimentAnalysis

Folders and files

Latest commit

History

Repository files navigation

Cyberbullying Detection using Sentiment Analysis

Project Structure Overview

Python Notebooks

Datasets

Our Model and Embeddings

Deprecated Files

How to Run

About

Resources

Stars

Watchers

Forks

Languages