Skip to content

Comparison of multiple sentiment classification supervised ML approaches on two separate datasets: 1. social media blogging posts from StockTwits 2. News Headlines from market related Indian news articles

Notifications You must be signed in to change notification settings

LinusKaiser/NLP-NewsHeadlines-StockTwits

Repository files navigation

Short Description

The purpose of this study is to evaluate the most prevalent techniques of natural language processing (NLP) in terms of their advantages and disadvantages for financial sentiment analysis (FSA), as well as to determine whether machine-learning-based approaches to sentiment measurement outperform those that rely on human perception of linguistic features. Additionally, I will outline the differences between these various sentiment analysis techniques by exemplarily showing their application in the financial context to subsequently compare their forecasting performance. To do so, I will use a dataset of messages sent via the online social media website StockTwits as well as a dataset of news headlines. I discover that Machine Learning (ML) improves sentiment classification performance substantially.

Data Sets

StockTwits Word Cloud News Headlines Word Cloud
Image News Headlines Data Set - Word Cloud

Processing Pipeline

Image

Results

Dictionaries

Harvard-IV

StockTwits Harvard-IV Dictionary News Headlines Harvard-IV Dictionary
Image News Headlines Data Set - Word Cloud
Accuracy: 0.5230523690773067 Accuracy: 0.5721745635910225

Loughran and McDonald

StockTwits Loughran & McDonald Dictionary News Headlines Loughran & McDonald Dictionary
Image News Headlines Data Set - Word Cloud
Accuracy: 0.5375361596009975 Accuracy: 0.5841845386533666

Machine Learning

Naive Bayes

StockTwits Naive Bayes News Headlines Naive Bayes
Image News Headlines Data Set - Word Cloud
Accuracy: 0.7785857246253798 Accuracy: 0.7737062296905709
8 min 42.6 sec 3 Min 20.8 sec

Support Vector Machine

StockTwits Support Vector Machine News Headlines Support Vector Machine
Image News Headlines Data Set - Word Cloud
Accuracy: 0.6277 Accuracy: 0.6901
79 min 28.9 sec 57 min 23.9 sec

Logistic Regression

StockTwits Logistic Regression News Headlines Logistic Regression
Image News Headlines Data Set - Word Cloud
Accuracy: 0.8244248523230233 Accuracy: 0.7961816070591451
5 min 35.4 sec 2min 10 sec

Multilayer Perceptron

StockTwits Multilayer Perceptron News Headlines Multilayer Perceptron
Image News Headlines Data Set - Word Cloud
Accuracy: 0.8085178475501894 Accuracy: 0.7768529795204634
8 min 25.4 sec 3 min 9.7 sec

Neural Network

StockTwits Neural Network News Headlines Neural Network
Image News Headlines Data Set - Word Cloud
Accuracy: 0.830794497488041 Accuracy: 0.7621
63 min 48.3 sec 58 min 24.7 sec

About

Comparison of multiple sentiment classification supervised ML approaches on two separate datasets: 1. social media blogging posts from StockTwits 2. News Headlines from market related Indian news articles

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published