Skip to content

This research wants to build a time serie of the polarity of tweets related to a cluster of firms, and compare it to the time serie of the same firms in the stock market.

Notifications You must be signed in to change notification settings

DavideGiardini/Twitter-Sentiment-Analysis-to-predict-the-stock-market

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Evaluating the efficiency of Twitter Sentiment Analysis as a tool of prediction for the stock market

This research wants to build a time serie of the polarity of tweets related to a cluster of firms, and compare it to the time serie of the same firms in the stock market. The chosen firms are: Apple, Google, Nike, Nestlé, Beyond Meat, Bayer and NovaVax.
A real forecasting capacity of the Twitter Sentiment series would mean that this method could have a valuable implementation to trading-bots.

Download and Analysis

To download all the tweets related to the firms I used Twitter's API. I wrote all the code in R, and automated it thanks to Windows Task Manager, so that the download would have started every day at the same our, by itself. The downloaded tweets were then automatically uploaded to OneDrive, so that I could access them at any time.

I also implemented an automated Gmail notification that would notified me that the download was correctly occured and would send me some general statistics.

I then proceded to clean all the data, lemmatize it, and analyze it. In order to sentiment analyze it I used 3 methods:

  1. Naive Bayes
    Based on the Bayes Theorem, the algorithm classifies every tweet as "positive" or "negative" using the "MPQA Subjectivity Lexicon" by Janyce Wiebe.
  2. Syuzhet
    Uses package Syuzhet and homonym dictionary to give a score to each tweet.
  3. udpipe
    Uses package UdPipe (with the MPQA subj lexicon) to give a score to each tweet. Has the possibility to use inensifier, weakeners and modifiers (so that it can, for example, distinguish between "good", "very good", "quite good" and "not good")

Visualizations

The data was then visualized using R package "ggplot2". Here is some example of some of the graph I built, using Google as reference:

Conclusions

In order to evaluate the existence of a relationship of causality between the tweets and the closing price of the firm, I built a test based on Granger Causality Test that I called "Close Test". This test brought very positive results highlighting numerous relationship of causality, summarized in the next table:

image

where:
"n" = number of significant reletionship found
"*" = number of relationship with a p.value < 0.10
"**" = number of relationship with a p.value < 0.05
"***" = number of relationship with a p.value < 0.01

The Test Score also found that, in our cluster of firms, using the tweets that only refer to the value of the firm in the stock market (for ex containing: $AAPL, $GOOGL, ecc.) (dataset "stock") is more suitable for a short term prevision (forecasting the closing prize of the same day), while using the tweet tha refer to the company in general (for ex containing also: Apple, Google, ecc.) (dataset "score") is more suitable for longer term prevision (forecasting the value in the next days). As summarized by the next table:

image

About

This research wants to build a time serie of the polarity of tweets related to a cluster of firms, and compare it to the time serie of the same firms in the stock market.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published