Skip to content

Doge sentiment analysis and stock correlation. Reddit comment extraction, ML evaluation using Naive Bayes, Random Forest.

Notifications You must be signed in to change notification settings

cspence001/crypto_sentiment_analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

68 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

crypto_sentiment_analysis

Crypto Sentiment Analysis

Crypto Sentiment Analysis is a proof-of-concept application designed to determine and evaluate social sentiment of the Dogecoin cryptocurrency, analyzing its correlation with the cryptocurrency's stock ticker value. This repository contains a comprehensive two-part project focused on sentiment analysis and machine learning classification, particularly during Dogecoin's heightened volatility over a span of three months in 2021.

Sentiment Analysis

Comment Extraction

  • Utilizing the Reddit API, this project extracts thread ID data from the "Daily Discussion" threads within the subreddit r/doge for each day of the three-month period.
  • The PMAW third-party wrapper facilitates the batch extraction of a total of 554k comments contained in these "Daily Discussion" threads.

Sentiment Analysis using VADER

  • The sentiment analysis process involves determining the polarity/compound, positive, negative, and neutral scores of each comment on a scale from -1 to 1 using VADER, a sentiment analysis tool specifically attuned to social media content.
  • Based on the polarity/compound score, each comment is attributed an overall positive, negative, or neutral rating.

Stock Value Correlation

  • Utilizing the CoinGecko API, this project extracts 5-minute interval data of Dogecoin's stock value over the span of three months.
  • By parsing comment scores and using timestamps by interval, the project calculates the mean average of each comment score (compound, positive, negative, neutral) for every 5-minute interval.
  • This data is then analyzed successively in tandem with the stock ticker value for plot/chart use, allowing for correlation analysis.

Machine Learning Classification

Evaluation of Classification Process

  • Using the VADER compound score of each comment, an overall determination of comment rating (positive, negative, neutral) is attributed to each comment.
  • These ratings are then applied in training models to determine prediction accuracy using Naive Bayes and Random Forests based on the VADER classification.

Performance Results

  • The project includes detailed analyses of the process and performance results of the machine learning classification.
  • Accuracy prediction results for Naive Bayes and Random Forest models are presented, along with a classification report and Confusion Matrix Heatmap display as determined by the Random Forest model.

Conclusion

Crypto Sentiment Analysis provides valuable insights into the social sentiment surrounding Dogecoin, offering a nuanced understanding of its fluctuations in correlation with stock ticker value. This repository serves as a comprehensive resource for those interested in sentiment analysis and machine learning applications within the cryptocurrency domain.

application deployment [heroku down, use local]


Local Application Deployment
git clone https://github.com/cspence001/crypto_sentiment_analysis.git
cd crypto_sentiment_analysis
python3 app.py

About

Doge sentiment analysis and stock correlation. Reddit comment extraction, ML evaluation using Naive Bayes, Random Forest.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published