Skip to content

Hamseals/Data-Science-2

Repository files navigation

Sentiment Analysis Using DistilRoberta, CryptoBERT, and SHAP

Overview

Github https://github.com/premstaller1/SHAP-DS2

The objective of this project is to implement, apply and compare different mashine learning models (Distilrobera, Cryptobert) based on financial datasets. Additionally, the implementation of SHAP therefore allows to gain a comprehensive understanding of influencing factors and to explore the performance of the models.

Research Questions

  1. What are the important features in the sentiment analysis of cryptocurrency and stock news/tweets using DistilRoberta-financial-sentiment/CryptoBERT and SHAP?
  2. How do the predictions of sentiment analysis of cryptocurrency and stock news compare using SHAP?
  3. How do the results of DistilRoberta-financial-sentiment (finetuned on financial news) compare with CryptoBERT (finetuned on crypto news)?
  4. What are possible applications where explainable sentiment analysis could be used productively?

Model Implementation and Dataset Processing

Initial Setup

import pandas as pd
import sklearn
import shap
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from transformers import TextClassificationPipeline, AutoModelForSequenceClassification, AutoTokenizer
from transformers import pipeline
from datasets import load_dataset
from sklearn.metrics import accuracy_score, precision_recall_fscore_support

Dataset Analysis

Three datasets were used:

DistilRoberta:

CryptoBERT:

Comparison Dataset:

Data Preprocessing

  • Balanced the CryptoBERT dataset to improve performance.
  • Analyzed sentiment distributions and applied further processing.

Model Accuracy Testing

model_name = "ElKulako/cryptobert"
tokenizer_crypto = AutoTokenizer.from_pretrained(model_name, use_fast=True)
model_crypto = AutoModelForSequenceClassification.from_pretrained(model_name)
pipe_crypto = TextClassificationPipeline(model=model_crypto, tokenizer=tokenizer_crypto, max_length=64, truncation=True, padding='max_length')
pipe_DR = pipeline("text-classification", model="mrm8488/distilroberta-finetuned-financial-news-sentiment-analysis")

Evaluation Metrics

Calculated accuracy, precision, recall, and F1-score for both models.

SHAP Analysis

Initial SHAP Analysis

Investigated the most important words influencing sentiment predictions for both models.

Comparison on Same Dataset

Applied both models to the same dataset to compare SHAP results. Significant differences and similarities were observed in sentiment predictions.

Web Application

A Streamlit web application was developed to provide interactive sentiment analysis using SHAP. The app can be accessed here. The code and files are hosted on a separate GitHub repository: SHAP_app.

Conclusion

Overall, the use of SHAP in sentiment analyses for the distilroberta and cryptobert models leads to a better understanding in terms of transparency, explainability and accountability.

Future Perspectives

Integration of sentiment scores into models predicting stock or cryptocurrency prices, and investigating the influence of specific features on predictions.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •