## Assignment: Financial Analysis and Sentiment Analysis with LLM
**Objective**: You will work with a Large Language Model (LLM) to generate financial strategies and analyze sentiment from market-related content. The goal is to implement practical finance-related tasks and use LLM for sentiment analysis of social media posts.

## 2.	Task 2 - Sentiment Analysis of Market-related Tweets:

○	Find two finance or market-related posts on Twitter (or any other social media platform) about a recent market event or financial news.
○	Pass this text to the LLM model and ask it to analyze the sentiment of each post.
○	The model should classify the sentiment of the post as positive, negative, or neutral, and provide a brief explanation for the classification.

# Example Prompt:

■	Tweet: "Stocks are surging today, with tech companies leading the way! Bullish sentiment driving the market!"
■	Model Response: Sentiment: Positive. Explanation: The tweet expresses optimism with the market trending upwards and highlighting the performance of tech stocks.



## 1. Project Planning and Requirements Gathering
# a. Define Objectives
**Primary Goal:** Analyze the sentiment of market-related tweets to gauge public opinion on recent financial events.
**Secondary Goals:** Identify trends, correlate sentiment with market movements, and generate actionable insights for stakeholders.
# b. Identify Stakeholders
**Internal:** Data scientists, data engineers, financial analysts, project managers.
**External:** Investors, financial institutions, social media platforms.
# c. Determine Success Metrics
**Accuracy:** Correct classification of sentiments (Positive, Negative, Neutral).

**Precision & Recall:** Especially important if certain sentiment classes are of higher interest.

**Processing Time:** Efficiency in handling data in real-time or batch processing.

**Scalability:** Ability to handle increasing volumes of data.

## 2. Data Collection
# Selecting the Data Source
**Platform Choice:** Twitter (X) is ideal due to its real-time data and relevance to market discussions.

## 3. Data Preprocessing
# a. Data Cleaning
**Remove Noise:** Eliminate URLs, mentions (@user), hashtags (if not needed for context), emojis, and special characters.
**Case Normalization:** Convert text to lowercase to maintain consistency.
# b. Text Normalization
**Tokenization:** Break down text into individual tokens or words.
Stop Words Removal: Remove common words that do not contribute to sentiment (e.g., “the,” “is,” “at”).
**Stemming/Lemmatization:** Reduce words to their root forms to standardize the dataset.
# c. Handling Imbalanced Data
**Class Distribution:** Ensure balanced representation of Positive, Negative, and Neutral sentiments.
**Techniques:** Use oversampling (e.g., SMOTE), undersampling, or class weighting during model training.
# d. Feature Engineering
**Bag of Words (BoW)**: Represent text data as frequency counts.
**TF-IDF**: Capture the importance of words relative to the document and corpus.
**Word Embeddings:** Utilize models like Word2Vec, GloVe, or contextual embeddings from transformers (e.g., BERT)


## 4. Model Selection and Training
# a. Leveraging Pre-trained Language Models (LLMs)
**Models to Consider:** GPT-4, BERT, RoBERTa, DistilBERT.
**Advantages**: Pre-trained models understand context and semantics better, requiring less data for fine-tuning.
# b. Fine-Tuning the LLM
**Dataset Preparation:** Create labeled datasets with sentiments annotated as Positive, Negative, or Neutral.
**Training Process:** Fine-tune the model on your specific dataset to adapt it to financial language nuances.
**Tools:** Use frameworks like Hugging Face’s Transformers, TensorFlow, or PyTorch.

In [1]:
# Install important Libraries
!pip install transformers torch pandas numpy scikit-learn




In [2]:
#Import Important Libraries

from transformers import BertTokenizer, BertForSequenceClassification
import torch

# Load pre-trained FinBERT model and tokenizer
tokenizer = BertTokenizer.from_pretrained('yiyanghkust/finbert-tone')
model = BertForSequenceClassification.from_pretrained('yiyanghkust/finbert-tone')


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


vocab.txt:   0%|          | 0.00/226k [00:00<?, ?B/s]

config.json:   0%|          | 0.00/533 [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/439M [00:00<?, ?B/s]

In [3]:
def analyze_sentiment_finbert(tweet):
    inputs = tokenizer(tweet, return_tensors='pt', truncation=True, padding=True)
    outputs = model(**inputs)
    probs = torch.nn.functional.softmax(outputs.logits, dim=1)
    sentiment = torch.argmax(probs).item()
    sentiments = ['Negative', 'Neutral', 'Positive']
    return sentiments[sentiment]


In [4]:
!pip install shap




In [5]:
import shap

# Define a wrapper function for SHAP
def finbert_predict(texts):
    inputs = tokenizer(texts, return_tensors='pt', truncation=True, padding=True)
    outputs = model(**inputs)
    probs = torch.nn.functional.softmax(outputs.logits, dim=1).detach().numpy()
    return probs

# Initialize SHAP explainer
explainer = shap.Explainer(finbert_predict, tokenizer)


In [11]:
explainer = shap.Explainer(finbert_predict, tokenizer)  # tokenizer is not a list of strings


In [13]:
!pip install transformers torch shap




In [15]:
from transformers import BertTokenizer, BertForSequenceClassification
import torch
import shap

# Load pre-trained FinBERT model and tokenizer
tokenizer = BertTokenizer.from_pretrained('yiyanghkust/finbert-tone')
model = BertForSequenceClassification.from_pretrained('yiyanghkust/finbert-tone')
model.eval()  # Set model to evaluation mode


BertForSequenceClassification(
  (bert): BertModel(
    (embeddings): BertEmbeddings(
      (word_embeddings): Embedding(30873, 768, padding_idx=0)
      (position_embeddings): Embedding(512, 768)
      (token_type_embeddings): Embedding(2, 768)
      (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
      (dropout): Dropout(p=0.1, inplace=False)
    )
    (encoder): BertEncoder(
      (layer): ModuleList(
        (0-11): 12 x BertLayer(
          (attention): BertAttention(
            (self): BertSdpaSelfAttention(
              (query): Linear(in_features=768, out_features=768, bias=True)
              (key): Linear(in_features=768, out_features=768, bias=True)
              (value): Linear(in_features=768, out_features=768, bias=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
            (output): BertSelfOutput(
              (dense): Linear(in_features=768, out_features=768, bias=True)
              (LayerNorm): LayerNorm((768,), eps=1e

In [16]:
def finbert_predict(texts):
    # Ensure texts is a list of strings
    if isinstance(texts, str):
        texts = [texts]
    elif isinstance(texts, list):
        if not all(isinstance(text, str) for text in texts):
            raise ValueError("All items in the list must be strings.")
    else:
        raise ValueError("Input must be a string or a list of strings.")

    # Tokenize the input texts
    inputs = tokenizer(texts, return_tensors='pt', truncation=True, padding=True)

    # Perform forward pass
    with torch.no_grad():
        outputs = model(**inputs)

    # Apply softmax to get probabilities
    probs = torch.nn.functional.softmax(outputs.logits, dim=1).numpy()
    return probs


In [18]:
# Example background tweets (you can replace these with your own samples)
background_texts = [
    "The market is bullish today with significant gains in tech stocks.",
    "Economic uncertainty leads to a downturn in the stock market.",
    "Neutral sentiments observed as markets stabilize.",
    "Investors are optimistic about the upcoming earnings reports.",
    "Market volatility continues amidst global economic challenges."
]


In [19]:
# Initialize SHAP Explainer with the finbert_predict function and background data
explainer = shap.Explainer(finbert_predict, background_texts)


In [34]:
# Example tweet to analyze
tweet = "Stocks are surging today, with tech companies leading the way! Bullish sentiment driving the market!"



In [36]:
def analyze_and_explain(tweet, top_n=3):
    sentiment_probs = finbert_predict(tweet1)
    sentiment_index = sentiment_probs.argmax(axis=1)[0]
    sentiments = ['Negative', 'Neutral', 'Positive']
    sentiment = sentiments[sentiment_index]

    explanation = f"Key terms influencing sentiment: {', '.join(tweet)}"

    return {
        "Tweet": tweet,
        "Sentiment": sentiment,
        "Explanation": explanation
    }

# Example Usage
result = analyze_and_explain(tweet)
print(result)


{'Tweet': 'Stocks are surging today, with tech companies leading the way! Bullish sentiment driving the market!', 'Sentiment': 'Positive', 'Explanation': 'Key terms influencing sentiment: S, t, o, c, k, s,  , a, r, e,  , s, u, r, g, i, n, g,  , t, o, d, a, y, ,,  , w, i, t, h,  , t, e, c, h,  , c, o, m, p, a, n, i, e, s,  , l, e, a, d, i, n, g,  , t, h, e,  , w, a, y, !,  , B, u, l, l, i, s, h,  , s, e, n, t, i, m, e, n, t,  , d, r, i, v, i, n, g,  , t, h, e,  , m, a, r, k, e, t, !'}


In [63]:
# Example tweet to analyze
tweet1 = "Money flows effortlessly into my life, bringing abundance and joy. I am a magnet for financial success."

In [66]:
def analyze_and_explain(tweet1, top_n=3):
    sentiment_probs = finbert_predict(tweet1)
    sentiment_index = sentiment_probs.argmax(axis=1)[0]
    sentiments = ['Negative', 'Neutral', 'Positive']
    sentiment = sentiments[sentiment_index]

    explanation = f"Key terms influencing sentiment: {', '.join(tweet1)}"

    return {
        "Tweet": tweet1,
        "Sentiment": sentiment,
        "Explanation": explanation
    }

# Example Usage
result = analyze_and_explain(tweet1)
print(result)


Predictions for texts: Money flows effortlessly into my life, bringing abundance and joy. I am a magnet for financial success.
[[{'label': 'Neutral', 'score': 1.0452318122133875e-07}, {'label': 'Positive', 'score': 0.9999994039535522}, {'label': 'Negative', 'score': 4.575226739689242e-07}]]
Extracted scores:
[[1.04523181e-07 9.99999404e-01 4.57522674e-07]]
{'Tweet': 'Money flows effortlessly into my life, bringing abundance and joy. I am a magnet for financial success.', 'Sentiment': 'Neutral', 'Explanation': 'Key terms influencing sentiment: M, o, n, e, y,  , f, l, o, w, s,  , e, f, f, o, r, t, l, e, s, s, l, y,  , i, n, t, o,  , m, y,  , l, i, f, e, ,,  , b, r, i, n, g, i, n, g,  , a, b, u, n, d, a, n, c, e,  , a, n, d,  , j, o, y, .,  , I,  , a, m,  , a,  , m, a, g, n, e, t,  , f, o, r,  , f, i, n, a, n, c, i, a, l,  , s, u, c, c, e, s, s, .'}


In [68]:
# Example tweet to analyze
tweet2 = "Those Hours in front of the charts will soon bring you a fortune"

In [72]:
def analyze_and_explain(tweet2, top_n=3):
    sentiment_probs = finbert_predict(tweet2)
    sentiment_index = sentiment_probs.argmax(axis=1)[0]
    sentiments = ['Negative', 'Neutral', 'Positive']
    sentiment = sentiments[sentiment_index]

    explanation = f"Key terms influencing sentiment: {', '.join(tweet2)}"

    return {
        "Tweet": tweet2,
        "Sentiment": sentiment,
        "Explanation": explanation
    }

# Example Usage
result = analyze_and_explain(tweet2)
print(result)


Predictions for texts: Those Hours in front of the charts will soon bring you a fortune
[[{'label': 'Neutral', 'score': 0.9999929666519165}, {'label': 'Positive', 'score': 2.8017070690111723e-06}, {'label': 'Negative', 'score': 4.22753646489582e-06}]]
Extracted scores:
[[9.99992967e-01 2.80170707e-06 4.22753646e-06]]
{'Tweet': 'Those Hours in front of the charts will soon bring you a fortune', 'Sentiment': 'Negative', 'Explanation': 'Key terms influencing sentiment: T, h, o, s, e,  , H, o, u, r, s,  , i, n,  , f, r, o, n, t,  , o, f,  , t, h, e,  , c, h, a, r, t, s,  , w, i, l, l,  , s, o, o, n,  , b, r, i, n, g,  , y, o, u,  , a,  , f, o, r, t, u, n, e'}


In [73]:
# Example tweet to analyze
tweet3 = "Financial conditions have meaningfully tightened"

In [74]:
def analyze_and_explain(tweet3, top_n=3):
    sentiment_probs = finbert_predict(tweet3)
    sentiment_index = sentiment_probs.argmax(axis=1)[0]
    sentiments = ['Negative', 'Neutral', 'Positive']
    sentiment = sentiments[sentiment_index]

    explanation = f"Key terms influencing sentiment: {', '.join(tweet3)}"

    return {
        "Tweet": tweet3,
        "Sentiment": sentiment,
        "Explanation": explanation
    }

# Example Usage
result = analyze_and_explain(tweet3)
print(result)


Predictions for texts: Financial conditions have meaningfully tightened
[[{'label': 'Neutral', 'score': 0.043096963316202164}, {'label': 'Positive', 'score': 0.04083748534321785}, {'label': 'Negative', 'score': 0.916065514087677}]]
Extracted scores:
[[0.04309696 0.04083749 0.91606551]]
{'Tweet': 'Financial conditions have meaningfully tightened', 'Sentiment': 'Positive', 'Explanation': 'Key terms influencing sentiment: F, i, n, a, n, c, i, a, l,  , c, o, n, d, i, t, i, o, n, s,  , h, a, v, e,  , m, e, a, n, i, n, g, f, u, l, l, y,  , t, i, g, h, t, e, n, e, d'}
