# Financial News Sentiment

When working with LLMs, I've often spend a lot of time on prompt engineering.  It's a bit like going to the eye doctor since  the question "Is this one better than that one?" keeps getting asked, for each system prompt variant.  With this notebook, you're able to run many more tests at scale, so you are no longer relying on just one datapoint to compare two system prompts.

The goal of this notebook is to explore how a state-of-the-art general purpose LLM performs compared a highly specialized Transformer model on the task of classifying sentiment in financial news.  The actual `ollama_model` and the `ollama_system_prompt` are configurable, via notebook parameters, so different permutations can be explored.

This notebook also connects to live news data and assigns a sentiment from the models, allowing for the possiblitiy of human evaluators to be able to judge how each model performed.

<u>The notebook flows in these steps:</u>
1. Load Financial Sentiment Evaluation Dataset
    - Financial PhraseBank
2. Load Sentiment Models
    - Transformer: FinBERT
    - LLM: Qwen3
3. Evaluate Models
4. Retrieve Live News Articles
5. Assign Sentiment to Live News

### Notebook Parameters

In [None]:
ollama_url = "http://localhost:11434" # Default Ollama URL
ollama_model =  "qwen3"
ollama_system_prompt =  "You are a financial sentiment analysis AI. Classify the following text as 'positive', 'negative', or 'neutral'. Respond with only one word, the chosen sentiment label. Do not provide any other text or explanation."

num_sentences = 100 # Number of sentences to use from evaluation set
shuffle_seed = 23 # Shuffle the evaluation set of better distribution of labels; shuffle seed is hardcoded here for reproducibility

live_news_ticker = "YELP" # Pull the latest news for this stock
num_live_articles = 10 # How many live news articles to pull for that stock

### Imports and Setup

In [None]:
import io
from langchain_core.messages import HumanMessage, SystemMessage
from langchain_ollama import ChatOllama
from newspaper import Article
import random
import requests
from sklearn.metrics import accuracy_score, classification_report
import torch
from transformers import AutoModelForSequenceClassification, AutoTokenizer
import yfinance as yf
import zipfile

In [None]:
# Needed for FinBERT initialization
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

### Load the Evaluation Dataset

In [None]:
# Download the ZIP file
url = "https://huggingface.co/datasets/takala/financial_phrasebank/resolve/main/data/FinancialPhraseBank-v1.0.zip"
response = requests.get(url)
zip_file = zipfile.ZipFile(io.BytesIO(response.content))

In [None]:
# Extract the specific file: Sentences_AllAgree.txt (note: it's inside FinancialPhraseBank-v1.0 folder)
file_path = "FinancialPhraseBank-v1.0/Sentences_AllAgree.txt"
with zip_file.open(file_path) as f:
    lines = f.read().decode('ISO-8859-1').splitlines()

In [None]:
lines[0:10]

In [None]:
# Create a vector of sentences and a vector of labels
# Note that Transformer models use numbers (0, 1, 2) for classification labels instead of text (negative, neutral, positive), so conversion is necessary

data_tuples = []
label_map = {'negative': 0, 'neutral': 1, 'positive': 2}
id_to_label = {0: 'negative', 1: 'neutral', 2: 'positive'}

for line in lines:
    if '@' in line:
        sentence, label_str = line.rsplit('@', 1)
        sentence = sentence.strip()
        label_str = label_str.strip()
        if label_str in label_map:
            data_tuples.append((sentence, label_map[label_str]))

# Add Shuffling to better balance label distribution
random.seed(shuffle_seed) # Hardcoded seed for reproducibility
random.shuffle(data_tuples)

sentences = [t[0] for t in data_tuples]
labels = [t[1] for t in data_tuples]

In [None]:
sentences[0:10]

In [None]:
labels[0:10]

### Load Transformer Model

In [None]:
class FinBERTModel:
    def __init__(self):
        self.tokenizer = AutoTokenizer.from_pretrained("ProsusAI/finbert")
        self.model = AutoModelForSequenceClassification.from_pretrained("ProsusAI/finbert")
        self.model.to(device)
        self.name = "FinBERT"
        # FinBERT has a different label order, so we define a mapping
        self.finbert_label_map = {0: 2, 1: 0, 2: 1} # 0 (positive) -> 2 (positive), 1 (negative) -> 0 (negative), 2 (neutral) -> 1 (neutral)

    def predict(self, sentence):
        inputs = self.tokenizer(sentence, return_tensors="pt", truncation=True, padding=True).to(device)
        with torch.no_grad():
            logits = self.model(**inputs).logits
        
        # Get the model's predicted class ID
        predicted_class_id = logits.argmax(dim=-1).item()
        
        # Map the model's class ID to our dataset's label ID
        return self.finbert_label_map[predicted_class_id]

In [None]:
finbert_model = FinBERTModel()

### Load Ollama Model

In [None]:
class OllamaModel:
    def __init__(self, url, model_name):
        self.chat_model = ChatOllama(base_url=url, model=model_name, reasoning=False) # Turn off reasoning for quicker response without <think> phase
        self.system_prompt = ollama_system_prompt
        self.name = model_name
        self.label_map = {'negative': 0, 'neutral': 1, 'positive': 2}

    def predict(self, sentence):
        messages = [
            SystemMessage(content=self.system_prompt),
            HumanMessage(content=sentence)
        ]
        response = self.chat_model.invoke(messages)
        predicted_label_string = response.content.lower()
        
        # Return None if the model's response is not a valid label
        if predicted_label_string not in self.label_map:
            return None
        
        return self.label_map[predicted_label_string]

In [None]:
ollama_model = OllamaModel(ollama_url, ollama_model)

### Evaluation Function

In [None]:
def evaluate_model(model, sentences, labels, num_sentences):
    """
    Evaluates a sentiment classification model and returns a dictionary of metrics.
    """
    true_labels = []
    predicted_labels = []
    invalid_responses = 0

    print(f"Running classification with {model.name}...")
    print(f"Evaluating {num_sentences} sentences.")
    print("---")

    for i in range(min(num_sentences, len(sentences))):
        sentence = sentences[i]
        actual_label = labels[i]
        predicted_class_id = model.predict(sentence)

        # Only append to lists if the prediction is valid
        if predicted_class_id is not None:
            true_labels.append(actual_label)
            predicted_labels.append(predicted_class_id)
        else:
            invalid_responses += 1

    # Handle the case where all responses are invalid
    if not true_labels:
        return {
            'model_name': model.name,
            'accuracy': 0.0,
            'classification_report': 'No valid predictions were made.',
            'invalid_responses': invalid_responses
        }
    
    # Calculate evaluation metrics
    accuracy = accuracy_score(true_labels, predicted_labels)
    report = classification_report(
        true_labels,
        predicted_labels,
        target_names=list(label_map.keys()),
        labels=[0, 1, 2],
        zero_division=0
    )
    
    # Return metrics in a structured format
    return {
        'model_name': model.name,
        'accuracy': accuracy,
        'classification_report': report,
        'invalid_responses': invalid_responses
    }

### Evaluate Transformer Model

In [None]:
finbert_metrics = evaluate_model(finbert_model, sentences, labels, num_sentences)

In [None]:
print(f"{finbert_metrics['model_name'].upper()} EVALUATION SUMMARY")
print("="*50)
print(f"Accuracy: {finbert_metrics['accuracy']:.4f}\n")
print(f"Invalid Responses: {finbert_metrics['invalid_responses']}/{num_sentences}\n")
print("Classification Report:\n")
print(finbert_metrics['classification_report'])

### Evaluate LLM Model

In [None]:
ollama_metrics = evaluate_model(ollama_model, sentences, labels, num_sentences)

In [None]:
print(f"{ollama_metrics['model_name'].upper()} EVALUATION SUMMARY")
print("="*50)
print(f"Accuracy: {ollama_metrics['accuracy']:.4f}\n")
print(f"Invalid Responses: {ollama_metrics['invalid_responses']}/{num_sentences}\n")
print("Classification Report:\n")
print(ollama_metrics['classification_report'])

### Retrieve Live News

In [None]:
def get_stock_news(ticker, max_articles=5):
    """
    Get recent news articles for a stock ticker
    
    Args:
        ticker (str): Stock ticker symbol (e.g., 'MSFT', 'AAPL')
        max_articles (int): Maximum number of articles to return
    
    Returns:
        list: List of dictionaries containing article data
    """

    # Get high-level news summary and article links from Yahoo! Finance
    stock = yf.Ticker(ticker)
    news = stock.news
    
    articles = []

    for i, item in enumerate(news[:max_articles]):
        try:
            url = item['content']['canonicalUrl']['url']
            
            # Follow article link to retreive full article content 
            article = Article(url)
            article.download()
            article.parse()
            
            article_data = {
                'id': item['content']['id'],
                'pub_date': item['content']['pubDate'],
                'url': url,
                'title': item['content']['title'],
                'summary': item['content'].get('summary', ''),
                'full_text': article.text
            }
            
            articles.append(article_data)
            
        except Exception as e:
            print(f"Could not parse article {i+1}: {e}")
            continue
    
    return articles

In [None]:
stock_articles = get_stock_news(live_news_ticker, num_live_articles)

### Classify Sentiment on Live News

In [None]:
for article in stock_articles:

    # Combined article title and body for maximum information
    title_body = article['title'] + "\n" + article['full_text']

    # Run Transformer model to classify news
    finbert_sent = id_to_label[finbert_model.predict(title_body)]

    # Run LLM model to classify news
    ollama_sent = id_to_label[ollama_model.predict(title_body)]

    print(f"{finbert_model.name}: {finbert_sent}")
    print(f"{ollama_model.name}: {ollama_sent}")
    print(f"Title: {article['title']}")
    print(f"Body: {article['full_text'][:250].replace('\n', ' ')}...")
    print("-" * 50)