## Build a Stock Sentiment Analysis Program Using FREE LLMs with LangChain and Pydantic (Github Repo: [Link](https://github.com/krittaprot/structured-output-tutorial))

## Table of Contents  
1. Introduction  
2. Setup and Dependencies  
3. Defining the Data Models  
4. Setting Up the Chat Model  
5. Creating the Prompt Template  
6. Processing Chain  
7. Example Analysis  

## Introduction <a name="introduction"></a>
Sentiment analysis is a powerful tool in financial markets, helping investors understand market sentiment towards specific companies. This notebook uses LangChain with Structured Output to analyze news articles and extract sentiment information about mentioned companies.

![Overview Diagram](supplementals/overview_diagram.png)

## Setup and Dependencies <a name="setup"></a>
First, let's ensure we have all necessary dependencies installed.

In [1]:
#Uncomment to install packages
# !pip install langchain-openai langchain pydantic

Now, let's import the required libraries.

In [2]:
from langchain_openai import ChatOpenAI
from langchain.prompts import ChatPromptTemplate
from pydantic import BaseModel, Field, field_validator
from typing import List, Optional
from enum import Enum
from datetime import datetime
import time

## Defining the Data Models <a name="data-models"></a>  

We define Pydantic models to structure our sentiment analysis output.

In [3]:
class SentimentLabel(str, Enum):
    POSITIVE = "positive"
    MIXED = "mixed"
    NEGATIVE = "negative"

class StockSentiment(BaseModel):
    company_name: str = Field(
        ..., 
        description="The name of the company being analyzed, e.g., NVIDIA Corporation (NVDA)."
    )
    justification: str = Field(
        ..., 
        description="Detailed explanation with specific numbers from the article, supporting the sentiment classification."
    )
    sentiment: SentimentLabel = Field(
        ..., 
        description="Sentiment classification based on the content analysis: positive, neutral, negative, or mixed."
    )
    confidence: float = Field(
        ..., 
        description="Confidence level of the sentiment analysis, ranging from 0 to 1."
    )

    @field_validator("company_name")
    def validate_company_name(cls, v):
        if not v.strip():
            raise ValueError("Company name cannot be empty")
        if len(v) > 100:
            raise ValueError("Company name must be ≤ 100 characters")
        return v

    @field_validator("confidence", mode="before")
    def normalize_confidence(cls, v: float) -> float:
        # Convert percentage values to decimal
        if isinstance(v, (int, float)) and v > 1:
            v /= 100
        return round(v, 2)

    @field_validator("justification")
    def validate_justification(cls, v):
        if not v.strip():
            raise ValueError("Justification cannot be empty")
        if "±" in v or "≈" in v:  # Prevent approximations
            raise ValueError("Use exact numbers from article, not approximations")
        return v

class NewsSentiment(BaseModel):
    stocks: List[StockSentiment] = Field(
            ...,
            example=[
                {
                    "company_name": "NVIDIA Corporation (NVDA)", 
                    "sentiment": "positive", 
                    "confidence": 0.95, 
                    "justification": "Q4 revenue increased 15% to $22.1 billion driven by AI chip demand"
                },
                {
                    "company_name": "Tesla, Inc. (TSLA)", 
                    "sentiment": "negative", 
                    "confidence": 0.85, 
                    "justification": "Vehicle deliveries dropped 8.5% to 435,000 units in Q3"
                }
            ]
        )
    timestamp: datetime = Field(
        default_factory=datetime.now,
        description="Timestamp of the analysis in ISO format"
    )

    @field_validator("stocks")
    def validate_stocks(cls, v):
        if not v:
            raise ValueError("Stocks list cannot be empty")
        return v

## Setting Up the Chat Model <a name="chat-model"></a>

We initialize the ChatOpenAI model with specific configurations.

![Models](supplementals\models.png)

- Gemini Reference: https://ai.google.dev/gemini-api/docs/openai  
- LM Studio Reference: https://lmstudio.ai/docs/api/endpoints/openai

Find more models at: https://lmstudio.ai/models or https://ollama.com/library

Current rate limits for gemini 2.0 flash is:
- 10 RPM (requests per minute)
- 4 million TPM
- 1,500 RPD (requests per day)

Ref: [Gemini 2.0 Flash Official API Doc](https://ai.google.dev/gemini-api/docs/models/gemini#gemini-2.0-flash)

In [4]:
from config import GEMINI_API_KEY

# Initialize Chat model, choose the mode, between 'gemini', 'lmstudio' and 'ollama'
mode = 'lmstudio'

if mode == 'lmstudio':
    openai_api_base = "http://localhost:1234/v1"
    openai_api_key = "lm-studio"
    model_name = "bartowski/deepseek-r1-distill-qwen-32b@iq2_s"
    #model_name = "mistral-small-24b-instruct-2501"
    #model_name = "deepseek-r1-distill-llama-8b"
    #bartowski/deepseek-r1-distill-qwen-14b
    #deepseek-r1-redistill-qwen-1.5b-v1.0
    #bartowski/deepseek-r1-distill-qwen-14b
    #llama-3.1-tulu-3-8b
    #selene-1-mini-llama-3.1-8b
    #unsloth/phi-4
elif mode == 'ollama':
    openai_api_base = "http://localhost:11434/v1"
    openai_api_key = "ollama"
    model_name = "deepseek-r1:14b"
elif mode == 'gemini':
    openai_api_base="https://generativelanguage.googleapis.com/v1beta/openai/"
    openai_api_key = GEMINI_API_KEY
    model_name = "gemini-2.0-flash-exp"

model = ChatOpenAI(
    model_name=model_name,
    openai_api_base=openai_api_base,
    openai_api_key=openai_api_key,
    temperature=0 # Set temperature to 0 for deterministic output
)

In [5]:
# Add structured output capability
structured_llm = model.with_structured_output(NewsSentiment)

## Creating the Prompt Template <a name="prompt-template"></a>

We define a detailed system prompt and a user prompt template.

In [6]:
# Create prompt template with detailed system message
system_prompt = """You are a senior financial analyst with expertise in news sentiment analysis. 
When analyzing articles, follow these guidelines:

1. Identify all publicly traded companies mentioned in the text
2. For each company, determine market sentiment based on:
   - Explicit statements about financial performance (include exact figures/percentages)
   - Strategic developments (mergers, partnerships, innovations)
   - Regulatory/legal implications
   - Market reactions (stock movements, analyst ratings)

For each sentiment determination:
- Include SPECIFIC NUMERICAL DATA from the article when available (revenue figures, percentage changes, booking numbers)
- State QUANTIFIED IMPACTS ("9% revenue growth" not just "revenue growth")
- Mention EXACT TIME REFERENCES ("Q4 2023" not just "recently")
- Use PRECISE METRICS from the text ($27.35 billion, 6% stock increase)

Maintain strict requirements:
- Confidence scores must reflect article evidence strength
- Never invent information not explicitly stated
- Use exact company names with ticker symbols
- Prioritize recent information when multiple data points exist"""


user_prompt =   """ 
                    The current date is {current_date}, 
                    analyze the following article and provide sentiment analysis for each publicly traded company 
                    mentioned in the text below:
                    {article}
                """

prompt = ChatPromptTemplate.from_messages([
    ("system", system_prompt),
    ("human", user_prompt)
])

## Processing Chain <a name="processing-chain"></a>

We create a processing chain that combines the prompt and the structured output model.

In [7]:
# Create processing chain
chain = prompt | structured_llm

## Example Analysis <a name="example-analysis"></a>

Let's analyze a sample news article.

In [8]:
import textwrap

# Example usage: load article content from file
with open('content.txt', 'r', encoding='utf-8') as file:
    article = file.read()

wrapped_justification = textwrap.fill(article, width=100)  # Adjust width as needed

# Split into lines and limit to first N rows (e.g., 5)
lines = wrapped_justification.split('\n')
limited_output = '\n'.join(lines[:12])  # Change 5 to however many lines you want

print(f"article:\n{limited_output} (see more in the full article)")

article:
www.investors.com /market-trend/stock-market-today/dow-jones-sp500-nasdaq-nvidia-nvda-stock-5/ Stock
Market Today: Dow Jones Dives On Trump Tariff Order; Apple, Nvidia Break Key Levels As MicroStrategy
Makes This Move (Live Coverage) Investor's Business Daily7-8 minutes 2/1/2025 Page 1 Major indexes
dropped to session lows at the close Friday as looming tariffs took a toll on stocks. Meanwhile,
Apple (AAPL) and Nvidia (NVDA) broke key levels, while MicroStrategy (MSTR) edged lower amid a new
offering on the stock market today.  Late Friday, the White House said President Donald Trump had
ordered the implementation of aggressive tariffs on key trading partners as early as Saturday. The
plan is to impose 25% tariffs on products from Mexico and Canada, and a 10% charge on goods from
China. According to reports, almost 30% of all imported goods in the U.S. come from its trading
neighbors.  The Dow Jones Industrial Average reeled under the news and dropped 337 points for a loss
of 

In [9]:
import textwrap
from loader import Loader  # Import the Loader class

# Update example usage with current date
current_date = datetime.now().strftime("%Y-%m-%d")

# Start the timer
start_time = time.time()

with Loader("Processing article..."):
    # Replace this line with your actual code
    result = chain.invoke({"article": article, "current_date": current_date})
print()

# Calculate the elapsed time
elapsed_time = time.time() - start_time

print(f"Model Used: {model_name}")
print(f"Analysis timestamp: {result.timestamp}")
print(f"Time taken to process the article: {elapsed_time:.2f} seconds")


for stock in result.stocks:
    print("**************************************************")
    print(f"Company: {stock.company_name}")
    print(f"Sentiment: {stock.sentiment.value}")
    print(f"Confidence: {stock.confidence:.0%}")
    wrapped_justification = textwrap.fill(stock.justification, width=100)  # Adjust width as needed
    print(f"Justification: {wrapped_justification}")
    print()

Done!                                                                           

Model Used: bartowski/deepseek-r1-distill-qwen-32b@iq2_s
Analysis timestamp: 2025-02-01 00:00:00+00:00
Time taken to process the article: 182.87 seconds
**************************************************
Company: Apple Inc.
Sentiment: positive
Confidence: 85%
Justification: Apple reported Q1 sales of $124.3 billion, meeting views while EPS of $2.40 beat estimates by 5
cents. Services revenue rose 13.9%, but iPhone sales declined 1%. Apple stock hit an all-time high
in December but pulled back in January.

**************************************************
Company: Nvidia Corporation
Sentiment: negative
Confidence: 85%
Justification: Nvidia dropped nearly 4% as CEO Jensen Huang met President Trump to discuss export curbs and risks
from DeepSeek. The stock fell below its 200-day moving average after Monday's news of cheaper AI
models.

**************************************************
Company: MicroStrateg