# Financial AI Fine-tuning with LLaMA 3.2

## Project Ideas & Implementation Guide

This notebook demonstrates how to fine-tune LLaMA 3.2 for financial applications using QLoRA. We'll explore multiple project ideas and implement a complete financial sentiment analysis system.

### Top Financial AI Project Ideas

1. **Financial Sentiment Analyzer** - Predict market sentiment from news
2. **Personal Finance Assistant** - Answer investment and budgeting questions  
3. **Stock Analysis Chatbot** - Explain financial reports and metrics
4. **Trading Strategy Explainer** - Teach trading concepts in simple terms
5. **Economic Research Assistant** - Summarize complex economic papers

Let's start with **Financial Sentiment Analysis** - perfect for beginners!

In [None]:
# Install required packages
! pip install -q yfinance pandas numpy matplotlib seaborn scikit-learn
! pip install -q transformers datasets accelerate peft bitsandbytes
! pip install -q huggingface_hub trl

In [None]:
# Import Financial Data Libraries
import pandas as pd
import numpy as np
import yfinance as yf
import requests
import json
from datetime import datetime, timedelta
import matplotlib.pyplot as plt
import seaborn as sns

# NLP and ML Libraries
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, classification_report
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig, TrainingArguments
from datasets import Dataset
from peft import LoraConfig, get_peft_model, TaskType
from trl import SFTTrainer

# Import our config
from config import HUGGINGFACE_TOKEN
from huggingface_hub import login

print("Ready for QLoRA Fine-tuning!")

## Load and Explore Financial Dataset

We'll use multiple data sources:
- **Yahoo Finance** for stock prices
- **Financial news APIs** for sentiment data
- **Sample financial data** for training

In [None]:
# Authentication
print("Logging into HuggingFace...")
login(token=HUGGINGFACE_TOKEN)

# Load sample financial data
print("Loading Stock Data...")

# Get stock data for major companies
tickers = ['AAPL', 'GOOGL', 'MSFT', 'AMZN', 'TSLA']
stock_data = {}

for ticker in tickers:
    try:
        stock = yf.Ticker(ticker)
        hist = stock.history(period="1mo")
        stock_data[ticker] = hist
        print(f"Loaded {ticker}: {len(hist)} days of data")
    except Exception as e:
        print(f"Error loading {ticker}: {e}")

print(f"Successfully loaded data for {len(stock_data)} stocks")

In [None]:
# Create synthetic financial sentiment dataset
print("Creating Financial Sentiment Dataset...")

# Sample financial news with sentiment labels
financial_data = [
    {
        "text": "Apple reports record quarterly revenue of $123 billion, beating analyst expectations",
        "sentiment": "BULLISH",
        "explanation": "Strong earnings beat indicates positive company performance"
    },
    {
        "text": "Tesla faces production challenges as supply chain issues continue to impact deliveries", 
        "sentiment": "BEARISH",
        "explanation": "Production problems suggest potential revenue decline"
    },
    {
        "text": "Microsoft announces new AI partnership, stock price jumps 5% in after-hours trading",
        "sentiment": "BULLISH", 
        "explanation": "AI partnerships indicate future growth potential"
    },
    {
        "text": "Amazon warehouse workers vote to unionize, raising concerns about operational costs",
        "sentiment": "BEARISH",
        "explanation": "Unionization could increase labor costs and reduce profitability"
    },
    {
        "text": "Federal Reserve hints at potential interest rate cuts, market rallies broadly",
        "sentiment": "BULLISH",
        "explanation": "Lower interest rates typically boost stock valuations"
    },
    {
        "text": "Inflation data comes in higher than expected, sparking concerns about economic slowdown",
        "sentiment": "BEARISH", 
        "explanation": "High inflation may lead to tighter monetary policy"
    }
]

# Convert to instruction format for fine-tuning
training_data = []

for item in financial_data:
    instruction = f"Analyze the following financial news and provide sentiment (BULLISH/BEARISH/MIXED) with explanation:\n\n{item['text']}"
    response = f"Sentiment: {item['sentiment']}\nExplanation: {item['explanation']}"
    
    training_data.append({
        "instruction": instruction,
        "response": response,
        "text": f"<|begin_of_text|><|start_header_id|>user<|end_header_id|>\n\n{instruction}<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n{response}<|eot_id|>"
    })

print(f"Created {len(training_data)} training examples")
print("\nSample Training Example:")
print("Instruction:", training_data[0]["instruction"][:100] + "...")
print("Response:", training_data[0]["response"][:100] + "...")

## Setup Model for Fine-tuning

Load LLaMA 3.2 with 4-bit quantization and prepare for QLoRA training.

In [None]:
# Model configuration
model_name = "meta-llama/Llama-3.2-3B-Instruct"
device = "cuda" if torch.cuda.is_available() else "cpu"

print(f"Loading {model_name}...")
print(f"Device: {device}")

# Configure 4-bit quantization
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_compute_dtype=torch.float16,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4"
) if torch.cuda.is_available() else None

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_name)
if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token
    tokenizer.pad_token_id = tokenizer.eos_token_id

# Load model
if torch.cuda.is_available() and bnb_config:
    model = AutoModelForCausalLM.from_pretrained(
        model_name,
        device_map="auto",
        quantization_config=bnb_config,
        torch_dtype=torch.float16,
        trust_remote_code=True
    )
else:
    model = AutoModelForCausalLM.from_pretrained(
        model_name,
        torch_dtype=torch.float16 if torch.cuda.is_available() else torch.float32,
        trust_remote_code=True
    )
    if torch.cuda.is_available():
        model = model.to("cuda")

print("Model loaded successfully!")
if torch.cuda.is_available():
    print(f"GPU Memory: {torch.cuda.memory_allocated() / 1024**3:.2f} GB")

In [None]:
# Configure QLoRA
print("Setting up QLoRA configuration...")

lora_config = LoraConfig(
    r=16,                               # Rank of the low-rank matrices
    lora_alpha=32,                      # Scaling parameter
    target_modules=[
        "q_proj", "k_proj", "v_proj", "o_proj",  # Attention layers
        "gate_proj", "up_proj", "down_proj"       # MLP layers
    ],
    lora_dropout=0.1,                   # Dropout for LoRA layers
    bias="none",                        # No bias terms
    task_type=TaskType.CAUSAL_LM        # Causal language modeling
)

print("LoRA Configuration:")
print(f"  - Rank (r): {lora_config.r}")
print(f"  - Alpha: {lora_config.lora_alpha}")
print(f"  - Target modules: {lora_config.target_modules}")
print(f"  - Dropout: {lora_config.lora_dropout}")

# Apply LoRA to the model
print("\nApplying LoRA adapters...")
model.gradient_checkpointing_enable()
model = get_peft_model(model, lora_config)

# Print trainable parameters
trainable_params = sum(p.numel() for p in model.parameters() if p.requires_grad)
total_params = sum(p.numel() for p in model.parameters())

print(f"Trainable parameters: {trainable_params:,} ({100 * trainable_params / total_params:.2f}%)")
print(f"Total parameters: {total_params:,}")

if torch.cuda.is_available():
    print(f"GPU Memory after LoRA: {torch.cuda.memory_allocated() / 1024**3:.2f} GB")

In [None]:
# Prepare dataset for training
print("Preparing training dataset...")

# Convert to HuggingFace dataset
dataset = Dataset.from_list(training_data)

# Configure training arguments
training_args = TrainingArguments(
    output_dir="./financial-llama-lora",
    per_device_train_batch_size=1,
    gradient_accumulation_steps=4,
    num_train_epochs=3,
    learning_rate=2e-4,
    logging_steps=1,
    save_strategy="epoch",
    evaluation_strategy="no",
    warmup_steps=10,
    lr_scheduler_type="linear",
    optim="adamw_torch",
    fp16=torch.cuda.is_available(),
    remove_unused_columns=False,
    report_to=None  # Disable wandb/tensorboard
)

print("Training Configuration:")
print(f"  - Output directory: {training_args.output_dir}")
print(f"  - Batch size: {training_args.per_device_train_batch_size}")
print(f"  - Gradient accumulation: {training_args.gradient_accumulation_steps}")
print(f"  - Epochs: {training_args.num_train_epochs}")
print(f"  - Learning rate: {training_args.learning_rate}")
print(f"  - FP16: {training_args.fp16}")

# Initialize trainer
print("\nInitializing SFT Trainer...")

trainer = SFTTrainer(
    model=model,
    train_dataset=dataset,
    args=training_args,
    tokenizer=tokenizer,
    dataset_text_field="text",
    max_seq_length=512,
    packing=False
)

print("Starting fine-tuning...")
print("This will take several minutes depending on your GPU...")

# Start training
trainer.train()

print("\nTraining completed!")
print("Model saved to: ./financial-llama-lora")

In [None]:
# Test the fine-tuned model
print("Testing Fine-tuned Model...")

def test_financial_sentiment(text):
    instruction = f"Analyze the following financial news and provide sentiment (BULLISH/BEARISH/MIXED) with explanation:\n\n{text}"
    formatted_prompt = f"<|begin_of_text|><|start_header_id|>user<|end_header_id|>\n\n{instruction}<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n"
    
    inputs = tokenizer.encode(formatted_prompt, return_tensors="pt")
    if torch.cuda.is_available():
        inputs = inputs.to("cuda")
    
    with torch.no_grad():
        outputs = model.generate(
            inputs,
            max_length=inputs.shape[1] + 150,
            num_return_sequences=1,
            temperature=0.7,
            do_sample=True,
            pad_token_id=tokenizer.eos_token_id,
            eos_token_id=tokenizer.eos_token_id
        )
    
    response = tokenizer.decode(outputs[0], skip_special_tokens=True)
    return response.split("assistant<|end_header_id|>")[-1].strip()

# Test cases
test_cases = [
    "Apple announces record iPhone sales for Q4 2024",
    "Major bank reports significant losses due to loan defaults", 
    "New cryptocurrency regulations create market uncertainty",
    "Tech sector shows strong growth in AI investments"
]

print(f"Running {len(test_cases)} test cases...\n")

for i, test_case in enumerate(test_cases):
    print(f"Test Case {i+1}:")
    print(f"News: {test_case}")
    result = test_financial_sentiment(test_case)
    print(f"Analysis: {result}\n")
    print("-" * 80)

## Next Steps & Applications

### Deployment Options
1. **Local Inference** - Use the fine-tuned model locally for analysis
2. **API Service** - Deploy as a REST API using FastAPI or Flask
3. **Streamlit App** - Create an interactive web interface
4. **Integration** - Embed in trading platforms or financial apps

### Expanding the Project
1. **More Data** - Add real financial news datasets
2. **Multi-class Sentiment** - Include confidence scores and sector-specific analysis
3. **Real-time Processing** - Connect to live news feeds
4. **Backtesting** - Test trading strategies based on sentiment signals

### Other Financial AI Projects
1. **Portfolio Advisor** - Personalized investment recommendations
2. **Risk Assessment** - Analyze investment risk profiles
3. **Market Prediction** - Price movement forecasting
4. **Earnings Call Analysis** - Summarize quarterly earnings calls
5. **Regulatory Compliance** - Analyze regulatory filings and impacts

The financial AI space offers endless possibilities for innovation!

In [None]:
# Load Financial Data from Multiple Sources

# 1. Stock Price Data
def get_stock_data(symbol, period="1y"):
    """Download stock data from Yahoo Finance"""
    stock = yf.Ticker(symbol)
    data = stock.history(period=period)
    return data

# 2. Sample Financial News Data (we'll create synthetic data for demo)
def create_sample_financial_data():
    """Create sample financial news with sentiment labels"""
    sample_data = [
        {
            "news": "Apple reports record Q4 earnings, beating expectations by 15%",
            "sentiment": "BULLISH",
            "confidence": "HIGH",
            "market_impact": "Stock likely to rise 35% in next session"
        },
        {
            "news": "Tesla faces production delays due to supply chain issues",
            "sentiment": "BEARISH", 
            "confidence": "MEDIUM",
            "market_impact": "Stock may decline 23% short term"
        },
        {
            "news": "Federal Reserve maintains interest rates, signals stable policy",
            "sentiment": "NEUTRAL",
            "confidence": "HIGH", 
            "market_impact": "Market likely to remain stable"
        },
        {
            "news": "Microsoft announces massive AI investment, partners with OpenAI",
            "sentiment": "BULLISH",
            "confidence": "HIGH",
            "market_impact": "Stock expected to gain 46% on AI optimism"
        },
        {
            "news": "Banking sector faces regulatory pressure over lending practices",
            "sentiment": "BEARISH",
            "confidence": "MEDIUM",
            "market_impact": "Bank stocks may underperform by 12%"
        }
    ]
    return pd.DataFrame(sample_data)

# Load sample data
print("Loading Stock Data...")
stocks = ['AAPL', 'TSLA', 'MSFT', 'GOOGL']
stock_data = {}

for symbol in stocks:
    try:
        stock_data[symbol] = get_stock_data(symbol, "3mo")
        print(f"{symbol}: {len(stock_data[symbol])} days of data")
    except Exception as e:
        print(f"Error loading {symbol}: {e}")

# Load financial news data
print("\nLoading Financial News Data...")
news_df = create_sample_financial_data()
print(f"Loaded {len(news_df)} news samples")
print("\nSample news data:")
print(news_df.head())

## Financial Data Preprocessing

Prepare data for LLaMA finetuning by creating instructionfollowing format.

In [None]:
# Financial Data Preprocessing for LLaMA Finetuning

def create_instruction_format(row):
    """Convert financial news to instructionfollowing format"""
    instruction = "Analyze the sentiment of this financial news and predict market impact:"
    input_text = row['news']
    
    # Create detailed output
    output = f"""SENTIMENT: {row['sentiment']}
CONFIDENCE: {row['confidence']}
ANALYSIS: {row['market_impact']}

REASONING: """
    
    if row['sentiment'] == 'BULLISH':
        output += "This news contains positive indicators that typically drive investor confidence and buying activity."
    elif row['sentiment'] == 'BEARISH':
        output += "This news contains negative factors that may cause investor concern and selling pressure."
    else:
        output += "This news presents neutral information with limited immediate market impact."
    
    return {
        'instruction': instruction,
        'input': input_text,
        'output': output
    }

# Convert to instruction format
print("Converting to Instruction Format...")
instruction_data = []

for _, row in news_df.iterrows():
    formatted = create_instruction_format(row)
    instruction_data.append(formatted)

# Create training dataset
train_df = pd.DataFrame(instruction_data)
print(f"Created {len(train_df)} training examples")

# Show example
print("\nExample Training Format:")
print("INSTRUCTION:", train_df.iloc[0]['instruction'])
print("INPUT:", train_df.iloc[0]['input'])
print("OUTPUT:", train_df.iloc[0]['output'])

## Feature Engineering for Financial Data

Create additional financial features and expand our training dataset.

In [None]:
# Feature Engineering  Create More Financial Training Examples

def generate_financial_examples():
    """Generate more diverse financial training examples"""
    
    # Financial scenarios with different complexities
    scenarios = [
        # Earnings scenarios
        {
            "news": "Company beats earnings by 20% but revenues miss expectations",
            "sentiment": "NEUTRAL",
            "reasoning": "Mixed signals  strong profitability but concerning revenue growth"
        },
        {
            "news": "Dividend increase announced alongside share buyback program", 
            "sentiment": "BULLISH",
            "reasoning": "Strong cash position and commitment to shareholder returns"
        },
        # Market scenarios  
        {
            "news": "Inflation data comes in higher than expected at 6.2%",
            "sentiment": "BEARISH", 
            "reasoning": "High inflation may lead to more aggressive Fed policy"
        },
        {
            "news": "GDP growth accelerates to 3.5% in latest quarter",
            "sentiment": "BULLISH",
            "reasoning": "Strong economic growth supports corporate earnings"
        },
        # Sectorspecific scenarios
        {
            "news": "New breakthrough in quantum computing announced by tech giant",
            "sentiment": "BULLISH",
            "reasoning": "Technological advancement creates competitive advantage"
        },
        {
            "news": "Energy sector faces headwinds from renewable energy transition",
            "sentiment": "BEARISH",
            "reasoning": "Longterm structural challenges for traditional energy"
        }
    ]
    
    expanded_data = []
    for scenario in scenarios:
        formatted = {
            'instruction': "Analyze the sentiment of this financial news and predict market impact:",
            'input': scenario['news'],
            'output': f"SENTIMENT: {scenario['sentiment']}\nREASONING: {scenario['reasoning']}"
        }
        expanded_data.append(formatted)
    
    return expanded_data

# Add more training examples
print("Generating Additional Training Examples...")
additional_examples = generate_financial_examples()
expanded_df = pd.concat([train_df, pd.DataFrame(additional_examples)], ignore_index=True)

print(f"Expanded dataset to {len(expanded_df)} examples")
print(f"Sentiment distribution:")
print(expanded_df['output'].str.contains('BULLISH').sum(), "Bullish examples")
print(expanded_df['output'].str.contains('BEARISH').sum(), "Bearish examples") 
print(expanded_df['output'].str.contains('NEUTRAL').sum(), "Neutral examples")

## Model Selection and Architecture

Now let's set up our LLaMA 3.2 model with QLoRA for financial finetuning.

In [None]:
# Model Selection  Load LLaMA 3.2 with QLoRA Setup
from transformers import BitsAndBytesConfig
from peft import LoraConfig, get_peft_model, TaskType

# Login to HuggingFace
print("Logging into HuggingFace...")
login(token=HUGGINGFACE_TOKEN)

# Configure 4bit quantization
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_compute_dtype=torch.float16,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4"
)

# Load model and tokenizer
print("Loading LLaMA 3.2 Model...")
model_id = "metallama/Llama3.23BInstruct"

try:
    tokenizer = AutoTokenizer.from_pretrained(model_id)
    if tokenizer.pad_token is None:
        tokenizer.pad_token = tokenizer.eos_token
    
    model = AutoModelForCausalLM.from_pretrained(
        model_id,
        device_map="auto",
        quantization_config=bnb_config,
        torch_dtype=torch.float16,
        trust_remote_code=True
    )
    
    print("Model loaded successfully!")
    print(f"GPU Memory: {torch.cuda.memory_allocated() / 1024**3:.2f} GB")
    
except Exception as e:
    print(f"Error loading model: {e}")
    print("Make sure you have access to LLaMA 3.2 model")

## Finetuning Hyperparameters

Configure QLoRA parameters specifically optimized for financial text analysis.

In [None]:
# QLoRA Configuration for Financial Finetuning

# LoRA configuration optimized for financial text
lora_config = LoraConfig(
    r=16,                               # Rank  balance between efficiency and performance
    lora_alpha=32,                      # Scaling parameter
    target_modules=[
        "q_proj", "k_proj", "v_proj", "o_proj",  # Attention layers
        "gate_proj", "up_proj", "down_proj"       # MLP layers  
    ],
    lora_dropout=0.1,                   # Prevent overfitting
    bias="none",
    task_type=TaskType.CAUSAL_LM
)

print(f"LoRA Parameters:")
print(f"    Rank (r): {lora_config.r}")
print(f"    Alpha: {lora_config.lora_alpha}")
print(f"    Dropout: {lora_config.lora_dropout}")
print(f"    Target modules: {len(lora_config.target_modules)} layers")

# Apply LoRA to model
try:
    model.gradient_checkpointing_enable()
    model = get_peft_model(model, lora_config)
    
    # Calculate trainable parameters
    trainable_params = sum(p.numel() for p in model.parameters() if p.requires_grad)
    total_params = sum(p.numel() for p in model.parameters())
    
    print("\nQLoRA Applied Successfully!")
    print(f"Trainable parameters: {trainable_params:,} ({100 * trainable_params / total_params:.2f}%)")
    print(f"Total parameters: {total_params:,}")
    print(f"GPU Memory after LoRA: {torch.cuda.memory_allocated() / 1024**3:.2f} GB")
    
except Exception as e:
    print(f"Error applying LoRA: {e}")

## Model Training and Validation

Set up the training pipeline with proper data formatting and training arguments.

In [None]:
# Prepare Training Data and Setup Training Pipeline

def format_instruction(example):
    """Format examples for instruction tuning"""
    prompt = f"""<|begin_of_text|><|start_header_id|>user<|end_header_id|>

{example['instruction']}

{example['input']}<|eot_id|><|start_header_id|>assistant<|end_header_id|>

{example['output']}<|eot_id|>"""
    return prompt

# Convert DataFrame to Hugging Face Dataset
print("Preparing Training Dataset...")

# Format all examples
formatted_examples = []
for _, row in expanded_df.iterrows():
    formatted_text = format_instruction(row)
    formatted_examples.append({"text": formatted_text})

# Create Dataset
train_dataset = Dataset.from_list(formatted_examples)

# Tokenize dataset
def tokenize_function(examples):
    tokens = tokenizer(examples["text"], truncation=True, padding=True, max_length=512)
    tokens["labels"] = tokens["input_ids"].copy()
    return tokens

print("Tokenizing Dataset...")
tokenized_dataset = train_dataset.map(tokenize_function, batched=True)

print(f"Dataset prepared: {len(tokenized_dataset)} examples")
print(f"Max sequence length: 512 tokens")

# Show tokenized example
print("\nSample Tokenized Input (first 100 chars):")
sample_text = train_dataset[0]["text"]
print(sample_text[:100] + "...")

## Financial Performance Metrics

Define evaluation metrics specific to financial AI applications.

In [None]:
# Financial Performance Evaluation Functions

def evaluate_financial_predictions(predictions, actual_sentiments):
    """Evaluate financial sentiment predictions"""
    
    # Extract sentiments from predictions
    pred_sentiments = []
    for pred in predictions:
        if "BULLISH" in pred.upper():
            pred_sentiments.append("BULLISH")
        elif "BEARISH" in pred.upper():
            pred_sentiments.append("BEARISH") 
        else:
            pred_sentiments.append("NEUTRAL")
    
    # Calculate metrics
    accuracy = accuracy_score(actual_sentiments, pred_sentiments)
    report = classification_report(actual_sentiments, pred_sentiments)
    
    return {
        "accuracy": accuracy,
        "classification_report": report,
        "predictions": pred_sentiments
    }

def test_financial_model(model, tokenizer, test_cases):
    """Test model on financial scenarios"""
    
    print("testing Financial AI Model...")
    results = []
    
    for i, case in enumerate(test_cases):
        # Format prompt
        prompt = f"""<|begin_of_text|><|start_header_id|>user<|end_header_id|>

Analyze the sentiment of this financial news and predict market impact:

{case['news']}<|eot_id|><|start_header_id|>assistant<|end_header_id|>

"""
        
        # Generate prediction
        inputs = tokenizer.encode(prompt, return_tensors="pt").to(model.device)
        
        with torch.no_grad():
            outputs = model.generate(
                inputs,
                max_length=inputs.shape[1] + 150,
                temperature=0.7,
                do_sample=True,
                pad_token_id=tokenizer.eos_token_id,
                eos_token_id=tokenizer.eos_token_id
            )
        
        # Decode response
        response = tokenizer.decode(outputs[0], skip_special_tokens=True)
        generated_text = response[len(prompt):].strip()
        
        results.append({
            "news": case['news'],
            "expected": case.get('expected_sentiment', 'Unknown'),
            "prediction": generated_text,
            "case_number": i + 1
        })
        
        print(f"\nTest Case {i+1}:")
        print(f"News: {case['news']}")
        print(f"Prediction: {generated_text[:100]}...")
    
    return results

# Create test cases
test_cases = [
    {
        "news": "Amazon reports 25% growth in cloud services revenue",
        "expected_sentiment": "BULLISH"
    },
    {
        "news": "Major tech company announces 10,000 layoffs amid economic uncertainty", 
        "expected_sentiment": "BEARISH"
    },
    {
        "news": "Federal Reserve pauses rate hikes, maintains current policy",
        "expected_sentiment": "NEUTRAL"
    }
]

print("Financial evaluation framework ready!")
print(f"{len(test_cases)} test cases prepared")

## Backtesting Strategy

Implement a framework to test how our AI predictions would perform in real trading scenarios.

In [None]:
# Financial Backtesting Framework

class SimpleFinancialBacktester:
    def __init__(self, initial_capital=10000):
        self.initial_capital = initial_capital
        self.capital = initial_capital
        self.positions = {}
        self.trade_history = []
        
    def execute_trade(self, symbol, sentiment, confidence, current_price):
        """Execute trade based on AI sentiment prediction"""
        
        # Trading logic based on sentiment
        if sentiment == "BULLISH" and confidence == "HIGH":
            # Buy signal  allocate 10% of capital
            position_size = self.capital * 0.1
            shares = position_size / current_price
            
            self.positions[symbol] = {
                "shares": shares,
                "entry_price": current_price,
                "entry_date": datetime.now(),
                "sentiment": sentiment
            }
            
            self.capital = position_size
            self.trade_history.append({
                "action": "BUY",
                "symbol": symbol,
                "shares": shares,
                "price": current_price,
                "sentiment": sentiment,
                "capital_remaining": self.capital
            })
            
        elif sentiment == "BEARISH" and symbol in self.positions:
            # Sell signal  close position
            position = self.positions[symbol]
            proceeds = position["shares"] * current_price
            
            profit_loss = proceeds  (position["shares"] * position["entry_price"])
            self.capital += proceeds
            
            self.trade_history.append({
                "action": "SELL", 
                "symbol": symbol,
                "shares": position["shares"],
                "price": current_price,
                "profit_loss": profit_loss,
                "capital_remaining": self.capital
            })
            
            del self.positions[symbol]
    
    def calculate_performance(self):
        """Calculate portfolio performance metrics"""
        
        # Calculate total return
        current_value = self.capital
        for symbol, position in self.positions.items():
            # For demo, assume current price = entry price (no real price data)
            current_value += position["shares"] * position["entry_price"]

        total_return = ( current_value - self.initial_capital ) / self.initial_capital

        # Calculate win rate
        profitable_trades = [t for t in self.trade_history if t.get("profit_loss", 0) > 0]
        total_trades = len([t for t in self.trade_history if "profit_loss" in t])
        win_rate = len(profitable_trades) / total_trades if total_trades > 0 else 0
        
        return {
            "total_return": total_return * 100,  # Convert to percentage
            "final_capital": current_value,
            "win_rate": win_rate * 100,
            "total_trades": total_trades,
            "profitable_trades": len(profitable_trades)
        }

# Demo backtesting
print("Financial Backtesting Demo")
backtester = SimpleFinancialBacktester(initial_capital=10000)

# Simulate some trades based on our AI predictions
demo_predictions = [
    {"symbol": "AAPL", "sentiment": "BULLISH", "confidence": "HIGH", "price": 150.0},
    {"symbol": "TSLA", "sentiment": "BEARISH", "confidence": "MEDIUM", "price": 200.0}, 
    {"symbol": "MSFT", "sentiment": "BULLISH", "confidence": "HIGH", "price": 300.0},
]

print("\nExecuting AIBased Trades...")
for pred in demo_predictions:
    backtester.execute_trade(pred["symbol"], pred["sentiment"], pred["confidence"], pred["price"])

# Show performance
performance = backtester.calculate_performance()
print(f"\nBacktesting Results:")
print(f"Final Capital: ${performance['final_capital']:,.2f}")
print(f"Total Return: {performance['total_return']:.2f}%")
print(f"Win Rate: {performance['win_rate']:.1f}%")
print(f"Total Trades: {performance['total_trades']}")

print("\nBacktesting framework ready for real testing!")

## Risk Analysis and Visualization

Create comprehensive visualizations for portfolio performance and risk analysis.

In [None]:
# Risk Analysis and Visualization

import plotly.graph_objects as go
import plotly.express as px
from plotly.subplots import make_subplots

def create_financial_dashboard(stock_data, predictions_data):
    """Create comprehensive financial AI dashboard"""
    
    # Create subplots
    fig = make_subplots(
        rows=2, cols=2,
        subplot_titles=('Stock Price Trends', 'Sentiment Distribution', 
                       'Prediction Confidence', 'Portfolio Performance'),
        specs=[[{"secondary_y": True}, {"type": "pie"}],
               [{"type": "bar"}, {"secondary_y": True}]]
    )
    
    # 1. Stock Price Trends
    colors = ['#1f77b4', '#ff7f0e', '#2ca02c', '#d62728']
    for i, (symbol, data) in enumerate(stock_data.items()):
        fig.add_trace(
            go.Scatter(
                x=data.index, 
                y=data['Close'],
                name=f'{symbol} Price',
                line=dict(color=colors[i % len(colors)])
            ),
            row=1, col=1
        )
    
    # 2. Sentiment Distribution
    sentiment_counts = {
        'Bullish': 3,
        'Bearish': 2, 
        'Neutral': 1
    }
    
    fig.add_trace(
        go.Pie(
            labels=list(sentiment_counts.keys()),
            values=list(sentiment_counts.values()),
            name="Sentiment",
            marker_colors=['#2ca02c', '#d62728', '#ff7f0e']
        ),
        row=1, col=2
    )
    
    # 3. Prediction Confidence
    confidence_data = ['HIGH', 'MEDIUM', 'HIGH', 'MEDIUM', 'HIGH']
    confidence_counts = {conf: confidence_data.count(conf) for conf in set(confidence_data)}
    
    fig.add_trace(
        go.Bar(
            x=list(confidence_counts.keys()),
            y=list(confidence_counts.values()),
            name="Confidence",
            marker_color=['#2ca02c', '#ff7f0e']
        ),
        row=2, col=1
    )
    
    # 4. Portfolio Performance Simulation
    dates = pd.date_range(start='20240101', periods=30, freq='D')
    portfolio_values = np.cumsum(np.random.normal(0.002, 0.02, 30)) + 1
    portfolio_values = 10000 * portfolio_values
    
    fig.add_trace(
        go.Scatter(
            x=dates,
            y=portfolio_values,
            name='Portfolio Value',
            line=dict(color='#1f77b4', width=3)
        ),
        row=2, col=2
    )
    
    # Update layout
    fig.update_layout(
        title="Financial AI Analytics Dashboard",
        height=800,
        showlegend=True,
        template="plotly_white"
    )
    
    return fig

# Risk Metrics Calculation
def calculate_risk_metrics(returns):
    """Calculate comprehensive risk metrics"""
    
    returns_array = np.array(returns)
    
    # Basic risk metrics
    volatility = np.std(returns_array) * np.sqrt(252)  # Annualized
    sharpe_ratio = np.mean(returns_array) / np.std(returns_array) * np.sqrt(252)
    max_drawdown = np.min(np.cumsum(returns_array))
    
    # Value at Risk (95% confidence)
    var_95 = np.percentile(returns_array, 5)
    
    return {
        "volatility": volatility * 100,
        "sharpe_ratio": sharpe_ratio,
        "max_drawdown": max_drawdown * 100,
        "var_95": var_95 * 100
    }

# Generate sample returns for risk analysis
print("Creating Financial Dashboard...")
sample_returns = np.random.normal(0.001, 0.02, 100)  # Daily returns
risk_metrics = calculate_risk_metrics(sample_returns)

print("Risk Analysis Results:")
print(f"Annualized Volatility: {risk_metrics['volatility']:.2f}%")
print(f"Sharpe Ratio: {risk_metrics['sharpe_ratio']:.2f}")
print(f"Maximum Drawdown: {risk_metrics['max_drawdown']:.2f}%")
print(f"Value at Risk (95%): {risk_metrics['var_95']:.2f}%")

# Create dashboard
if 'stock_data' in locals() and stock_data:
    dashboard = create_financial_dashboard(stock_data, expanded_df)
    print("\nFinancial Dashboard Created!")
    print("Dashboard includes:")
    print("    Stock price trends")
    print("    Sentiment distribution") 
    print("    Prediction confidence levels")
    print("    Portfolio performance simulation")
    
    # Show dashboard (in real Jupyter notebook)
    # dashboard.show()
else:
    print("Dashboard ready  run stock data loading first!")

print("\nFinancial AI Finetuning Pipeline Complete!")
print("\nNext Steps:")
print("1. finetune the model with your training data")
print("2. Test on real financial news")
print("3. Implement live trading strategy")
print("4. Monitor performance and iterate")

## 🎯 Project Implementation Guide

### Quick Start Options:

1. **Beginner**: Start with sentiment analysis on financial news
2. **Intermediate**: Add stock price correlation analysis  
3. **Advanced**: Build a full trading strategy with risk management

### Data Sources to Explore:

 **Financial News**: Alpha Vantage, Polygon.io, NewsAPI
 **Social Sentiment**: Reddit API, Twitter API
 **Market Data**: Yahoo Finance, IEX Cloud
 **Economic Data**: FRED, World Bank APIs

### Model Improvements:

 **Finetune on domainspecific data**: Financial reports, earnings calls
 **Add market context**: Include broader market conditions
 **Multimodal inputs**: Combine text with price charts
 **Realtime inference**: Deploy for live market analysis

### Ready to start your financial AI journey! 🚀