# üöÄ Stock Price Prediction with Local Llama 3.1 8B

This notebook uses a **locally-run Llama 3.1 8B model** for stock price prediction with PPO reinforcement learning.

## ‚úÖ Compatible Environments
- ‚úÖ **Google Colab** (Free/Pro)
- ‚úÖ **Kaggle** (GPU enabled)
- ‚úÖ **Local** (with GPU recommended)

## üìã Requirements

### Hardware
- **GPU**: Recommended (T4, P100, or better)
  - Free tier Colab/Kaggle GPUs work!
  - ~6-8 GB VRAM with 4-bit quantization
- **CPU**: Will work but much slower

### Hugging Face Account
- Create account: https://huggingface.co/join
- Get access token: https://huggingface.co/settings/tokens
- Request Llama 3.1 access: https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct

### Data Files
- Training/validation/test JSONL files
- Supervised price labels CSV
- See data setup instructions below

## üéØ What This Notebook Does
1. Loads Llama 3.1 8B locally with 4-bit quantization
2. Performs stock price prediction using LLM
3. Applies PPO reinforcement learning for risk-aware adjustment
4. Evaluates performance on test stocks (AAPL, HSBC, PEP, Tencent, Toyota)

---

# Two-Stage Framework for Stock Price Prediction: LLM-Based Forecasting with Risk-Aware PPO Adjustment

This notebook replicates the methodology from the paper:
**"A Two-Stage Framework for Stock Price Prediction: LLM-Based Forecasting with Risk-Aware PPO Adjustment"**

## Framework Overview:
1. **Stage 1**: LLM-based stock price prediction using historical data, technical indicators, and sentiment analysis
2. **Stage 2**: Risk-aware PPO adjustment incorporating VaR and CVaR to refine predictions

## Dataset:
- Training, validation, and test data from finetune_paper directory
- Stocks: AAPL, HSBC, PEP, 0700.HK (Tencent), 7203.T (Toyota)

## 1. Environment Setup and Dependencies

## Environment Detection (Colab/Kaggle/Local)

In [None]:
# Detect environment
import sys
import os

# Check if running on Colab
IS_COLAB = 'google.colab' in sys.modules

# Check if running on Kaggle
IS_KAGGLE = 'kaggle_secrets' in sys.modules or os.path.exists('/kaggle')

# Local or other environment
IS_LOCAL = not (IS_COLAB or IS_KAGGLE)

print(f"üåç Environment Detection:")
print(f"   - Google Colab: {IS_COLAB}")
print(f"   - Kaggle: {IS_KAGGLE}")
print(f"   - Local: {IS_LOCAL}")

# Setup paths based on environment
if IS_COLAB or IS_KAGGLE:
    # Mount/clone data if needed
    BASE_DIR = "/content" if IS_COLAB else "/kaggle/working"
    DATA_DIR = os.path.join(BASE_DIR, "data")
    MODEL_CACHE_DIR = os.path.join(BASE_DIR, "models")
    
    # Create directories
    os.makedirs(DATA_DIR, exist_ok=True)
    os.makedirs(MODEL_CACHE_DIR, exist_ok=True)
    
    print(f"\nüìÅ Paths configured for cloud environment:")
    print(f"   - Base: {BASE_DIR}")
    print(f"   - Data: {DATA_DIR}")
    print(f"   - Model Cache: {MODEL_CACHE_DIR}")
else:
    # Local environment
    BASE_DIR = os.path.dirname(os.path.abspath("__file__"))
    DATA_DIR = "../finetune_paper"
    MODEL_CACHE_DIR = os.path.join(BASE_DIR, "models")
    os.makedirs(MODEL_CACHE_DIR, exist_ok=True)
    print(f"\nüìÅ Paths configured for local environment")
    print(f"   - Model Cache: {MODEL_CACHE_DIR}")

## Hugging Face Authentication (Required for Llama 3.1)

In [None]:
# Hugging Face Authentication
# Required to download Llama 3.1 8B model

print("üîê Hugging Face Authentication")
print("=" * 80)
print("\nüìù IMPORTANT - Token Permission Requirements:")
print("   ‚ö†Ô∏è  Your token MUST have 'Read access to contents of public gated repos'")
print("   ‚ö†Ô∏è  This is REQUIRED for Llama 3.1 access!")
print("\nüìã Complete Setup Instructions:")
print("   1. Create account: https://huggingface.co/join")
print("   2. Get Llama access first: https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct")
print("      (Click 'Agree and access repository' - approval is instant!)")
print("   3. Create token with CORRECT permissions:")
print("      - Go to: https://huggingface.co/settings/tokens")
print("      - Click 'New token'")
print("      - Name it (e.g., 'colab-llama')")
print("      - ‚úÖ CHECK: 'Read access to contents of public gated repos'")
print("      - Click 'Generate token'")
print("      - Copy the token!")
print("\n" + "=" * 80)

# For Colab/Kaggle: Login interactively
if IS_COLAB or IS_KAGGLE:
    print("\nüîë Login to Hugging Face:")
    try:
        from huggingface_hub import notebook_login
        print("\n‚ö†Ô∏è  BEFORE YOU PASTE YOUR TOKEN:")
        print("   Make sure it has 'Read access to contents of public gated repos' enabled!")
        print("   Otherwise you'll get a 403 Forbidden error.\n")
        notebook_login()
        print("\n‚úÖ Logged in successfully!")
        print("   If you get 403 errors, create a NEW token with correct permissions.")
    except Exception as e:
        print(f"‚ö†Ô∏è  Login failed: {e}")
        print("\nüîß Alternative: Set HF_TOKEN in Kaggle Secrets or Colab Secrets")
        print("   Make sure token has 'Read access to contents of public gated repos'")
        
        # Try environment variable
        hf_token = os.getenv('HF_TOKEN')
        if hf_token:
            from huggingface_hub import login
            login(token=hf_token)
            print("‚úÖ Authenticated with HF_TOKEN")
        else:
            print("‚ùå No HF_TOKEN found")

# For local: Check if already logged in
else:
    try:
        from huggingface_hub import HfFolder
        token = HfFolder.get_token()
        if token:
            print("‚úÖ Already authenticated with Hugging Face")
        else:
            print("‚ö†Ô∏è  Not logged in. Run: huggingface-cli login")
            print("   Make sure your token has 'Read access to contents of public gated repos'")
    except Exception as e:
        print(f"‚ö†Ô∏è  Could not check login status: {e}")
        print("   Run: huggingface-cli login")
        print("   Make sure your token has 'Read access to contents of public gated repos'")

In [None]:
# üîç Verify HuggingFace Setup (Run this to check if everything is correct)
print("üîç Verifying HuggingFace Setup...")
print("=" * 80)

try:
    from huggingface_hub import HfApi, whoami
    
    # Check if logged in
    api = HfApi()
    user_info = whoami()
    
    print(f"‚úÖ Logged in as: {user_info['name']}")
    print(f"   Account type: {user_info.get('type', 'user')}")
    
    # Try to access the model
    print("\nüì• Checking access to Llama 3.1 8B...")
    try:
        model_info = api.model_info("meta-llama/Meta-Llama-3.1-8B-Instruct")
        print("‚úÖ You have access to Llama 3.1 8B Instruct!")
        print(f"   Model: {model_info.id}")
        print(f"   Downloads: {model_info.downloads:,}")
        print("\nüéâ Everything looks good! You can proceed to load the model.")
        
    except Exception as e:
        error_msg = str(e)
        print(f"‚ùå Cannot access Llama 3.1 8B")
        print(f"   Error: {error_msg}")
        
        if "403" in error_msg or "gated" in error_msg.lower():
            print("\nüí° Fix:")
            print("   1. Accept license: https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct")
            print("   2. Create NEW token with 'Read access to contents of public gated repos'")
            print("   3. Re-run authentication cell")
        elif "401" in error_msg:
            print("\nüí° Fix: You're not logged in. Run the authentication cell above.")
        
except Exception as e:
    print(f"‚ùå Error: {e}")
    print("\nüí° Fix: Run the authentication cell above first")

print("=" * 80)

In [None]:
# Install required packages
import sys

# Core dependencies
core_packages = [
    "transformers>=4.40.0",
    "accelerate>=0.28.0", 
    "bitsandbytes>=0.43.0",
    "torch>=2.0.0",
    "pandas",
    "numpy",
    "matplotlib",
    "seaborn",
    "scikit-learn",
    "gymnasium",
    "stable-baselines3",
    "tqdm"
]

# Install packages if on Colab/Kaggle
if 'google.colab' in sys.modules or os.path.exists('/kaggle'):
    print("üì¶ Installing packages for cloud environment...")
    for package in core_packages:
        print(f"   Installing {package}...")
        !pip install -q {package}
    print("‚úÖ All packages installed!")
else:
    print("üí° Running locally - ensure requirements are installed:")
    print("   pip install -r ../requirements.txt")

In [None]:
# Load Local Llama 3.1 8B Model
print("ü§ñ Loading Llama 3.1 8B Instruct model locally...")
print("=" * 80)

from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
import torch

# Model configuration
MODEL_NAME = "meta-llama/Meta-Llama-3.1-8B-Instruct"

# Use quantization to reduce memory usage (important for Colab/Kaggle)
quantization_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_compute_dtype=torch.float16,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4"
)

print(f"üì• Loading model: {MODEL_NAME}")
print(f"üíæ Using 4-bit quantization to save memory")
print(f"üìç Cache directory: {MODEL_CACHE_DIR}")
print("\n‚è≥ This may take a few minutes on first run (downloading ~5GB)...")
print("   Subsequent runs will be faster (cached locally)")

try:
    # Load tokenizer
    tokenizer = AutoTokenizer.from_pretrained(
        MODEL_NAME,
        cache_dir=MODEL_CACHE_DIR,
        trust_remote_code=True
    )
    
    # Load model with quantization
    model = AutoModelForCausalLM.from_pretrained(
        MODEL_NAME,
        quantization_config=quantization_config,
        cache_dir=MODEL_CACHE_DIR,
        device_map="auto",
        trust_remote_code=True,
        torch_dtype=torch.float16
    )
    
    print("\n‚úÖ Model loaded successfully!")
    print(f"   Device: {model.device}")
    print(f"   Memory footprint: ~{model.get_memory_footprint() / 1e9:.2f} GB")
    
except Exception as e:
    print(f"\n‚ùå Error loading model: {e}")
    print("\nüí° Troubleshooting:")
    print("   1. Ensure you have Hugging Face access to Llama 3.1")
    print("   2. Login with: huggingface-cli login")
    print("   3. Accept license at: https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct")
    raise

### ‚ö†Ô∏è Got a 403 Forbidden Error?

**The issue:** Your HuggingFace token doesn't have the right permissions.

**Quick Fix (5 minutes):**

1. **Delete your old token** (it has wrong permissions):
   - Go to: https://huggingface.co/settings/tokens
   - Find your token ‚Üí Click trash icon

2. **Create a NEW token with correct permissions:**
   - Click "New token"
   - Name: `colab-llama-access`
   - **‚úÖ IMPORTANT: Check the box:**
     - ‚úÖ "Read access to contents of public gated repos"
   - Click "Generate token"
   - **Copy the token!**

3. **Make sure you have Llama access:**
   - Visit: https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct
   - Click "Agree and access repository" (instant approval)

4. **Re-run the authentication cell above** with your NEW token

5. **Then re-run the model loading cell**

**Still having issues?** Verify your token at: https://huggingface.co/settings/tokens
- Should show: ‚úÖ "Read access to contents of public gated repos enabled"

### üí° GPU Optimization Tips

**For Colab/Kaggle:**
- Free tier provides T4 GPU (~15GB VRAM)
- 4-bit quantization reduces model to ~6-8GB
- Leaves room for batch processing

**If you run out of memory:**
```python
# Clear GPU cache
import torch
torch.cuda.empty_cache()

# Or restart runtime and skip training data generation
# (only run validation and test inference)
```

**Monitor GPU usage:**
```python
# Check GPU memory
!nvidia-smi
```

In [None]:
# Import libraries
import os
import json
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from datetime import datetime
from typing import Dict, List, Tuple
import warnings
warnings.filterwarnings('ignore')

# Standard library
import time
import pickle

# Deep Learning
import torch
import torch.nn as nn

# Reinforcement Learning
import gymnasium as gym
from gymnasium import spaces
from stable_baselines3 import PPO
from stable_baselines3.common.env_util import make_vec_env

# Progress bar
from tqdm import tqdm

# Set random seeds for reproducibility
np.random.seed(42)
torch.manual_seed(42)

# Check GPU availability
if torch.cuda.is_available():
    print(f"‚úÖ GPU Available: {torch.cuda.get_device_name(0)}")
    print(f"   Memory: {torch.cuda.get_device_properties(0).total_memory / 1e9:.2f} GB")
else:
    print("‚ö†Ô∏è  No GPU detected - using CPU (will be slower)")

print("\n‚úÖ All libraries imported successfully!")

All libraries imported successfully!


## 2. Local LLM Configuration (Llama 3.1 8B)

In [None]:
# Local LLM Configuration
MAX_TOKENS = 1024
TEMPERATURE = 0.0  # Set to 0.1 minimum for sampling (0 not supported)

# Adjust temperature for actual use
ACTUAL_TEMPERATURE = max(0.1, TEMPERATURE)

print(f"‚öôÔ∏è  Local LLM Configuration:")
print(f"   Model: Llama 3.1 8B Instruct (Local)")
print(f"   Max Tokens: {MAX_TOKENS}")
print(f"   Temperature: {ACTUAL_TEMPERATURE}")
print(f"   Quantization: 4-bit (reduces memory usage)")
print(f"\n‚úÖ Ready to generate predictions!")

GROQ API configured successfully!
Model: llama-3.1-8b-instant
Max Tokens: 1024
Temperature: 0.0


## 3. Data Loading and Preprocessing

### üì¶ Data Setup for Cloud Environments

If running on **Google Colab** or **Kaggle**, you need to get the data files first.

**Option 1: Clone from GitHub (Recommended)**
```python
!git clone https://github.com/ajiayi-debug/StockPPOLLMresearch.git
DATA_PATH = 'StockPPOLLMresearch/finetune_paper'
```

**Option 2: Google Drive (Colab)**
```python
from google.colab import drive
drive.mount('/content/drive')
DATA_PATH = '/content/drive/MyDrive/path/to/finetune_paper'
```

**Option 3: Kaggle Dataset**
- Upload as Kaggle dataset
- Add to notebook
- Set `DATA_PATH = '/kaggle/input/your-dataset-name'`

**Option 4: Manual Upload**
- Upload these files to Colab/Kaggle:
  - `train.jsonl`
  - `val.jsonl`
  - `test.jsonl`
  - `all_supervised_price_labels.csv`

In [None]:
# Load datasets
def load_jsonl(filepath):
    """Load JSONL file"""
    data = []
    with open(filepath, 'r') as f:
        for line in f:
            data.append(json.loads(line))
    return data

# Handle data loading based on environment
if IS_COLAB or IS_KAGGLE:
    print("‚òÅÔ∏è  Cloud Environment Detected")
    print("\nüìã Data Setup Instructions:")
    print("   Option 1: Clone the repository")
    print("   !git clone https://github.com/ajiayi-debug/StockPPOLLMresearch.git")
    print("   DATA_PATH = 'StockPPOLLMresearch/finetune_paper'")
    print("\n   Option 2: Upload files manually")
    print("   - Upload train.jsonl, val.jsonl, test.jsonl")
    print("   - Upload all_supervised_price_labels.csv")
    print("   DATA_PATH = '/content/data'  # or '/kaggle/input/your-dataset'")
    print("\n   Option 3: Mount from Google Drive (Colab only)")
    print("   from google.colab import drive")
    print("   drive.mount('/content/drive')")
    print("   DATA_PATH = '/content/drive/MyDrive/StockPPOLLMresearch/finetune_paper'")
    
    # Set default path - adjust as needed
    DATA_PATH = DATA_DIR
    print(f"\nüìÅ Current data path: {DATA_PATH}")
    print("   ‚ö†Ô∏è  Adjust DATA_PATH variable if your data is in a different location")
else:
    # Local path
    DATA_PATH = '../finetune_paper'

# Try to load data
try:
    train_data = load_jsonl(os.path.join(DATA_PATH, 'train.jsonl'))
    val_data = load_jsonl(os.path.join(DATA_PATH, 'val.jsonl'))
    test_data = load_jsonl(os.path.join(DATA_PATH, 'test.jsonl'))
    all_labels = pd.read_csv(os.path.join(DATA_PATH, 'all_supervised_price_labels.csv'))
    
    print(f"\n‚úÖ Data loaded successfully!")
    print(f"   Training samples: {len(train_data)}")
    print(f"   Validation samples: {len(val_data)}")
    print(f"   Test samples: {len(test_data)}")
    print(f"   All labels shape: {all_labels.shape}")
    print(f"   Stocks in dataset: {all_labels['ticker'].unique()}")
except FileNotFoundError as e:
    print(f"\n‚ùå Data files not found: {e}")
    print("   Please follow the data setup instructions above")

Training samples: 8698
Validation samples: 1243
Test samples: 2477

All labels shape: (12418, 16)

Stocks in dataset: ['AAPL' 'HSBC' '0700.HK' 'PEP' '7203.T']


In [6]:
# Display sample data
print("Sample training data:")
print(f"Prompt (first 500 chars): {train_data[0]['prompt'][:500]}...")
print(f"\nResponse: {train_data[0]['response']}")

print("\n" + "="*80 + "\n")
print("Sample supervised labels:")
all_labels.head()

Sample training data:
Prompt (first 500 chars): You are a financial analyst with expertise in stock market forecasting.
Your task is to analyze market data and predict the next trading day stock price.
Use historical price trends, technical indicators, and sentiment analysis to provide an informed forecast.
Ensure that your predictions are well-justified, considering multiple financial factors.

‚Ä¢ Predicted Stock Price: The forecasted close price for the next trading day.
‚Ä¢ Price Movement Likelihood: The likelihood of the predicted stock pric...

Response: {"predicted_close": 27.18000030517578, "likelihood": 0.5, "justification": "n/a"}


Sample supervised labels:


Unnamed: 0,Date,SMA_20,SMA_50,EMA_12,EMA_26,RSI_14,MACD,MACD_signal,MACD_hist,BB_width_20_2,headline_count,sent_compound_mean,titles_joined,next_close,confidence_proxy,ticker
0,2015-01-16 00:00:00+00:00,,,27.159062,27.234398,13.536208,-0.075335,-0.01569,-0.059645,,4.0,-0.07955,,27.18,0.5,AAPL
1,2015-01-16 00:00:00+00:00,,,45.765558,46.231136,4.645025,-0.465578,-0.348537,-0.117041,,6.0,0.308567,Which London business pays the highest busines...,45.360001,0.9,HSBC
2,2015-01-16 00:00:00+00:00,,,113.078837,109.846862,68.406756,3.231975,2.607665,0.624309,,1.0,0.0,,113.388344,0.5,0700.HK
3,2015-01-16 00:00:00+00:00,,,96.059458,95.400737,36.54659,0.658721,0.41146,0.247261,,10.0,0.08298,"Audrey P. ""Pep"" Landry Obituary January 16, 20...",97.510002,0.5,PEP
4,2015-01-19 00:00:00+00:00,,,113.126453,110.109194,70.079261,3.017259,2.689584,0.327675,,1.0,0.3612,WeChat apologizes for showering Chinese users ...,114.402382,0.5,0700.HK


In [7]:
# Parse test data for evaluation
def parse_prompt_data(prompt_text):
    """Extract key information from prompt"""
    lines = prompt_text.split('\n')
    data = {}
    
    for line in lines:
        if 'TICKER:' in line:
            data['ticker'] = line.split('TICKER:')[1].strip()
        elif 'DATE:' in line:
            data['date'] = line.split('DATE:')[1].strip()
        elif 'RECENT CLOSING PRICES' in line:
            prices_line = lines[lines.index(line) + 1]
            if prices_line.strip():
                data['recent_prices'] = [float(p.strip()) for p in prices_line.split(',') if p.strip()]
    
    return data

# Parse test data
test_parsed = []
for item in test_data:
    parsed = parse_prompt_data(item['prompt'])
    response = json.loads(item['response'])
    parsed['predicted_close'] = response['predicted_close']
    parsed['likelihood'] = response['likelihood']
    test_parsed.append(parsed)

test_df = pd.DataFrame(test_parsed)
print(f"Parsed test data shape: {test_df.shape}")
test_df.head()

Parsed test data shape: (2477, 4)


Unnamed: 0,ticker,date,predicted_close,likelihood
0,HSBC,2023-01-03,32.68,0.9
1,0700.HK,2023-01-03,342.870056,0.5
2,PEP,2023-01-03,178.970001,0.9
3,AAPL,2023-01-03,126.360001,0.5
4,7203.T,2023-01-04,1807.5,0.7


## 4. Stage 1: LLM-Based Stock Price Prediction

In [None]:
def llm_predict_stock_price(prompt: str) -> Dict:
    """Use local Llama 3.1 8B model to predict stock price"""
    try:
        # Prepare the chat template
        messages = [
            {"role": "user", "content": prompt}
        ]
        
        # Apply chat template
        formatted_prompt = tokenizer.apply_chat_template(
            messages,
            tokenize=False,
            add_generation_prompt=True
        )
        
        # Tokenize
        inputs = tokenizer(
            formatted_prompt,
            return_tensors="pt",
            truncation=True,
            max_length=4096
        ).to(model.device)
        
        # Generate response
        with torch.no_grad():
            outputs = model.generate(
                **inputs,
                max_new_tokens=MAX_TOKENS,
                temperature=ACTUAL_TEMPERATURE,
                do_sample=ACTUAL_TEMPERATURE > 0,
                pad_token_id=tokenizer.eos_token_id,
                eos_token_id=tokenizer.eos_token_id
            )
        
        # Decode response
        generated_tokens = outputs[0][inputs['input_ids'].shape[1]:]
        response_text = tokenizer.decode(generated_tokens, skip_special_tokens=True)
        
        # Parse JSON response
        if '{' in response_text and '}' in response_text:
            json_start = response_text.index('{')
            json_end = response_text.rindex('}') + 1
            json_str = response_text[json_start:json_end]
            result = json.loads(json_str)
            return result
        else:
            # Try to extract prediction from text
            print(f"Warning: Could not parse JSON from response: {response_text[:200]}")
            return {"predicted_close": None, "likelihood": 0.5, "justification": "Parse error"}
            
    except Exception as e:
        print(f"Error in LLM prediction: {e}")
        return {"predicted_close": None, "likelihood": 0.5, "justification": str(e)}

# Test LLM prediction on a sample
print("üß™ Testing Local LLM with a sample prediction...")
print("="*80)

if 'test_data' in locals() and len(test_data) > 0:
    sample_prompt = test_data[0]['prompt']
    print("Sample prompt (first 300 chars):")
    print(sample_prompt[:300] + "...\n")
    
    print("‚è≥ Generating prediction...")
    start_time = time.time()
    llm_result = llm_predict_stock_price(sample_prompt)
    elapsed = time.time() - start_time
    
    print(f"\n‚úÖ Prediction generated in {elapsed:.2f}s")
    print("\nLLM Prediction Result:")
    print(json.dumps(llm_result, indent=2))
    
    actual_response = json.loads(test_data[0]['response'])
    print(f"\nActual Target Price: {actual_response['predicted_close']}")
    print("\n‚úÖ Local LLM is working! Ready to generate predictions for all data.")
else:
    print("‚ö†Ô∏è  Test data not loaded - skipping test prediction")
    
print("="*80)

üß™ Testing LLM API with a sample prediction...
Sample prompt (first 300 chars):
You are a financial analyst with expertise in stock market forecasting.
Your task is to analyze market data and predict the next trading day stock price.
Use historical price trends, technical indicators, and sentiment analysis to provide an informed forecast.
Ensure that your predictions are well-j...

LLM Prediction Result:
{
  "predicted_close": 31.5,
  "likelihood": 0.65,
  "justification": "The predicted close price of 31.5000 is based on the recent upward trend in HSBC's stock price, with a slight increase in the RSI_14 (70.01903430263613) indicating overbought conditions. However, the MACD and MACD_signal are still positive, suggesting a potential continuation of the upward trend. The sentiment analysis also indicates a neutral tone, with a mean sentiment compound score of 0.072325, which does not strongly influence the prediction."
}

Actual Target Price: 32.68000030517578

‚úÖ LLM API is working!

### ? Local Model Performance

**Using Local Llama 3.1 8B with 4-bit Quantization:**

‚úÖ **Advantages:**
- No API rate limits or token restrictions
- No API costs
- Full privacy - data stays local
- Faster batch processing with GPU
- Works offline

‚öôÔ∏è **Performance:**
- ~2-5 seconds per prediction (depends on GPU)
- Colab/Kaggle free tier: T4 GPU works well
- Total time for all predictions: ~2-6 hours
- Memory usage: ~6-8 GB VRAM (with 4-bit quantization)

üí° **Tips:**
- GPU is highly recommended (CPU will be much slower)
- Use checkpointing to save progress
- Can run continuously without interruptions

### ‚ö†Ô∏è Important: LLM Inference Process

This section will **use the local Llama 3.1 8B model** to generate predictions for all data:

**Data Split:**
- **Training data** (~8,699 samples): Generate LLM predictions for reference
- **Validation data** (~1,598 samples): Generate LLM predictions ‚Üí Used to train PPO agent
- **Test data** (~3,726 samples): Generate LLM predictions ‚Üí Used for final evaluation

**Features:**
- ‚úÖ **Checkpointing**: Progress saved every 50-100 samples
- ‚úÖ **Resume capability**: Simply re-run the cell to continue from the last checkpoint
- ‚úÖ **No API limits**: Run continuously without restrictions
- ‚è∞ **Estimated time**: ~2-6 hours for all data (depends on GPU)

**How it works:**
1. Each cell checks for existing checkpoint and resumes if found
2. Progress is automatically saved at regular intervals
3. If interrupted, simply re-run the cell to continue
4. All predictions are saved locally

**GPU Performance:**
- T4 GPU (Colab/Kaggle): ~2-3 seconds per prediction
- Better GPUs: ~1-2 seconds per prediction
- CPU (not recommended): ~10-30 seconds per prediction

**Checkpoints saved to:**
- `results/llm_predictions_train_checkpoint.json`
- `results/llm_predictions_val_checkpoint.json`
- `results/llm_predictions_checkpoint.json` (test)

**Checkpoint Format (JSON):**
Each checkpoint file contains:
- `predictions`: List of predicted closing prices
- `actual_prices`: List of actual target prices
- `llm_results`: List of full LLM responses including `predicted_close`, `likelihood`, and `justification`
- `last_idx`: Last processed index (for resuming)
- `completed`: Boolean indicating if all samples are processed

**üí° Tip:** You can run each dataset separately. For initial testing, start with just validation and test sets.

In [None]:
# Setup results directory for checkpoints
if IS_COLAB or IS_KAGGLE:
    RESULTS_DIR = os.path.join(BASE_DIR, "results")
else:
    RESULTS_DIR = "../results"

os.makedirs(RESULTS_DIR, exist_ok=True)
print(f"üìÅ Results directory: {RESULTS_DIR}")

### üíæ Transfer Checkpoints Between Environments

If you've already run predictions locally and want to continue on Colab/Kaggle (or vice versa), you can transfer checkpoint files!

**Checkpoint Files:**
- `llm_predictions_train_checkpoint.json` - Training data progress
- `llm_predictions_val_checkpoint.json` - Validation data progress  
- `llm_predictions_checkpoint.json` - Test data progress

**üì§ From Local to Colab:**
1. Find your checkpoint files in `../results/` (local directory)
2. Upload to Google Drive: `/content/drive/MyDrive/StockPPO/results/`
3. In Colab, mount Drive and copy:
```python
from google.colab import drive
drive.mount('/content/drive')

# Copy checkpoint from Drive to Colab results directory
import shutil
shutil.copy('/content/drive/MyDrive/StockPPO/results/llm_predictions_checkpoint.json', 
            f'{RESULTS_DIR}/llm_predictions_checkpoint.json')
```

**üì§ From Local to Kaggle:**
1. Upload checkpoint files directly when creating notebook
2. Or copy from input dataset:
```python
import shutil
shutil.copy('/kaggle/input/checkpoints/llm_predictions_checkpoint.json',
            f'{RESULTS_DIR}/llm_predictions_checkpoint.json')
```

**üì• From Colab/Kaggle to Local:**
- Download checkpoint files from results directory
- Place in your local `../results/` folder
- Re-run the inference cell - it will resume automatically!

**Format:** Checkpoints are JSON files with:
```json
{
  "predictions": [123.45, 234.56, ...],
  "actual_prices": [125.00, 230.00, ...],
  "llm_results": [{...}, {...}, ...],
  "last_idx": 199,
  "completed": false
}
```

The notebook will automatically detect and resume from the `last_idx`!

In [None]:
# üîÑ Checkpoint Transfer Helper
# Run this cell to upload/download checkpoint files

import os
import shutil
from pathlib import Path

print("üíæ Checkpoint Transfer Helper")
print("=" * 80)

# List existing checkpoints
print("\nüìã Current Checkpoints:")
if os.path.exists(RESULTS_DIR):
    checkpoints = [f for f in os.listdir(RESULTS_DIR) if f.endswith('_checkpoint.json')]
    if checkpoints:
        for cp in checkpoints:
            cp_path = os.path.join(RESULTS_DIR, cp)
            size = os.path.getsize(cp_path) / 1024  # KB
            
            # Load and check progress
            import json
            with open(cp_path, 'r') as f:
                data = json.load(f)
            
            status = "‚úÖ Complete" if data.get('completed', False) else f"‚è≥ In Progress ({data.get('last_idx', 0) + 1} samples)"
            print(f"   ‚Ä¢ {cp} ({size:.1f} KB) - {status}")
    else:
        print("   No checkpoints found in this environment")
else:
    print("   Results directory doesn't exist yet")

# Environment-specific instructions
print("\n" + "=" * 80)

if IS_COLAB:
    print("\nüì§ GOOGLE COLAB - Upload Checkpoint from Local:")
    print("   1. Option A - Upload directly:")
    print("      from google.colab import files")
    print("      uploaded = files.upload()  # Select your .json checkpoint file")
    print("      # Move to results directory")
    print("      for filename in uploaded.keys():")
    print(f"          shutil.move(filename, '{RESULTS_DIR}/' + filename)")
    print()
    print("   2. Option B - From Google Drive:")
    print("      from google.colab import drive")
    print("      drive.mount('/content/drive')")
    print("      # Then copy:")
    print("      shutil.copy('/content/drive/MyDrive/path/to/checkpoint.json',")
    print(f"                  '{RESULTS_DIR}/llm_predictions_checkpoint.json')")
    print()
    print("üì• Download Checkpoint to Local:")
    print("   from google.colab import files")
    print(f"   files.download('{RESULTS_DIR}/llm_predictions_checkpoint.json')")

elif IS_KAGGLE:
    print("\nüì§ KAGGLE - Upload Checkpoint from Local:")
    print("   1. Create a dataset with your checkpoint files")
    print("   2. Add dataset to this notebook")
    print("   3. Copy to working directory:")
    print("      shutil.copy('/kaggle/input/your-dataset/llm_predictions_checkpoint.json',")
    print(f"                  '{RESULTS_DIR}/llm_predictions_checkpoint.json')")
    print()
    print("üì• Download Checkpoint:")
    print("   Checkpoints in /kaggle/working/ persist after notebook completion")
    print("   Download from Output section after run finishes")

else:
    print("\nüì§ LOCAL - Upload Checkpoint to Colab/Kaggle:")
    print("   1. Find checkpoint files in: ../results/")
    print("   2. For Colab: Upload via file upload or Google Drive")
    print("   3. For Kaggle: Create dataset or upload when creating notebook")
    print()
    print("üì• Download from Colab/Kaggle to Local:")
    print("   1. Download checkpoint files from cloud environment")
    print("   2. Place in: ../results/")
    print("   3. Re-run inference cell - it will resume automatically!")

print("\n" + "=" * 80)
print("üí° Tip: Checkpoints save every 50 samples, so you can resume anytime!")
print("=" * 80)

In [None]:
# üöÄ QUICK ACTION: Upload Your Local Checkpoint (Colab Only)
# Run this cell to directly upload your checkpoint file from local computer

if IS_COLAB:
    print("üì§ Upload your local checkpoint file...")
    print("=" * 80)
    print("\n1. Click 'Choose Files' below")
    print("2. Select your checkpoint .json file from ../results/ on your local machine")
    print("3. File will be moved to the correct location automatically")
    print()
    
    from google.colab import files
    import shutil
    
    uploaded = files.upload()
    
    if uploaded:
        print("\n‚úÖ Files uploaded:")
        for filename in uploaded.keys():
            # Move to results directory
            dest_path = os.path.join(RESULTS_DIR, filename)
            
            # Check if it's a checkpoint file
            if filename.endswith('_checkpoint.json'):
                shutil.move(filename, dest_path)
                print(f"   ‚úÖ {filename} ‚Üí {RESULTS_DIR}/")
                
                # Show checkpoint info
                import json
                with open(dest_path, 'r') as f:
                    data = json.load(f)
                print(f"      Progress: {data.get('last_idx', 0) + 1} samples completed")
                print(f"      Will resume from index: {data.get('last_idx', 0) + 1}")
            else:
                print(f"   ‚ö†Ô∏è  {filename} - Not a checkpoint file (should end with _checkpoint.json)")
        
        print("\nüéØ Now you can run the inference cell - it will resume automatically!")
    else:
        print("\n‚ùå No files uploaded")
        
elif IS_KAGGLE:
    print("üì¶ For Kaggle:")
    print("   1. Add your checkpoint as an input dataset")
    print("   2. Use the code above to copy from /kaggle/input/")
    
else:
    print("üíª Running locally - checkpoints already in ../results/")
    print("   Just run the inference cell to resume!")

print("=" * 80)

### ‚òÅÔ∏è Google Drive Auto-Backup (Colab Only - RECOMMENDED!)

**Why:** Colab's `/content/` is temporary. Session ends = files lost!

**Solution:** Auto-save checkpoints to Google Drive every time they update.

Set this up BEFORE running inference to never lose progress!

In [None]:
# üîê Setup Google Drive Auto-Backup (Run this FIRST if on Colab!)
print("‚òÅÔ∏è  Google Drive Auto-Backup Setup")
print("=" * 80)

if IS_COLAB:
    print("\nüìÇ Setting up automatic backup to Google Drive...")
    print("   This will save checkpoints to Drive automatically!")
    print()
    
    # Mount Google Drive
    from google.colab import drive
    drive.mount('/content/drive')
    
    # Create backup directory
    DRIVE_BACKUP_DIR = '/content/drive/MyDrive/StockPPO_Checkpoints'
    os.makedirs(DRIVE_BACKUP_DIR, exist_ok=True)
    
    print(f"\n‚úÖ Google Drive mounted successfully!")
    print(f"‚úÖ Backup directory created: {DRIVE_BACKUP_DIR}")
    print()
    print("üíæ Auto-backup enabled:")
    print("   ‚úì Checkpoints will save to /content/results/ (fast)")
    print("   ‚úì AND automatically copy to Google Drive (safe)")
    print("   ‚úì Your progress is protected even if Colab disconnects!")
    print()
    
    # Check for existing backups
    existing_backups = [f for f in os.listdir(DRIVE_BACKUP_DIR) if f.endswith('.json')]
    if existing_backups:
        print(f"üìã Found {len(existing_backups)} existing backup(s) in Drive:")
        for backup in existing_backups:
            backup_path = os.path.join(DRIVE_BACKUP_DIR, backup)
            size = os.path.getsize(backup_path) / 1024
            print(f"   ‚Ä¢ {backup} ({size:.1f} KB)")
        print()
        print("üí° To resume from a Drive backup, run:")
        print("   import shutil")
        print(f"   shutil.copy('{DRIVE_BACKUP_DIR}/[filename].json',")
        print(f"               '{RESULTS_DIR}/[filename].json')")
    
    # Set global flag
    ENABLE_DRIVE_BACKUP = True
    print("\nüéØ Setup complete! Run inference cells normally - backups are automatic.")
    
elif IS_KAGGLE:
    print("üì¶ Kaggle detected:")
    print("   Kaggle notebooks auto-save outputs to /kaggle/working/")
    print("   Your checkpoints persist after notebook completion!")
    print("   No additional backup needed.")
    DRIVE_BACKUP_DIR = None
    ENABLE_DRIVE_BACKUP = False
    
else:
    print("üíª Local environment detected:")
    print("   Checkpoints save to ../results/ (permanent)")
    print("   No backup needed.")
    DRIVE_BACKUP_DIR = None
    ENABLE_DRIVE_BACKUP = False

print("=" * 80)

In [None]:
# Helper Function: Auto-backup to Drive
def backup_to_drive(checkpoint_file, backup_name=None):
    """
    Automatically backup checkpoint to Google Drive (Colab only)
    Called after each checkpoint save
    """
    if not ENABLE_DRIVE_BACKUP:
        return
    
    try:
        import shutil
        
        # Determine backup filename
        if backup_name is None:
            backup_name = os.path.basename(checkpoint_file)
        
        backup_path = os.path.join(DRIVE_BACKUP_DIR, backup_name)
        
        # Copy to Drive
        shutil.copy(checkpoint_file, backup_path)
        
        # Optional: Keep timestamped backups every 500 samples
        # (Uncomment if you want historical versions)
        # from datetime import datetime
        # timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
        # timestamped_path = os.path.join(DRIVE_BACKUP_DIR, f"{backup_name}.{timestamp}")
        # shutil.copy(checkpoint_file, timestamped_path)
        
    except Exception as e:
        print(f"‚ö†Ô∏è  Drive backup failed (non-critical): {e}")

print("‚úÖ Backup function defined")
print("   Will be called automatically after each checkpoint save")

In [None]:
# üìä Check Google Drive Backup Status
print("üìä Google Drive Backup Status")
print("=" * 80)

if ENABLE_DRIVE_BACKUP and DRIVE_BACKUP_DIR:
    if os.path.exists(DRIVE_BACKUP_DIR):
        backups = [f for f in os.listdir(DRIVE_BACKUP_DIR) if f.endswith('.json')]
        
        if backups:
            print(f"\n‚úÖ Found {len(backups)} backup file(s) in Google Drive:")
            print(f"   Location: {DRIVE_BACKUP_DIR}\n")
            
            for backup in sorted(backups):
                backup_path = os.path.join(DRIVE_BACKUP_DIR, backup)
                size = os.path.getsize(backup_path) / 1024  # KB
                
                # Load and show progress
                try:
                    with open(backup_path, 'r') as f:
                        data = json.load(f)
                    
                    samples = data.get('last_idx', 0) + 1
                    completed = "‚úÖ Complete" if data.get('completed', False) else f"‚è≥ {samples} samples"
                    
                    print(f"   üìÑ {backup}")
                    print(f"      Size: {size:.1f} KB | Status: {completed}")
                except:
                    print(f"   üìÑ {backup} ({size:.1f} KB)")
                print()
            
            print("üí° Your progress is safely backed up to Google Drive!")
            print("   Even if Colab disconnects, you can resume from these backups.")
        else:
            print("\nüìÇ Backup directory exists but no backups yet")
            print("   Backups will appear here after first checkpoint save")
    else:
        print("\n‚ö†Ô∏è  Backup directory not found")
        print("   Run the auto-backup setup cell first!")
else:
    print("\nüíª Drive backup not enabled")
    if IS_COLAB:
        print("   Run the 'üîê Setup Google Drive Auto-Backup' cell to enable it!")
    else:
        print("   (Only needed for Colab - you're on Local/Kaggle)")

print("=" * 80)

### 4.1 Run LLM Inference on Training Data

We'll generate LLM predictions for the training dataset to use for PPO training later.

In [None]:
# Run LLM predictions on TRAINING data with checkpointing
checkpoint_file_train = os.path.join(RESULTS_DIR, 'llm_predictions_train_checkpoint.json')

# Load existing checkpoint if available
if os.path.exists(checkpoint_file_train):
    print(f"Loading existing training checkpoint from {checkpoint_file_train}")
    with open(checkpoint_file_train, 'r') as f:
        checkpoint = json.load(f)
    train_llm_predictions = checkpoint['predictions']
    train_actual_prices = checkpoint['actual_prices']
    train_llm_results = checkpoint.get('llm_results', [])  # Full LLM responses
    start_idx = checkpoint['last_idx'] + 1
    print(f"Resuming from index {start_idx}/{len(train_data)}")
else:
    train_llm_predictions = []
    train_actual_prices = []
    train_llm_results = []
    start_idx = 0
    print("Starting fresh LLM predictions on training data...")

# Run LLM predictions
print(f"\nüîÑ Generating LLM predictions for {len(train_data)} TRAINING samples...")
print("‚è∞ This will take considerable time. You can stop and resume later.")

for idx in tqdm(range(start_idx, len(train_data)), desc="Training LLM Inference"):
    item = train_data[idx]
    
    try:
        # Get LLM prediction
        llm_result = llm_predict_stock_price(item['prompt'])
        
        # Store full LLM result (including justification)
        train_llm_results.append(llm_result)
        
        if llm_result['predicted_close'] is not None:
            train_llm_predictions.append(llm_result['predicted_close'])
        else:
            response = json.loads(item['response'])
            train_llm_predictions.append(response['predicted_close'])
        
        response = json.loads(item['response'])
        train_actual_prices.append(response['predicted_close'])
        
        # Small delay to prevent overheating (optional)
        time.sleep(0.1)

        # Checkpoint every 50 samples
        if (idx + 1) % 50 == 0:
            checkpoint = {
                'predictions': train_llm_predictions,
                'actual_prices': train_actual_prices,
                'llm_results': train_llm_results,  # Full LLM responses with justification
                'last_idx': idx,
                'completed': False
            }
            with open(checkpoint_file_train, 'w') as f:
                json.dump(checkpoint, f)
            print(f"\nüíæ Checkpoint saved at index {idx + 1}/{len(train_data)}")
            
            # Auto-backup to Google Drive (if enabled)
            backup_to_drive(checkpoint_file_train)
    
    except Exception as e:
        print(f"\n‚ùå Error at index {idx}: {e}")
        # Save checkpoint on error
        checkpoint = {
            'predictions': train_llm_predictions,
            'actual_prices': train_actual_prices,
            'llm_results': train_llm_results,
            'last_idx': idx - 1 if idx > 0 else 0,
            'completed': False
        }
        with open(checkpoint_file_train, 'w') as f:
            json.dump(checkpoint, f)
        backup_to_drive(checkpoint_file_train)  # Backup on error too!
        print(f"üíæ Emergency checkpoint saved. Re-run this cell to continue.")
        break

# Save final checkpoint
if idx == len(train_data) - 1:
    checkpoint = {
        'predictions': train_llm_predictions,
        'actual_prices': train_actual_prices,
        'llm_results': train_llm_results,
        'last_idx': idx,
        'completed': True
    }
    with open(checkpoint_file_train, 'w') as f:
        json.dump(checkpoint, f)
    backup_to_drive(checkpoint_file_train)  # Final backup
    print(f"\n‚úÖ Training LLM predictions completed!")
    print(f"   Total predictions: {len(train_llm_predictions)}")
    print(f"   Checkpoint saved to: {checkpoint_file_train}")

Starting fresh LLM predictions on training data...

üîÑ Generating LLM predictions for 8698 TRAINING samples...
‚è∞ This will take considerable time. You can stop and resume later.


Training LLM Inference:   0%|          | 15/8698 [02:34<22:54:52,  9.50s/it]

### 4.2 Run LLM Inference on Validation Data

Generate predictions for validation data (used for PPO training).

In [None]:
# Run LLM predictions on VALIDATION data with checkpointing
checkpoint_file_val = os.path.join(RESULTS_DIR, 'llm_predictions_val_checkpoint.json')

if os.path.exists(checkpoint_file_val):
    print(f"Loading existing validation checkpoint from {checkpoint_file_val}")
    with open(checkpoint_file_val, 'r') as f:
        checkpoint = json.load(f)
    val_llm_predictions = checkpoint['predictions']
    val_actual_prices = checkpoint['actual_prices']
    val_llm_results = checkpoint.get('llm_results', [])
    start_idx = checkpoint['last_idx'] + 1
    print(f"Resuming from index {start_idx}/{len(val_data)}")
else:
    val_llm_predictions = []
    val_actual_prices = []
    val_llm_results = []
    start_idx = 0
    print("Starting fresh LLM predictions on validation data...")

print(f"\nüîÑ Generating LLM predictions for {len(val_data)} VALIDATION samples...")

for idx in tqdm(range(start_idx, len(val_data)), desc="Validation LLM Inference"):
    item = val_data[idx]
    
    try:
        llm_result = llm_predict_stock_price(item['prompt'])
        
        # Store full LLM result
        val_llm_results.append(llm_result)
        
        if llm_result['predicted_close'] is not None:
            val_llm_predictions.append(llm_result['predicted_close'])
        else:
            response = json.loads(item['response'])
            val_llm_predictions.append(response['predicted_close'])
        
        response = json.loads(item['response'])
        val_actual_prices.append(response['predicted_close'])
        
        # time.sleep(0.5)
        
        if (idx + 1) % 50 == 0:
            checkpoint = {
                'predictions': val_llm_predictions,
                'actual_prices': val_actual_prices,
                'llm_results': val_llm_results,
                'last_idx': idx
            }
            with open(checkpoint_file_val, 'w') as f:
                json.dump(checkpoint, f, indent=2)
            backup_to_drive(checkpoint_file_val)  # Auto-backup
    
    except Exception as e:
        error_msg = str(e)
        
        if 'rate_limit' in error_msg.lower() or 'too many requests' in error_msg.lower():
            print(f"\n‚ùå RATE LIMIT HIT at index {idx}!")
            print(f"Saving checkpoint and stopping execution...")
            checkpoint = {
                'predictions': val_llm_predictions,
                'actual_prices': val_actual_prices,
                'llm_results': val_llm_results,
                'last_idx': idx - 1
            }
            with open(checkpoint_file_val, 'w') as f:
                json.dump(checkpoint, f, indent=2)
            backup_to_drive(checkpoint_file_val)  # Backup on error
            print(f"‚úÖ Checkpoint saved to: {checkpoint_file_val}")
            print(f"üìä Progress: {idx}/{len(val_data)} samples completed")
            print(f"üí° Run this cell again to resume from where you left off.")
            break  # Stop execution
        else:
            print(f"\n‚ö†Ô∏è Error at index {idx}: {error_msg}")
            error_result = {"predicted_close": None, "likelihood": 0.5, "justification": f"Error: {error_msg}"}
            val_llm_results.append(error_result)
            response = json.loads(item['response'])
            val_llm_predictions.append(response['predicted_close'])
            val_actual_prices.append(response['predicted_close'])

checkpoint = {
    'predictions': val_llm_predictions,
    'actual_prices': val_actual_prices,
    'llm_results': val_llm_results,
    'last_idx': len(val_llm_predictions) - 1,
    'completed': len(val_llm_predictions) == len(val_data)
}
with open(checkpoint_file_val, 'w') as f:
    json.dump(checkpoint, f, indent=2)
backup_to_drive(checkpoint_file_val)  # Final backup

if len(val_llm_predictions) == len(val_data):
    print(f"\n‚úÖ Validation LLM predictions completed: {len(val_llm_predictions)} samples")
else:
    print(f"\n‚ö†Ô∏è Partial completion: {len(val_llm_predictions)}/{len(val_data)} samples")
print(f"Checkpoint saved to: {checkpoint_file_val}")

### 4.3 Run LLM Inference on Test Data

Generate predictions for test data (used for final evaluation).

In [None]:
# Run LLM predictions on test data with checkpointing
import time

# Checkpoint file to save progress
checkpoint_file = os.path.join(RESULTS_DIR, 'llm_predictions_checkpoint.json')

# Load existing checkpoint if available
if os.path.exists(checkpoint_file):
    print(f"Loading existing checkpoint from {checkpoint_file}")
    with open(checkpoint_file, 'r') as f:
        checkpoint = json.load(f)
    llm_predictions = checkpoint['predictions']
    actual_prices = checkpoint['actual_prices']
    llm_results = checkpoint.get('llm_results', [])
    start_idx = checkpoint['last_idx'] + 1
    print(f"Resuming from index {start_idx}/{len(test_data)}")
else:
    llm_predictions = []
    actual_prices = []
    llm_results = []
    start_idx = 0
    print("Starting fresh LLM predictions...")

# Run LLM predictions with rate limiting and checkpointing
print(f"\nGenerating LLM predictions for {len(test_data)} samples...")
print("This may take a while due to API rate limits...")

for idx in tqdm(range(start_idx, len(test_data)), desc="LLM Inference"):
    item = test_data[idx]
    
    try:
        # Get LLM prediction
        llm_result = llm_predict_stock_price(item['prompt'])
        
        # Store full LLM result
        llm_results.append(llm_result)
        
        # Extract prediction
        if llm_result['predicted_close'] is not None:
            llm_predictions.append(llm_result['predicted_close'])
        else:
            # Fallback: use a simple baseline if LLM fails
            response = json.loads(item['response'])
            llm_predictions.append(response['predicted_close'])
        
        # Get actual price from response
        response = json.loads(item['response'])
        actual_prices.append(response['predicted_close'])
        
        # Small delay to avoid rate limiting (adjust based on your API limits)
        #time.sleep(0.5)

        # Checkpoint every 50 samples
        if (idx + 1) % 50 == 0:
            checkpoint = {
                'predictions': llm_predictions,
                'actual_prices': actual_prices,
                'llm_results': llm_results,
                'last_idx': idx
            }
            os.makedirs('../results', exist_ok=True)
            with open(checkpoint_file, 'w') as f:
                json.dump(checkpoint, f, indent=2)
            backup_to_drive(checkpoint_file)  # Auto-backup
            print(f"\nCheckpoint saved at index {idx + 1}")
    
    except Exception as e:
        error_msg = str(e)
        
        # Handle rate limiting
        if 'rate_limit' in error_msg.lower() or 'too many requests' in error_msg.lower():
            print(f"\n‚ùå RATE LIMIT HIT at index {idx}!")
            print(f"Saving checkpoint and stopping execution...")
            
            # Save checkpoint
            checkpoint = {
                'predictions': llm_predictions,
                'actual_prices': actual_prices,
                'llm_results': llm_results,
                'last_idx': idx - 1
            }
            os.makedirs('../results', exist_ok=True)
            with open(checkpoint_file, 'w') as f:
                json.dump(checkpoint, f, indent=2)
            backup_to_drive(checkpoint_file)  # Backup on error
            
            print(f"‚úÖ Checkpoint saved to: {checkpoint_file}")
            print(f"üìä Progress: {idx}/{len(test_data)} samples completed")
            print(f"üí° Run this cell again to resume from where you left off.")
            break  # Stop execution
        else:
            print(f"\n‚ö†Ô∏è Error at index {idx}: {error_msg}")
            # Store error result
            error_result = {"predicted_close": None, "likelihood": 0.5, "justification": f"Error: {error_msg}"}
            llm_results.append(error_result)
            # Use fallback
            response = json.loads(item['response'])
            llm_predictions.append(response['predicted_close'])
            actual_prices.append(response['predicted_close'])

# Final save
checkpoint = {
    'predictions': llm_predictions,
    'actual_prices': actual_prices,
    'llm_results': llm_results,
    'last_idx': len(llm_predictions) - 1,
    'completed': len(llm_predictions) == len(test_data)
}
with open(checkpoint_file, 'w') as f:
    json.dump(checkpoint, f, indent=2)
backup_to_drive(checkpoint_file)  # Final backup

# Merge with test_df
test_df['llm_prediction'] = llm_predictions
test_df['actual_price'] = actual_prices

if len(llm_predictions) == len(test_data):
    print(f"\n‚úÖ LLM predictions completed: {len(llm_predictions)} samples")
else:
    print(f"\n‚ö†Ô∏è Partial completion: {len(llm_predictions)}/{len(test_data)} samples")
print(f"Checkpoint saved to: {checkpoint_file}")
print("\nSample predictions:")
print(test_df[['ticker', 'llm_prediction', 'actual_price']].head())

## 5. Stage 2: Risk-Aware PPO Environment Setup

In [None]:
# Financial Risk Metrics
def calculate_var(returns: np.ndarray, confidence_level: float = 0.95) -> float:
    """Calculate Value at Risk (VaR)"""
    if len(returns) == 0:
        return 0.0
    return np.percentile(returns, (1 - confidence_level) * 100)

def calculate_cvar(returns: np.ndarray, confidence_level: float = 0.95) -> float:
    """Calculate Conditional Value at Risk (CVaR) - Expected Shortfall"""
    if len(returns) == 0:
        return 0.0
    var = calculate_var(returns, confidence_level)
    # CVaR is the average of losses beyond VaR
    tail_losses = returns[returns <= var]
    if len(tail_losses) == 0:
        return var
    return np.mean(tail_losses)

def calculate_volatility(prices: np.ndarray) -> float:
    """Calculate price volatility (standard deviation of returns)"""
    if len(prices) < 2:
        return 0.0
    returns = np.diff(prices) / prices[:-1]
    return np.std(returns)

print("Risk metrics functions defined.")

In [None]:
# Custom Gym Environment for Stock Price Prediction with PPO
class StockPredictionEnv(gym.Env):
    """Custom Environment for Risk-Aware Stock Price Prediction"""
    
    def __init__(self, data_df: pd.DataFrame, window_size: int = 5):
        super(StockPredictionEnv, self).__init__()
        
        self.data = data_df.copy()
        self.window_size = window_size
        self.current_step = 0
        self.max_steps = len(self.data)
        
        # State: [llm_prediction, historical_prices (window), volatility, var]
        state_dim = 1 + window_size + 2  # llm_pred + window + vol + var
        
        # Action space: adjustment factor (continuous)
        self.action_space = spaces.Box(
            low=-0.1, high=0.1, shape=(1,), dtype=np.float32
        )
        
        # Observation space
        self.observation_space = spaces.Box(
            low=-np.inf, high=np.inf, shape=(state_dim,), dtype=np.float32
        )
        
        # Risk parameters
        self.lambda_risk = 0.5  # Risk penalty weight
        self.confidence_level = 0.95
        
    def reset(self, seed=None, options=None):
        super().reset(seed=seed)
        self.current_step = self.window_size
        return self._get_observation(), {}
    
    def _get_observation(self):
        """Construct state representation"""
        idx = self.current_step
        
        # LLM prediction
        llm_pred = self.data.iloc[idx]['llm_prediction']
        
        # Historical prices (window)
        if 'recent_prices' in self.data.columns and self.data.iloc[idx]['recent_prices'] is not None:
            hist_prices = self.data.iloc[idx]['recent_prices'][:self.window_size]
            # Pad if necessary
            if len(hist_prices) < self.window_size:
                hist_prices = hist_prices + [hist_prices[-1]] * (self.window_size - len(hist_prices))
            hist_prices = np.array(hist_prices[-self.window_size:])
        else:
            hist_prices = np.array([llm_pred] * self.window_size)
        
        # Volatility
        volatility = calculate_volatility(hist_prices)
        
        # VaR (using historical returns)
        returns = np.diff(hist_prices) / hist_prices[:-1] if len(hist_prices) > 1 else np.array([0.0])
        var = calculate_var(returns, self.confidence_level)
        
        # Combine state
        state = np.concatenate([
            [llm_pred],
            hist_prices,
            [volatility, var]
        ]).astype(np.float32)
        
        return state
    
    def step(self, action):
        """Execute one step"""
        idx = self.current_step
        
        # Get LLM prediction and actual price
        llm_pred = self.data.iloc[idx]['llm_prediction']
        actual_price = self.data.iloc[idx]['actual_price']
        
        # Apply action (adjustment)
        adjustment = action[0]
        adjusted_pred = llm_pred * (1 + adjustment)
        
        # Calculate prediction error
        pred_error = abs(adjusted_pred - actual_price)
        
        # Calculate risk penalty (using CVaR)
        if 'recent_prices' in self.data.columns and self.data.iloc[idx]['recent_prices'] is not None:
            hist_prices = np.array(self.data.iloc[idx]['recent_prices'][-self.window_size:])
            returns = np.diff(hist_prices) / hist_prices[:-1] if len(hist_prices) > 1 else np.array([0.0])
            cvar = abs(calculate_cvar(returns, self.confidence_level))
        else:
            cvar = 0.0
        
        # Reward function: -|error| - lambda * CVaR
        reward = -(pred_error / actual_price) - self.lambda_risk * cvar
        
        # Move to next step
        self.current_step += 1
        terminated = self.current_step >= self.max_steps
        truncated = False
        
        # Next observation
        if not terminated:
            next_state = self._get_observation()
        else:
            next_state = self._get_observation()  # Return final state
        
        return next_state, reward, terminated, truncated, {}

print("Stock Prediction Environment defined.")

## 6. PPO Training on Training Data

Train the PPO agent on the training set to learn risk-aware adjustments to LLM predictions.

In [None]:
# Prepare training data for PPO using training set with LLM predictions
train_parsed = []
for idx, item in enumerate(train_data):
    parsed = parse_prompt_data(item['prompt'])
    
    # Use LLM prediction we generated
    if idx < len(train_llm_predictions):
        parsed['llm_prediction'] = train_llm_predictions[idx]
        parsed['actual_price'] = train_actual_prices[idx]
    else:
        # Fallback if somehow we don't have LLM prediction
        response = json.loads(item['response'])
        parsed['llm_prediction'] = response['predicted_close']
        parsed['actual_price'] = response['predicted_close']
    
    response = json.loads(item['response'])
    parsed['likelihood'] = response.get('likelihood', 0.5)
    train_parsed.append(parsed)

train_df_ppo = pd.DataFrame(train_parsed)
print(f"Training data prepared for PPO training: {len(train_df_ppo)} samples")
print(f"With LLM predictions: {sum(train_df_ppo['llm_prediction'].notna())} samples")
train_df_ppo.head()

In [None]:
# Create and train PPO model
print("Creating PPO training environment...")

# Create environment using TRAINING data (more samples = better RL learning)
env = StockPredictionEnv(train_df_ppo, window_size=5)

# Initialize PPO agent
print("\nInitializing PPO agent...")
model = PPO(
    "MlpPolicy",
    env,
    learning_rate=3e-4,
    n_steps=2048,
    batch_size=64,
    n_epochs=10,
    gamma=0.99,
    clip_range=0.2,
    ent_coef=0.01,
    verbose=1
)

# Train PPO model on training data
print("\nTraining PPO model on training data...")
print(f"Training samples: {len(train_df_ppo)}")
print("This may take several minutes...")

# Adjust total_timesteps based on training data size
# Using more timesteps for larger training set
total_timesteps = min(200000, len(train_df_ppo) * 20)
print(f"Total timesteps: {total_timesteps}")

model.learn(total_timesteps=total_timesteps)

print("\n‚úÖ PPO training completed!")

### 6.1 (Optional) Validate PPO on Validation Set

Before applying to test data, optionally evaluate PPO performance on validation data.

In [None]:
# Optional: Prepare and evaluate on validation data
val_parsed = []
for idx, item in enumerate(val_data):
    parsed = parse_prompt_data(item['prompt'])
    
    # Use LLM prediction we generated
    if idx < len(val_llm_predictions):
        parsed['llm_prediction'] = val_llm_predictions[idx]
        parsed['actual_price'] = val_actual_prices[idx]
    else:
        # Fallback if somehow we don't have LLM prediction
        response = json.loads(item['response'])
        parsed['llm_prediction'] = response['predicted_close']
        parsed['actual_price'] = response['predicted_close']
    
    response = json.loads(item['response'])
    parsed['likelihood'] = response.get('likelihood', 0.5)
    val_parsed.append(parsed)

val_df = pd.DataFrame(val_parsed)

# Apply PPO to validation set
val_env = StockPredictionEnv(val_df, window_size=5)
val_obs, _ = val_env.reset()

val_ppo_predictions = []
for idx in range(len(val_df)):
    if idx < val_env.window_size:
        val_ppo_predictions.append(val_df.iloc[idx]['llm_prediction'])
        continue
    
    action, _ = model.predict(val_obs, deterministic=True)
    llm_pred = val_df.iloc[idx]['llm_prediction']
    adjusted_pred = llm_pred * (1 + action[0])
    val_ppo_predictions.append(adjusted_pred)
    
    if idx < len(val_df) - 1:
        val_obs, _, terminated, _, _ = val_env.step(action)
        if terminated:
            break

val_df['ppo_adjusted_prediction'] = val_ppo_predictions

# Quick validation metrics
val_llm_mae = np.mean(np.abs(val_df['llm_prediction'] - val_df['actual_price']))
val_ppo_mae = np.mean(np.abs(val_df['ppo_adjusted_prediction'] - val_df['actual_price']))

print(f"\nValidation Set Results:")
print(f"LLM MAE: {val_llm_mae:.4f}")
print(f"LLM-PPO MAE: {val_ppo_mae:.4f}")
print(f"Improvement: {((val_llm_mae - val_ppo_mae) / val_llm_mae * 100):.2f}%")
print("\n‚úÖ Validation complete! Proceeding to test set...")

## 7. Apply PPO Adjustments to Test Set

In [None]:
# Apply PPO adjustments to test predictions
def apply_ppo_adjustment(model, test_df):
    """Apply trained PPO model to adjust predictions"""
    adjusted_predictions = []
    
    env = StockPredictionEnv(test_df, window_size=5)
    obs, _ = env.reset()
    
    for idx in range(len(test_df)):
        if idx < env.window_size:
            # For early samples, use LLM prediction as-is
            adjusted_predictions.append(test_df.iloc[idx]['llm_prediction'])
            continue
        
        # Get PPO action
        action, _ = model.predict(obs, deterministic=True)
        
        # Apply adjustment
        llm_pred = test_df.iloc[idx]['llm_prediction']
        adjusted_pred = llm_pred * (1 + action[0])
        adjusted_predictions.append(adjusted_pred)
        
        # Step environment
        if idx < len(test_df) - 1:
            obs, _, terminated, _, _ = env.step(action)
            if terminated:
                break
    
    return adjusted_predictions

print("Applying PPO adjustments to test set...")
test_df['ppo_adjusted_prediction'] = apply_ppo_adjustment(model, test_df)
print("PPO adjustments applied!")

# Display results
test_df[['ticker', 'llm_prediction', 'ppo_adjusted_prediction', 'actual_price']].head(10)

## 8. Baseline Models Implementation (COMMENTED OUT - Only using LLM and LLM-PPO)

<!-- Baseline models (SVR, XGBoost, LSTM) are commented out to focus on LLM and LLM-PPO comparison -->

In [None]:
# # Prepare features from all_labels for baseline models
# # Filter for test period (last 30% of data)
# all_labels['Date'] = pd.to_datetime(all_labels['Date'])
# all_labels = all_labels.sort_values('Date')

# # Create feature set
# feature_cols = ['SMA_20', 'SMA_50', 'EMA_12', 'EMA_26', 'RSI_14', 
#                 'MACD', 'MACD_signal', 'MACD_hist', 'BB_width_20_2',
#                 'headline_count', 'sent_compound_mean']

# # Fill NaN values
# all_labels[feature_cols] = all_labels[feature_cols].fillna(0)

# # Split by date (70% train, 30% test)
# train_size = int(len(all_labels) * 0.7)
# train_labels = all_labels.iloc[:train_size]
# test_labels = all_labels.iloc[train_size:]

# X_train = train_labels[feature_cols].values
# y_train = train_labels['next_close'].values
# X_test = test_labels[feature_cols].values
# y_test = test_labels['next_close'].values

# # Standardize features
# scaler = StandardScaler()
# X_train_scaled = scaler.fit_transform(X_train)
# X_test_scaled = scaler.transform(X_test)

# print(f"Training set: {X_train.shape}")
# print(f"Test set: {X_test.shape}")

print("Baseline models commented out - only using LLM and LLM-PPO")

In [None]:
# # Train SVR model
# print("Training SVR model...")
# svr_model = SVR(kernel='rbf', C=100, gamma=0.1, epsilon=0.1)
# svr_model.fit(X_train_scaled, y_train)
# svr_predictions = svr_model.predict(X_test_scaled)
# print("SVR training completed!")

print("SVR model commented out")

In [None]:
# # Train XGBoost model
# print("Training XGBoost model...")
# xgb_model = XGBRegressor(
#     n_estimators=100,
#     learning_rate=0.1,
#     max_depth=5,
#     random_state=42
# )
# xgb_model.fit(X_train_scaled, y_train)
# xgb_predictions = xgb_model.predict(X_test_scaled)
# print("XGBoost training completed!")

print("XGBoost model commented out")

In [None]:
# # Build LSTM model
# print("Building and training LSTM model...")

# # Reshape data for LSTM (samples, timesteps, features)
# X_train_lstm = X_train_scaled.reshape((X_train_scaled.shape[0], 1, X_train_scaled.shape[1]))
# X_test_lstm = X_test_scaled.reshape((X_test_scaled.shape[0], 1, X_test_scaled.shape[1]))

# # Build LSTM model
# lstm_model = Sequential([
#     LSTM(50, activation='relu', input_shape=(1, X_train_scaled.shape[1])),
#     Dense(25, activation='relu'),
#     Dense(1)
# ])

# lstm_model.compile(optimizer='adam', loss='mse')

# # Train LSTM
# history = lstm_model.fit(
#     X_train_lstm, 
#     y_train,
#     epochs=50,
#     batch_size=32,
#     validation_split=0.1,
#     verbose=0
# )

# lstm_predictions = lstm_model.predict(X_test_lstm).flatten()
# print("LSTM training completed!")

print("LSTM model commented out")

## 9. Evaluation Metrics Implementation

In [None]:
# Evaluation metric functions
def calculate_mape(y_true, y_pred):
    """Calculate Mean Absolute Percentage Error"""
    y_true, y_pred = np.array(y_true), np.array(y_pred)
    mask = y_true != 0
    return np.mean(np.abs((y_true[mask] - y_pred[mask]) / y_true[mask])) * 100

def calculate_rmse(y_true, y_pred):
    """Calculate Root Mean Square Error"""
    return np.sqrt(mean_squared_error(y_true, y_pred))

def calculate_returns(prices):
    """Calculate returns from prices"""
    prices = np.array(prices)
    return np.diff(prices) / prices[:-1]

def calculate_sharpe_ratio(returns, risk_free_rate=0.0):
    """Calculate Sharpe Ratio"""
    excess_returns = returns - risk_free_rate
    if np.std(returns) == 0:
        return 0.0
    return np.mean(excess_returns) / np.std(returns)

def calculate_sortino_ratio(returns, risk_free_rate=0.0):
    """Calculate Sortino Ratio"""
    excess_returns = returns - risk_free_rate
    downside_returns = returns[returns < 0]
    if len(downside_returns) == 0 or np.std(downside_returns) == 0:
        return 0.0
    return np.mean(excess_returns) / np.std(downside_returns)

def calculate_max_drawdown(prices):
    """Calculate Maximum Drawdown"""
    prices = np.array(prices)
    cummax = np.maximum.accumulate(prices)
    drawdowns = (prices - cummax) / cummax
    return np.min(drawdowns)

def calculate_cumulative_return(prices):
    """Calculate Cumulative Return"""
    prices = np.array(prices)
    return (prices[-1] - prices[0]) / prices[0]

print("Evaluation metrics defined.")

In [None]:
# Evaluate all models by ticker
def evaluate_model_by_ticker(predictions, actual_prices, test_labels):
    """Evaluate model performance for each ticker"""
    results = {}
    
    for ticker in test_labels['ticker'].unique():
        ticker_mask = test_labels['ticker'] == ticker
        ticker_pred = predictions[ticker_mask]
        ticker_actual = actual_prices[ticker_mask]
        
        # Calculate metrics
        mape = calculate_mape(ticker_actual, ticker_pred)
        rmse = calculate_rmse(ticker_actual, ticker_pred)
        
        # Returns-based metrics
        returns = calculate_returns(ticker_pred)
        sharpe = calculate_sharpe_ratio(returns)
        sortino = calculate_sortino_ratio(returns)
        max_dd = calculate_max_drawdown(ticker_pred)
        cum_return = calculate_cumulative_return(ticker_pred)
        
        results[ticker] = {
            'MAPE': mape,
            'RMSE': rmse,
            'Sharpe Ratio': sharpe,
            'Sortino Ratio': sortino,
            'Max Drawdown': max_dd,
            'Cumulative Return': cum_return
        }
    
    return results

print("Model evaluation function defined.")

## 10. Results Comparison and Analysis (LLM vs LLM-PPO)

In [None]:
# Compile all model predictions
models_results = {}

# # Baseline models (COMMENTED OUT)
# models_results['SVR'] = evaluate_model_by_ticker(svr_predictions, y_test, test_labels)
# models_results['XGBoost'] = evaluate_model_by_ticker(xgb_predictions, y_test, test_labels)
# models_results['LSTM'] = evaluate_model_by_ticker(lstm_predictions, y_test, test_labels)

# For LLM and LLM-PPO, we need to evaluate from test_df
# Evaluate LLM predictions
if 'llm_prediction' in test_df.columns:
    llm_predictions = test_df['llm_prediction'].values
    actual_prices = test_df['actual_price'].values
    models_results['LLM'] = evaluate_model_by_ticker(llm_predictions, actual_prices, test_df)

# Evaluate LLM-PPO predictions
if 'ppo_adjusted_prediction' in test_df.columns:
    ppo_predictions = test_df['ppo_adjusted_prediction'].values
    actual_prices = test_df['actual_price'].values
    models_results['LLM-PPO'] = evaluate_model_by_ticker(ppo_predictions, actual_prices, test_df)

print("Model evaluation completed!")
print(f"\nNumber of models evaluated: {len(models_results)}")
print(f"Models: {list(models_results.keys())}")

In [None]:
# Create comparison table
def create_comparison_table(models_results):
    """Create a comprehensive comparison table"""
    comparison_data = []
    
    for model_name, ticker_results in models_results.items():
        for ticker, metrics in ticker_results.items():
            row = {
                'Model': model_name,
                'Ticker': ticker,
                **metrics
            }
            comparison_data.append(row)
    
    return pd.DataFrame(comparison_data)

comparison_df = create_comparison_table(models_results)
print("\nModel Comparison Results:")
comparison_df

In [None]:
# Calculate average metrics across all tickers
avg_metrics = comparison_df.groupby('Model')[['MAPE', 'RMSE', 'Sharpe Ratio', 
                                                'Sortino Ratio', 'Max Drawdown', 
                                                'Cumulative Return']].mean()

print("\nAverage Performance Across All Tickers:")
avg_metrics.round(4)

## 11. Visualizations

In [None]:
# Plot MAPE comparison
plt.figure(figsize=(14, 6))

plt.subplot(1, 2, 1)
comparison_df_pivot = comparison_df.pivot(index='Ticker', columns='Model', values='MAPE')
comparison_df_pivot.plot(kind='bar', ax=plt.gca())
plt.title('MAPE Comparison by Ticker', fontsize=14, fontweight='bold')
plt.xlabel('Ticker')
plt.ylabel('MAPE (%)')
plt.legend(title='Model', bbox_to_anchor=(1.05, 1), loc='upper left')
plt.xticks(rotation=45)
plt.grid(axis='y', alpha=0.3)

plt.subplot(1, 2, 2)
avg_metrics['MAPE'].plot(kind='bar', color='steelblue')
plt.title('Average MAPE Across All Tickers', fontsize=14, fontweight='bold')
plt.xlabel('Model')
plt.ylabel('MAPE (%)')
plt.xticks(rotation=45)
plt.grid(axis='y', alpha=0.3)

plt.tight_layout()
plt.show()

In [None]:
# Plot RMSE comparison
plt.figure(figsize=(14, 6))

plt.subplot(1, 2, 1)
comparison_df_pivot = comparison_df.pivot(index='Ticker', columns='Model', values='RMSE')
comparison_df_pivot.plot(kind='bar', ax=plt.gca())
plt.title('RMSE Comparison by Ticker', fontsize=14, fontweight='bold')
plt.xlabel('Ticker')
plt.ylabel('RMSE')
plt.legend(title='Model', bbox_to_anchor=(1.05, 1), loc='upper left')
plt.xticks(rotation=45)
plt.grid(axis='y', alpha=0.3)

plt.subplot(1, 2, 2)
avg_metrics['RMSE'].plot(kind='bar', color='coral')
plt.title('Average RMSE Across All Tickers', fontsize=14, fontweight='bold')
plt.xlabel('Model')
plt.ylabel('RMSE')
plt.xticks(rotation=45)
plt.grid(axis='y', alpha=0.3)

plt.tight_layout()
plt.show()

In [None]:
# Plot risk-adjusted metrics
fig, axes = plt.subplots(2, 2, figsize=(16, 12))

# Sharpe Ratio
avg_metrics['Sharpe Ratio'].plot(kind='bar', ax=axes[0, 0], color='green', alpha=0.7)
axes[0, 0].set_title('Sharpe Ratio (Higher is Better)', fontsize=12, fontweight='bold')
axes[0, 0].set_xlabel('Model')
axes[0, 0].set_ylabel('Sharpe Ratio')
axes[0, 0].tick_params(axis='x', rotation=45)
axes[0, 0].grid(axis='y', alpha=0.3)

# Sortino Ratio
avg_metrics['Sortino Ratio'].plot(kind='bar', ax=axes[0, 1], color='blue', alpha=0.7)
axes[0, 1].set_title('Sortino Ratio (Higher is Better)', fontsize=12, fontweight='bold')
axes[0, 1].set_xlabel('Model')
axes[0, 1].set_ylabel('Sortino Ratio')
axes[0, 1].tick_params(axis='x', rotation=45)
axes[0, 1].grid(axis='y', alpha=0.3)

# Maximum Drawdown
avg_metrics['Max Drawdown'].plot(kind='bar', ax=axes[1, 0], color='red', alpha=0.7)
axes[1, 0].set_title('Maximum Drawdown (Closer to 0 is Better)', fontsize=12, fontweight='bold')
axes[1, 0].set_xlabel('Model')
axes[1, 0].set_ylabel('Max Drawdown')
axes[1, 0].tick_params(axis='x', rotation=45)
axes[1, 0].grid(axis='y', alpha=0.3)

# Cumulative Return
avg_metrics['Cumulative Return'].plot(kind='bar', ax=axes[1, 1], color='purple', alpha=0.7)
axes[1, 1].set_title('Cumulative Return (Higher is Better)', fontsize=12, fontweight='bold')
axes[1, 1].set_xlabel('Model')
axes[1, 1].set_ylabel('Cumulative Return')
axes[1, 1].tick_params(axis='x', rotation=45)
axes[1, 1].grid(axis='y', alpha=0.3)

plt.suptitle('Risk-Adjusted Performance Metrics', fontsize=16, fontweight='bold', y=1.00)
plt.tight_layout()
plt.show()

In [None]:
# Sample prediction visualization (if test_df available)
if 'ticker' in test_df.columns:
    # Select one ticker for detailed visualization
    sample_ticker = test_df['ticker'].iloc[0]
    ticker_data = test_df[test_df['ticker'] == sample_ticker].head(50)
    
    plt.figure(figsize=(15, 6))
    
    x = range(len(ticker_data))
    plt.plot(x, ticker_data['actual_price'].values, 'ko-', label='Actual Price', linewidth=2, markersize=6)
    plt.plot(x, ticker_data['llm_prediction'].values, 'bs--', label='LLM Prediction', linewidth=1.5, markersize=5, alpha=0.7)
    
    if 'ppo_adjusted_prediction' in ticker_data.columns:
        plt.plot(x, ticker_data['ppo_adjusted_prediction'].values, 'r^--', label='LLM-PPO Prediction', linewidth=1.5, markersize=5, alpha=0.7)
    
    plt.title(f'Stock Price Predictions for {sample_ticker} (First 50 Test Samples)', fontsize=14, fontweight='bold')
    plt.xlabel('Sample Index')
    plt.ylabel('Stock Price')
    plt.legend(loc='best')
    plt.grid(True, alpha=0.3)
    plt.tight_layout()
    plt.show()

## 12. Key Findings and Summary

In [None]:
# Summary statistics
print("="*80)
print("SUMMARY OF RESULTS")
print("="*80)

print("\n1. PREDICTION ACCURACY (Lower is Better)")
print("-" * 80)
accuracy_summary = avg_metrics[['MAPE', 'RMSE']].round(4)
print(accuracy_summary)

print("\n2. RISK-ADJUSTED RETURNS (Higher is Better for Ratios)")
print("-" * 80)
risk_summary = avg_metrics[['Sharpe Ratio', 'Sortino Ratio']].round(4)
print(risk_summary)

print("\n3. RISK METRICS")
print("-" * 80)
drawdown_summary = avg_metrics[['Max Drawdown', 'Cumulative Return']].round(4)
print(drawdown_summary)

print("\n" + "="*80)
print("CONCLUSION")
print("="*80)
print("""
The two-stage LLM-PPO framework aims to:
1. Generate initial predictions using LLM with historical data and sentiment
2. Refine predictions using PPO with risk-aware adjustments (VaR, CVaR)

Key Benefits:
- Incorporates both market data and qualitative information (news sentiment)
- Balances prediction accuracy with financial risk management
- Provides more stable predictions compared to pure ML/DL approaches
- Better risk-adjusted returns through CVaR-based reward function

The framework demonstrates the potential of combining LLMs with reinforcement
learning for robust financial forecasting in uncertain market environments.
""")

## 13. Save Results

In [None]:
# Save comparison results
output_dir = '../results'
os.makedirs(output_dir, exist_ok=True)

# Save comparison table
comparison_df.to_csv(f'{output_dir}/model_comparison_results.csv', index=False)
print(f"Comparison results saved to {output_dir}/model_comparison_results.csv")

# Save average metrics
avg_metrics.to_csv(f'{output_dir}/average_metrics.csv')
print(f"Average metrics saved to {output_dir}/average_metrics.csv")

# Save PPO model
model.save(f'{output_dir}/ppo_stock_prediction_model')
print(f"PPO model saved to {output_dir}/ppo_stock_prediction_model")

# Save test predictions
if 'ppo_adjusted_prediction' in test_df.columns:
    test_df.to_csv(f'{output_dir}/test_predictions.csv', index=False)
    print(f"Test predictions saved to {output_dir}/test_predictions.csv")

print("\nAll results saved successfully!")

## 14. Next Steps and Extensions

### Potential Improvements:
1. **Fine-tune LLM**: Fine-tune the Llama model on financial data for better domain-specific predictions
2. **Enhanced PPO**: Experiment with different reward functions and hyperparameters
3. **More Baselines**: Implement TCN (Temporal Convolutional Network) for comparison
4. **Real-time Prediction**: Adapt the framework for real-time stock prediction
5. **Portfolio Optimization**: Extend to multi-stock portfolio management
6. **Risk Metrics**: Incorporate additional risk metrics (CVaR at different confidence levels)
7. **Ensemble Methods**: Combine multiple models for more robust predictions
8. **Market Regime Detection**: Adapt strategy based on market conditions (bull/bear markets)

### Research Directions:
- Study the interpretability of LLM predictions
- Analyze the impact of different sentiment sources
- Investigate transfer learning across different stocks
- Explore attention mechanisms in the PPO policy network

---

## üìù Notes for Cloud Deployment

### Google Colab Setup:
1. Enable GPU: Runtime ‚Üí Change runtime type ‚Üí GPU
2. Run all cells in order
3. Data will be saved to `/content/` directory
4. Download results before session ends

### Kaggle Setup:
1. Enable GPU: Settings ‚Üí Accelerator ‚Üí GPU T4 x2
2. Upload data as a dataset or use !git clone
3. Data saved to `/kaggle/working/`
4. Results persist after run completion

### Troubleshooting:
- **Out of memory**: Restart runtime, clear cache with `torch.cuda.empty_cache()`
- **Model download fails**: Check HuggingFace authentication
- **Slow inference**: Verify GPU is enabled with `torch.cuda.is_available()`
- **Data not found**: Check DATA_PATH variable matches your setup

### Performance Expectations:
- **Model loading**: 2-5 minutes (first time)
- **Per prediction**: 2-5 seconds on GPU
- **Total inference time**: 2-6 hours for all data
- **Memory usage**: 6-8 GB VRAM (with 4-bit quantization)