Skip to content

Dev-Sirbhaiya/FinBot

Repository files navigation

MAIS — Multimodal Agentic Investment System

A multi-agent AI system for stock analysis using 5 specialized agents, multi-round debate, ML ensemble predictions, and reinforcement learning paper trading validation.

Python 3.11+ FastAPI Next.js 15 PyTorch License: MIT


The Problem

Single-Perspective Bias — Traditional stock analysis tools provide a single viewpoint. A technical analyst ignores fundamentals. A sentiment analyst ignores chart patterns. Real investment decisions require synthesizing multiple perspectives—just like how investment committees at hedge funds work.

Black Box Predictions — ML models that predict "BUY" or "SELL" without explanation are useless for real decisions. You need to understand why the system recommends something.

Unvalidated Recommendations — Most systems tell you what to do but don't test if their advice actually works.

Data Overload — Investors face information overload: SEC filings, news, social media, charts, economic indicators—scattered across dozens of sources.

The Solution

MAIS solves all four problems with a multi-layered AI pipeline:

  1. 5 Specialized AI Agents — each fetches real data and analyzes independently using Claude AI
  2. 4-Round Structured Debate — agents challenge each other's reasoning, just like a real investment committee
  3. 5-Model ML Ensemble — LSTM, Transformer, XGBoost, Random Forest, and Prophet predict price direction
  4. RL Paper Trading Validator — a PPO reinforcement learning agent paper-trades the recommendation to verify it works
  5. Professional Reports — PDF reports with charts, full reasoning, and audit trails

Input: A stock ticker (e.g., AAPL) Output: A BUY/HOLD/SELL recommendation with confidence score, 5 agent analyses, debate transcript, ML predictions, RL paper trading results, risk metrics, and a PDF report with charts.

Disclaimer: MAIS is for educational and research purposes only. It does not constitute financial advice. Always consult a licensed financial advisor before making investment decisions.


Architecture

┌─────────────────────────────────────────────────────────────────────────┐
│                         FRONTEND (Next.js 15)                          │
│  ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────────┐  │
│  │  Analysis   │ │   Paper     │ │  Training   │ │  Reports + Risk │  │
│  │  Dashboard  │ │  Trading    │ │   Console   │ │    Dashboard    │  │
│  └─────────────┘ └─────────────┘ └─────────────┘ └─────────────────┘  │
├─────────────────────────────────────────────────────────────────────────┤
│                         API LAYER (FastAPI)                             │
│  /analysis    /paper-trading    /training    /reports    /risk          │
├─────────────────────────────────────────────────────────────────────────┤
│                      ORCHESTRATION LAYER                               │
│   Master Orchestrator: Agents → Debate → ML → RL → Blending → Report  │
├──────────┬──────────┬──────────┬──────────┬──────────┬─────────────────┤
│  AGENT   │  DEBATE  │    ML    │    RL    │   RISK   │    OUTPUT       │
│  LAYER   │  ENGINE  │ ENSEMBLE │ VALIDATOR│  ENGINE  │    LAYER        │
├──────────┼──────────┼──────────┼──────────┼──────────┼─────────────────┤
│ Trend    │ Round 1: │ LSTM     │ PPO      │ VaR/CVaR │ PDF Reports     │
│ Sentiment│ Positions│ Trans-   │ Agent    │ Stress   │ Charts          │
│ Risk     │ Round 2: │ former   │          │ Testing  │ Memos           │
│ Fundmtls │ Counters │ XGBoost  │ Gym Env  │ Beta     │ Audit Logs      │
│ Technical│ Round 3: │ R.Forest │          │ Drawdown │                 │
│          │ Evidence │ Prophet  │          │          │                 │
│          │ Round 4: │          │          │          │                 │
│          │ Voting   │          │          │          │                 │
├──────────┴──────────┴──────────┴──────────┴──────────┴─────────────────┤
│                         DATA LAYER                                     │
│  yfinance │ SEC EDGAR │ News APIs │ Reddit/StockTwits │ FRED/World Bank│
├─────────────────────────────────────────────────────────────────────────┤
│                         LLM LAYER                                      │
│   Claude (Primary) ──→ Gemini (Fallback) ──→ Ollama (Local Fallback)   │
└─────────────────────────────────────────────────────────────────────────┘

Data Flow (Step by Step)

  1. User enters a stock ticker (e.g., AAPL) in the frontend
  2. API receives request and passes to the Master Orchestrator
  3. Orchestrator spawns 5 agents in parallel — each fetches relevant data using specialized tools
  4. Each agent uses Claude AI to analyze its data and form a position (bullish/bearish)
  5. Debate Engine runs 4 rounds of structured argument between agents
  6. Consensus Builder aggregates votes with confidence weighting
  7. ML Ensemble generates price direction predictions from 5 models
  8. RL Validator paper-trades the recommendation to validate it
  9. Blending Layer combines debate consensus + ML prediction + RL validation
  10. Report Generator creates audit-ready output with checksums

Core Components

1. The 5 Specialized Agents

Each agent follows the ReAct (Reasoning + Acting) pattern: collect data → analyze with LLM → form position → debate.

Agent Expertise Data Sources
Trend Agent Price momentum, moving averages, trend strength yfinance price history
Sentiment Agent News sentiment, social media mood, fear/greed News APIs, Reddit, StockTwits
Risk Agent Volatility, VaR, correlation, drawdown risk Price history, benchmark data
Fundamentals Agent Financial statements, valuations, earnings quality SEC EDGAR, yfinance fundamentals
Technical Agent Chart patterns, RSI, MACD, Bollinger Bands OHLCV data, technical indicators

All 5 agents run in parallel — reducing total analysis time from ~2.5 minutes (sequential) to ~30 seconds.

2. The 4-Round Debate Engine

Real investment committees have specialists who challenge each other's reasoning. MAIS simulates this:

Round Purpose Example
Round 1 — Position Statements Each agent presents its analysis and takes a stance (strongly bullish → strongly bearish) with confidence and cited data "RSI at 72 suggests overbought conditions"
Round 2 — Counter-Arguments Agents challenge each other, targeting positions most different from their own "The Technical Agent ignores that social sentiment just turned positive"
Round 3 — Evidence & Rebuttals Agents respond to challenges with strongest evidence, may update confidence "While social sentiment is positive, my RSI concern stands because..."
Round 4 — Final Voting Each agent casts a final vote (BUY/HOLD/SELL) with updated confidence. Dissent notes capture remaining concerns "I vote BUY but worry about the earnings date risk"

Consensus Building uses confidence-weighted voting — an agent 90% sure of BUY counts more than one 51% sure:

Votes: [BUY(0.8), BUY(0.6), HOLD(0.5), SELL(0.7), BUY(0.9)]
BUY weight:  0.8 + 0.6 + 0.9 = 2.3
HOLD weight: 0.5
SELL weight: 0.7
→ BUY with 65.7% consensus strength

3. The ML Ensemble (5 Models)

Five different ML models predict whether a stock's price will go UP, DOWN, or stay FLAT over the next 5 days:

Model Architecture Why It's Good for Stocks
LSTM 3 layers, 128 hidden units, bidirectional + attention Captures temporal dependencies — today's price depends on yesterday's
Transformer 6 encoder layers, 8 attention heads, 256-dim embeddings Self-attention finds relationships between any two time points
XGBoost 500 trees, max depth 8, learning rate 0.05 Excels at non-linear relationships: "RSI > 70 AND volume declining → reversal"
Random Forest 500 trees, max depth 12 Robust to outliers and noise, provides feature importance rankings
Prophet Facebook's time series model Captures weekly patterns (Monday dips, Friday rallies) and seasonality

The models are combined with a Ridge regression meta-learner that automatically learns optimal weights — not just simple majority voting:

1. Each model predicts: [-0.02, +0.01, +0.03, +0.01, +0.02]
2. Meta-learner weights: [0.25, 0.30, 0.20, 0.15, 0.10]
3. Weighted average:     +0.015 (+1.5%)
4. Direction: UP (> 0.5% threshold)
5. Confidence: 1 / (1 + std_deviation × 100)   — high agreement = high confidence

Feature Engineering: 60 engineered features from raw OHLCV data — price returns (6 timeframes), moving average ratios, volatility measures, RSI, MACD, Bollinger Bands, volume features, and lagged variants.

4. The RL Paper Trading Validator

A PPO (Proximal Policy Optimization) reinforcement learning agent validates recommendations by paper-trading with simulated money:

Trading Environment (Custom Gymnasium)

  • State: 35 features (30 market features + 5 portfolio features)
  • Action: Continuous position from -1 (100% short) to +1 (100% long)
  • Initial Balance: $100,000 (simulated)
  • Transaction Cost: 0.1% per trade (realistic broker fee)
  • Slippage: 0.05% (simulates market impact)

Multi-Objective Reward Function:

Reward = 100 × daily_return           ← main signal: make money
       + 0.1 × sharpe_component      ← reward consistency
       - 0.5 × excess_drawdown       ← penalize big losses (>5% from peak)
       - 10  × transaction_cost       ← penalize overtrading

PPO Neural Network:

  • Actor (policy): 256 → 256 → 128 neurons with GELU activation
  • Critic (value): 256 → 256 → 128 neurons with GELU activation
  • Training: 500,000 timesteps (~2,000 years of simulated trading) with orthogonal initialization

Validation Logic: If the recommendation is BUY and the RL agent's average position is > 0.3 (long-biased), the RL agent agrees. Validation passes if total return > -5%, Sharpe ratio > -0.5, and agreement score > 0.6.

5. Report Generation

The DecisionReportGenerator creates point-in-time, checksummed reports:

Section Contents
Header Ticker, date, recommendation, confidence
Executive Summary One-paragraph recommendation with key reasons
Agent Analyses Each agent's stance, reasoning, evidence
Debate Transcript All 4 rounds with arguments and counters
ML Predictions 5-model predictions with confidence
RL Validation Paper trading results, trade log
Risk Metrics VaR, volatility, drawdown, beta
Counterfactuals What-if scenarios (market crash, rate hike, sector rotation, earnings miss)
Charts Price history, prediction vs. actual, agent positions
Audit Trail Timestamps, SHA-256 checksums, model versions

Tech Stack

Backend

Technology Version Purpose
Python 3.11+ Primary backend language
FastAPI Latest High-performance async API framework
Pydantic v2 Latest Data validation and settings management
asyncio Built-in Parallel agent execution
uvicorn Latest ASGI server

AI / ML

Technology Purpose
Anthropic Claude Primary LLM for agent reasoning
Google Gemini Fallback LLM provider
Ollama Local LLM fallback (no API needed)
LangChain + LangGraph LLM tool orchestration and agent framework
PyTorch Deep learning (LSTM, Transformer)
scikit-learn Traditional ML (Random Forest) + meta-learner
XGBoost Gradient boosting
Prophet Time series forecasting
Stable-Baselines3 Reinforcement learning (PPO)
Gymnasium RL environment framework

Frontend

Technology Purpose
Next.js 15 React framework with App Router
React 19 UI component library
Tailwind CSS Utility-first styling
Plotly.js Interactive charts
TypeScript Type-safe JavaScript

Data Sources

Source Data Type Cost
yfinance Price/volume history, fundamentals Free
SEC EDGAR 10-K, 10-Q, 8-K filings Free
Finnhub Real-time quotes, news Free tier
GDELT Global news sentiment Free
StockTwits Social sentiment Free
Reddit API r/wallstreetbets sentiment Free tier
World Bank Macro indicators Free
FRED Economic data Free

LLM Fallback Chain

MAIS uses a fault-tolerant fallback chain to ensure reliability:

Claude (Primary, highest quality)
  ↓ on failure (rate limit, downtime, error)
Gemini (Secondary, different failure modes)
  ↓ on failure
Ollama (Local, always available, no cost)

Prompt Engineering Techniques:

  • Structured Output: All prompts request JSON with specific schemas for parseable, consistent responses
  • Role Assignment: Each agent has a detailed system prompt defining its expertise
  • Few-Shot Examples: Critical prompts include examples of desired output format
  • Temperature Control: Analysis uses temperature 0.3 (focused), debate uses 0.5 (more creative)

Getting Started

Prerequisites

Requirement Version Notes
Python 3.11+ 3.12 or 3.13 work fine
Node.js 18+ For the frontend
Anthropic API Key Required Get at console.anthropic.com
8GB+ RAM Recommended For ML training

Optional (not required): PostgreSQL, Redis, Qdrant — the system works without them using in-memory fallbacks.

Installation

# 1. Clone the repository
git clone https://github.com/Dev-Sirbhaiya/FinBot.git
cd FinBot

# 2. Create and activate a virtual environment
python -m venv .venv
# Windows:
.venv\Scripts\activate
# macOS/Linux:
source .venv/bin/activate

# 3. Install Python dependencies (~40 packages including PyTorch)
pip install -e .

# 4. Create your environment file
cp .env.example .env
# Edit .env and add: ANTHROPIC_API_KEY=sk-ant-your-key-here

# 5. Install frontend dependencies
cd frontend
npm install
cd ..

Running

Start the backend (Terminal 1):

python -m uvicorn src.api.main:app --reload --port 8000

Start the frontend (Terminal 2):

cd frontend
npm run dev

Open http://localhost:3000 in your browser.

API Documentation

With the backend running:


Using the Application

Full Analysis

  1. Enter a stock ticker (e.g., AAPL, NVDA, TSLA) and click Analyze
  2. The pipeline runs: 5 agents analyze in parallel → 4-round debate → ML predictions → RL validation
  3. Results show agent reasoning, debate transcript, ML predictions, and RL validation

Paper Trading

Test a specific investment hypothesis. Enter a ticker, choose a recommendation (BUY/SELL/HOLD) and confidence level. The RL agent simulates trades.

Training Console

Manually train or retrain ML and RL models. View trained models, ensemble weights, and accuracy statistics.

Risk Dashboard

View risk profiles (VaR, CVaR, beta, drawdown), run stress tests (market crash, flash crash, interest rate shock, sector rotation, etc.).

Reports

Generate PDF reports with charts, agent analyses, debate summaries, ML/RL results, and audit metadata with SHA-256 checksums.


First Run Timing

On the first analysis for a new ticker, the system auto-trains ML and RL models:

Stage First Run (CPU) Subsequent Runs
Agent analysis (5 parallel) 15–30s 15–30s
Debate (4 rounds) 30–60s 30–60s
ML training (5 models) 5–20 min 1–3s (cached)
RL training (500K steps) 10–30 min 1–3s (cached)
Total ~20–55 min ~1–2 min

Trained models are saved to models/<TICKER>/ and reused on subsequent runs.

Speed up first run: Reduce epochs in config/ml_models.yaml and total_timesteps in config/rl_config.yaml. If you have a CUDA GPU, set device: "cuda" in config/settings.py.


Project Structure

mais/
├── config/                    # Configuration (YAML + Python)
│   ├── settings.py            # Pydantic settings hub (loads from .env + YAML)
│   ├── agents.yaml            # Agent prompts, tools, timeouts
│   ├── ml_models.yaml         # ML hyperparameters (LSTM, XGBoost, etc.)
│   ├── rl_config.yaml         # RL training config (PPO, environment, reward)
│   ├── data_sources.yaml      # Data source URLs and rate limits
│   └── agent_weights.json     # Initial blending weights
│
├── src/                       # Python backend
│   ├── agents/                # 5 financial agents + base class + tool definitions
│   ├── api/                   # FastAPI routes, middleware, auth
│   ├── core/                  # Shared utilities (caching, validation, audit)
│   ├── data/                  # Data fetching, caching, rate limiting
│   ├── db/                    # Database models (optional, in-memory fallback)
│   ├── debate/                # Debate engine + consensus builder
│   ├── llm/                   # LLM providers (Claude, Gemini, Ollama) + fallback chain
│   ├── ml/                    # ML models, ensemble, training, feature engineering
│   ├── orchestrator/          # Master orchestrator + pipeline coordination
│   ├── output/                # PDF generation, charts, memo rendering
│   ├── reports/               # Decision report system with checksums
│   ├── risk/                  # Risk metrics, VaR/CVaR, stress testing, governance
│   ├── rl/                    # RL agent (PPO), Gym environment, validator
│   ├── simulation/            # Backtesting engine, walk-forward testing
│   ├── tasks/                 # Async task definitions
│   └── vision/                # Chart image processing
│
├── frontend/                  # Next.js 15 + React 19 + Tailwind + TypeScript
│   └── src/
│       ├── app/               # Pages (analysis, paper-trading, training, risk, reports)
│       ├── components/        # Reusable components (AgentCard, DebateTimeline, etc.)
│       ├── hooks/             # Custom React hooks
│       └── lib/               # API client + TypeScript types
│
├── models/                    # Saved ML/RL weights (auto-generated at runtime)
├── tests/                     # Test suite (agents, API, debate, ML, RL)
├── scripts/                   # Training scripts (train_ml_models.py, train_rl_agent.py)
├── data/                      # Runtime data (reports, simulations, predictions)
├── .env.example               # Environment variable template
├── pyproject.toml             # Python dependencies and project metadata
├── Dockerfile                 # Production container (non-root, multi-layer cache)
└── docker-compose.yml         # Full stack: API + frontend + PostgreSQL + Redis + Qdrant

API Endpoints

Endpoint Method Description
/api/analysis/start POST Start full analysis pipeline for a ticker
/api/analysis/{task_id}/status GET Check analysis progress
/api/analysis/{task_id}/result GET Get completed analysis results
/api/paper-trading/run POST Run paper trading simulation
/api/training/ml/{ticker} POST Train ML models for a ticker
/api/training/rl/{ticker} POST Train RL agent for a ticker
/api/reports/generate/{task_id} POST Generate PDF report
/api/risk/profile/{ticker} GET Get risk metrics
/ws/analysis/{task_id} WebSocket Real-time analysis progress updates

Key Technical Decisions

Decision Chosen Why
Multi-Agent vs. Single Agent Multi-Agent Investment decisions are multi-faceted. Single perspectives have blind spots. Debate forces consideration of all angles.
Ensemble vs. Single ML Model Ensemble Each model captures different patterns (LSTM: sequences, Prophet: seasonality, XGBoost: non-linear). Ensemble is more robust than any individual.
PPO vs. DQN/SAC PPO PPO's clipping mechanism prevents catastrophic policy updates — crucial for financial applications where stability > marginal performance.
Fallback Chain vs. Single LLM Fallback Chain API rate limits and outages are common. Fallback ensures the system works even when Claude's API is down.
Confidence-Weighted Voting Over simple majority An agent 90% sure of BUY should count more than one 51% sure. Captures nuance in consensus.

Testing

# Run all tests
pytest

# Run specific category
pytest tests/test_ml/
pytest tests/test_debate/
pytest tests/test_rl/

# Run with coverage
pytest --cov=src --cov-report=html

Test suite covers:

  • Agent logic — weight calculation, circuit breaker, JSON parsing
  • API endpoints — analysis routes, error sanitization
  • Debate engine — consensus building, voting mechanics
  • ML pipeline — ensemble predictions, data validation, market data caching
  • RL environment — trading environment step/reset, reward calculation

Docker Deployment

# Build and run all services
docker-compose up -d

# Services started:
# - mais-api:      FastAPI backend on :8000
# - mais-frontend: Next.js on :3000
# - postgres:      Database on :5432
# - redis:         Cache on :6379
# - qdrant:        Vector DB on :6333

# View logs
docker-compose logs -f mais-api

# Stop all
docker-compose down

Configuration Deep Dive

Key Settings (config/settings.py)

LLMSettings:
  anthropic_api_key      # From ANTHROPIC_API_KEY env var
  claude_model           # Default: claude-sonnet-4-20250514
  fallback_chain         # [CLAUDE, GEMINI, OLLAMA]

MLSettings:
  lookback_window: 60    # Days of history for features
  prediction_horizon: 5  # Days ahead to predict
  model_dir: "models"    # Where to save trained models

RLSettings:
  total_timesteps: 500000
  learning_rate: 0.0003
  initial_balance: 100000

Agent Configuration (config/agents.yaml)

orchestrator:
  max_parallel_agents: 5    # Run all 5 simultaneously
  debate_rounds: 4
  analysis_timeout: 300     # 5 min max per analysis

agents:
  trend:
    indicators: [sma_20, sma_50, rsi, macd, ...]
  sentiment:
    news_lookback_days: 7
    subreddits: [wallstreetbets, stocks, investing]
    use_finbert: true

ML Hyperparameters (config/ml_models.yaml)

lstm:
  hidden_size: 128
  num_layers: 3
  bidirectional: true
  attention_heads: 8
  epochs: 100

xgboost:
  n_estimators: 500
  max_depth: 8

ensemble:
  meta_learner: "ridge"
  ridge_alpha: 1.0

RL Training (config/rl_config.yaml)

environment:
  initial_balance: 100000
  transaction_cost: 0.001
  slippage: 0.0005

reward:
  return_weight: 100.0
  sharpe_weight: 0.1
  drawdown_penalty: 0.5
  turnover_penalty: 10.0

ppo:
  learning_rate: 0.0003
  n_steps: 2048
  batch_size: 64
  gamma: 0.99
  clip_range: 0.2

Optional API Keys

Add any of these to .env for additional data sources:

Key Source Purpose
GOOGLE_API_KEY Google Gemini LLM fallback provider
NEWSAPI_KEY NewsAPI News articles
FINNHUB_API_KEY Finnhub Market data + news
REDDIT_CLIENT_ID / REDDIT_CLIENT_SECRET Reddit Social sentiment
FRED_API_KEY FRED Economic indicators
ALPHA_VANTAGE_API_KEY Alpha Vantage Market data

None required — free sources (yfinance, SEC EDGAR, GDELT, StockTwits, World Bank) work by default.


Troubleshooting

Problem Solution
ANTHROPIC_API_KEY error Make sure .env exists and has your key
Database connection errors Comment out DATABASE_URL, REDIS_URL, QDRANT_URL in .env
Port 8000 in use Kill existing process or use --port 8001
Port 3000 in use Frontend auto-detects and uses 3001
ML training is slow Reduce epochs in config/ml_models.yaml, timesteps in config/rl_config.yaml
Module not found errors Run pip install -e . from the mais/ directory
Frontend can't reach API Ensure backend is running on port 8000 first

What I Built vs. What I Integrated

Built end-to-end:

  • Complete multi-agent architecture with 5 specialized agents, each with different data sources and tools
  • 4-round debate engine with structured argumentation and confidence-weighted consensus building
  • ML ensemble pipeline with feature engineering (60 features), model training, and Ridge regression meta-learner
  • Custom Gymnasium trading environment with multi-objective reward shaping (returns, Sharpe, drawdown, turnover)
  • PPO trading agent with custom neural network architecture (256→256→128, GELU, orthogonal init)
  • Walk-forward backtesting engine with performance metrics (Sharpe, max drawdown, win rate, profit factor)
  • Decision report system with point-in-time snapshots, counterfactual scenarios, and SHA-256 checksums
  • FastAPI backend with RESTful API, WebSocket real-time updates, and API key authentication
  • LLM fallback chain with automatic failover (Claude → Gemini → Ollama) and robust JSON parsing
  • Data layer with intelligent caching (market-hours-aware TTL), token-bucket rate limiting, and graceful degradation
  • Next.js frontend with interactive dashboard, debate visualization, paper trading interface, and risk dashboard

Libraries and services integrated:

  • Claude AI (Anthropic) — Primary LLM for agent reasoning
  • Stable-Baselines3 — RL algorithm implementation (PPO)
  • PyTorch — Deep learning framework for LSTM/Transformer models
  • yfinance, SEC EDGAR — Market data and filings
  • LangChain / LangGraph — Agent tool orchestration

License

This project is licensed under the MIT License — see the LICENSE file for details.

About

Multi-agent AI stock analysis system: 5 LLM-powered agents debate investment decisions, validated by ML ensemble and RL paper trading

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors