Skip to content

chrisLI0212/Source-Data

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Trading Data Collection Service

A data fetching service that collects real-time options and market data from Theta Data and ORATS APIs. Successfully deployed on Railway with public HTTPS endpoints for data collection.

Deployment Status: Running successfully on Railway with stable data collection from ORATS and Theta Terminal.

🆕 New Features (December 8, 2025)

1. Dashboard Password Control (Improved Security)

Data collection is now controlled exclusively via the dashboard with password protection:

  • Password Required: Enter password (default: 0212) to start collectors
  • Re-authentication on Stop: Password required again after stopping collectors
  • No Environment Variable: Removed START_FETCH environment variable for cleaner deployment
  • Stop Monitoring: All API polling stops when collectors are stopped to avoid rate limiting

2. External AI Data Access Endpoint: /api/data/organized

New endpoint designed for external AI systems to access organized data:

# Get all data organized by provider and date
curl http://localhost:5000/api/data/organized

# Get only ORATS data with sample rows
curl "http://localhost:5000/api/data/organized?provider=orats&include_data=true&sample_rows=20"

# Get data for specific date
curl "http://localhost:5000/api/data/organized?date=2025-12-07"

Features:

  • Organized by provider (theta_data, orats) and date
  • Real row counts from actual CSV files
  • Optional sample data inclusion
  • Rich metadata with date ranges
  • Perfect for AI/ML model integration

See docs/API_ENDPOINT_ORGANIZED.md for complete documentation.

3. Real Data Display

Dashboard now shows actual row counts from CSV files:

  • No fake or placeholder data
  • Real-time file sizes and row counts
  • Accurate timestamp information

Overview

This service continuously collects:

  • Theta Data (Primary): Options snapshots for SPY, SPX, QQQ and market indices (SPX, VIX, NDX)
  • ORATS (Secondary): Comprehensive Greeks data for SPY, SPX, and QQQ with 0-1 day expiration

Data is saved to CSV files with automatic rotation at 100MB.

Architecture

This repository is part of a three-service Railway deployment:

  1. 1-Model-Hybrid - ML model service
  2. 2-Trading-data - This service (data collector)
  3. 3-Theta-terminal-railway - Theta Data Terminal (ports 25503, 25520)

Railway Internal Networking

Services within the same Railway project communicate via internal networking:

  • Format: <service-name>.railway.internal
  • Example: theta-terminal.railway.internal:25503
  • Benefits: No external IP exposure, faster communication, free internal traffic

When deployed on Railway, Model-Hybrid connects to Source-Data using Railway's internal networking:

Model-Hybrid                          Source-Data
┌─────────────────┐                  ┌─────────────────┐
│  Dashboard UI   │                  │  Data Collectors │
│                 │    HTTP API      │  - Theta        │
│  /api/data-     │ ──────────────→  │  - ORATS        │
│  source/latest  │                  │                 │
│                 │  Internal URL:   │                 │
│                 │  http://source-  │  /api/data/     │
│                 │  data.railway.   │  realtime       │
│                 │  internal:5000   │                 │
└─────────────────┘                  └─────────────────┘

Environment Variables for Railway:

Source-Data:

  • PORT=5000
  • RUN_MODE=api
  • START_FETCH=no (until ready to collect)

Model-Hybrid connects automatically via:

  • DATA_SOURCE_URL=http://source-data.railway.internal:5000

Directory Structure

Source-Data/
├── Dockerfile              # Container configuration
├── requirements.txt        # Python dependencies
├── railway.json           # Railway deployment config
├── .env.example           # Environment variables template
├── .gitignore            # Git ignore patterns
├── README.md             # This file
├── QUICK_START.md        # Quick start guide (NEW)
├── DATA_BROWSING_GUIDE.md # Data browsing documentation (NEW)
├── main.py               # Application entrypoint
├── api_server.py         # Flask API server
├── collectors/           # Data collector modules
│   ├── __init__.py
│   ├── orats_collector.py    # ORATS options data
│   └── theta_collector.py    # Theta Data options & indices
└── data/                 # Data storage (gitignored)
    ├── orats_live/       # ORATS CSV files
    └── theta_live/       # Theta Data CSV files

Quick Start

Railway Deployment

  1. Set Environment Variables in Railway dashboard:

    ORATS_API_KEY=your_orats_api_key_here
    THETA_PUBLIC_URL=https://theta-terminal-production.up.railway.app
    START_FETCHING=yes
    RUN_MODE=api
    

    Network Configuration:

    • THETA_PUBLIC_URL: External HTTPS proxy for fetching data from Theta Terminal
    • Inward connections from Model-Hybrid use Railway's automatic internal service discovery
  2. Deploy:

    railway up
  3. Access:

    • Dashboard: https://your-service.railway.app
    • Data Browse: https://your-service.railway.app/api/data/browse

Local Development

# Clone and setup
git clone https://github.com/chrisLI0212/Source-Data.git
cd Source-Data
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
pip install -r requirements.txt

# Configure
cp .env.example .env
# Edit .env with your API keys and set START_FETCHING=yes

# Run API server
python main.py

Access dashboard at: http://localhost:5000


API Endpoints

New in December 2025

Browse Data by Provider and Date

GET /api/data/browse?provider=all&date=2025-12-07

Response:

{
  "status": "success",
  "data": {
    "orats": {
      "2025-12-07": [
        {
          "filename": "orats_live_20251207_120000.csv",
          "timestamp": "2025-12-07T12:00:00",
          "rows": 1250,
          "size_bytes": 245678
        }
      ]
    },
    "theta_options": {...},
    "theta_market": {...}
  }
}

Query Parameters:

  • provider: all, orats, theta_options, theta_market
  • date: YYYY-MM-DD format

Check Collector Status

GET /api/status

Response:

{
  "status": "success",
  "data": {
    "running": true,
    "start_time": "2025-12-07T08:00:00",
    "error": null,
    "fetching_enabled": true
  }
}

Existing Endpoints

Control Collectors

POST /api/start   # Start data collection (respects START_FETCHING)
POST /api/stop    # Stop data collection
GET  /api/status  # Get current status

Retrieve Data

GET /api/data/latest?source=all&limit=100&format=json
GET /api/data/realtime  # Formatted for Model-Hybrid

Health Check

GET /health
GET /api/theta/status  # Check Theta Terminal connection

For detailed endpoint documentation, see DATA_BROWSING_GUIDE.md.


Configuration

Environment Variables

Variable Purpose Values Example
START_FETCHING Enable/disable data collection yes/no START_FETCHING=yes
RUN_MODE Run API or direct mode api/direct RUN_MODE=api
ORATS_API_KEY ORATS authentication API key (required)
THETA_PUBLIC_URL Theta Terminal external proxy URL URL https://theta-terminal-production.up.railway.app

Network Architecture:

  • Outward (Source-Data → Theta Terminal): Uses THETA_PUBLIC_URL (external HTTPS proxy)
  • Inward (Model-Hybrid → Source-Data): Uses Railway's automatic internal service discovery

Data Collection

Theta Data Collector (Primary)

  • Options: SPY, SPX, QQQ snapshots
  • Market Indices: SPX, VIX, NDX
  • Interval: 20 seconds
  • Output: data/theta_live/{options,market}_YYYYMMDD_HHMMSS.csv

ORATS Collector (Secondary)

  • Symbols: SPY, SPX, QQQ
  • DTE Range: 0-1 days
  • Interval: 20 seconds
  • Output: data/orats_live/orats_live_YYYYMMDD_HHMMSS.csv

File Rotation: Automatic at 100MB per file


Data Organization

data/
├── orats_live/
│   ├── orats_live_20251207_120000.csv
│   ├── orats_live_20251207_130000.csv
│   └── orats_live_20251206_120000.csv
└── theta_live/
    ├── options_20251207_120000.csv
    ├── options_20251207_130000.csv
    ├── market_20251207_120000.csv
    └── market_20251206_120000.csv

Access via /api/data/browse endpoint.


Running the Service

API Mode (Recommended)

Default mode - runs REST API with dashboard:

python main.py
# or
RUN_MODE=api python main.py

Features:

  • Web dashboard at http://localhost:5000
  • REST API for programmatic control
  • Data discovery via /api/data/browse
  • Collectors NOT running by default (use API to start)

Direct Mode

Direct collector mode - collectors start immediately:

RUN_MODE=direct python main.py

⚠️ Warning: Collectors run continuously. Use only for development.


Using the API Server

Web Dashboard

Access at:

Features:

  • Start/stop collectors
  • Monitor collection status
  • Preview latest data
  • Download CSV files

Programmatic Access

Check Status:

curl http://localhost:5000/api/status

Browse Data:

curl http://localhost:5000/api/data/browse

Get Latest Data:

curl http://localhost:5000/api/data/latest?limit=100

Start Collection:

curl -X POST http://localhost:5000/api/start

Stop Collection:

curl -X POST http://localhost:5000/api/stop

Frontend Integration

Example JavaScript integration:

// Check if data is available
async function checkData() {
  const response = await fetch('http://api.example.com/api/data/browse');
  const data = await response.json();
  return data.data;  // Returns organized data by provider/date
}

// Get latest data for analysis
async function getLatestForAnalysis() {
  const response = await fetch('http://api.example.com/api/data/latest?format=csv');
  return await response.text();  // Returns CSV data
}

Troubleshooting

Data Collection Not Starting

  1. Check START_FETCHING is set to yes
  2. Verify /api/status shows fetching_enabled: true
  3. Check Theta Terminal connection: /api/theta/status
  4. Review logs for errors

No Data in Browse Endpoint

  1. Ensure collectors are running (/api/status)
  2. Wait 20+ seconds (collection interval)
  3. Verify data directories exist: data/orats_live/ and data/theta_live/
  4. Check file permissions

Connection Errors

Theta Terminal:

curl http://localhost:5000/api/theta/status

ORATS:

  • Verify ORATS_API_KEY in environment variables
  • Check API rate limits

Documentation

Active Documentation

Historical Documentation

Older deployment and implementation notes have been archived in docs/archive/ for reference.


Data Sources

Theta Data (Primary)

  • Comprehensive real-time options data
  • Market indices (SPX, VIX, NDX)
  • Lower latency via internal networking
  • Documentation: thetadata.net

ORATS (Secondary)

  • Detailed Greeks and analytics
  • Specialized options analysis
  • Proven reliability for short DTE
  • Documentation: docs.orats.io

Integration with Model-Hybrid

Model-Hybrid can use these endpoints:

// Check data availability
fetch('http://source-data.railway.internal:5000/api/data/browse')

// Get latest data for analysis
fetch('http://source-data.railway.internal:5000/api/data/latest')

// Check collection status
fetch('http://source-data.railway.internal:5000/api/status')

License

MIT License

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •