A data fetching service that collects real-time options and market data from Theta Data and ORATS APIs. Successfully deployed on Railway with public HTTPS endpoints for data collection.
✅ Deployment Status: Running successfully on Railway with stable data collection from ORATS and Theta Terminal.
Data collection is now controlled exclusively via the dashboard with password protection:
- Password Required: Enter password (default:
0212) to start collectors - Re-authentication on Stop: Password required again after stopping collectors
- No Environment Variable: Removed
START_FETCHenvironment variable for cleaner deployment - Stop Monitoring: All API polling stops when collectors are stopped to avoid rate limiting
New endpoint designed for external AI systems to access organized data:
# Get all data organized by provider and date
curl http://localhost:5000/api/data/organized
# Get only ORATS data with sample rows
curl "http://localhost:5000/api/data/organized?provider=orats&include_data=true&sample_rows=20"
# Get data for specific date
curl "http://localhost:5000/api/data/organized?date=2025-12-07"Features:
- Organized by provider (theta_data, orats) and date
- Real row counts from actual CSV files
- Optional sample data inclusion
- Rich metadata with date ranges
- Perfect for AI/ML model integration
See docs/API_ENDPOINT_ORGANIZED.md for complete documentation.
Dashboard now shows actual row counts from CSV files:
- No fake or placeholder data
- Real-time file sizes and row counts
- Accurate timestamp information
This service continuously collects:
- Theta Data (Primary): Options snapshots for SPY, SPX, QQQ and market indices (SPX, VIX, NDX)
- ORATS (Secondary): Comprehensive Greeks data for SPY, SPX, and QQQ with 0-1 day expiration
Data is saved to CSV files with automatic rotation at 100MB.
This repository is part of a three-service Railway deployment:
- 1-Model-Hybrid - ML model service
- 2-Trading-data - This service (data collector)
- 3-Theta-terminal-railway - Theta Data Terminal (ports 25503, 25520)
Services within the same Railway project communicate via internal networking:
- Format:
<service-name>.railway.internal - Example:
theta-terminal.railway.internal:25503 - Benefits: No external IP exposure, faster communication, free internal traffic
When deployed on Railway, Model-Hybrid connects to Source-Data using Railway's internal networking:
Model-Hybrid Source-Data
┌─────────────────┐ ┌─────────────────┐
│ Dashboard UI │ │ Data Collectors │
│ │ HTTP API │ - Theta │
│ /api/data- │ ──────────────→ │ - ORATS │
│ source/latest │ │ │
│ │ Internal URL: │ │
│ │ http://source- │ /api/data/ │
│ │ data.railway. │ realtime │
│ │ internal:5000 │ │
└─────────────────┘ └─────────────────┘
Environment Variables for Railway:
Source-Data:
PORT=5000RUN_MODE=apiSTART_FETCH=no(until ready to collect)
Model-Hybrid connects automatically via:
DATA_SOURCE_URL=http://source-data.railway.internal:5000
Source-Data/
├── Dockerfile # Container configuration
├── requirements.txt # Python dependencies
├── railway.json # Railway deployment config
├── .env.example # Environment variables template
├── .gitignore # Git ignore patterns
├── README.md # This file
├── QUICK_START.md # Quick start guide (NEW)
├── DATA_BROWSING_GUIDE.md # Data browsing documentation (NEW)
├── main.py # Application entrypoint
├── api_server.py # Flask API server
├── collectors/ # Data collector modules
│ ├── __init__.py
│ ├── orats_collector.py # ORATS options data
│ └── theta_collector.py # Theta Data options & indices
└── data/ # Data storage (gitignored)
├── orats_live/ # ORATS CSV files
└── theta_live/ # Theta Data CSV files
-
Set Environment Variables in Railway dashboard:
ORATS_API_KEY=your_orats_api_key_here THETA_PUBLIC_URL=https://theta-terminal-production.up.railway.app START_FETCHING=yes RUN_MODE=apiNetwork Configuration:
THETA_PUBLIC_URL: External HTTPS proxy for fetching data from Theta Terminal- Inward connections from Model-Hybrid use Railway's automatic internal service discovery
-
Deploy:
railway up
-
Access:
- Dashboard:
https://your-service.railway.app - Data Browse:
https://your-service.railway.app/api/data/browse
- Dashboard:
# Clone and setup
git clone https://github.com/chrisLI0212/Source-Data.git
cd Source-Data
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
pip install -r requirements.txt
# Configure
cp .env.example .env
# Edit .env with your API keys and set START_FETCHING=yes
# Run API server
python main.pyAccess dashboard at: http://localhost:5000
GET /api/data/browse?provider=all&date=2025-12-07Response:
{
"status": "success",
"data": {
"orats": {
"2025-12-07": [
{
"filename": "orats_live_20251207_120000.csv",
"timestamp": "2025-12-07T12:00:00",
"rows": 1250,
"size_bytes": 245678
}
]
},
"theta_options": {...},
"theta_market": {...}
}
}Query Parameters:
provider:all,orats,theta_options,theta_marketdate:YYYY-MM-DDformat
GET /api/statusResponse:
{
"status": "success",
"data": {
"running": true,
"start_time": "2025-12-07T08:00:00",
"error": null,
"fetching_enabled": true
}
}POST /api/start # Start data collection (respects START_FETCHING)
POST /api/stop # Stop data collection
GET /api/status # Get current statusGET /api/data/latest?source=all&limit=100&format=json
GET /api/data/realtime # Formatted for Model-HybridGET /health
GET /api/theta/status # Check Theta Terminal connectionFor detailed endpoint documentation, see DATA_BROWSING_GUIDE.md.
| Variable | Purpose | Values | Example |
|---|---|---|---|
START_FETCHING |
Enable/disable data collection | yes/no |
START_FETCHING=yes |
RUN_MODE |
Run API or direct mode | api/direct |
RUN_MODE=api |
ORATS_API_KEY |
ORATS authentication | API key | (required) |
THETA_PUBLIC_URL |
Theta Terminal external proxy URL | URL | https://theta-terminal-production.up.railway.app |
Network Architecture:
- Outward (Source-Data → Theta Terminal): Uses
THETA_PUBLIC_URL(external HTTPS proxy) - Inward (Model-Hybrid → Source-Data): Uses Railway's automatic internal service discovery
Theta Data Collector (Primary)
- Options: SPY, SPX, QQQ snapshots
- Market Indices: SPX, VIX, NDX
- Interval: 20 seconds
- Output:
data/theta_live/{options,market}_YYYYMMDD_HHMMSS.csv
ORATS Collector (Secondary)
- Symbols: SPY, SPX, QQQ
- DTE Range: 0-1 days
- Interval: 20 seconds
- Output:
data/orats_live/orats_live_YYYYMMDD_HHMMSS.csv
File Rotation: Automatic at 100MB per file
data/
├── orats_live/
│ ├── orats_live_20251207_120000.csv
│ ├── orats_live_20251207_130000.csv
│ └── orats_live_20251206_120000.csv
└── theta_live/
├── options_20251207_120000.csv
├── options_20251207_130000.csv
├── market_20251207_120000.csv
└── market_20251206_120000.csv
Access via /api/data/browse endpoint.
Default mode - runs REST API with dashboard:
python main.py
# or
RUN_MODE=api python main.pyFeatures:
- Web dashboard at http://localhost:5000
- REST API for programmatic control
- Data discovery via
/api/data/browse - Collectors NOT running by default (use API to start)
Direct collector mode - collectors start immediately:
RUN_MODE=direct python main.pyAccess at:
- Local: http://localhost:5000
- Railway: https://your-service.railway.app
Features:
- Start/stop collectors
- Monitor collection status
- Preview latest data
- Download CSV files
Check Status:
curl http://localhost:5000/api/statusBrowse Data:
curl http://localhost:5000/api/data/browseGet Latest Data:
curl http://localhost:5000/api/data/latest?limit=100Start Collection:
curl -X POST http://localhost:5000/api/startStop Collection:
curl -X POST http://localhost:5000/api/stopExample JavaScript integration:
// Check if data is available
async function checkData() {
const response = await fetch('http://api.example.com/api/data/browse');
const data = await response.json();
return data.data; // Returns organized data by provider/date
}
// Get latest data for analysis
async function getLatestForAnalysis() {
const response = await fetch('http://api.example.com/api/data/latest?format=csv');
return await response.text(); // Returns CSV data
}- Check
START_FETCHINGis set toyes - Verify
/api/statusshowsfetching_enabled: true - Check Theta Terminal connection:
/api/theta/status - Review logs for errors
- Ensure collectors are running (
/api/status) - Wait 20+ seconds (collection interval)
- Verify data directories exist:
data/orats_live/anddata/theta_live/ - Check file permissions
Theta Terminal:
curl http://localhost:5000/api/theta/statusORATS:
- Verify
ORATS_API_KEYin environment variables - Check API rate limits
- QUICK_START.md - Get started quickly
- DATA_BROWSING_GUIDE.md - Detailed API documentation
- DATA_DISCOVERY_AND_PASSWORD.md - Data access information
- API_INTERNAL_COMMUNICATION.md - API communication details
- CHANGELOG.md - Version history
Older deployment and implementation notes have been archived in docs/archive/ for reference.
- Comprehensive real-time options data
- Market indices (SPX, VIX, NDX)
- Lower latency via internal networking
- Documentation: thetadata.net
- Detailed Greeks and analytics
- Specialized options analysis
- Proven reliability for short DTE
- Documentation: docs.orats.io
Model-Hybrid can use these endpoints:
// Check data availability
fetch('http://source-data.railway.internal:5000/api/data/browse')
// Get latest data for analysis
fetch('http://source-data.railway.internal:5000/api/data/latest')
// Check collection status
fetch('http://source-data.railway.internal:5000/api/status')MIT License