ArkScope

AI-powered investment research platform with dual-provider agents (Anthropic + OpenAI), knowledge graph memory, 49 financial tools, and multi-source news/price/options data pipeline.

Overview

ArkScope combines RL-based trading strategies with LLM-powered analysis:

Dual AI Agent CLI — Anthropic (Claude Opus 4.7) + OpenAI (GPT-5.4) with 50 tools, 4 skills, 4 subagents
Analysis Pipeline — Structured 5-strategy pipeline (technical, fundamental, sentiment, risk, decision) with report generation
Discord Bot — Slash commands, interactive buttons, free-chat analysis, model selection
HTTP API — 25 RESTful endpoints (FastAPI + Swagger UI)
News Pipeline — Multi-source collection (Polygon, Finnhub, IBKR) with LLM scoring
Analysis Toolkit — Fundamentals (SEC EDGAR + Financial Datasets), options (IV/Greeks/chain), signals, web search
RL Pipeline — PPO/CPPO agents with sentiment/risk-enhanced data, model registry, 3 agent tools
Monitor System — Watchlist alerts (price, sentiment, signal, sector) with Discord notifications
Self-hosted PostgreSQL — pgvector-enabled, Docker deployment, 8 migrations

Quick Start

# 1. Install dependencies
pip install -r requirements.txt

# 2. Configure API keys
cp config/.env.template config/.env
# Edit config/.env: OPENAI_API_KEY, ANTHROPIC_API_KEY, POLYGON_API_KEY, etc.

# 3. Start database
docker compose up -d

# 4. Launch AI Agent CLI
python -m src.agents

# 5. (Optional) Start HTTP API
python -m src.api

# 6. (Optional) Start Discord bot
python scripts/monitor_service.py --discord

AI Agent CLI

python -m src.agents                                    # Default: Anthropic Opus 4.7
python -m src.agents --provider openai                  # Use GPT-5.4
python -m src.agents --model sonnet                     # Use Sonnet 4.6
python -m src.agents --thinking                         # Enable extended thinking
python -m src.agents --effort medium                    # Anthropic effort level
python -m src.agents --provider openai --reasoning xhigh  # GPT-5.4 max reasoning

Available Models

#	Provider	Model	Aliases	Context	Output	Features
1	Anthropic	Claude Opus 4.7	opus, o47	1M	128K	Effort, thinking, compaction
2	Anthropic	Claude Sonnet 4.6	sonnet, s46	1M	64K	Effort, thinking
3	OpenAI	GPT-5.4	gpt5, 5.4	1M	128K	Reasoning effort
4	OpenAI	GPT-5.4 Mini	mini, 5.4-mini	400K	128K	Fast + cost-efficient
5	OpenAI	GPT-5.4 Nano	nano, 5.4-nano	400K	128K	Fastest, cheapest

Slash Commands (23)

Command	Alias	Description
`/model [name]`	`/m`	Show model picker / switch model
`/code-model [name]`	`/cm`	Set code generation model
`/code-backend [name]`	`/cb`	Set code generation backend (api/codex/claude)
`/reasoning <level>`	`/r`	Set OpenAI reasoning (none/minimal/low/medium/high/xhigh)
`/effort <level>`	`/e`	Set Anthropic effort (max/xhigh/high/medium/low)
`/thinking`	`/t`	Toggle extended thinking (Anthropic)
`/context`	`/ctx`	Toggle 1M context beta (Anthropic)
`/compaction`	`/cmp`	Toggle server-side compaction (Opus 4.7)
`/skill <name> [args]`	`/sk`	Run a skill workflow (e.g. `/sk fa NVDA`)
`/subagent [name] [model]`	`/sa`	View/change subagent models
`/scratchpad`	`/pad`	List recent scratchpad sessions
`/history [N]`	`/h`	Show current session Q&A history
`/turns <n>`		Set max tool calls per query
`/attach <path> [pages]`	`/at`	Attach PDF/image/text to next query
`/save [N\|N-M] ["title"]`	`/sv`	Save session exchanges as report
`/reports [ticker]`	`/rp`	List/view saved research reports
`/memory [save\|search\|delete]`	`/mem`	Episodic memory (cross-session knowledge)
`/alpha-picks [symbol\|refresh]`	`/ap`	Seeking Alpha Alpha Picks portfolio & detail
`/monitor`	`/mon`	Scan watchlist for alerts
`/analyze <TICKER> [depth]`	`/az`	Run structured analysis pipeline (quick/standard/full)
`/analyze-save <TICKER> [depth]`	`/asz`	Run analysis pipeline and save report
`/status`	`/s`	Show session config
`/help`		Show all commands

Tools (50)

Category	Tool	Description
News	`get_ticker_news`	Recent articles for a ticker
	`get_news_sentiment_summary`	Aggregated sentiment statistics
	`search_news_by_keyword`	Keyword search across news
	`get_news_brief`	Lightweight news stats per ticker (scout phase)
	`search_news_advanced`	Multi-filter full-text search with DB-level FTS
Prices	`get_ticker_prices`	OHLCV bars (15min/1h/1d)
	`get_price_change`	Price change %, high/low range
	`get_sector_performance`	Sector-level average performance
Options	`get_iv_analysis`	IV rank, percentile, VRP, signal
	`get_iv_history_data`	Raw IV/HV history points
	`scan_mispricing`	Options mispricing vs theoretical
	`calculate_greeks`	BS2002 American option Greeks
	`get_option_chain`	Live option chain (IBKR), P/C ratio, max pain, IV term structure
	`get_iv_skew_analysis`	IV skew shape, 25d skew, gradient, term structure
Signals	`detect_anomalies`	Sentiment/volume anomaly detection
	`detect_event_chains`	Event sequence patterns
	`synthesize_signal`	Multi-factor trading signal
Analysis	`get_fundamentals_analysis`	Fundamentals with 3-tier fallback (IBKR → SEC EDGAR → Financial Datasets)
	`get_detailed_financials`	EV/EBITDA, ROIC, tech metrics (SEC cached + IBKR real-time)
	`get_sec_filings`	10-K, 10-Q, 8-K metadata
	`get_insider_trades`	SEC Form 4 insider transactions
	`get_analyst_consensus`	Analyst recommendations, price targets
	`get_peer_comparison`	Cross-ticker valuation & growth ranking
	`get_earnings_impact`	Historical earnings-day moves, drift, surprise correlation
	`get_watchlist_overview`	Watchlist status overview
	`get_morning_brief`	Personalized morning briefing
	`check_data_freshness`	Health & staleness check for all data sources
Portfolio	`get_portfolio_analysis`	P&L, beta vs SPY, correlation matrix, HHI
	`get_sa_alpha_picks`	Seeking Alpha Alpha Picks portfolio (cached, auto-refresh)
	`get_sa_pick_detail`	Alpha Picks detail report for a specific pick
	`refresh_sa_alpha_picks`	Force refresh from SA website + sync tickers
	`get_sa_articles`	Search Alpha Picks articles by ticker/keyword/type
	`get_sa_article_detail`	Full article content + nested comment tree
	`get_sa_market_news`	SA market news headlines
Reports	`save_report`	Save research report (Markdown + DB)
	`list_reports`	List reports by ticker/type
	`get_report`	Retrieve report by ID
Memory	`save_memory`	Store analysis/insight for cross-session recall
	`recall_memories`	Search memories by keyword (full-text)
	`list_memories`	List recent memories by category
	`delete_memory`	Remove a memory entry
Web	`tavily_search`	AI-powered web search
	`tavily_fetch`	URL content extraction
	`web_browse`	Headless browser (Playwright)
	`codex_web_research`	Deep research via Codex CLI (OAuth, --search)
Code	`execute_python_analysis`	Python code execution with auto code gen
Monitor	`scan_alerts`	Scan watchlist for price/sentiment/signal/sector alerts
RL Models	`get_rl_model_status`	List trained PPO/CPPO models with backtest metrics
	`get_rl_prediction`	Model availability check (live inference pending Phase 2)
	`get_rl_backtest_report`	Detailed backtest report (Sharpe, Sortino, Calmar, CVaR)

Skills

Goal-oriented prompt templates that orchestrate multi-tool analysis:

Skill	Aliases	Usage	Description
`full_analysis`	fa, analyze	`/sk fa NVDA`	Comprehensive entry analysis with adversarial check
`portfolio_scan`	scan, ps	`/sk scan`	Watchlist screening with drill-down on movers
`earnings_prep`	ep, earnings	`/sk ep TSLA`	Pre-earnings risk/reward assessment
`sector_rotation`	sr, sectors	`/sk sr`	Cross-sector relative strength analysis

Custom skills can be added via YAML files in config/skills/.

Subagents

Specialized agents delegated for specific tasks:

Subagent	Default Model	Purpose
`code_analyst`	GPT-5.4	Quantitative Python analysis, calculations
`deep_researcher`	GPT-5.4	Multi-source investigation across 14 tools
`data_summarizer`	Sonnet 4.6	Fast data retrieval and summarization
`reviewer`	Opus 4.7	Critical analysis review (adversarial)

Discord Bot

Interactive Discord gateway for trading analysis and alerts.

python scripts/monitor_service.py --discord

Features

8 Slash Commands — /analyze, /watchlist, /price, /news, /options, /fundamentals, /model, /status
Interactive Buttons — Quick drill-down on analysis results
Free Chat — @mention or dedicated #agent-channel for natural language queries
Model Selection — /model, /effort, /reasoning with per-session state
Alert Routing — Severity-based color-coded embeds (critical/warning/info)
Admin Controls — Permission-gated commands via manage_guild

Monitor System

4 watchers scan your watchlist on a configurable schedule (default 5 min):

Watcher	Trigger	Description
PriceWatcher	>3% move	Intraday price alert
SentimentWatcher	Avg <2.5 or >4.0	News sentiment shift
SignalWatcher	Anomaly detected	Sentiment/volume anomaly
SectorWatcher	Sector >2% divergence	Sector rotation signal

Alerts are deduplicated (30-min cooldown + value threshold) and routed to Discord/console/log.

HTTP API

python -m src.api
# Server: http://localhost:8420
# Swagger UI: http://localhost:8420/docs

Example Requests:

# News
curl "http://localhost:8420/news/NVDA?days=7"
curl "http://localhost:8420/news/NVDA/sentiment?days=30"

# Prices
curl "http://localhost:8420/prices/AMD?interval=15min&days=30"

# Options
curl "http://localhost:8420/options/PLTR"

# Fundamentals
curl "http://localhost:8420/fundamentals/AAPL"

# AI Agent query
curl -X POST "http://localhost:8420/query" \
  -H "Content-Type: application/json" \
  -d '{"question": "Compare AMD and NVDA recent performance", "provider": "anthropic"}'

Database Setup (Self-Hosted PostgreSQL)

Docker Deployment

# Start with default port (15432)
docker compose up -d

# Custom port
POSTGRES_PORT=25432 docker compose up -d

Default connection: postgresql://postgres:mindfulrl_dev_2026@localhost:15432/mindfulrl

Configure Connection

Edit config/.env:

DATABASE_URL=postgresql://postgres:mindfulrl_dev_2026@localhost:15432/mindfulrl

Schema Migrations

Applied automatically on first Docker startup, or manually:

-- sql/001_init_schema.sql         — Core tables (news, prices, iv_history, fundamentals, signals, agent_queries)
-- sql/002_add_news_scores.sql     — Multi-model news scoring
-- sql/003_add_reports.sql         — Research reports
-- sql/004_add_memories.sql        — Episodic memory (full-text search, GIN + tsvector)
-- sql/005_add_financial_cache.sql — Financial data cache (paid API responses)
-- sql/006_add_news_search.sql     — Full-text search on news (GIN index) + pgvector embedding column
-- sql/007_add_sa_alpha_picks.sql  — Seeking Alpha Alpha Picks portfolio + refresh metadata
-- sql/008_add_sa_articles.sql     — SA articles + comments tables + canonical_article_id

Migrate Data from Parquet Files

python scripts/migrate_to_supabase.py              # Import all data
python scripts/migrate_to_supabase.py --prices      # Prices only
python scripts/migrate_to_supabase.py --news        # News only
python scripts/migrate_to_supabase.py --dry-run     # Count only

Data Collection

Daily Update

python scripts/collection/daily_update.py --status       # Check data status
python scripts/collection/daily_update.py --news          # Update all news
python scripts/collection/daily_update.py --all --sync-db # Everything + DB sync

Individual Scripts

python scripts/collection/collect_polygon_news.py --incremental    # Polygon (3+ years)
python scripts/collection/collect_finnhub_news.py --incremental    # Finnhub (7 days)
python scripts/collection/collect_ibkr_news.py --incremental       # IBKR (requires TWS)
python scripts/collection/collect_ibkr_prices.py --incremental     # IBKR prices
python scripts/collection/collect_iv_history.py                    # IV history

LLM Scoring Pipeline

Current Models

Model	Usage
gpt-5.4	Agent reasoning, sentiment/risk scoring (primary)
gpt-5.4-mini	Fast scoring, cost-efficient tasks
gpt-5.4-nano	Simple tasks, highest throughput

Strategy: Use latest reasoning models only. Upgrade path: gpt-5 → gpt-5.1 → gpt-5.2 → gpt-5.4 → ...

Sentiment Scoring

python scripts/scoring/score_sentiment_openai.py \
  --input data.csv --output sentiment_scored.csv \
  --model gpt-5.4 --reasoning-effort high \
  --chunk-size 5000 --retry 3 --verbose

Risk Scoring

python scripts/scoring/score_risk_openai.py \
  --input data.csv --output risk_scored.csv \
  --model gpt-5.4 --reasoning-effort high \
  --chunk-size 5000 --retry 3

IBKR News Scoring (Parquet → DB)

python scripts/scoring/score_ibkr_news.py --continue-from   # Score unscored articles
python scripts/scoring/score_ibkr_news.py --rescore          # Re-score all

Parameters

--reasoning-effort: minimal, low, medium, high (gpt-5.x)
--chunk-size: Rows per batch (default 5000, auto-resume on interrupt)
--allow-flex: Flex mode for 50% cost savings (longer latency)
--daily-token-limit: Auto-stop after budget (resume next day)

RL Training Pipeline

Prepare Dataset

# Basic: merge price + sentiment + risk
python training/data_prep/prepare_training_data.py \
  --price-data data/intraday.csv \
  --sentiment sentiment_scored.csv \
  --risk risk_scored.csv \
  --output-dir data/rl_ready/

# With derived features (MA, momentum, volatility)
python training/data_prep/prepare_training_data.py \
  --price-data data/intraday.csv \
  --sentiment sentiment_scored.csv \
  --risk risk_scored.csv \
  --output-dir data/rl_ready/ \
  --features                              # All defaults
  # --features sentiment_7d_ma risk_7d_ma  # Or specific features

Features are Z-score standardized; scaler saved as feature_scaler_{tag}.json alongside the CSV.

Train

# PPO with sentiment signals
python training/train_ppo_llm.py --data data/rl_ready/train.csv --epochs 100 --seed 42

# CPPO with sentiment + risk signals
python training/train_cppo_llm_risk.py --data data/rl_ready/train.csv --epochs 100 --seed 42

# On-the-fly feature engineering (skips prepare step)
python training/train_ppo_llm.py --data raw.csv --epochs 50 --features

Models are saved to trained_models/<model_id>/ with metadata and scaler automatically registered.

Backtest

# By model ID (auto-loads metadata + features)
python training/backtest.py --data data/rl_ready/trade.csv --model-id latest

# By specific model ID
python training/backtest.py --data trade.csv --model-id ppo_claude_100ep_s42_20260301T120000Z_abc123

# By model path (derives model_id from directory)
python training/backtest.py --data trade.csv --model trained_models/xxx/model.pth

Outputs: Sharpe, Sortino, Calmar, max drawdown, CVaR 95%, win rate, daily returns CSV, equity curve PNG. Results are appended to backtest_runs[] in the model registry for traceability.

Model Storage

trained_models/
├── registry.json
├── ppo_claude_100ep_s42_.../
│   ├── model.pth
│   ├── metadata.json
│   ├── feature_scaler.json    # If --features was used
│   ├── daily_returns.csv      # Backtest artifact
│   ├── actions_log.csv        # Backtest artifact
│   └── equity_curve.png       # Backtest artifact

Agent Integration

RL models are exposed to the agent via 3 tools (config-guarded, default off):

# config/user_profile.yaml
rl_pipeline:
  enabled: false          # Set true when trained models exist
  models_dir: "trained_models"

get_rl_model_status — List all models with backtest metrics
get_rl_backtest_report — Detailed backtest report (Sharpe, Sortino, Calmar, CVaR, etc.)
get_rl_prediction — Model availability check (live inference pending Phase 2)

Analysis Pipeline (Phase D)

Structured 5-strategy analysis pipeline that runs independently from the agent conversation path.

# config/user_profile.yaml
analysis_pipeline:
  enabled: true           # Enables /analyze and /analyze-save commands

CLI

/analyze NVDA quick       # Fast scan (fewer news, price only)
/analyze NVDA standard    # Balanced (fundamentals + sentiment + risk)
/analyze NVDA full        # Deep analysis (all data sources)
/analyze-save NVDA quick  # Run + save report to data/reports/

API

curl -X POST http://localhost:8420/analysis/run \
  -H "Content-Type: application/json" \
  -d '{"ticker": "NVDA", "depth": "standard", "persist": true}'

Pipeline Stages

Stage	Inputs	Output
Technical	Price, MA5/10/20, volatility	Trend signal, score 0-100
Fundamental	Revenue growth, margins, D/E, PE	Quality rating, score 0-100
Sentiment	News sentiment scores	Sentiment regime, score 0-100
Risk	Volatility, beta, max drawdown	Risk level, score 0-100
Decision	Weighted aggregate of above	Buy/Hold/Sell + confidence

Reports are saved via the existing save_report() system (Markdown files + DB metadata).

Seeking Alpha Alpha Picks (Optional)

Reads the Alpha Picks portfolio via a Chrome Extension + Native Messaging architecture. Requires SA Premium + Alpha Picks subscription ($199/yr). Disabled by default.

Setup

Install Chrome extension
- chrome://extensions → 開啟「開發者模式」
- 「載入未封裝項目」→ 選擇 extensions/sa_alpha_picks/
- 複製顯示的 Extension ID

Register Native Messaging host

bash extensions/sa_alpha_picks/install.sh
# Paste your Extension ID when prompted

Enable in config

# config/user_profile.yaml
seeking_alpha:
  enabled: true

First refresh: Click the SA Alpha Picks extension icon in Chrome → "Refresh All"

The extension runs in your real Chrome browser (zero anti-bot detection risk). Data flows: Extension DOM scrape → Native Messaging → Python DAL → DB + file cache. Session validity checked per scrape (URL redirect + table selector + paywall marker).

CLI

/ap                    # Current picks table (cached, auto-refresh if >24h stale)
/ap closed             # Closed positions
/ap all                # Both current + closed
/ap NVDA               # Detail report for a specific pick
/ap NVDA 2025-06-15    # Specific pick date (disambiguates re-picks)
/ap refresh            # Force refresh + sync tickers to collection watchlist

Project Structure

Agent Layer (`src/agents/`)

Module	Description
`cli.py`	Interactive CLI (21 slash commands, prompt caching, token tracking)
`config.py`	Model configuration, defaults, aliases
`anthropic_agent/agent.py`	Anthropic messages loop (streaming, thinking, effort)
`openai_agent/agent.py`	OpenAI Agents SDK wrapper (Responses API)
`shared/prompts.py`	System prompts with dynamic sections (freshness, RL status)
`shared/skills.py`	Skill registry + custom YAML loading
`shared/subagent.py`	Subagent registry + dispatch
`shared/token_tracker.py`	Per-turn token + cache tracking
`shared/context_manager.py`	Context compaction for long sessions (L1 client-side)
`shared/scratchpad.py`	JSONL session logging (10 event types)
`shared/attachments.py`	PDF/image/text file attachment processing
`shared/security.py`	Tool result wrapping for input safety
`shared/model_catalog.py`	Shared model catalog (CLI + Discord bot)
`shared/events.py`	Event types for async streaming

Tool Layer (`src/tools/`)

Module	Description
`registry.py`	ToolRegistry (49 tools, dual-format for Anthropic + OpenAI)
`data_access.py`	DataAccessLayer with backend abstraction
`backends/file_backend.py`	Parquet file backend
`backends/db_backend.py`	PostgreSQL backend (psycopg3) + `query_health_stats()`
`news_tools.py`, `price_tools.py`, `sa_tools.py`, etc.	Individual tool implementations
`report_tools.py`	Research report save/list/get
`memory_tools.py`	Episodic memory CRUD + full-text search
`web_tools.py`	Tavily search + Playwright browser + Codex deep research
`code_tools.py`	Python code execution + auto code gen
`freshness.py`	FreshnessRegistry singleton + data source health
`rl_tools.py`	RL model status, prediction, backtest report

Analysis Pipeline (`src/analysis/`)

Module	Description
`contracts.py`	Data contracts (AnalysisRequest, AnalysisContext, StrategyResult, RenderedReport)
`pipeline.py`	Sequential strategy execution and artifact aggregation
`context_builder.py`	DAL adapter — assembles AnalysisContext from prices, fundamentals, news
`strategies/`	5 strategies: technical, fundamental, sentiment, risk, decision
`integrity.py`	Field validation and placeholder fill for degraded inputs
`renderer.py`	Template-based Markdown/HTML report rendering
`service.py`	Public API: `run_analysis_request()`, `save_analysis_run()`
`factory.py`	Default pipeline assembly
`scheduler_hooks.py`	Batch/scheduled path adapter
`templates/`	Report templates (`report_markdown.tpl`, `report_html.tpl`)

Monitor Layer (`src/monitor/`)

Module	Description
`discord_bot.py`	Discord gateway (slash commands, buttons, free chat)
`engine.py`	MonitorEngine orchestrates watchers
`watchers.py`	4 watchers (Price, Sentiment, Signal, Sector)
`scheduler.py`	MonitorScheduler (asyncio, configurable interval)
`notifiers.py`	Console, Log, Discord notifiers
`dedup.py`	AlertDeduplicator (cooldown + value threshold)

Training Layer (`training/`)

Module	Description
`train_ppo_llm.py`	PPO training with MPI, `--features` support, auto-registry
`train_cppo_llm_risk.py`	CPPO training (sentiment + risk), `--features` support
`backtest.py`	Full backtest metrics, artifact saving, registry integration
`train_utils.py`	Shared utilities (model ID, artifact saving, feature detection)
`data_prep/prepare_training_data.py`	Merge + split + optional feature engineering
`data_prep/feature_engineering.py`	Derived features (MA, momentum, volatility) + FeatureScaler
`envs/stocktrading_llm.py`	PPO trading env with `extra_feature_cols` + state invariants
`envs/stocktrading_llm_risk.py`	CPPO trading env with risk tail invariant
`model_registry.py`	ModelMetadata + ModelRegistry (JSON file-based, backtest_runs)
`UPSTREAM.md`	Lineage documentation for FinRL_DeepSeek fork

Data Sources (`data_sources/`)

Source	Data	Tier
Finnhub	News, quotes, company profiles, analyst consensus	Free
Tiingo	Historical stock prices (30+ years)	Free
SEC EDGAR	XBRL financial data (income, balance, cashflow)	Free
Financial Datasets	Structured financials (Q4, TTM, segmented)	PAYG $0.01/req
Polygon	News (3+ years), reference data	Free/Paid
IBKR	Real-time news, intraday prices, options	Requires TWS
Seeking Alpha	Alpha Picks portfolio & analysis reports	Premium + Alpha Picks ($199/yr)

Configuration (`config/`)

File	Description
`.env`	API keys (from `.env.template`)
`user_profile.yaml`	14 sections: watchlists, strategy, models, alerts, RL pipeline, analysis pipeline, Seeking Alpha, etc.
`sectors.yaml`	Sector definitions and ticker mappings
`tickers_core.json`	Core ticker list (Tier 1/2/3)
`skills/*.yaml`	Custom skill definitions

Web Search Configuration

The system has 6 web search capabilities across 3 layers:

#	Tool	Type	Config Key	Default	Cost	Notes
1	`tavily_search`	Agent tool	`web_search.tavily`	ON	Free 1000/mo	AI-summarized results
2	`tavily_fetch`	Agent tool	`web_search.tavily`	ON	(same quota)	URL content extraction
3	`web_browse`	Agent tool	`web_search.playwright`	ON	Free (local)	Headless browser, JS pages
4	`codex_web_research`	Agent tool	`web_search.codex_research`	ON	OAuth quota	Deep research via Codex CLI
5	Claude `web_search`	Server tool	`web_search.claude_search`	OFF	$10/1K	Anthropic agent only
6	OpenAI `WebSearchTool`	SDK built-in	`web_search.openai_search`	ON	Included	OpenAI agent only

Tools 1-4 are registered in the tool registry (available to both agents). Tools 5-6 are SDK server-side tools (only active in their respective agent).

Setup

# Tavily (tools 1-2): set API key in config/.env
TAVILY_API_KEY=tvly-...

# Playwright (tool 3): install browsers
playwright install chromium

# Codex CLI (tool 4): install + OAuth login (uses subscription quota, not API billing)
npm install -g @openai/codex
codex login

# Claude web search (tool 5): no setup, just enable in config
# OpenAI web search (tool 6): no setup, uses existing OPENAI_API_KEY

Toggle in `config/user_profile.yaml`

web_search:
  tavily: true
  playwright: true
  codex_research: true
  claude_search: false          # $10/1K searches, enable when needed
  claude_search_max_uses: 5     # per-conversation limit
  openai_search: true

When to use which

Scenario	Recommended Tool
Quick fact check, latest news	`tavily_search`
Read specific URL content	`tavily_fetch` or `web_browse`
JS-heavy page, interactive site	`web_browse`
Deep investigation (earnings, events, trends)	`codex_web_research`
Agent auto-decides during analysis	Claude/OpenAI native search

Advanced Features

Score Validation

python scripts/analysis/validate_scoring_value.py --file scored_data.csv --score-col sentiment_gpt_5
python scripts/analysis/sentiment_backtest.py --file scored_data.csv --score-col sentiment_gpt_5

News Dashboard (Streamlit)

streamlit run scripts/visualization/news_dashboard.py

Fundamentals CLI

python scripts/visualization/fundamentals_query.py
> AAPL                    # Single stock
> AAPL MSFT GOOGL         # Compare multiple
> top roe                 # ROE ranking
> pe<20 roe>15            # Filter by conditions

Model Comparison

python scripts/comparison/compare_scores.py --files a.csv,b.csv --column sentiment_gpt_5
python scripts/comparison/ab_score_comparison.py --file-a a.csv --file-b b.csv

Development

Run Tests

pytest tests/                                # All tests
pytest tests/test_agents.py -v               # Agent tests
pytest tests/test_subagent.py -v             # Subagent tests
pytest tests/test_tools.py -v                # Tool tests
pytest tests/test_skills.py -v               # Skills tests
pytest tests/test_api.py -v                  # API tests
pytest tests/test_rl_tools.py -v             # RL pipeline agent tools
pytest tests/test_feature_engineering.py -v  # Feature engineering + scaler
pytest tests/test_env_extra_features.py -v   # Env state vector invariants
pytest tests/test_train_utils.py -v          # Training utilities
pytest tests/test_backtest_enhanced.py -v    # Backtest metrics + artifacts
pytest tests/test_integration_pipeline.py -v # E2E features→train→backtest
pytest tests/test_analysis_pipeline.py -v     # Analysis pipeline tests
pytest tests/test_monitor.py -v              # Monitor tests
pytest tests/ --cov=src --cov-report=html    # With coverage

Development Server

uvicorn src.api.app:create_app --factory --reload --port 8420

Documentation

Category	Files
Architecture	`docs/design/MINDFULRL_ARCHITECTURE.md`, `SERVICE_ARCHITECTURE.md`, `DATA_STORAGE_ACCESS.md`
Agent Evolution	`docs/design/AGENT_EVOLUTION_TRACKER.md` (detailed changelog, Phase 0-15 + A-F)
RL Pipeline	`docs/design/RL_PIPELINE_DESIGN.md` (end-to-end integration design)
Data	`docs/data/DATA_SUBSCRIPTION_GUIDE.md`, `OPTIONS_FLOW_GUIDE.md`, `OPTIONS_PRICING_THEORY.md`
Analysis	`docs/analysis/SCORING_VALIDATION_METHODOLOGY.md`
Scripts	`scripts/scoring/README.md`, `scripts/visualization/README.md`
Training	`training/UPSTREAM.md` (FinRL_DeepSeek lineage)

Open Data

We open-sourced our multi-LLM financial news scoring dataset on HuggingFace:

HYL/NASDAQ-News-Multi-LLM-Scores

127,176 NASDAQ news articles (from FNSPID) re-scored by 11 LLMs for sentiment and risk:

Claude Opus 4.5, Sonnet 4.5, Haiku 4.5 (Anthropic)
GPT-5, GPT-5-mini, GPT-5.4-nano, GPT-4.1, GPT-4.1-mini, GPT-4.1-nano (OpenAI)
o3, o4-mini (OpenAI reasoning models)

Includes 60 score columns, 26 summary variants (GPT-5 / GPT-5-mini at 4 reasoning × 3 verbosity levels), and cross-model analysis.

Built upon FinRL-DeepSeek (arXiv:2502.07393).

License

MIT License - see LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 670 Commits
NewsExtraction		NewsExtraction
analysis		analysis
config		config
data_sources		data_sources
docker		docker
docs		docs
extensions/sa_alpha_picks		extensions/sa_alpha_picks
resources/skills		resources/skills
scripts		scripts
sql		sql
src		src
tests		tests
trained_models		trained_models
training		training
.gitattributes		.gitattributes
.gitignore		.gitignore
ARCHITECTURE_VISION.md		ARCHITECTURE_VISION.md
DATA_PIPELINE_DOCUMENTATION.md		DATA_PIPELINE_DOCUMENTATION.md
MD_FILES_AUDIT.md		MD_FILES_AUDIT.md
NEWS_STORAGE_DESIGN.md		NEWS_STORAGE_DESIGN.md
OPENAI_SCRIPTS.md		OPENAI_SCRIPTS.md
PROJECT_STRUCTURE.md		PROJECT_STRUCTURE.md
README.md		README.md
audit_stock_news.py		audit_stock_news.py
filter_fns_data_by_date.py		filter_fns_data_by_date.py
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

ArkScope

Overview

Quick Start

AI Agent CLI

Available Models

Slash Commands (23)

Tools (50)

Skills

Subagents

Discord Bot

Features

Monitor System

HTTP API

Database Setup (Self-Hosted PostgreSQL)

Docker Deployment

Configure Connection

Schema Migrations

Migrate Data from Parquet Files

Data Collection

Daily Update

Individual Scripts

LLM Scoring Pipeline

Current Models

Sentiment Scoring

Risk Scoring

IBKR News Scoring (Parquet → DB)

Parameters

RL Training Pipeline

Prepare Dataset

Train

Backtest

Model Storage

Agent Integration

Analysis Pipeline (Phase D)

CLI

API

Pipeline Stages

Seeking Alpha Alpha Picks (Optional)

Setup

CLI

Project Structure

Agent Layer (src/agents/)

Tool Layer (src/tools/)

Analysis Pipeline (src/analysis/)

Monitor Layer (src/monitor/)

Training Layer (training/)

Data Sources (data_sources/)

Configuration (config/)

Web Search Configuration

Setup

Toggle in config/user_profile.yaml

When to use which

Advanced Features

Score Validation

News Dashboard (Streamlit)

Fundamentals CLI

Model Comparison

Development

Run Tests

Development Server

Documentation

Open Data

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors 1

Languages

Agent Layer (`src/agents/`)

Tool Layer (`src/tools/`)

Analysis Pipeline (`src/analysis/`)

Monitor Layer (`src/monitor/`)

Training Layer (`training/`)

Data Sources (`data_sources/`)

Configuration (`config/`)

Toggle in `config/user_profile.yaml`

Packages