Production-oriented, modular stock screener engine for Indian equity markets. Generates two signal families from public market and fundamental data:
- Long-term investment candidates — quality + value + governance composite
- Short-term swing trade candidates — trend + momentum + event catalyst composite
The core research and scoring engine is broker-agnostic by design. Zerodha and ICICI Breeze adapters exist as optional modules and are disabled by default.
| Entity | Update frequency | Fields |
|---|---|---|
MarketSnapshot |
Daily / intraday | OHLCV, delivery ratio, market cap |
FundamentalsSnapshot |
Quarterly | PE, ROE, D/E, FCF margin, growth rates |
GovernanceSnapshot |
Quarterly | Promoter holding, insider scores, audit opinion |
StockSnapshot |
Convenience | Unified flattened type for demos / legacy code |
- Sector-relative + rolling valuation normalization (PE/PB z-score context)
- Explicit
earnings_stabilityandleverage_trendrisk features - 26 named features organised in independent category methods
- Each category is pure-function: no IO, independently testable
- Feature constants live in
feature_specs.py; no magic strings - Backward-compatible
compute_from_snapshot()wrapsStockSnapshotfor demos
- Modular layer (new):
core/scoring_base.pycore/scoring_long_term.pycore/scoring_swing.pycore/scoring_risk.pycore/scoring_ranking.pycore/signal_generator.pycore/signal_schemas.pycore/explainability_engine.pycore/feature_access.py,core/normalizers.py,core/validators.py
- Compatibility facade (existing imports still work):
core/scoring.pycore/signals.pycore/explainability.py
- All weights and thresholds are configurable from YAML.
- Regime-aware configurable profile switching (
bull/bear/sideways). - Optional calibration-driven prior auto-tuning from IC/decay/turnover diagnostics.
# defaults.yaml — scoring section
scoring:
long_term_min_score: 24.0
swing_min_score: 28.0
long_term_weights:
growth_quality: 18.0
profitability_quality: 17.0
...
swing_weights:
trend_strength: 20.0
momentum_strength: 18.0
...
risk_weights:
liquidity_risk: 0.20
volatility_risk: 0.20
leverage_risk: 0.20
earnings_instability_risk: 0.15
event_uncertainty_risk: 0.15
governance_risk: 0.10
ranking:
top_k_long_term: 25
top_k_swing: 25- Every
SignalResultcarries aSignalExplanationwith:top_positive_drivers(human-readable labels, e.g."growth quality: 14.40")top_negative_drivers(risk-prefix components)holding_horizon,entry_logic,invalidation_logicrisk_flagslist
| Interface | Purpose |
|---|---|
MarketDataProvider |
Daily OHLCV + unified snapshot |
FinancialsProvider |
Fundamentals + governance snapshots |
FilingsProvider |
BSE/NSE regulatory filings |
NewsProvider |
Financial news + sentiment |
ExchangeAdapter |
Corporate actions + announcements |
TextEventProvider |
Generic text events + sentiment |
BrokerAdapter |
Optional execution layer |
data_sources/
market/
mock_market_data.py # MockIndianMarketDataProvider
mock_fundamentals.py # MockFinancialsProvider
filings/
mock_filings.py # MockFilingsProvider
news/
mock_news.py # MockNewsProvider
exchange/
nse_adapter.py # NSEExchangeAdapter (implements ExchangeAdapter)
broker/
zerodha_adapter.py # disabled by default
breeze_adapter.py # disabled by default
ScorerProtocol— structuraltyping.Protocol; any callable matching.score(fv) → (float, dict)satisfies it without explicit inheritance (ML models, ensembles, rule-based scorers all interchangeable)LongTermModel.with_weights(LongTermWeights(...))— convenience factorySwingModel.with_weights(SwingWeights(...))— convenience factory
LocalFileStorage— raw/clean/features/signals CSV+JSON under configurableroot_dirSQLiteStore— features, scores, signals tables with proper primary keys:features (symbol, as_of)— composite PK, upsert safescores (symbol, as_of)— composite PK, upsert safesignals (symbol, category, run_date)— composite PK, no duplicate rows on re-run
- Deterministic post-ranking adapter (
execution/portfolio_adapter.py) - Constraint layer: max positions, sector caps, liquidity floor, single-name cap
- Outputs target shares/weights and rejection reasons for dropped candidates
Example portfolio config with separate long/swing overrides:
scoring:
ranking:
portfolio:
enabled: true
max_positions_long: 12
max_positions_swing: 10
max_sector_positions: 3
min_avg_daily_volume: 1000000
max_single_position_weight: 0.12
capital_base: 1000000
# Shared defaults used if strategy-specific overrides are not set
min_position_notional: 25000
sector_target_weights:
IT: 0.35
Banking: 0.25
Pharma: 0.20
Energy: 0.20
sector_target_tolerance: 0.05
# Long-term specific overrides
long_min_position_notional: 40000
long_sector_target_weights:
IT: 0.40
Banking: 0.20
Pharma: 0.20
Energy: 0.20
# Swing specific overrides
swing_min_position_notional: 15000
swing_sector_target_weights:
IT: 0.30
Banking: 0.35
Pharma: 0.20
Energy: 0.15stock_screener_engine/
config/ YAML + env settings, ScoringWeightsSettings
core/
entities.py MarketSnapshot, FundamentalsSnapshot, GovernanceSnapshot, ...
feature_specs.py Named constants for all 20 feature keys
features.py FeatureEngine with independent category methods
scoring.py LongTermScorer, SwingScorer, configurable weights dataclasses
explainability.py ExplanationEngine (single _top_components helper, no duplicates)
signals.py SignalGenerator — sector threaded through to SignalResult
engine.py ResearchEngine — accepts optional FinancialsProvider
universe.py UniverseSelector
ranking.py rank_by_long_term, rank_by_swing
data_sources/
base/interfaces.py All provider + adapter ABCs
market/ mock_market_data, mock_fundamentals
filings/ mock_filings
news/ mock_news
exchange/ NSEExchangeAdapter (implements ExchangeAdapter)
text/ MockTextEventProvider
broker/ Zerodha + Breeze (disabled by default)
models/
protocols.py ScorerProtocol
long_term_model.py LongTermModel.with_weights(...)
swing_model.py SwingModel.with_weights(...)
pipelines/ daily_batch, feature_refresh, intraday_update, signal_generation
storage/ local_files, sqlite_store (dedup-safe)
execution/ order abstraction, execution router
backtest/ cross_sectional, walk_forward, event_study scaffolds
monitoring/ data_quality, health, signal_drift
docs/ architecture, setup, extension guide
examples/ run_demo.py
tests/
conftest.py shared fixtures (AppSettings, FeatureVector, snapshots, ScoreCard)
test_features.py FeatureEngine — granular path + StockSnapshot compat
test_scoring.py configurable weights, risk flags, edge cases
test_explainability.py ExplanationEngine + _pretty_name regression
test_universe.py UniverseSelector liquidity filter
test_engine_pipeline.py end-to-end ResearchEngine
test_config.py settings loading and env overlay
test_broker_optional.py broker graceful-failure
Requirements: Python 3.9+
# 1. Create and activate a virtual environment
python -m venv .venv && source .venv/bin/activate
# 2. Install the package in editable mode
pip install -e .
# 3. Optionally copy the env template
cp .env.example .env
# 4. Run demo pipeline
python examples/run_demo.pyDemo writes outputs under data/:
data/features/ feature vectors (CSV/JSON)
data/signals/ signal results
data/metadata.db SQLite — features, scores, signals tables
Default config: stock_screener_engine/config/defaults.yaml
All settings are overridable with environment variables:
| Variable | Default | Purpose |
|---|---|---|
SSE_ENV |
dev |
Environment tag |
SSE_LOG_LEVEL |
INFO |
Python log level |
SSE_STORAGE_ROOT |
./data |
Output directory |
SSE_SQLITE_PATH |
./data/metadata.db |
SQLite file |
SSE_ENABLE_ZERODHA |
false |
Enable Zerodha broker |
SSE_ENABLE_BREEZE |
false |
Enable Breeze broker |
SSE_MIN_LIQUIDITY |
1000000 |
Volume filter threshold |
SSE_MARKET_PROVIDER |
nse_http |
Market data provider |
SSE_NEWS_PROVIDER |
free_rss |
News source provider |
SSE_LLM_PROVIDER |
heuristic |
LLM backend (heuristic, openai, anthropic) |
SSE_LLM_API_KEY_ENV |
OPENAI_API_KEY |
Env var name that stores LLM API key |
SSE_LLM_AUDIT_PATH |
./data |
Root path for low-confidence LLM audit logs |
Override any individual weight in defaults.yaml under scoring.long_term_weights
or scoring.swing_weights or scoring.risk_weights. Ranking cutoffs are under
scoring.ranking.
The ResearchEngine picks all of these up automatically via AppSettings.
Each signal is built from:
- Long-term category score (0-100)
- Swing category score (0-100)
- Risk penalty (0-max_risk_penalty)
- Final score = category score - risk penalty
Outputs expose:
- Positive/negative driver contributions
- Missing feature hints
- Deterministic rejection reasons
- Horizon tag (
6-24 monthsor3-15 trading days)
The text pipeline now supports a hybrid event-intelligence path:
- Rule-based classification, event extraction, and sentiment remain available as the deterministic baseline.
- Optional LLM-assisted extractors can enrich document classification, event normalization, sentiment, and management-tone signals.
- All LLM outputs are normalized into typed schemas before they affect features, scoring, or explainability.
- Low-confidence LLM outputs can fall back to the rule pipeline.
Default config lives in stock_screener_engine/config/defaults.yaml under llm::
llm:
enabled: false
provider: heuristic
model: heuristic-finance-v1
base_url: https://api.openai.com
api_key_env: OPENAI_API_KEY
timeout_seconds: 30
min_confidence: 0.55
fallback_to_rules: true
enable_management_tone: true
audit_low_confidence: true
audit_path: ./dataThe shipped heuristic provider is deterministic and offline. It is intended as a provider-agnostic stub for testing and local development.
Supported real-provider wiring:
- OpenAI-style endpoints via
provider: openai(or OpenAI-compatible gateway URL viabase_url) - Anthropic messages API via
provider: anthropic
Startup now validates LLM provider credentials strictly:
- If
SSE_ENABLE_LLM_EXTRACTION=trueand provider isopenaioranthropic, startup fails fast unless the env var named bySSE_LLM_API_KEY_ENVis present and non-empty. - This prevents silently running with missing provider keys.
When audit_low_confidence: true, low-confidence LLM decisions are appended as JSONL artifacts under:
data/llm_audit/YYYY-MM-DD/low_confidence.jsonl
Ingestion health reports are also written for operational monitoring under:
data/ingestion_health/YYYY-MM-DD/ingestion_health.jsonl
Each report includes per-adapter and source-level (news, filings) fetch counts, failure counts, document counts, and latency (ms).
Deployment defaults use free, public RSS ingestion (no paid key required):
- Google News RSS search feeds per symbol (
free_rssprovider) - Exchange announcements for filing-like event ingestion
This keeps the runtime disconnected from mock sources while preserving deterministic fallback behavior for LLM extraction.
| Pipeline | Trigger | Purpose |
|---|---|---|
DailyBatchPipeline |
EOD | Full feature → score → signal cycle |
IntradayUpdatePipeline |
During market hours | Refresh swing-sensitive stack |
FeatureRefreshPipeline |
On-demand | Recompute features only |
SignalGenerationPipeline |
On-demand | Regenerate signals from cached features |
Both broker adapters are disabled by default and fail gracefully when credentials are absent.
Zerodha (Kite):
SSE_ENABLE_ZERODHA=true
SSE_ZERODHA_API_KEY=...
SSE_ZERODHA_API_SECRET=...
SSE_ZERODHA_ACCESS_TOKEN=...
ICICI Breeze:
SSE_ENABLE_BREEZE=true
SSE_BREEZE_API_KEY=...
SSE_BREEZE_API_SECRET=...
SSE_BREEZE_SESSION_TOKEN=...
pytest -qAll tests are offline — no network calls, no broker credentials required.
# create env + install
python -m venv .venv && source .venv/bin/activate
pip install -e .
# run main demo
python examples/run_demo.py
# run modular scoring demo
python examples/scoring_framework_demo.py
# run LLM-assisted event intelligence demo
python examples/llm_event_intelligence_demo.py
# run tests
pytest -qRun a standalone demo that scores a mini universe with missing-data handling, risk penalties, explainability, and ranking:
python examples/scoring_framework_demo.pyRun a side-by-side comparison of the research engine with and without the LLM-assisted text pipeline:
python examples/llm_event_intelligence_demo.pyThe demo prints:
- Aggregated structured text features per symbol
- Long-term and swing score deltas with LLM assistance enabled
Implement MarketDataProvider (and optionally FinancialsProvider) from
data_sources/base/interfaces.py, then pass it to ResearchEngine.
Any object with .score(fv: FeatureVector) -> tuple[float, dict[str, float]]
satisfies ScorerProtocol. Pass it as scorer=... to LongTermModel or
SwingModel, or directly to LongTermScorer/SwingScorer.
- Add a constant to
core/feature_specs.py - Add it to the relevant
frozensetgroup - Implement it in the matching
_xxx_features()method incore/features.py - Add a weight entry in
LongTermWeightsorSwingWeightsincore/scoring.py
- Real NSE/BSE ingest adapters (bhavcopy, SEBI filings API)
- Financial statement parser with quality checks (Ind AS awareness)
- Transcript / news ingestion with transformer-based event extraction
- Portfolio and risk overlays + execution simulation
- ML ranker training pipeline and model registry
- Signal drift monitoring dashboards + alerting
- Added 8 new raw-ratio and technical features:
pe_ratio,pb_ratio,debt_to_equity,cfo_pat_ratio,price_acceleration,breakout_score,compression_score,activity_vs_avg - Scorers now receive both normalised features (for consistency) and raw values (for flexible rescoring)
- All 35+ features organised into typed frozensets (
FUNDAMENTAL_FEATURES,TECHNICAL_FEATURES,TEXT_FEATURES, etc.) for validation and subsetting - Feature specs fully typed with constants (no magic strings anywhere)
RankedSignalschema expanded with:risk_flags(list),sector(str),invalidation_notes(list),regime(str)- Added
to_dict()method for JSON serialisation in reports and APIs - Driver explanations now show category-aware human-readable labels (e.g., "Earnings & Revenue Growth Trajectory") instead of just metric names
- Positive/negative driver format standardised: "Positive driver: ..." and "Risk flag: ..."
- CrossSectionalBacktester: Added Spearman information coefficient (IC) computation + IC t-statistic testing
- CrossSectionalBacktester: Annualised information ratio (IR) for top-quintile returns
- EventStudy: Cumulative abnormal returns (CAR) + paired t-statistic for win-rate statistical testing
- WalkForwardPlanner: Typed
WalkForwardResultdataclass with per-window IC tracking,mean_ic(),ic_stability()aggregators - All evaluation metrics designed for continuous monitoring and adaptive rebalancing
- SignalDriftMonitor: Now tracks score distribution snapshots (mean, std, p25, median, p75)
- Detects mean/std/median shifts across rolling lookback windows
- Configurable thresholds for early alert on distribution anomalies
- Foundation ready for KL-divergence and quintile-shift detection (future)
- UniverseSelector: Accepts both
MarketSnapshot(typed, granular — preferred) andStockSnapshot(legacy compat) - Union type signature
AnySnapshot = Union[MarketSnapshot, StockSnapshot]enables incremental migration - Added
select_symbols()convenience method for filtering by symbol only
percentile_rank(value, population) -> float— cross-sectional relative ranking [0,1]log_score(value, low, high) -> float— log-scale normalisation for right-skewed distributions (volume, market-cap)- Both maintain [0,1] contract for seamless feature-engine integration
- 85 core tests passing (all new features validated)
- Feature range expectations relaxed to allow raw-ratio features (outside [0,1] is intentional and well-documented)
- Verification suite (
tools/_verify_patches.py) validates end-to-end: feature emission, signal schemas, evaluation metrics - New drivers show contextual category labels and human-readable explanations
test_single_stock_pipeline.pyandTestDirectionalVerdictsfailures due to optionalyfinancedependency not installed- These failures existed before the improvements and are independent of the refactoring work
- Easy fix:
pip install yfinancerestores these tests to passing state
- ML Feature Importance — train LightGBM ranker on historical IC, feature importance analysis
- Regime Classification — market state detector (bull/bear/transition) from macro volatility + cross-sectional spread
- Real-Time Invalidation — monitor live events and revert flagged signals when catalyst assumptions break
- Full Broker Integration — execution layer with commission costing, slippage simulation
- Distributed IC Analysis — compute walk-forward IC by sector/factor for adaptive weight tuning
- Performance Attribution — decompose returns into alpha (signal skill), beta (market beta), residual factor tilts