Skip to content

dackclup/quantrank

Repository files navigation

QuantRank

Open-source US equity stock ranking — fundamental, technical, factor, sentiment, and ML signals combined into a single 0–100 composite StockRank, refreshed every US trading day.

QuantRank is a static web app. A Python pipeline runs in GitHub Actions on a Mon-Fri cron (after US market close), computes scores for the S&P 500, and writes JSON files into the repo. A Next.js static site reads those JSON files at build time and is served from Vercel's free tier. No backend. No database. No live API calls from the browser.


⚠️ Disclaimer — please read

QuantRank is for educational and research purposes only.

  • Nothing here is investment advice, a recommendation, or an offer to buy or sell securities.
  • Scores and "fair prices" are model outputs derived from public data. They can be wrong, stale, or misleading.
  • Do not use these scores for real-money trading decisions.
  • Past performance does not predict future results.
  • The author is not a registered investment adviser.
  • This project does not connect to a brokerage and never will.

Honest limits of quantitative analysis — full breakdown in the Honest Limitations section below. The short version:

  • Quantitative fraud detection has irreducible false-positive (~30% in broad market) and false-negative (~25-40%) rates. Defense flags indicate elevated risk, not confirmed fraud.
  • Madoff-style fabrication (where revenue, cash, and counter-parties are all fictitious) is undetectable by any system based on filed financials.
  • Published anomalies typically decay 30-60% post-publication (McLean-Pontiff 2016).

If you're not comfortable losing 100% of any capital you might allocate based on quantitative models, do not use this app for investing.


Honest Limitations

QuantRank ships academic-quality defenses (Altman Z″, Sloan accruals, net-stock-issuance, going-concern phrase scan, Beneish M-score, Dechow F-score, tangible-book sanity guard). Despite this, several classes of manipulation and several structural realities remain outside what any filed-financials-based screener can address. v1.0 ships with the honest accounting below — readers should weight QuantRank's outputs accordingly.

Frauds we cannot catch

Pure financial-statement screeners can only detect manipulation that leaves a footprint in the filed numbers. Four manipulation classes leave no detectable footprint:

  1. Madoff-style total fabrication. When revenue, cash, customers, and bank confirmations are all fictitious, the screener has no anchor to a real economy to compare against. Detection requires forensic accounting + bank cross-confirmation outside SEC EDGAR's reach.
  2. Off-shore related-party round-trips. Wirecard's Asian "third- party acquirers" recorded as customers + suppliers cancel out on the consolidated balance sheet. The ratios all behave normally because the cash never existed but the offsetting fiction is symmetric.
  3. Audit-firm complicity. When the audit itself is fraudulent (Arthur Andersen / Enron pattern), the 10-K is a primary source for manipulated numbers — no quantitative cross-check inside the same document can break the loop.
  4. Post-acquisition baseline reset. Fraud disguised by an acquisition that resets the accounting baseline (Tyco-pattern roll- ups) — the year-over-year ratios all reset with the acquisition, so prior-period manipulation gets washed out of the comparison window.

Realistic error rates

Every defense layer ships with documented false-positive and false- negative rates from the academic literature. v1.0 uses these unmodified — no proprietary "tuning":

Defense False positive rate False negative rate Source
Beneish M-score (M > −2.22) ~30% broad market, ~15-20% S&P 500 ~25-40% Beneish 1999, Beneish 2022
Dechow F-score (F > 2.45) ~27% broad market ~27% (sensitivity 73%) Dechow et al. 2011
Going-concern phrase scan ~1-3% (post-MD&A restriction) ~10-15% Mayew-Sethuraman-Venkatachalam 2015
Altman Z″ < 1.10 (distress) ~5-10% manufacturing ~20% (FE-heavy sectors) Altman 1968, Altman 2017
Sloan accruals top-decile ~10% (sector-confounded) n/a (definitional) Sloan 1996
Net stock issuance top-decile ~10% (definitional) n/a (definitional) Pontiff-Woodgate 2008

All defense flags are risk stratifiers, not fraud verdicts. A flagged stock is a "look harder" signal, not a "this is fraud" signal. Multiple flags compounding (e.g., Beneish + Dechow + going-concern simultaneously) is the actionable pattern.

Anomaly decay reality

Academic factor research has a well-documented decay curve:

  • Out-of-sample drop: ~26% lower returns vs the original in-sample period (McLean-Pontiff 2016).
  • Post-publication drop: an additional ~32% lower (publication- informed trading erodes the edge).
  • Cumulative: ~58% of the original effect, on average, by year 5 post-publication.

This applies to most of QuantRank's classical signals (value, momentum, quality, low-vol). The methodology page tracks expected vs realized IC where comparable; the rolling-IC decay monitor will land in Phase 4+.

Free-data fragility

QuantRank uses only free data sources, which trade cost for fragility:

  • yfinance is an unofficial scraper — multiple 2023-2024 incidents broke fundamental endpoints without warning. Daily-close prices have been the most reliable surface; pre-market and after-hours data are not.
  • SEC EDGAR XBRL has documented 2025 taxonomy drift — the CostOfGoodsAndServicesSold vs CostOfRevenue split, the IntangibleAssetsNetExcludingGoodwill vs OtherIntangibleAssetsNet variant — every Phase 3 PR added a fallback chain for one of these. Some filers report values only under tags the parser doesn't know about yet, and the data goes missing silently.
  • A cross-source validator (Phase 4) catches large discrepancies but not small systematic biases.

Diminishing returns on stacking defenses

Beneish-Vorst 2021 measured marginal AAER capture across an ensemble of fraud signals:

  • Beneish + Dechow + Sloan + going-concern = ~4 signals
  • Adding a 5th signal (e.g., Bao-Ke 2020 ML) captures less than 5% additional AAERs beyond the first 4
  • Adding more produces proportionally more false positives without proportional true positives

QuantRank's v1.0 defense set is intentionally fixed at this "diminishing-returns inflection point." Future versions will rotate signals based on IC decay rather than stacking. Treat the v1.0 defense list as ceiling, not floor.

What QuantRank is — and is not

QuantRank is:

  • A risk-stratifier and screener built from public filings + free data
  • An educational research tool with transparent methodology
  • A pre-computed JSON pipeline tied to a git commit (every score is reproducible)

QuantRank is not:

  • A fraud guarantor — flags indicate elevated risk, not confirmed fraud
  • A backtested live-trading strategy — anomaly decay is real and unpredictable in direction
  • A registered investment adviser — the author is not, and this is not investment advice
  • A connection to any brokerage — and will not become one

See docs/RESEARCH_FINDINGS.md for the academic bibliography backing each defense layer.


Architecture

flowchart LR
    A[GitHub Actions cron<br/>Mon-Fri 22:00 UTC] -->|run daily| B[Python compute pipeline]
    B -->|fetch| C[(yfinance / SEC EDGAR<br/>FRED / Finnhub / Reddit)]
    B -->|write| D[JSON files in<br/>frontend/public/data/]
    D -->|git push| E[GitHub repo]
    E -->|webhook| F[Vercel build]
    F -->|next build --output export| G[Static HTML/CSS/JS on CDN]
    H[User browser] -->|fetch| G
Loading

Why this design (Option D — static site):

  • Free forever: public GitHub repo = unlimited Actions minutes; Vercel hobby tier = unlimited static hosting.
  • One system to debug: only the Python script + the GitHub Actions logs.
  • Fast for users: pre-computed JSON served via CDN — no DB queries, no rate limits.
  • Reproducible: every score is tied to a git commit.

This is not a FastAPI/Flask backend, not a database, and not a live-data system. See SKILL.md for the full architecture rules.


Tech stack

Layer Tool
Compute language Python 3.11+
Compute runtime GitHub Actions (ubuntu-latest)
Frontend framework Next.js 14 (App Router, static export)
Styling Tailwind CSS
Charts Recharts
Data storage JSON files in frontend/public/data/
Hosting Vercel (frontend) + GitHub (data)
Free data sources yfinance, edgartools, fredapi, finnhub-python, PRAW
ML LightGBM + SHAP (Phase 5+)

Setup

You don't need to run anything locally. The whole app builds in CI.

  1. Push this repo to GitHub as a public repository.
  2. Connect Vercel:
    • vercel.com → "Add New Project" → import the repo.
    • Framework preset: Next.js.
    • Root directory: frontend.
    • Build command: npm run build.
    • Output directory: out.
    • Production branch: main.
    • Click Deploy.
  3. Trigger first compute (after Phase 1 lands): GitHub → Actions → "Compute Rankings" → "Run workflow".
  4. Done. From now on, every Sunday at 22:00 UTC the pipeline refreshes the JSON, commits it, and Vercel auto-deploys.

Required GitHub secrets — by phase

Phase Secret Why
0 none Stub workflow only
1 none yfinance + Wikipedia are unauthenticated
2 EDGAR_USER_AGENT SEC requires "<Your Name> <email>" for EDGAR access
4 FINNHUB_API_KEY, REDDIT_CLIENT_ID, REDDIT_CLIENT_SECRET, REDDIT_USER_AGENT News + Reddit sentiment
6 FRED_API_KEY Macro / regime detection

Add secrets at: Repo → Settings → Secrets and variables → Actions → New repository secret.


Project status

See PHASE_STATUS.md for the current build phase and acceptance checklist.

Methodology

See docs/METHODOLOGY.md for the user-facing methodology summary, and stock_ranking_knowledge.md for the full formula reference (~1600 lines covering fundamental, technical, factor, sentiment, ML, regime, and validation techniques).

Architecture rules: SKILL.md and docs/ARCHITECTURE.md. Phase-by-phase build plan: WORKFLOW.md.


License

MIT — see LICENSE.