Quantarded

An experiment in extracting trading signals from messy, public data.

Live dashboard · Algorithm docs · Changelog

TL;DR

Quantarded is a personal research project I ran to test a simple hypothesis: public, messy, unstructured data sources leak enough signal to systematically beat a broad index — if you process them without the emotional bias humans bring to markets.

This repository is the data pipeline that powers the experiment. It ingests three signal sources in parallel — Reddit (r/wallstreetbets), US congressional trade disclosures (via Quiver Quant), and contrarian indicators — normalizes them into a unified event schema, classifies the unstructured content with an LLM, and streams everything into Tinybird for analytics.

The companion site at quantarded.com publishes a weekly trading basket derived from the data, and tracks live performance against the NASDAQ.

Important

This is an educational research project. Results below are hypothetical (paper trading on public signals), not investment advice, and past performance does not predict future returns.

Results so far

The experiment has been running continuously since week 51 of 2025 (late December 2025), publishing one signal-driven trading basket per week. As of early May 2026:

Quantarded portfolio value vs NASDAQ since inception

Metric	Value
Cumulative return	+32.84%
Edge vs NASDAQ	+23.59 pp
Max drawdown	−9.34%
Sharpe ratio (annualized)	1.77
Weeks running	20+

Live numbers, position history, and the weekly newsletter are at quantarded.com. The intent of publishing in the open was to make the experiment falsifiable: every signal, every entry, and every loss is timestamped and public.

What I learned

A few things became obvious only by running this end-to-end for several months:

Breadth beats depth. Baskets where many independent signals pointed the same direction were structurally more stable than baskets with one large conviction trade. Concentrated bets won the biggest weeks and lost the biggest weeks; broad consensus was less spectacular but compounded.
Visibility is not conviction. Tickers like TSLA and NVDA dominated raw mention counts on WSB, but sentiment was so divided that they rarely cleared the imbalance threshold. The signals worth trading were almost never the loudest.
Congressional trades are slow, not useless. Form 4 disclosures are stale by the time they're public — but clustered, repeat purchases by the same representative over multiple weeks did indicate position-building worth tracking on a longer horizon.
LLMs are cheap precision filters. A naive regex over r/wallstreetbets produces thousands of false positives (every "FOR", "ALL", "ON" gets flagged as a ticker). A constrained prompt with high-precision rules cuts that to a usable signal at fractions of a cent per request.
Ship to a real warehouse from day one. Writing every event to Tinybird from the first commit meant I could answer "what was the signal on April 3rd?" months later without re-running anything. The instinct to write to JSONL files and "figure out storage later" would have killed the project.

Architecture

                      ┌──────────────────────────────────┐
                      │           SOURCES                │
                      ├──────────────────────────────────┤
   ┌──────────────────┤  Reddit  ·  Quiver Quant API     │
   │                  └──────────────────────────────────┘
   │
   │   ┌─────────────┐    ┌──────────────┐    ┌──────────────────┐    ┌───────────┐
   └──▶│   Fetch     │───▶│  Normalize   │───▶│  LLM classify    │───▶│ Tinybird  │
       │  (paginate, │    │  (event      │    │  (Reddit only:   │    │ (events + │
       │   rate-lim, │    │   schema,    │    │   ticker +       │    │  job_runs │
       │   proxy)    │    │   dedupe)    │    │   sentiment)     │    │   tables) │
       └─────────────┘    └──────────────┘    └──────────────────┘    └───────────┘

Two scrapers, one container, shared event schema:

Scraper	Source	Schedule	Why LLM?
`reddit-scraper`	`r/wallstreetbets` submissions + comments	Every 5 min, 15-min window	Yes — content is unstructured prose
`quiver-scraper`	US House congressional trades	Cron (default 6h)	No — tickers are structured fields

Both scrapers emit NormalizedEvent records to the same Tinybird events_landing data source, so downstream analytics queries the same table regardless of source. Job execution metrics (success, duration, counts, errors) land in a separate job_runs data source for observability.

Why this shape

Single normalized event schema means new sources can be added without touching the warehouse layer. The payload field is intentionally untyped JSON so each source can preserve its native fields verbatim.
Deterministic event IDs (SHA-256 of the natural key) make ingestion idempotent — re-running the scraper on the same window produces identical event IDs, so duplicates are dropped at the warehouse.
Parallel LLM batches with bounded concurrency keep latency low without thrashing rate limits. Empirically, 3 concurrent batches of 50 items each was the sweet spot for gpt-4o-mini.
No queue, no broker, no orchestrator. It's a single Node process running in a container with a shell loop. The simplest thing that could work — and for ~$5/month on Hetzner, it does.

Algorithm docs

The scoring algorithms that turn raw events into a weekly basket live in doc/algorithm/:

WSB scoring (v0) · (v1)
House trades scoring (v0) · (v1)

Quick start

Prerequisites

Key	Where to get it
`LLM_API_KEY`	OpenAI — platform.openai.com/api-keys
`TINYBIRD_TOKEN`	Tinybird — tinybird.co
`QUIVER_API_KEY`	Quiver Quant (Hobbyist+) — api.quiverquant.com

Deploy the Tinybird datasources

tb push tinybird/datasources/events_landing__v0.datasource
tb push tinybird/datasources/job_runs__v0.datasource

Run locally

npm install
cp .env.example .env       # fill in API keys
npm run reddit:scrape      # one-shot Reddit scrape
npm run quiver:scrape      # one-shot Quiver scrape

In development (default NODE_ENV), events are also written to tmp/*.jsonl for inspection. In production, only Tinybird receives them.

Run continuously with Docker

cp docker-compose.yml.example docker-compose.yml
# edit env vars
docker compose up --build

The container runs both scrapers on their own schedules — Reddit every 5 minutes, Quiver on a cron expression — and restarts automatically on failure.

Configuration

All configuration is environment-variable driven. See .env.example for the full list with defaults; the most important ones:

Variable	Default	Notes
`REDDIT_SCRAPER_ENABLED`	`true`	Toggle the Reddit scraper
`TIME_WINDOW_MINUTES`	`15`	How far back each Reddit run looks
`SCRAPER_INTERVAL_MINUTES`	`5`	How often Reddit runs (Docker)
`CLASSIFY_BATCH_SIZE`	`50`	Items per LLM call
`CLASSIFY_CONCURRENCY`	`3`	Parallel LLM batches
`MIN_CONTENT_LENGTH`	`10`	Skip items shorter than this (saves tokens)
`MAX_CONTENT_LENGTH`	`2000`	Truncate longer items (caps tokens)
`LLM_MODEL`	`gpt-4o-mini`	Any OpenAI-compatible chat model
`QUIVER_SCRAPER_ENABLED`	`false`	Toggle the Quiver scraper
`QUIVER_SCRAPER_CRON`	`0 /6 * *`	Standard 5-field cron expression

Reddit's API caps listings at 1,000 items per endpoint. A 15-minute window with a 5-minute interval comfortably fits inside that limit during peak WSB hours.

Event schema

A single shape covers every source. payload is intentionally loose so each source preserves its native fields.

{
  "event_type": "reddit_comment",        // or reddit_submission, congressional_trade
  "event_id":   "<sha256 of natural key>",
  "source":     "wsb",                   // or quiver-daily
  "timestamp":  "2025-12-16T17:58:35Z",
  "version":    "1",
  "payload": {
    "reddit_link": "...",
    "content":     "...",
    "tickers": [
      { "ticker": "TSLA", "sentiment": "sell", "confidence": 0.85 }
    ]
  }
}

For congressional trades, payload carries the full Quiver row verbatim plus an ingested_at timestamp, so any new field Quiver adds is captured without a schema change.

Project layout

src/
├── lib/                          # Domain modules
│   ├── reddit.ts                 # Reddit API client (paginated)
│   ├── quiver.ts                 # Quiver API client (paginated, rate-limited)
│   ├── normalize.ts              # Reddit → NormalizedEvent
│   ├── normalize-quiver.ts       # Quiver trade → NormalizedEvent
│   ├── classify.ts               # LLM ticker + sentiment extraction
│   ├── tinybird.ts               # Tinybird ingestion client
│   └── job-runner.ts             # Shared job lifecycle utilities
├── scripts/                      # CLI entry points (one per scraper)
├── utils/                        # HTTP, hashing, date helpers
├── config.ts                     # Env-driven configuration
└── types.ts                      # Shared TypeScript types

tinybird/                         # Tinybird datasources & pipes
doc/algorithm/                    # Algorithm design docs (versioned)
infra/                            # Terraform + Hetzner deploy scripts
bin/docker-entrypoint.sh          # Container scheduler (both scrapers)
.github/workflows/                # CI (lint + typecheck) + CD (GHCR + deploy)

Development

npm run lint           # ESLint
npm run lint:fix       # ESLint with --fix
npm run format         # Prettier write
npm run format:check   # Prettier check (used in CI)

A Husky pre-commit hook runs Prettier, ESLint --fix, and tsc --noEmit on staged files. CI runs the same checks on every push.

Deployment

The infra/ directory contains a Terraform setup that provisions a single Hetzner Cloud VM and a GitHub Actions pipeline that builds the Docker image, pushes it to GHCR, and deploys on every tagged release. See infra/README.md for details.

The whole production setup costs ~€5/month. The point isn't that this is the right way to deploy a serious system — it's the minimum viable infrastructure that runs the experiment reliably enough to publish weekly results.

License

MIT — use it, fork it, learn from it.

Disclaimer

This is a personal research project for educational purposes. Nothing in this repository or on quantarded.com constitutes financial advice. The published returns are hypothetical and based on paper-trading public signals; they should not be interpreted as a recommendation or as evidence of future performance. Trade your own money at your own risk.

Name		Name	Last commit message	Last commit date
Latest commit History 42 Commits
.github/workflows		.github/workflows
.husky		.husky
assets		assets
bin		bin
doc/algorithm		doc/algorithm
infra		infra
src		src
tinybird		tinybird
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
.prettierignore		.prettierignore
.prettierrc.json		.prettierrc.json
CHANGELOG.md		CHANGELOG.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
docker-compose.yml.example		docker-compose.yml.example
eslint.config.js		eslint.config.js
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Quantarded

TL;DR

Results so far

What I learned

Architecture

Why this shape

Algorithm docs

Quick start

Prerequisites

Deploy the Tinybird datasources

Run locally

Run continuously with Docker

Configuration

Event schema

Project layout

Development

Deployment

License

Disclaimer

About

Uh oh!

Releases 4

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Quantarded

TL;DR

Results so far

What I learned

Architecture

Why this shape

Algorithm docs

Quick start

Prerequisites

Deploy the Tinybird datasources

Run locally

Run continuously with Docker

Configuration

Event schema

Project layout

Development

Deployment

License

Disclaimer

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 4

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages