Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 20 additions & 0 deletions .claude.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
# Claude Instructions for BetterAI Engine

## Database Migrations

**NEVER manually apply database migrations.**

- Do NOT create scripts to apply SQL migrations directly
- Do NOT use `sql.unsafe()` or similar methods to run migration files
- Always use drizzle-kit commands: `db:generate`, `db:migrate`, or `db:push`
- If drizzle-kit fails, inform the user and let them handle it manually
- The user will manage migration issues themselves

## Project Guidelines

- Follow the design document at [docs/design_betteraiengine.md](docs/design_betteraiengine.md)
- Use the existing service architecture (polymarket, ingestion, prediction)
- Maintain structured logging with pino
- Keep raw JSONB data alongside structured tables
- All environment variables are stored in `.env.local` (not `.env`)
- Use the centralized `env` config from `src/config/env.ts` instead of `process.env`
10 changes: 10 additions & 0 deletions .env.example
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
# Copy this file to .env.local and fill in your values

# Database
DATABASE_URL=

# OpenRouter API
OPENROUTER_API_KEY=your_openrouter_api_key_here

# Optional: Environment
NODE_ENV=development
31 changes: 26 additions & 5 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,17 +1,38 @@
# env files
# Dependencies
node_modules/

# Build output
dist/

# Environment variables
.env
.env.local
.env.production

# vercel

# Logs
logs/
*.log
npm-debug.log*
yarn-debug.log*
yarn-error.log*
pnpm-debug.log*

# Vercel
.vercel

# typescript
# TypeScript
*.tsbuildinfo


# ds store
# OS
.DS_Store
/public/.DS_Store
**/.DS_Store
Thumbs.db

# IDE
.vscode/
.idea/
*.swp
*.swo
*~
69 changes: 30 additions & 39 deletions docs/design_betteraiengine.md
Original file line number Diff line number Diff line change
@@ -1,87 +1,78 @@
# design-betteraiengine.md

Last updated: 10/3/25

## Overview
BetterAI-v2 is a **headless backend service** (no frontend) for generating AI-enhanced predictions on Polymarket markets. It ingests data from the Polymarket Gamma API, persists both structured entities (events, markets, predictions) and raw API payloads, then uses LangChain + OpenRouter to produce structured prediction outputs.
BetterAI-v2 is a **headless backend service** (no frontend) for generating AI-enhanced predictions on Polymarket markets. It pulls data from the Polymarket Gamma API, persists both structured entities (events, markets, predictions) and raw API payloads, then uses LangChain + OpenRouter to produce structured prediction outputs.

The system is operated via CLI commands and scheduled batch jobs (daily syncs).

---

## Problems Solved
- **Automated Market Data Ingestion**: Consistently fetches top or specified Polymarket markets and stores them in a local Postgres database.
- **Auditability**: Saves **raw JSON snapshots** from Polymarket in a `polymarket_raw` table for debugging, reproducibility, and compliance.
## Problems Addressed
- **Prediction Automation**: Allows single-market or batch prediction jobs with LLM inference pipelines (LangChain → OpenRouter).

- **Auditability**: Saves **raw JSON snapshots** from Polymarket in a `polymarket_raw` table for debugging, reproducibility, and compliance.
- **Structured Storage**: Persists both intermediate steps and final prediction outputs for later analysis, reproducibility, or re-training.
- **No UI Complexity**: Headless design (CLI only) avoids early frontend overhead.
- **Extensible**: Built to support later add-ons like tool-augmented predictions, research integrations, or trading automation.


---

## User Stories
1. **Analyst**: As a user, I can run
`yarn cli predict:market --marketId <id>`
to fetch a Polymarket market, run a prediction job, and store results.
2. **Researcher**: As a user, I can query the DB for past raw Polymarket responses to compare data revisions.
3. **Operator**: As a maintainer, I can schedule a daily batch job to pull the top N markets by volume/liquidity.
4. **Developer**: As a developer, I can inspect prediction job steps, including raw AI responses, to debug prompt performance.
5. **Auditor**: As a compliance reviewer, I can confirm exactly what the AI saw by inspecting raw API JSON + saved prompt context.
- Prediction: As a user, I can run a CLI command that will fetch a Polymarket event and/or market's latest data, run a prediction job. Store intermediate prediction jobs, steps, and results to the database.
- Inspect prediction job steps, including raw AI responses, to debug prompt performance.
5. View past predictions in the database by inspecting raw API JSON + saved prompt context.

---

## High-Level System Components
- **CLI Layer**
- `ingest:topMarkets`
- `ingest:event <id>`
- `predict:market <id>`
- `prune:raw --keep-days 60`
- **Database (Postgres + Prisma)**
- `Event`, `Market`, `Predictions`
- `PredictionJob`, `PredictionStep`, `PredictionResult`
- `PolymarketRaw` (audit log of raw payloads)
- `predict:market --url or --slug`

- **Database (Postgres + Drizzle)**

- `RawEvent`, `RawMarket` (pure JSONB with generated columns for id/slug)
- `Prediction`, `PredictionJob`, `PredictionResult`
- **Polymarket Ingestion**
- Gamma API fetchers
- Gamma API fetchers for Event and Market endpoints, eg:
https://docs.polymarket.com/api-reference/events/get-event-by-slug
https://docs.polymarket.com/api-reference/markets/get-market-by-slug
- Deduplication + upserts into DB
- Call `savePolymarketRaw` for every fetch
- Call `saveRawEvent` or `saveRawMarket` for every fetch

- **Prediction Engine**
- LangChain pipeline
- OpenRouter wrapper (system + context prompt builders)
- Writes `PredictionJob` and `PredictionStep`
- **Observability**
- Pino logging with structured context (jobId, marketId, eventId)
- Logs output to JSON (raw) and pretty (dev)
- **Retention / Pruning**
- Scripted pruning of old `polymarket_raw` entries (retain first + last daily snapshots, keep last N days)

---

## Phased Implementation Plan

### Phase 1 – Core Ingestion
- Scaffold repo (TS/Node, Prisma, CLI)
### Phase 1 – Scaffold setup
- Scaffold repo (TS/Node, Drizzle, Commander.js CLI, pnpm)
- Create DB schema for Events, Markets, Predictions
- Implement Gamma API fetcher + upsert
- Add `PolymarketRaw` table + `savePolymarketRaw` helper
- CLI: `ingest:topMarkets`
- Add `RawEvent` and `RawMarket` tables (pure JSONB with generated id/slug columns)
- CLI: `pnpm dev ingest:topMarkets`

### Phase 2 – Prediction Pipeline (MVP)
- Add `PredictionJob` + `PredictionStep` tables
- Add `PredictionJob` tables tables
- Integrate LangChain + OpenRouter
- CLI: `predict:market <id>`
- CLI: `predict:market --url or --slug`
- Persist final prediction JSON + raw model responses

### Phase 3 – Batch + Scheduling
### Future Features – Batch + Scheduling
- Daily cron job: fetch top N markets by 24h volume/liquidity
- Batch prediction: `predict:event <eventId>` → fan out to all open markets
- Structured logs with Pino (contextualized by job/market/event)

### Phase 4 – Data Retention & Ops
- Implement pruning job for `PolymarketRaw` (`prune:raw`)
- Add retention policy (keep last 60 days, plus daily first/last snapshots)
- Add indexes for query speed (GIN for JSONB, timestamps)

### Phase 5 – Extensibility / Future
- Plug in research sources (Exa, Grok, etc.)
- Advanced AI pipelines (multi-step reasoning, multiple models)
- Trading integration layer (OMS, portfolio watcher)
- Plug in research sources (Exa, Grok, etc.)
- Trading integration with BetterOMS: https://github.com/better-labs/betteroms
- Optional frontend dashboard for monitoring jobs

31 changes: 31 additions & 0 deletions drizzle.config.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
import { defineConfig } from 'drizzle-kit';
import { config } from 'dotenv';

// Load .env.local
const result = config({ path: '.env.local' });

const directUrl = process.env.DATABASE_URL;

// Debug logging
console.log('=== Drizzle Config Debug ===');
console.log('dotenv result:', result.error ? `ERROR: ${result.error}` : 'SUCCESS');
console.log('DATABASE_URL loaded:', directUrl ? 'YES' : 'NO');
if (directUrl) {
// Mask password for security
const masked = directUrl.replace(/:([^@]+)@/, ':****@');
console.log('DATABASE_URL (masked):', masked);
}
console.log('===========================\n');

if (!directUrl) {
throw new Error('DATABASE_URL is required for Drizzle migrations.');
}

export default defineConfig({
schema: './src/db/schema.ts',
out: './drizzle',
dialect: 'postgresql',
dbCredentials: {
url: directUrl,
},
});
86 changes: 86 additions & 0 deletions drizzle/0000_bumpy_rhino.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,86 @@
CREATE TABLE "events" (
"id" serial PRIMARY KEY NOT NULL,
"event_id" text NOT NULL,
"slug" text NOT NULL,
"title" text NOT NULL,
"description" text,
"start_date" timestamp,
"end_date" timestamp,
"active" boolean DEFAULT true,
"closed" boolean DEFAULT false,
"created_at" timestamp DEFAULT now() NOT NULL,
"updated_at" timestamp DEFAULT now() NOT NULL,
CONSTRAINT "events_event_id_unique" UNIQUE("event_id"),
CONSTRAINT "events_slug_unique" UNIQUE("slug")
);
--> statement-breakpoint
CREATE TABLE "markets" (
"id" serial PRIMARY KEY NOT NULL,
"market_id" text NOT NULL,
"condition_id" text NOT NULL,
"slug" text NOT NULL,
"question" text NOT NULL,
"description" text,
"event_id" text,
"active" boolean DEFAULT true,
"closed" boolean DEFAULT false,
"volume" text,
"liquidity" text,
"created_at" timestamp DEFAULT now() NOT NULL,
"updated_at" timestamp DEFAULT now() NOT NULL,
CONSTRAINT "markets_market_id_unique" UNIQUE("market_id"),
CONSTRAINT "markets_slug_unique" UNIQUE("slug")
);
--> statement-breakpoint
CREATE TABLE "prediction_jobs" (
"id" uuid PRIMARY KEY DEFAULT gen_random_uuid() NOT NULL,
"market_id" text,
"event_id" text,
"status" text DEFAULT 'pending' NOT NULL,
"started_at" timestamp,
"completed_at" timestamp,
"error" text,
"created_at" timestamp DEFAULT now() NOT NULL,
"updated_at" timestamp DEFAULT now() NOT NULL
);
--> statement-breakpoint
CREATE TABLE "predictions" (
"id" uuid PRIMARY KEY DEFAULT gen_random_uuid() NOT NULL,
"job_id" uuid NOT NULL,
"market_id" text,
"prediction" jsonb NOT NULL,
"raw_response" jsonb,
"model" text,
"prompt_tokens" integer,
"completion_tokens" integer,
"created_at" timestamp DEFAULT now() NOT NULL
);
--> statement-breakpoint
CREATE TABLE "raw_events" (
"id" serial PRIMARY KEY NOT NULL,
"event_id" text GENERATED ALWAYS AS ((data->>'id')::text) STORED NOT NULL,
"slug" text GENERATED ALWAYS AS ((data->>'slug')::text) STORED,
"data" jsonb NOT NULL,
"fetched_at" timestamp DEFAULT now() NOT NULL,
"created_at" timestamp DEFAULT now() NOT NULL,
"updated_at" timestamp DEFAULT now() NOT NULL,
CONSTRAINT "raw_events_event_id_unique" UNIQUE("event_id")
);
--> statement-breakpoint
CREATE TABLE "raw_markets" (
"id" serial PRIMARY KEY NOT NULL,
"market_id" text GENERATED ALWAYS AS ((data->>'id')::text) STORED NOT NULL,
"slug" text GENERATED ALWAYS AS ((data->>'slug')::text) STORED,
"condition_id" text GENERATED ALWAYS AS ((data->>'conditionId')::text) STORED,
"data" jsonb NOT NULL,
"fetched_at" timestamp DEFAULT now() NOT NULL,
"created_at" timestamp DEFAULT now() NOT NULL,
"updated_at" timestamp DEFAULT now() NOT NULL,
CONSTRAINT "raw_markets_market_id_unique" UNIQUE("market_id")
);
--> statement-breakpoint
ALTER TABLE "markets" ADD CONSTRAINT "markets_event_id_events_event_id_fk" FOREIGN KEY ("event_id") REFERENCES "public"."events"("event_id") ON DELETE no action ON UPDATE no action;--> statement-breakpoint
ALTER TABLE "prediction_jobs" ADD CONSTRAINT "prediction_jobs_market_id_markets_market_id_fk" FOREIGN KEY ("market_id") REFERENCES "public"."markets"("market_id") ON DELETE no action ON UPDATE no action;--> statement-breakpoint
ALTER TABLE "prediction_jobs" ADD CONSTRAINT "prediction_jobs_event_id_events_event_id_fk" FOREIGN KEY ("event_id") REFERENCES "public"."events"("event_id") ON DELETE no action ON UPDATE no action;--> statement-breakpoint
ALTER TABLE "predictions" ADD CONSTRAINT "predictions_job_id_prediction_jobs_id_fk" FOREIGN KEY ("job_id") REFERENCES "public"."prediction_jobs"("id") ON DELETE no action ON UPDATE no action;--> statement-breakpoint
ALTER TABLE "predictions" ADD CONSTRAINT "predictions_market_id_markets_market_id_fk" FOREIGN KEY ("market_id") REFERENCES "public"."markets"("market_id") ON DELETE no action ON UPDATE no action;
Loading