Codex-native forecasting harness for Prophet Hacks.
Public repository:
https://github.com/Hilo-Hilo/CodexProphet
License: MIT. See LICENSE.
Forecasting Track endpoint:
https://predict.hansonwen.dev/predict
This is the standalone Codex duplicate of the Pi harness. It does not use Pi.
The Mac mini deployment launches Codex directly with the local ChatGPT/Codex auth
session. The Mac mini does not auto-update from GitHub; update it only when an
operator explicitly runs the manual deployment script.
The agent receives forecasting instructions through AGENTS.md and uses local
domain skill files plus copied forecasting CLI tools.
Set up the standalone repo first:
cd /path/to/codex-prophet
npm run setup
npm run checkInteractive Codex goal session:
npm run codex:prophet -- "Will Bitcoin trade above 100000 by Sunday?"or with an event JSON file:
npm run codex:prophet -- sample_events/sample-economics.jsonNon-interactive one-shot run:
npm run codex:prophet:exec -- sample_events/sample-economics.jsonOrganizer standardized API run:
export OPENAI_API_KEY=...
PORT=8080 scripts/run_standardized_agent.shThis starts the Forecasting Track API on 0.0.0.0:8080 by default and exposes:
GET /health
POST /predict
Defaults:
- Model:
gpt-5.5 - Active variant:
config/variants.json->v1_market_prior_codexby default. Override withCODEX_FORECAST_VARIANT. - Sandbox:
danger-full-accessfor non-interactive/predictforecasts so local evidence tools can reach Kalshi, finance, sports, and other network providers. - Approval policy:
never - Request budget: actual evaluation is expected to allow up to 10 minutes per event.
- Internal Codex timeout:
CODEX_API_TIMEOUT=540seconds by default for direct/local serving. Public Cloudflare-backed Mac Mini deployments use60seconds by default so/predictreturns before Cloudflare's request timeout; the latest initial checkpoint is used as fallback if Codex runs long. - Evaluation scale: organizers indicated at most about 200 total forecast requests over the 2-week evaluation window, with daily requests.
- Web search: enabled for interactive
npm run codex:prophet; non-interactive/predictuses local tools and API-backed retrieval. - Working root: repo root
Run the internal API on this machine:
npm run serveThis binds to 127.0.0.1:8080 by default. It is not public.
Endpoints:
GET /health
POST /predict
GET /prophet/events?status=open
POST /prophet/register-team
POST /prophet/register-endpoint
GET /prophet/endpoint/{team_name}
GET /prophet/leaderboard
Prediction smoke test:
curl -s http://127.0.0.1:8080/health
python - <<'PY'
import json
from pathlib import Path
events = json.loads(Path("sample_events/sample-economics.json").read_text())
Path("tmp/sample-event.json").write_text(json.dumps(events[0], indent=2) + "\n")
PY
curl -s -X POST http://127.0.0.1:8080/predict \
-H 'content-type: application/json' \
--data-binary @tmp/sample-event.jsonFor Prophet server registration, create .env with PA_SERVER_API_KEY or PROPHETHACKS_SERVER_API_KEY. A local .env is ignored by Git.
The official public CLI does not expose direct forecast submission. Server-side forecasting is done by registering a reachable endpoint for the Prophet server to call.
Prophet Hacks' optional website self-check can call /health and can POST a
sample event to /predict for format validation. The format check may time out
faster than actual evaluation, so use it to confirm response shape, not final
agent runtime.
Public live soak test:
python scripts/live_soak_test.py --iterations 1 --event-index 0 --timeout-seconds 160For a low-frequency overnight check, run the same script on the Mac Mini with a
larger iteration count and interval. It writes JSONL records to
logs/live_soak_predictions.jsonl and validates health, response shape,
outcome coverage, probabilities, rationale, and total latency.
There is no automatic CI/CD deployment for the Mac mini. A GitHub push must not restart or update the live endpoint.
When an operator explicitly wants to update the Mac mini, SSH into it and run:
cd ~/CodexProphet
git fetch origin --prune
git reset --hard origin/main
bash scripts/deploy_mac_mini.shThe manual script refreshes dependencies, runs checks, and restarts the local LaunchAgent-backed API service. It should only be run on instruction.
See SUBMISSION.md for the organizer-facing run instructions, endpoint, Docker
command, license, secrets policy, and track scope. This repo is packaged for the
Forecasting Track. It does not claim Trading Track compatibility unless the
official trading harness is added later.
This repo is Render-ready through render.yaml and Dockerfile. The Docker image installs Python, the Python requirements, Node, and the Codex CLI.
Do not put real credentials in Git. Configure secrets in Render's Environment panel:
PA_SERVER_API_KEY
OPENAI_API_KEY
Optional but useful:
PROPHETHACKS_SERVER_API_KEY
THE_ODDS_API_KEY
FRED_API_KEY
FINNHUB_API_KEY
ALPHA_VANTAGE_API_KEY
CODEX_API_TIMEOUT=600
CODEX_FORECAST_MODEL=gpt-5.5
CODEX_PUBLIC_API_MODE=true
For direct live evaluation without a proxy timeout, prefer CODEX_API_TIMEOUT=540
or lower. For predict.hansonwen.dev behind Cloudflare, keep
CODEX_API_TIMEOUT around 60 unless the endpoint is moved to a route that
supports longer HTTP requests. Setting it to 600 gives Codex the entire
organizer timeout and leaves no room for the API to validate or return the
fallback prediction.
CODEX_ACCESS_TOKEN is supported as an alternative to OPENAI_API_KEY, but it is a short-lived account token from local Codex auth and should only be set as a Render secret. Do not commit ~/.codex/auth.json.
Render start command is handled by:
scripts/render_start.shThe public endpoint to give Prophet Hacks after deployment is:
https://<render-service>.onrender.com/predict
Override the model:
CODEX_FORECAST_MODEL=gpt-5.5 npm run codex:prophet -- event.jsonThe launcher starts Codex with an initial /goal-style prompt. The prompt tells Codex to:
- Read
AGENTS.md. - Read
docs/context_memory.md,docs/market_signal_learnings.md, andskills/calibration-validation/SKILL.md. - Select other relevant skill files under
skills/. - Use local CLI tools in this repo for market, sports, finance, Kalshi, and validation.
- Distinguish mutually exclusive outcome rows from Top-K / multi-correct rows.
- Save an early
--kind initialforecast checkpoint within the first 1-2 minutes. - Keep updating the initial checkpoint as the current best forecast improves.
- Save
--kind finalwhen the final forecast is ready. - Return final strict JSON.
Each API request gets a local workspace under tmp/api/<request_id>/:
event.json
evidence_manifest.json
trace.jsonl
initial_submission.json
final_submission.json
codex_stdout.txt
codex_stderr.txt
codex_final.json
The API initializes evidence_manifest.json with the active variant and event
payload, then exposes REQUEST_WORKSPACE, EVIDENCE_MANIFEST, TRACE_LOG,
ACTIVE_VARIANT_ID, and ACTIVE_VARIANT_JSON to the Codex prompt. Codex can
read these files during the run. For important evidence that comes from native
web search or another source not automatically logged by a local tool, Codex can
append a concise item:
python -m api_service.run_metadata evidence \
--workspace "$REQUEST_WORKSPACE" \
--kind "web" \
--source "native_web_search" \
--query "<query>" \
--notes "<what this evidence established>"The API writes trace.jsonl lifecycle events such as request receipt, Codex
start/finish, fallback usage, validation failure, and validation success.
Codex may append agent-side milestones, but tracing must never delay a valid
forecast.
For Top-K or other non-mutually-exclusive events, the returned probabilities are per-outcome inclusion probabilities and may sum to K. They should not be normalized to 1 unless the event is actually single-winner / mutually exclusive.
This is intentionally an agentic workflow. Codex decides which tools to call, what to search, and when it has enough evidence.
From repo root:
npm run market:lookup -- --text "..." --category "..." --max-markets 10
npm run market:lookup -- --text "..." --include-history --history-lookback-days 7 --history-trade-limit 50 --history-candle-limit 48
npm run kalshi:discover -- --query "..." --status open --max-markets 100
npm run sports:lookup -- --query "..." --sport auto --include-odds
npm run finance:lookup -- --query "..." --symbols NVDA --asset-type equity
npm run metadata -- evidence --workspace tmp/api/<request_id> --kind web --source native_web_search --query "..." --notes "..."
npm run metadata -- trace --workspace tmp/api/<request_id> --stage agent_research_complete --message "..."
npm run submit:prediction -- --event event.json --prediction prediction.json
npm run submit:prediction -- --kind initial --event event.json --prediction prediction.json
npm run submit:prediction -- --kind final --event event.json --prediction prediction.jsonsubmit_prediction --kind initial and --kind final save local checkpoints
beside the event file. They do not submit externally. If Codex times out or
returns malformed stdout, the API uses the latest valid final checkpoint, then
the latest valid initial checkpoint, then a deterministic fallback.
--include-history is opt-in and currently enriches matched Kalshi markets with public trade prints and price/volume candles. The returned history includes timestamps, prices, volume/count, and side fields, but not Kalshi tickers, market IDs, or trade IDs.
Codex also has its normal shell/file/search tools and native web search from --search.