Skip to content

Hilo-Hilo/CodexProphet

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

35 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CodexProphet

Codex-native forecasting harness for Prophet Hacks.

Public repository:

https://github.com/Hilo-Hilo/CodexProphet

License: MIT. See LICENSE.

Forecasting Track endpoint:

https://predict.hansonwen.dev/predict

This is the standalone Codex duplicate of the Pi harness. It does not use Pi. The Mac mini deployment launches Codex directly with the local ChatGPT/Codex auth session. The Mac mini does not auto-update from GitHub; update it only when an operator explicitly runs the manual deployment script. The agent receives forecasting instructions through AGENTS.md and uses local domain skill files plus copied forecasting CLI tools.

Launch

Set up the standalone repo first:

cd /path/to/codex-prophet
npm run setup
npm run check

Interactive Codex goal session:

npm run codex:prophet -- "Will Bitcoin trade above 100000 by Sunday?"

or with an event JSON file:

npm run codex:prophet -- sample_events/sample-economics.json

Non-interactive one-shot run:

npm run codex:prophet:exec -- sample_events/sample-economics.json

Organizer standardized API run:

export OPENAI_API_KEY=...
PORT=8080 scripts/run_standardized_agent.sh

This starts the Forecasting Track API on 0.0.0.0:8080 by default and exposes:

GET  /health
POST /predict

Defaults:

  • Model: gpt-5.5
  • Active variant: config/variants.json -> v1_market_prior_codex by default. Override with CODEX_FORECAST_VARIANT.
  • Sandbox: danger-full-access for non-interactive /predict forecasts so local evidence tools can reach Kalshi, finance, sports, and other network providers.
  • Approval policy: never
  • Request budget: actual evaluation is expected to allow up to 10 minutes per event.
  • Internal Codex timeout: CODEX_API_TIMEOUT=540 seconds by default for direct/local serving. Public Cloudflare-backed Mac Mini deployments use 60 seconds by default so /predict returns before Cloudflare's request timeout; the latest initial checkpoint is used as fallback if Codex runs long.
  • Evaluation scale: organizers indicated at most about 200 total forecast requests over the 2-week evaluation window, with daily requests.
  • Web search: enabled for interactive npm run codex:prophet; non-interactive /predict uses local tools and API-backed retrieval.
  • Working root: repo root

Local API

Run the internal API on this machine:

npm run serve

This binds to 127.0.0.1:8080 by default. It is not public.

Endpoints:

GET  /health
POST /predict
GET  /prophet/events?status=open
POST /prophet/register-team
POST /prophet/register-endpoint
GET  /prophet/endpoint/{team_name}
GET  /prophet/leaderboard

Prediction smoke test:

curl -s http://127.0.0.1:8080/health
python - <<'PY'
import json
from pathlib import Path
events = json.loads(Path("sample_events/sample-economics.json").read_text())
Path("tmp/sample-event.json").write_text(json.dumps(events[0], indent=2) + "\n")
PY
curl -s -X POST http://127.0.0.1:8080/predict \
  -H 'content-type: application/json' \
  --data-binary @tmp/sample-event.json

For Prophet server registration, create .env with PA_SERVER_API_KEY or PROPHETHACKS_SERVER_API_KEY. A local .env is ignored by Git.

The official public CLI does not expose direct forecast submission. Server-side forecasting is done by registering a reachable endpoint for the Prophet server to call.

Prophet Hacks' optional website self-check can call /health and can POST a sample event to /predict for format validation. The format check may time out faster than actual evaluation, so use it to confirm response shape, not final agent runtime.

Public live soak test:

python scripts/live_soak_test.py --iterations 1 --event-index 0 --timeout-seconds 160

For a low-frequency overnight check, run the same script on the Mac Mini with a larger iteration count and interval. It writes JSONL records to logs/live_soak_predictions.jsonl and validates health, response shape, outcome coverage, probabilities, rationale, and total latency.

Mac Mini Deployment

There is no automatic CI/CD deployment for the Mac mini. A GitHub push must not restart or update the live endpoint.

When an operator explicitly wants to update the Mac mini, SSH into it and run:

cd ~/CodexProphet
git fetch origin --prune
git reset --hard origin/main
bash scripts/deploy_mac_mini.sh

The manual script refreshes dependencies, runs checks, and restarts the local LaunchAgent-backed API service. It should only be run on instruction.

Submission

See SUBMISSION.md for the organizer-facing run instructions, endpoint, Docker command, license, secrets policy, and track scope. This repo is packaged for the Forecasting Track. It does not claim Trading Track compatibility unless the official trading harness is added later.

Render Deployment

This repo is Render-ready through render.yaml and Dockerfile. The Docker image installs Python, the Python requirements, Node, and the Codex CLI.

Do not put real credentials in Git. Configure secrets in Render's Environment panel:

PA_SERVER_API_KEY
OPENAI_API_KEY

Optional but useful:

PROPHETHACKS_SERVER_API_KEY
THE_ODDS_API_KEY
FRED_API_KEY
FINNHUB_API_KEY
ALPHA_VANTAGE_API_KEY
CODEX_API_TIMEOUT=600
CODEX_FORECAST_MODEL=gpt-5.5
CODEX_PUBLIC_API_MODE=true

For direct live evaluation without a proxy timeout, prefer CODEX_API_TIMEOUT=540 or lower. For predict.hansonwen.dev behind Cloudflare, keep CODEX_API_TIMEOUT around 60 unless the endpoint is moved to a route that supports longer HTTP requests. Setting it to 600 gives Codex the entire organizer timeout and leaves no room for the API to validate or return the fallback prediction.

CODEX_ACCESS_TOKEN is supported as an alternative to OPENAI_API_KEY, but it is a short-lived account token from local Codex auth and should only be set as a Render secret. Do not commit ~/.codex/auth.json.

Render start command is handled by:

scripts/render_start.sh

The public endpoint to give Prophet Hacks after deployment is:

https://<render-service>.onrender.com/predict

Override the model:

CODEX_FORECAST_MODEL=gpt-5.5 npm run codex:prophet -- event.json

How It Works

The launcher starts Codex with an initial /goal-style prompt. The prompt tells Codex to:

  1. Read AGENTS.md.
  2. Read docs/context_memory.md, docs/market_signal_learnings.md, and skills/calibration-validation/SKILL.md.
  3. Select other relevant skill files under skills/.
  4. Use local CLI tools in this repo for market, sports, finance, Kalshi, and validation.
  5. Distinguish mutually exclusive outcome rows from Top-K / multi-correct rows.
  6. Save an early --kind initial forecast checkpoint within the first 1-2 minutes.
  7. Keep updating the initial checkpoint as the current best forecast improves.
  8. Save --kind final when the final forecast is ready.
  9. Return final strict JSON.

Each API request gets a local workspace under tmp/api/<request_id>/:

event.json
evidence_manifest.json
trace.jsonl
initial_submission.json
final_submission.json
codex_stdout.txt
codex_stderr.txt
codex_final.json

The API initializes evidence_manifest.json with the active variant and event payload, then exposes REQUEST_WORKSPACE, EVIDENCE_MANIFEST, TRACE_LOG, ACTIVE_VARIANT_ID, and ACTIVE_VARIANT_JSON to the Codex prompt. Codex can read these files during the run. For important evidence that comes from native web search or another source not automatically logged by a local tool, Codex can append a concise item:

python -m api_service.run_metadata evidence \
  --workspace "$REQUEST_WORKSPACE" \
  --kind "web" \
  --source "native_web_search" \
  --query "<query>" \
  --notes "<what this evidence established>"

The API writes trace.jsonl lifecycle events such as request receipt, Codex start/finish, fallback usage, validation failure, and validation success. Codex may append agent-side milestones, but tracing must never delay a valid forecast.

For Top-K or other non-mutually-exclusive events, the returned probabilities are per-outcome inclusion probabilities and may sum to K. They should not be normalized to 1 unless the event is actually single-winner / mutually exclusive.

This is intentionally an agentic workflow. Codex decides which tools to call, what to search, and when it has enough evidence.

Tool Surface

From repo root:

npm run market:lookup -- --text "..." --category "..." --max-markets 10
npm run market:lookup -- --text "..." --include-history --history-lookback-days 7 --history-trade-limit 50 --history-candle-limit 48
npm run kalshi:discover -- --query "..." --status open --max-markets 100
npm run sports:lookup -- --query "..." --sport auto --include-odds
npm run finance:lookup -- --query "..." --symbols NVDA --asset-type equity
npm run metadata -- evidence --workspace tmp/api/<request_id> --kind web --source native_web_search --query "..." --notes "..."
npm run metadata -- trace --workspace tmp/api/<request_id> --stage agent_research_complete --message "..."
npm run submit:prediction -- --event event.json --prediction prediction.json
npm run submit:prediction -- --kind initial --event event.json --prediction prediction.json
npm run submit:prediction -- --kind final --event event.json --prediction prediction.json

submit_prediction --kind initial and --kind final save local checkpoints beside the event file. They do not submit externally. If Codex times out or returns malformed stdout, the API uses the latest valid final checkpoint, then the latest valid initial checkpoint, then a deterministic fallback.

--include-history is opt-in and currently enriches matched Kalshi markets with public trade prints and price/volume candles. The returned history includes timestamps, prices, volume/count, and side fields, but not Kalshi tickers, market IDs, or trade IDs.

Codex also has its normal shell/file/search tools and native web search from --search.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Generated from render-examples/fastapi