Skip to content

RupertDodkins/honestlaunch

Repository files navigation

HonestLaunch

Grounded adversarial claim audit for dense expert documents.

HonestLaunch reads an AI model report, paper, technical blog post, or other dense expert document, extracts the riskiest factual claims, then dispatches specialist verifier agents to produce an audited launch page plus a structured evidence ledger.

It is not a paper summarizer. The primary output is an audited page view that preserves the source structure, substitutes overstated copy inline, and shows the original wording plus citations on hover or tap. The secondary surface is the claim ledger: original wording, formal verdict, evidence contrast against references, collapsible agent steps, supporting evidence found, contradictions or narrowing evidence, missing context, computed checks when relevant, and the strongest defensible rewrite.

Current Archive Shape

The current public-facing direction is a reviewed static archive of audited launch pages, not an open live-generation product.

Browse the archive locally:

cd /Users/rupert/engineering/honestlaunch
python3 -m http.server 8765

Then open:

  • http://127.0.0.1:8765/
  • http://127.0.0.1:8765/examples/index.html
  • http://127.0.0.1:8765/examples/methodology.html

The first canonical archive set is:

  • Google
  • Anthropic
  • OpenAI
  • xAI

Archive maintenance commands:

source .venv/bin/activate
python scripts/rebuild_archive.py
python scripts/evaluate_archive.py

This rebuilds the featured HTML reports, refreshes archive metadata and evidence-packet sidecars, and writes the current archive-quality score under docs/temporary/.

Hackathon Submission

HonestLaunch was built as a submission to the Google I/O Hackathon, hosted by Cerebral Valley with the Google DeepMind team.

It was originally submitted under the name CappinCheck, before the project was rebranded to HonestLaunch.

The project was shaped around the event prompt for Gemini 3.5 Flash: build something new for the agentic era that benefits from fast, low-cost multi-step reasoning. Rather than shipping another chat interface or RAG wrapper, HonestLaunch uses parallel Gemini specialist agents to audit dense public AI model reports and launch posts, compare claims against references, and rewrite overstated claims into wording that is still strong but actually defensible.

Why Low-Latency Gemini

Before low-latency Gemini models, running multiple grounded specialist passes over the same document would have cost dollars and minutes per audit. Today it can be fast enough for a live claim ledger.

This repo is structured around agent skills:

  • skills/verifier/SKILL.md: finds support and primary evidence.
  • skills/contradiction-finder/SKILL.md: finds caveats, contrary evidence, and missing context.
  • skills/numeric-calibrator/SKILL.md: checks percentages, deltas, units, and table math.
  • skills/claim-aggregator/SKILL.md: turns evidence into verdicts and rewrites.

The supported default implementation is a local async Gemini runner that loads those same SKILL.md files. An experimental --runtime managed path uses Google GenAI Interactions; see RUNTIME.md for the runtime boundary and caveats.

Quickstart

cd /Users/rupert/engineering/honestlaunch
python -m venv .venv
source .venv/bin/activate
pip install -e .

Run the no-key demo:

honestlaunch audit examples/demo_document.md --mock --profile --out examples/demo_report.md --json examples/demo_report.json --html examples/demo_report.html
open examples/demo_report.html

For the archive shell, open examples/index.html after generating or refreshing artifacts.

The deterministic fixture is examples/demo_document.md; use --mock for public demos when API access is unavailable or live grounding is flaky.

Run the no-key Evidence Contrast demo:

honestlaunch audit examples/demo_document.md --mock --contrast --contrast-top 2 --profile --out examples/contrast_demo.md --json examples/contrast_demo.json --html examples/contrast_demo.html
open examples/contrast_demo.html

Run with Gemini:

export GEMINI_API_KEY=...
honestlaunch audit examples/demo_document.md --out examples/demo_report.md --json examples/demo_report.json --html examples/demo_report.html

Run Evidence Contrast against explicit reference URLs:

honestlaunch audit examples/demo_document.md \
  --contrast \
  --reference https://ai.google.dev/gemini-api/docs/models \
  --profile \
  --contrast-top 2 \
  --out examples/contrast_live.md \
  --json examples/contrast_live.json \
  --html examples/contrast_live.html

V1 uses explicit --reference URLs for reliability. Automatic reference discovery is intentionally deferred.

For a real/public source placeholder that avoids copying copyrighted text into the repo, see examples/real_public_example.md.

Demo Script

  1. Run the deterministic mock command above.
  2. Open examples/demo_report.html.
  3. Start on the audited launch page: underline styling, hover/tap tooltip, original wording, and explicit-reference citations.
  4. Switch to the claim ledger to show formal verdict, Evidence Contrast, Evidence Sources, Agent Steps, missing context, and rewrite.
  5. Highlight the numeric contrast row: the source says 84.1% to 87.3%, so the defensible improvement is 3.2 points / 3.8% relative, not 30%.
  6. If GEMINI_API_KEY is available, rerun with --contrast --reference ... and compare the live grounded report to the deterministic fallback.

Output

Each audited claim includes:

  • Primary audited-page rendering with inline substitutions
  • Hover/tap tooltip showing original wording, verdict, and explicit-reference citations
  • Original claim
  • Claim type
  • Formal verdict: supported, overstated, missing_context, contradicted, or not_checkable
  • Verdict definitions:
    • supported: available evidence supports the claim as written or with only minor caveats.
    • overstated: evidence points in the same direction, but the wording is stronger, broader, or more certain than supported.
    • missing_context: the claim may be true, but key scope, baseline, methodology, source, or denominator context is missing.
    • contradicted: available evidence conflicts with the claim as written.
    • not_checkable: available sources do not provide enough evidence to verify or falsify the claim.
  • Stretch score from 0 to 100
  • Evidence Contrast against explicit reference URLs when --contrast is enabled
  • Evidence Sources split into provided references, Gemini-discovered supporting sources, Gemini-discovered caveat/counter sources, snippets, and mismatch notes
  • Agent Steps showing the verifier, contradiction-finder, numeric-calibrator, and aggregator outputs
  • Run telemetry covering pipeline wall time, per-claim timing, and per-agent timing
  • Supporting evidence found
  • Contradictions / narrowing evidence
  • Missing context
  • Computed checks when the claim has quantities, percentages, units, or table math to verify
  • Strongest defensible rewrite

Evidence Contrast Mode

Evidence Contrast Mode compares selected claims against user-provided reference URLs with URL Context and renders a side-by-side contrast card:

Claim says: ...
Best source says: ...
Delta: narrower_than_claim / missing_context / contradicted / not_checkable
Defensible rewrite: ...

This is documented in DEMO_EXTENSION_PLAN.md. Evidence Contrast Mode is the intended answer to "show me exactly where the claim differs from existing docs."

Report layout:

  • Audited Launch Page: the primary surface, preserving source block structure while substituting audited claims inline with hover/tap receipts.
  • Evidence Contrast: the demo-facing side-by-side card: claim wording, reference wording, delta, final verdict, and defensible rewrite.
  • Audit Ledger: the secondary inspection/debug surface showing verdicts, contrast, sources, and agent steps.
  • Evidence Sources: separates explicit user-provided references from Gemini-discovered supporting and caveat/counter sources, with snippets and mismatch notes underneath for inspection.

Reference discovery with Google Search grounding is a v2 extension. The first version prioritizes explicit --reference URLs for demo reliability.

Limitations

HonestLaunch does not prove that a paper is true or false. It identifies claims whose wording may outrun the available evidence. It should be used as a triage and review aid, not as an authority.

The default runtime is local async Gemini execution over repo-local skill files. The experimental managed runtime is implemented but not demo-safe in this environment; see RUNTIME.md. Live output depends on model availability, tool support, and grounding quality. The --mock path is intentionally deterministic so public demos can run without secrets or network access.

About

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors