TrialDesignBench provides tooling for evaluating whether AI agents can reproduce clinical trial designs from Statistical Analysis Plans and protocols.
This baseline implements workflow step 1:
- Create a local benchmark workspace.
- Convert a SAP/protocol PDF to Mathpix Markdown, with optional LaTeX ZIP output.
- Build the standard TrialDesignBench reproduction prompt.
- Run the prompt against a locally installed Codex SDK/runtime and save the run artifacts.
uv add trialdesignbenchFor development:
git clone https://github.com/BBSW-org/TrialDesignBench.git
cd TrialDesignBench
uv syncThe experimental Codex Python SDK is declared as a Git source dependency for
uv environments until it is published on PyPI. From a clone of this
repository, uv sync installs both openai-codex and its pinned local runtime.
For PyPI-only installs before openai-codex is published on PyPI, add the SDK
source explicitly in the consuming project:
uv add "openai-codex @ git+https://github.com/openai/codex.git#subdirectory=sdk/python"uv run tdb init tdb-workspace
uv run tdb configure --workspace tdb-workspace
uv run tdb run path/to/sap.pdf --workspace tdb-workspace --case-id tdb-001Use --no-codex to exercise only the Mathpix ingestion portion:
uv run tdb run path/to/sap.pdf --workspace tdb-workspace --no-codexThe workspace .env file stores MATHPIX_APP_ID, MATHPIX_APP_KEY,
CODEX_MODEL, and optionally CODEX_BIN. The default Codex model is
gpt-5.5, and the default reasoning effort is high. The generated workspace
.gitignore excludes credentials and output artifacts by default.