Skip to content

feat: parametrise .ad replay scripts#433

Merged
thymikee merged 7 commits intocallstackincubator:mainfrom
kacper-mikolajczak:feat/parametrise-ad-scripts
Apr 24, 2026
Merged

feat: parametrise .ad replay scripts#433
thymikee merged 7 commits intocallstackincubator:mainfrom
kacper-mikolajczak:feat/parametrise-ad-scripts

Conversation

@kacper-mikolajczak
Copy link
Copy Markdown
Contributor

Resolves: #432

Summary

Add ${VAR} substitution to .ad replay scripts so flows can be reused across app variants, tuned per environment, deduplicated within a file, and driven from CI. Values in .ad were previously literals only - a long label=A || label=B || ... selector repeated across several steps, or wait 1000 / wait 500 scattered across a tree, had no way to be named once and reused.

Fixes #.

Resolution runs at dispatch time in invokeReplayAction. SessionAction keeps raw tokens, so writeReplayScript round-trips unchanged and the recorder stays literal-only. Precedence is high to low: CLI -e > AD_* shell env > file env > built-ins.

Key behaviours:

  • env KEY=VALUE directives in the script header, quoted values supported for content with spaces or OR-chained selectors.
  • ${VAR} interpolation in positionals and string flag values, ${VAR:-default} fallback, \${ escape.
  • CLI -e KEY=VALUE (repeatable) on replay and test.
  • Shell env auto-import: variables prefixed AD_* are imported with the prefix stripped.
  • Built-ins: AD_PLATFORM, AD_SESSION, AD_FILENAME, AD_DEVICE, AD_ARTIFACTS.
  • Unresolved ${X} throws with file:line.
  • replay -u is rejected when the script has env entries - rewrite would drop them. Noted as a v1 limitation.

Before, the same selector and timing repeated inline:

tap label=Settings || label=Preferences || label=More || label=Account
wait 1000
tap label=Settings || label=Preferences || label=More || label=Account
wait 1000
tap label=Settings || label=Preferences || label=More || label=Account
wait 500

After, named once and reused:

env SETTINGS="label=Settings || label=Preferences || label=More || label=Account"
env WAIT_LONG=1000
env WAIT_SHORT=500

tap ${SETTINGS}
wait ${WAIT_LONG}
tap ${SETTINGS}
wait ${WAIT_LONG}
tap ${SETTINGS}
wait ${WAIT_SHORT}

CI or per-variant override without editing the file:

ad replay flow.ad -e SETTINGS="label=Settings" -e WAIT_LONG=500
AD_SETTINGS="label=Settings" ad replay flow.ad

Scope intentionally excluded from v1: includes / sub-scripts, conditionals, loops, computed expressions, cross-file shared vars.

Touched-file count: 10. Scope stayed within the replay script / dispatch path plus CLI plumbing and docs.

Files:

  • new: src/daemon/handlers/session-replay-vars.ts, src/daemon/handlers/__tests__/session-replay-vars.test.ts
  • edited: src/cli/commands/generic.ts, src/client-normalizers.ts, src/client-types.ts, src/client.ts, src/daemon/handlers/session-replay-runtime.ts, src/daemon/handlers/session-replay-script.ts, src/utils/command-schema.ts, website/docs/docs/replay-e2e.md

Validation

  • pnpm format
  • pnpm check:quick
  • pnpm vitest run src/daemon/handlers/__tests__/session-replay-vars.test.ts
  • pnpm test:smoke
  • pnpm build
  • git diff --check
  • Manual ad replay on a sample .ad using env directives, ${VAR:-default} fallback, \${ escape, and an unresolved ${X} to verify the file:line error.
  • Manual ad replay -e K=V ... and AD_K=V ad replay ... to verify CLI and shell precedence over file env.
  • Manual ad replay -u on a script with env entries to verify the rejection.

Support ${VAR} substitution with env header directives and CLI -e
overrides so flows can be reused across app variants, environments,
and devices without duplicating the script.

Precedence (high->low): CLI -e > AD_* shell env > file-local env >
built-ins (AD_PLATFORM, AD_SESSION, AD_FILENAME, AD_DEVICE,
AD_ARTIFACTS). Supports ${VAR:-default} fallback, \${ escape, and
fails with file:line on unresolved vars.
resolvedPath: resolved,
}),
fileEnv: metadata.env,
shellEnv: collectReplayShellEnv(process.env),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[P2] This reads process.env in the daemon process, so the documented AD_K=V agent-device replay flow only works if that env was present when the daemon started. It also cannot work for a remote daemon. Collect the caller environment in the CLI/client and serialize it into the replay request instead.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved ✅

if (typeof device === 'string' && device.length > 0) builtins.AD_DEVICE = device;
const artifactsDir = flags.artifactsDir;
if (typeof artifactsDir === 'string' && artifactsDir.length > 0) {
builtins.AD_ARTIFACTS = artifactsDir;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[P2] AD_ARTIFACTS is derived only from the raw artifactsDir flag. Under agent-device test with the default artifacts directory it is unset, so ${AD_ARTIFACTS} fails despite being documented as available under test. With a custom --artifacts-dir it points at the raw root, not the resolved per-suite artifacts directory that test actually uses.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved ✅

Tighten the parametrisation surface introduced in the feat commit after
an internal review pass:

- Reserve the AD_* namespace for built-ins. User env (file env, CLI -e,
  shell AD_VAR_*) can no longer define AD_* keys, which closes a built-in
  shadowing vector (e.g. AD_VAR_AD_SESSION).
- Change the shell-env prefix from AD_* to AD_VAR_* so unrelated CI
  secrets that happen to start with AD_ (AD_TOKEN, AD_SECRET_KEY) are
  not auto-imported into replay scripts.
- Extend the replay -u guard to also reject scripts with \${VAR}
  substitutions in any action, not just those with env directives, so
  the writer never silently drops substitutions on heal-rewrite.
- Reword DX-unfriendly regex errors ("must match /^[A-Z_]...$/" ->
  "must be uppercase letters, digits, and underscores, e.g. APP_ID").
- Docs rewrite with precedence table, three recipes, fallback/escape
  examples, and a Notes block covering replay -u limitation, remote
  daemon caveat, no nested fallback, and loud typo behaviour.
- Additional unit tests: namespace reservation on every path, shell
  prefix migration, \${VAR} round-trip preservation through
  writeReplayScript, green-path integration test with a fake invoke.
Review feedback (callstackincubator#433 (comment)):
the daemon was reading AD_VAR_* from its own process.env, which meant
"AD_VAR_K=V agent-device replay" only worked if V was set when the
daemon started, and never worked for remote daemons.

The client now filters process.env for AD_VAR_* at request time and
ships the result as replayShellEnv on the DaemonRequest flags. The
daemon prefers the request value when present and falls back to its
own process.env for direct-daemon callers (internal tests).

Adds two integration tests pinning both paths and updates the docs
Notes block to reflect the new behaviour.
Review feedback (callstackincubator#433 (comment)):
AD_ARTIFACTS is documented as available under "agent-device test", but
buildReplayBuiltinVars was only reading the raw flags.artifactsDir.
Under the default artifacts layout the flag is unset and \${AD_ARTIFACTS}
failed; when set with --artifacts-dir it pointed at the suite root, not
the resolved per-attempt directory the test runner actually writes to.

Plumb the attempt-level artifacts dir from session-test.ts through the
runReplay callback (via runReplayTestAttempt) down into the nested
replay request's flags.artifactsDir. The daemon side is unchanged -
buildReplayBuiltinVars just now sees the right value.

Extracts the nested-request flag merge into a testable helper
(buildNestedReplayFlags) to close the coverage gap between the test
harness and the replay runtime.
- Share VAR_KEY_RE between session-replay-vars and session-replay-script
  instead of re-declaring the same /^[A-Z_][A-Z0-9_]*$/ in each.
- Fold resolveReplayFlags + resolveReplayRuntime into a single generic
  resolveStringProps<T>; the two were near-identical object-walkers.
- Un-export parseReplayEnvLine (was only used inside its own module).
- Table-drive the four "reject AD_* namespace" tests with test.each
  instead of four near-duplicate test blocks.
- Extract runReplayFixture helper for runReplayScriptFile integration
  tests; each test is now ~10 lines instead of ~30.

No behaviour change. 812 tests still green.
@thymikee thymikee marked this pull request as ready for review April 24, 2026 21:52
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 95def33f52

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

}): Promise<DaemonResponse> {
const { req, sessionName, action, invoke } = params;
const { req, sessionName, action, scope, filePath, line, invoke } = params;
const resolved = resolveReplayAction(action, scope, { file: filePath, line });
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Interpolate replay vars before argument classification

Variable substitution now runs in invokeReplayAction, after parseReplayScriptLine has already classified tokens, which breaks parametrized inputs that depend on token shape. For example, click ${X} ${Y} is parsed as a selector and collapsed to one positional (later becoming "100 200"), and numeric options like snapshot -d ${DEPTH} are parsed as NaN and dropped before substitution. This means replay scripts can silently execute different behavior than intended when using ${...} in positionals/flag values that influence parsing.

Useful? React with 👍 / 👎.

@thymikee thymikee merged commit 2062bff into callstackincubator:main Apr 24, 2026
9 of 10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Parametrise .ad replay scripts

2 participants