feat(evm): enable basic performance check in ci by starwarfan · Pull Request #335 · DTVMStack/DTVM

starwarfan · 2026-02-05T05:50:41Z

1. Does this PR affect any open issues?(Y/N) and add issue references (e.g. "fix #123", "re #123".):

N
Y

2. What is the scope of this PR (e.g. component or file name):

`.github/workflows/dtvm_evm_test_x86.yml`, `.github/workflows/perf_pr_comment.yml`

3. Provide a description of the PR(e.g. more details, effects, motivations or doc link):

Affects user behaviors
Contains CI/CD configuration changes
Contains documentation changes
Contains experimental features
Performance regression: Consumes more CPU
Performance regression: Consumes more Memory
Other

Enable basic performance regression checking in CI, with fixes for fork PR token permissions.

Root cause of CI failure: For fork PRs (cross-repository), the GITHUB_TOKEN in pull_request events is always read-only — this is a GitHub security feature. The "Comment on PR" step tried to call issues.createComment which requires write permissions, causing a 403 error.

Fix:

Write to GITHUB_STEP_SUMMARY: Performance results are always visible in the GitHub Actions job summary, regardless of token permissions.
Upload artifacts: Benchmark results and PR metadata are uploaded as workflow artifacts for cross-workflow access.
Graceful PR comment: The direct PR comment step now uses continue-on-error with try/catch, so it succeeds for same-repo PRs and fails silently for fork PRs.
New workflow_run workflow: A separate perf_pr_comment.yml workflow triggers on workflow_run events. Since it runs in the context of the base branch (main), it has write permissions and can post PR comments for fork PRs. This takes effect after merge to main.

4. Are there any breaking changes?(Y/N) and describe the breaking changes(e.g. more details, motivations or doc link):

N
Y

5. Are there test cases for these changes?(Y/N) select and add more details, references or doc links:

The CI workflow changes can be verified by re-running the PR checks. The "Comment on PR" step should no longer cause a job failure for fork PRs.

6. Release note

None

- Add matrix strategy (interpreter, multipass) to the performance_regression_check job so both modes run in parallel; fail-fast: false so a failure in one doesn't cancel the other. - Baseline and summary files are namespaced by mode (perf_baseline_{mode}.json, perf_summary_{mode}.md) to avoid collisions between matrix runs. - PR comments include the mode name in the title. - Add ZEN_ENABLE_JIT_PRECOMPILE_FALLBACK=ON to benchmarksuite CMake options so the JIT suitability checker is active and prevents infinite compilation hangs on pathological synth bytecode. - Add --benchmark-filter option to check_performance_regression.py to allow callers to override the default external/* filter. Co-authored-by: Cursor <cursoragent@cursor.com>

Co-authored-by: Cursor <cursoragent@cursor.com>

For fork PRs, the GITHUB_TOKEN in pull_request events is always read-only, causing the "Comment on PR" step to fail with 403. - Write results to $GITHUB_STEP_SUMMARY for always-visible output - Upload benchmark artifacts for cross-workflow access - Add continue-on-error and try/catch to the PR comment step - Add workflow_run-triggered workflow to post PR comments with write permissions (works for fork PRs after merge to main) Made-with: Cursor

…heck The baseline step checked out main and ran run_test_suite.sh, but main has no benchmarksuite case, so no baseline file was produced. The PR step then failed with "Baseline file not found". Fix: build the DTVM library on the base branch and save only the binary. Benchmarks for both baseline and PR libraries run from the PR branch's infrastructure via a new BENCHMARK_BASELINE_LIB env var. Also mitigate CI runner variance: - Raise default threshold from 10% to 15% - Require at least 3 benchmarks to exceed threshold before flagging regression (--min-regressions), filtering out single-outlier noise Made-with: Cursor

…line Reduce CI benchmark variance on shared GitHub Actions runners: - Run each benchmark 3x and use the median (--benchmark_repetitions) - Pin benchmarks to CPU 0 via taskset when available - Filter to external/total/* to avoid execute/total double-counting - Skip micro-benchmarks (<5us) that are dominated by noise - Raise threshold from 15% to 20% and require >=5 regressions Cache baseline results with actions/cache keyed by base branch SHA so the baseline build and benchmark run are skipped on subsequent pushes to the same PR. On cache miss the baseline library is built once and benchmarked; the resulting JSON is cached for future runs. Made-with: Cursor

Copilot

Pull request overview

Adds a CI-based performance regression check for the EVM path and ensures results are visible and (when possible) posted back to PRs, including fork PRs via a workflow_run follow-up workflow.

Changes:

Introduces tools/check_performance_regression.py to run/parse evmone-bench JSON output, save baselines, and detect regressions with noise-reduction knobs.
Extends .ci/run_test_suite.sh with a new benchmarksuite mode that builds/runs evmone benchmarks and compares against a baseline.
Updates CI to run performance checks on PRs, upload artifacts + step summaries, and adds a workflow_run workflow to comment results with base-repo permissions.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 7 comments.

File	Description
`tools/check_performance_regression.py`	New benchmark runner/parser + baseline + regression comparator used by CI.
`.github/workflows/dtvm_evm_test_x86.yml`	Adds a matrix perf job, uploads artifacts, writes step summaries, attempts direct PR comment.
`.github/workflows/perf_pr_comment.yml`	New `workflow_run` workflow that downloads artifacts and posts/updates a PR comment.
`.ci/run_test_suite.sh`	Adds `benchmarksuite` case to clone evmone and execute regression checking.

Comments suppressed due to low confidence (1)

tools/check_performance_regression.py:452

--benchmark-filter help text says the default is external/*, but run_benchmark() actually applies --benchmark_filter=external/total/* when none is provided. Please make the help text consistent with the real default (or change the code default if external/* is intended).

        "--benchmark-filter",
        default=None,
        help="Custom regex filter forwarded to evmone-bench --benchmark_filter (default: external/*)",
    )

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-28T07:41:03Z

.ci/run_test_suite.sh

+            cp build/lib/* $EVMONE_DIR/
+
+            cd $EVMONE_DIR


set -e is enabled, so cp build/lib/* $EVMONE_DIR/ will fail the whole benchmark run if the glob doesn’t match (or if build/lib contains non-regular files). Since only libdtvmapi.so is needed for the benchmark runner, consider copying that specific file (and/or enabling nullglob / adding an explicit existence check) to make the CI step more robust.

Suggested change

cp build/lib/* $EVMONE_DIR/

cd $EVMONE_DIR

if [ -f "build/lib/libdtvmapi.so" ]; then

cp build/lib/libdtvmapi.so "$EVMONE_DIR/"

else

echo "Error: build/lib/libdtvmapi.so not found; cannot run benchmarks." >&2

exit 1

fi

cd "$EVMONE_DIR"

Copilot · 2026-02-28T07:41:03Z

tools/check_performance_regression.py

+        time_change = (c.time_ns - b.time_ns) / b.time_ns
+        cpu_change = (c.cpu_time_ns - b.cpu_time_ns) / b.cpu_time_ns
+
+        max_change = max(time_change, cpu_change)


compare_benchmarks() divides by b.time_ns and b.cpu_time_ns directly. If either baseline value is 0 (possible because parse_benchmark_json() defaults missing times to 0), this will raise ZeroDivisionError and fail the CI run. Please guard against zero/negative baseline times (e.g., skip those entries with a notice, or treat change as inf and mark as regression/unknown).

Copilot · 2026-02-28T07:41:03Z

tools/check_performance_regression.py

+
+    result = subprocess.run(
+        cmd,
+        env={**subprocess.os.environ, **env},


subprocess.run(..., env={**subprocess.os.environ, **env}) is using subprocess.os.environ instead of os.environ. While it may work, it's non-idiomatic and couples this code to an internal detail of the subprocess module; please switch to env={**os.environ, **env} (and keep the existing merge behavior).

Suggested change

env={**subprocess.os.environ, **env},

env={**os.environ, **env},

Copilot · 2026-02-28T07:41:04Z

tools/check_performance_regression.py

+  # Check for regressions against baseline
+  python check_performance_regression.py --baseline baseline.json
+
+  # Check with custom threshold (default 10%)


The header docstring says the default threshold is 10% (and shows --threshold 0.15 as “custom”), but the argparse default is 0.15 (15%). This mismatch will confuse users and future CI tuning; please update the docstring/examples to reflect the actual defaults (and consider aligning the example text that still references 10%).

Suggested change

# Check with custom threshold (default 10%)

# Check with explicit threshold (default 15%)

Copilot · 2026-02-28T07:41:04Z

.github/workflows/perf_pr_comment.yml

+            for (const mode of ['interpreter', 'multipass']) {
+              const dir = `/tmp/perf-${mode}`;
+              try {
+                if (!prNumber) {
+                  prNumber = parseInt(fs.readFileSync(`${dir}/pr_number`, 'utf8').trim());
+                }
+                const outcome = fs.readFileSync(`${dir}/outcome`, 'utf8').trim();
+                const summary = fs.readFileSync(`${dir}/summary.md`, 'utf8');


This workflow trusts pr_number from artifacts generated by a pull_request run. For fork PRs, artifact contents are attacker-controlled; with this workflow’s write token, a malicious PR can set pr_number to any issue/PR and cause the workflow to spam or overwrite comments elsewhere in the repo. Please derive the PR number from github.event.workflow_run.pull_requests (and/or verify the artifact PR number matches that list) before posting/updating comments.

Copilot · 2026-02-28T07:41:04Z

.github/workflows/perf_pr_comment.yml

+      - completed
+
+permissions:
+  pull-requests: write


The workflow calls github.rest.issues.* (list/update/create comment), but the top-level permissions: does not grant issues: write. On fine-grained GITHUB_TOKEN permissions this can lead to 403s and the workflow failing to post comments. Please add issues: write (or switch to PR-specific APIs that work with only pull-requests: write).

Suggested change

pull-requests: write

pull-requests: write

issues: write

Copilot · 2026-02-28T07:41:04Z

.github/workflows/dtvm_evm_test_x86.yml

+
+      - name: Comment on PR
+        if: always()
+        uses: actions/github-script@v6


actions/github-script@v6 runs on the deprecated Node 16 runtime; GitHub Actions is removing Node 16 support, which can break this step in the future. Consider bumping to actions/github-script@v7 (Node 20) to avoid upcoming CI failures.

Suggested change

uses: actions/github-script@v6

uses: actions/github-script@v7

starwarfan marked this pull request as draft February 5, 2026 05:51

starwarfan force-pushed the perf-ci branch 4 times, most recently from 76e887c to 7e327e9 Compare February 9, 2026 06:31

starwarfan and others added 6 commits February 25, 2026 03:32

feat(evm): enable basic performance check in ci

5fb843f

fix: ci workflow

c482133

fix: add summary

65019a8

fix: only interpreter

36fd85d

fix(ci): add issues: write permission for PR comment step

248058f

Co-authored-by: Cursor <cursoragent@cursor.com>

starwarfan force-pushed the perf-ci branch from 6a9a6c3 to 248058f Compare February 26, 2026 11:22

starwarfan added 2 commits February 26, 2026 12:02

starwarfan marked this pull request as ready for review February 26, 2026 12:36

zoowii requested a review from Copilot February 28, 2026 07:35

zoowii merged commit 56b42b8 into DTVMStack:main Feb 28, 2026
15 checks passed

Copilot started reviewing on behalf of zoowii February 28, 2026 07:35 View session

Copilot AI reviewed Feb 28, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(evm): enable basic performance check in ci#335

feat(evm): enable basic performance check in ci#335
zoowii merged 9 commits intoDTVMStack:mainfrom
starwarfan:perf-ci

starwarfan commented Feb 5, 2026 •

edited

Loading

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Feb 28, 2026

Uh oh!

Copilot AI Feb 28, 2026

Uh oh!

Copilot AI Feb 28, 2026

Uh oh!

Copilot AI Feb 28, 2026

Uh oh!

Copilot AI Feb 28, 2026

Uh oh!

Copilot AI Feb 28, 2026

Uh oh!

Copilot AI Feb 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

-            cp build/lib/* $EVMONE_DIR/
-            cd $EVMONE_DIR
+            if [ -f "build/lib/libdtvmapi.so" ]; then
+                cp build/lib/libdtvmapi.so "$EVMONE_DIR/"
+            else
+                echo "Error: build/lib/libdtvmapi.so not found; cannot run benchmarks." >&2
+                exit 1
+            fi
+            cd "$EVMONE_DIR"

	env={subprocess.os.environ, env},
	env={os.environ, env},

	# Check with custom threshold (default 10%)
	# Check with explicit threshold (default 15%)

	uses: actions/github-script@v6
	uses: actions/github-script@v7

Conversation

starwarfan commented Feb 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

1. Does this PR affect any open issues?(Y/N) and add issue references (e.g. "fix #123", "re #123".):

2. What is the scope of this PR (e.g. component or file name):

3. Provide a description of the PR(e.g. more details, effects, motivations or doc link):

4. Are there any breaking changes?(Y/N) and describe the breaking changes(e.g. more details, motivations or doc link):

5. Are there test cases for these changes?(Y/N) select and add more details, references or doc links:

6. Release note

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Feb 28, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 28, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 28, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 28, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 28, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 28, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 28, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

starwarfan commented Feb 5, 2026 •

edited

Loading