docs(perf): fill benchmark tables with real BMI5 test data#450
Conversation
|
/cubebot |
|
Claude finished @ls-ggg's task in 1m 43s —— View job Reviewing PR: docs(perf): fill benchmark tables with real BMI5 test data✅ OverviewThis PR does two things well:
✅ Data ConsistencyI verified the following:
All benchmark script invocations in the article match the CLI syntax of the refactored scripts. ✅ Blog Post Quality
✅ Benchmark Script QualityGeneral:
Specific scripts:
🔍 Suggestions1. Duplicated The 2. dirty_mb_avg = statistics.mean(dirty_list) / (1024 * 1024) if dirty_list[0] >= 0 else -1This only checks the first element. Using 3. The article's command examples for rollback don't show This is technically correct (default is 0, matching "pure rollback" in the description), so the article accurately reflects what was tested. No change needed, but noting it for completeness. ✅ VerdictLGTM. The PR is well-structured, the data is consistent across both languages, and the benchmark scripts are cleanly refactored. The English article is a proper rewrite aligned with the Chinese version. |
|
Claude finished @ls-ggg's task in 1m 51s —— View job ✅ PR Review: docs(perf): fill benchmark tables with real BMI5 test dataProgress
OverviewThis PR replaces mock/placeholder benchmark data with real measurements from a Tencent Cloud BMI5 bare-metal node. The changes span documentation (English + Chinese) and 5 Python benchmark scripts. Overall, the quality is high — the refactoring to a single-tier CLI pattern is well-motivated, the data is presented clearly, and the scripts are robust. Below are my findings organized by severity and category. 🔴 Potential Issues1. dirty_mb_avg = statistics.mean(dirty_list) / (1024 * 1024) if dirty_list[0] >= 0 else -1This only validates the first element. If the vmm.log grep succeeds in round 0 (returns a valid byte count > 0) but fails in a later round (returns -1), the result silently averages -1 bytes into if any(d < 0 for d in dirty_list):
dirty_mb_avg = -1
else:
dirty_mb_avg = statistics.mean(dirty_list) / (1024 * 1024)2. §3.2 Data — Per-sandbox amortized (55.8 ms) vs avg (47.8 ms) at c=1 For concurrency=1:
The ~8 ms gap between "avg individual latency" and "wall/N" at serial concurrency is non-trivial. If 🟡 Minor Issues3.
4. Each script defines its own 5. def rollback_one(sb, checkpoint_id: str, dirty_mb: int):
if dirty_mb > 0:
sb.run_code(f"...")
sb.rollback(checkpoint_id)If 🟢 Observations / Style Notes6. Excellent warm-up pattern in all scripts — the first-round discard eliminates page-cache cold-read noise, and the single-tier design makes multi-tier sweeping composable with shell scripts. 7. Cleanup discipline is generally good — all scripts kill sandboxes and delete snapshots in 8. Documentation accuracy — command blocks are tested, tables render correctly in both English and Chinese versions, metric definitions are clear, and the Chinese article is well-aligned with the English one. The TAP pool scaling instructions (§3.3) are a nice addition for readers who need to push beyond default limits. 9. Inconsistent SSL_CERT_FILE mention — SummaryThis is a solid PR. The data is well-presented, the scripts are professionally structured, and the documentation is thorough. I found one correctness bug (item 1 — partial validation), one data presentation question (item 2), and several minor quality suggestions. |
e3f6b03 to
10fca28
Compare
|
I'll analyze this and get back to you. |
Replace all mock/placeholder data in the performance benchmark article (zh + en) with real measurements from a Tencent Cloud BMI5 bare-metal node. Refactor benchmark scripts to single-tier CLI style for composable multi-tier sweeps. Add TAP pool scaling instructions and environment variable setup to each section. Signed-off-by: ls-ggg <335814617@qq.com>
10fca28 to
619eb0d
Compare
|
Claude finished @ls-ggg's task in 1m 22s —— View job 📋 Status Update — PR Synchronize EventBranch: This PR was synchronized (e.g., rebased on master), but no code changes were detected since the previous reviews. The single commit Previous Review FindingsThe two earlier reviews (from two separate action runs) identified the following items: 🔴 Potential Issues (Unaddressed)
🟡 Minor Issues (Unaddressed)
✅ Verified OK
SummaryThe branch is unchanged — previous review feedback still applies. Let me know if you'd like me to implement any of the fixes identified above. |

What
Replace all mock/placeholder data in the CubeSandbox performance benchmark article with real measurements from a Tencent Cloud BMI5 bare-metal node (96 cores, 375 GiB).
Changes
Benchmark articles (zh + en):
tap_init_numconfig + network-agent restart)Benchmark scripts (
examples/snapshot-rollback-clone/):-c,-n,-d,--no-header) for composable multi-tier sweepsTest environment