chore(docs): improve ai instructions#1582
Conversation
Add build and test instructions to AGENTS.md to improve Claude Code performance.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 875c30a480
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Binary Size Analysis (Agent Data Plane)Target: e8a875d (baseline) vs 803ddec (comparison) diff
|
| Module | File Size | Symbols |
|---|---|---|
anon.4f8fd67d74ae1f1600187cfeb0121be9.1.llvm.14393392841373718055 |
-130 B | 1 |
anon.4f8fd67d74ae1f1600187cfeb0121be9.1.llvm.6868297941858885019 |
+129 B | 1 |
anon.4f8fd67d74ae1f1600187cfeb0121be9.4.llvm.14393392841373718055 |
-115 B | 1 |
anon.4f8fd67d74ae1f1600187cfeb0121be9.4.llvm.6868297941858885019 |
+114 B | 1 |
anon.4f8fd67d74ae1f1600187cfeb0121be9.3.llvm.14393392841373718055 |
-109 B | 1 |
anon.4f8fd67d74ae1f1600187cfeb0121be9.3.llvm.6868297941858885019 |
+108 B | 1 |
anon.4f8fd67d74ae1f1600187cfeb0121be9.0.llvm.14393392841373718055 |
-97 B | 1 |
anon.4f8fd67d74ae1f1600187cfeb0121be9.0.llvm.6868297941858885019 |
+96 B | 1 |
anon.4f8fd67d74ae1f1600187cfeb0121be9.2.llvm.14393392841373718055 |
-95 B | 1 |
anon.4f8fd67d74ae1f1600187cfeb0121be9.2.llvm.6868297941858885019 |
+94 B | 1 |
[Unmapped] |
-3 B | 1 |
Detailed Symbol Changes
FILE SIZE VM SIZE
-------------- --------------
[NEW] +129 [NEW] +40 anon.4f8fd67d74ae1f1600187cfeb0121be9.1.llvm.6868297941858885019
[NEW] +114 [NEW] +25 anon.4f8fd67d74ae1f1600187cfeb0121be9.4.llvm.6868297941858885019
[NEW] +108 [NEW] +19 anon.4f8fd67d74ae1f1600187cfeb0121be9.3.llvm.6868297941858885019
[NEW] +96 [NEW] +7 anon.4f8fd67d74ae1f1600187cfeb0121be9.0.llvm.6868297941858885019
[NEW] +94 [NEW] +5 anon.4f8fd67d74ae1f1600187cfeb0121be9.2.llvm.6868297941858885019
-0.1% -3 [ = ] 0 [Unmapped]
[DEL] -95 [DEL] -5 anon.4f8fd67d74ae1f1600187cfeb0121be9.2.llvm.14393392841373718055
[DEL] -97 [DEL] -7 anon.4f8fd67d74ae1f1600187cfeb0121be9.0.llvm.14393392841373718055
[DEL] -109 [DEL] -19 anon.4f8fd67d74ae1f1600187cfeb0121be9.3.llvm.14393392841373718055
[DEL] -115 [DEL] -25 anon.4f8fd67d74ae1f1600187cfeb0121be9.4.llvm.14393392841373718055
[DEL] -130 [DEL] -40 anon.4f8fd67d74ae1f1600187cfeb0121be9.1.llvm.14393392841373718055
-0.0% -8 [ = ] 0 TOTAL
Regression Detector (Agent Data Plane)Regression Detector ResultsRun ID: f7284ee9-1a8f-4882-bcae-e2092f84ef08 Baseline: e8a875d Optimization Goals: ✅ No significant changes detected
|
| perf | experiment | goal | Δ mean % | Δ mean % CI | trials | links |
|---|---|---|---|---|---|---|
| ❌ | otlp_ingest_logs_5mb_memory | memory utilization | +10.25 | [+9.78, +10.71] | 1 | (metrics) (profiles) (logs) |
| ➖ | otlp_ingest_logs_5mb_cpu | % cpu utilization | +1.38 | [-3.53, +6.28] | 1 | (metrics) (profiles) (logs) |
| ➖ | otlp_ingest_logs_5mb_throughput | ingress throughput | -0.02 | [-0.15, +0.10] | 1 | (metrics) (profiles) (logs) |
Fine details of change detection per experiment
| perf | experiment | goal | Δ mean % | Δ mean % CI | trials | links |
|---|---|---|---|---|---|---|
| ❌ | otlp_ingest_logs_5mb_memory | memory utilization | +10.25 | [+9.78, +10.71] | 1 | (metrics) (profiles) (logs) |
| ➖ | dsd_uds_512kb_3k_contexts_cpu | % cpu utilization | +7.64 | [-50.37, +65.65] | 1 | (metrics) (profiles) (logs) |
| ➖ | otlp_ingest_logs_5mb_cpu | % cpu utilization | +1.38 | [-3.53, +6.28] | 1 | (metrics) (profiles) (logs) |
| ➖ | otlp_ingest_metrics_5mb_cpu | % cpu utilization | +0.44 | [-5.69, +6.57] | 1 | (metrics) (profiles) (logs) |
| ➖ | otlp_ingest_traces_ottl_filtering_5mb_memory | memory utilization | +0.34 | [+0.10, +0.57] | 1 | (metrics) (profiles) (logs) |
| ➖ | otlp_ingest_traces_ottl_transform_5mb_cpu | % cpu utilization | +0.25 | [-1.96, +2.45] | 1 | (metrics) (profiles) (logs) |
| ➖ | otlp_ingest_traces_5mb_throughput | ingress throughput | +0.17 | [+0.09, +0.24] | 1 | (metrics) (profiles) (logs) |
| ➖ | quality_gates_rss_dsd_low | memory utilization | +0.16 | [+0.00, +0.32] | 1 | (metrics) (profiles) (logs) |
| ➖ | dsd_uds_500mb_3k_contexts_throughput | ingress throughput | +0.06 | [-0.06, +0.19] | 1 | (metrics) (profiles) (logs) |
| ➖ | quality_gates_rss_dsd_ultraheavy | memory utilization | +0.06 | [-0.08, +0.20] | 1 | (metrics) (profiles) (logs) |
| ➖ | otlp_ingest_traces_ottl_transform_5mb_memory | memory utilization | +0.06 | [-0.10, +0.22] | 1 | (metrics) (profiles) (logs) |
| ➖ | quality_gates_rss_dsd_medium | memory utilization | +0.06 | [-0.11, +0.23] | 1 | (metrics) (profiles) (logs) |
| ➖ | dsd_uds_10mb_3k_contexts_throughput | ingress throughput | +0.02 | [-0.18, +0.21] | 1 | (metrics) (profiles) (logs) |
| ➖ | dsd_uds_512kb_3k_contexts_throughput | ingress throughput | +0.01 | [-0.05, +0.06] | 1 | (metrics) (profiles) (logs) |
| ➖ | dsd_uds_1mb_3k_contexts_throughput | ingress throughput | -0.00 | [-0.06, +0.05] | 1 | (metrics) (profiles) (logs) |
| ➖ | otlp_ingest_metrics_5mb_throughput | ingress throughput | -0.01 | [-0.14, +0.12] | 1 | (metrics) (profiles) (logs) |
| ➖ | dsd_uds_100mb_3k_contexts_throughput | ingress throughput | -0.01 | [-0.06, +0.03] | 1 | (metrics) (profiles) (logs) |
| ➖ | dsd_uds_10mb_3k_contexts_memory | memory utilization | -0.02 | [-0.18, +0.14] | 1 | (metrics) (profiles) (logs) |
| ➖ | otlp_ingest_logs_5mb_throughput | ingress throughput | -0.02 | [-0.15, +0.10] | 1 | (metrics) (profiles) (logs) |
| ➖ | dsd_uds_512kb_3k_contexts_memory | memory utilization | -0.03 | [-0.17, +0.12] | 1 | (metrics) (profiles) (logs) |
| ➖ | otlp_ingest_traces_5mb_memory | memory utilization | -0.03 | [-0.19, +0.13] | 1 | (metrics) (profiles) (logs) |
| ➖ | otlp_ingest_traces_ottl_transform_5mb_throughput | ingress throughput | -0.07 | [-0.15, -0.00] | 1 | (metrics) (profiles) (logs) |
| ➖ | quality_gates_rss_dsd_heavy | memory utilization | -0.09 | [-0.22, +0.04] | 1 | (metrics) (profiles) (logs) |
| ➖ | otlp_ingest_traces_ottl_filtering_5mb_throughput | ingress throughput | -0.10 | [-0.17, -0.02] | 1 | (metrics) (profiles) (logs) |
| ➖ | dsd_uds_100mb_3k_contexts_memory | memory utilization | -0.10 | [-0.25, +0.05] | 1 | (metrics) (profiles) (logs) |
| ➖ | otlp_ingest_traces_5mb_cpu | % cpu utilization | -0.14 | [-2.19, +1.91] | 1 | (metrics) (profiles) (logs) |
| ➖ | dsd_uds_1mb_3k_contexts_memory | memory utilization | -0.17 | [-0.30, -0.03] | 1 | (metrics) (profiles) (logs) |
| ➖ | dsd_uds_500mb_3k_contexts_memory | memory utilization | -0.20 | [-0.35, -0.06] | 1 | (metrics) (profiles) (logs) |
| ➖ | dsd_uds_500mb_3k_contexts_cpu | % cpu utilization | -0.49 | [-1.94, +0.96] | 1 | (metrics) (profiles) (logs) |
| ➖ | quality_gates_rss_idle | memory utilization | -0.56 | [-0.60, -0.51] | 1 | (metrics) (profiles) (logs) |
| ➖ | dsd_uds_100mb_3k_contexts_cpu | % cpu utilization | -1.43 | [-7.47, +4.61] | 1 | (metrics) (profiles) (logs) |
| ➖ | otlp_ingest_metrics_5mb_memory | memory utilization | -1.53 | [-1.75, -1.32] | 1 | (metrics) (profiles) (logs) |
| ➖ | otlp_ingest_traces_ottl_filtering_5mb_cpu | % cpu utilization | -2.35 | [-4.50, -0.20] | 1 | (metrics) (profiles) (logs) |
| ➖ | dsd_uds_10mb_3k_contexts_cpu | % cpu utilization | -2.83 | [-31.76, +26.10] | 1 | (metrics) (profiles) (logs) |
| ➖ | dsd_uds_1mb_3k_contexts_cpu | % cpu utilization | -3.08 | [-54.90, +48.73] | 1 | (metrics) (profiles) (logs) |
Bounds Checks: ✅ Passed
| perf | experiment | bounds_check_name | replicates_passed | observed_value | links |
|---|---|---|---|---|---|
| ✅ | quality_gates_rss_dsd_heavy | memory_usage | 10/10 | 123.57MiB ≤ 140MiB | (metrics) (profiles) (logs) |
| ✅ | quality_gates_rss_dsd_low | memory_usage | 10/10 | 40.36MiB ≤ 50MiB | (metrics) (profiles) (logs) |
| ✅ | quality_gates_rss_dsd_medium | memory_usage | 10/10 | 60.71MiB ≤ 75MiB | (metrics) (profiles) (logs) |
| ✅ | quality_gates_rss_dsd_ultraheavy | memory_usage | 10/10 | 174.64MiB ≤ 200MiB | (metrics) (profiles) (logs) |
| ✅ | quality_gates_rss_idle | memory_usage | 10/10 | 27.59MiB ≤ 40MiB | (metrics) (profiles) (logs) |
Explanation
Confidence level: 90.00%
Effect size tolerance: |Δ mean %| ≥ 5.00%
Performance changes are noted in the perf column of each table:
- ✅ = significantly better comparison variant performance
- ❌ = significantly worse comparison variant performance
- ➖ = no significant change in performance
A regression test is an A/B test of target performance in a repeatable rig, where "performance" is measured as "comparison variant minus baseline variant" for an optimization goal (e.g., ingress throughput). Due to intrinsic variability in measuring that goal, we can only estimate its mean value for each experiment; we report uncertainty in that value as a 90.00% confidence interval denoted "Δ mean % CI".
For each experiment, we decide whether a change in performance is a "regression" -- a change worth investigating further -- if all of the following criteria are true:
-
Its estimated |Δ mean %| ≥ 5.00%, indicating the change is big enough to merit a closer look.
-
Its 90.00% confidence interval "Δ mean % CI" does not contain zero, indicating that if our statistical model is accurate, there is at least a 90.00% chance there is a difference in performance between baseline and comparison variants.
-
Its configuration does not mark it "erratic".
|
|
||
| Level 4: At the user's discretion, proceed to integration testing. There are two integration test | ||
| suites: | ||
| - Name "integration": test definitions at `test/integration`. |
There was a problem hiding this comment.
Sidenote: we should probably move the integration test cases into an actual subdirectory called cases to match all the other test groupings we have.
There was a problem hiding this comment.
Agreed. Not in this one though. I also want to move some panoramic code for clearer patterns.
## Summary
Claude was routinely opting not to read AGENTS.md. I found that even
when I added a `.claude/CLAUDE.md` which tells it to read AGENTS.md it
still skipped that step.
I added a session start hook that more-or-less forces Claude to actually
read the AGENTS.md file and I added a section about our building and
testing commands to AGENTS.md
I also added some information to help it understand what commands we
need to use to check our work. It was routinely reporting being done
with work without running fmt or the linter. I think it's still going to
skip those steps but at least now we can scold it and say "Did you run
the linting steps?" and it will have those instructions available to say
"Oh sorry, blah blah, here i go with `make fmt`"
Note: some of it could get stale quickly. We should actually create a
skill and just run it occasionally to look for stale instructions or
paths. But keeping this PR quick and small.
## Change Type
- [x] Non-functional (chore, refactoring, docs)
## How did you test this PR?
### Before
On main, when I start a session and say "run the integration tests", it
looks around and chooses `make test-integration-quick` to save time,
i.e. not building the containers. That's most likely not what you want:
```
⏺ Let me check how integration tests are run in this project.
⏺ Bash(ls /my/weird/path//saluki-main/)
⎿ AGENTS.md
bin
bun.lock
Cargo.lock
Cargo.toml
ci
clippy.toml
deny.toml
docker
docs
lib
LICENSE
LICENSE-3rdparty.csv
license-tool.toml
Makefile
NOTICE
package.json
README.md
renovate.json5
repository.datadog.yml
rust-toolchain.toml
rustfmt.toml
SECURITY.md
static-analysis.datadog.yml
target
test
tomlfmt.toml
tooling
⏺ Bash(grep -i "integrat\|test-correct\|test-integration" /my/weird/path//saluki-main/Makefile | head -40)
⎿ .PHONY: test-correctness
test-correctness: build-panoramic
test-correctness: ## Runs the complete correctness suite (all test cases in parallel)
.PHONY: test-correctness-case
test-correctness-case: build-panoramic
test-correctness-case: ## Runs a single correctness test case by name (usage: make test-correctness-case CASE=dsd-plain)
build-panoramic: ## Builds the panoramic binary (ADP integration test runner)
.PHONY: test-integration
test-integration: build-panoramic build-datadog-agent-image
test-integration: ## Runs all ADP integration tests
@echo "[*] Running ADP integration tests..."
@target/release/panoramic run -d $(shell pwd)/test/integration/cases $(if $(PANORAMIC_LOG_DIR),-l $(PANORAMIC_LOG_DIR))
.PHONY: test-integration-quick
test-integration-quick: build-panoramic
test-integration-quick: ## Runs ADP integration tests (assumes images already built)
@echo "[*] Running ADP integration tests (quick mode)..."
@target/release/panoramic run -d $(shell pwd)/test/integration/cases $(if $(PANORAMIC_LOG_DIR),-l $(PANORAMIC_LOG_DIR))
.PHONY: list-integration-tests
list-integration-tests: build-panoramic
list-integration-tests: ## Lists available ADP integration tests
@target/release/panoramic list -d $(shell pwd)/test/integration/cases
⏺ Bash(make test-integration-quick 2>&1)
```
### After
It reads `AGENTS.md` and understands what to do:
```
⏺ Let me read AGENTS.md first as instructed.
Searched for 1 pattern, read 1 file (ctrl+o to expand)
⏺ Let me check the Makefile to confirm the integration test targets.
Searched for 1 pattern, read 1 file (ctrl+o to expand)
⏺ The integration suite uses make test-integration, which builds images first. Running it now — this will take a while since it builds the Datadog Agent image.
⏺ Bash(make test-integration 2>&1)
⎿ Running in the background (↓ to manage)
⎿ (timeout 10m)
```
## References
N/A 7223534
Summary
Claude was routinely opting not to read AGENTS.md. I found that even when I added a
.claude/CLAUDE.mdwhich tells it to read AGENTS.md it still skipped that step.I added a session start hook that more-or-less forces Claude to actually read the AGENTS.md file and I added a section about our building and testing commands to AGENTS.md
I also added some information to help it understand what commands we need to use to check our work. It was routinely reporting being done with work without running fmt or the linter. I think it's still going to skip those steps but at least now we can scold it and say "Did you run the linting steps?" and it will have those instructions available to say "Oh sorry, blah blah, here i go with
make fmt"Note: some of it could get stale quickly. We should actually create a skill and just run it occasionally to look for stale instructions or paths. But keeping this PR quick and small.
Change Type
How did you test this PR?
Before
On main, when I start a session and say "run the integration tests", it looks around and chooses
make test-integration-quickto save time, i.e. not building the containers. That's most likely not what you want:After
It reads
AGENTS.mdand understands what to do:References
N/A