feat: Java tracing agent with end-to-end optimization pipeline#1874
feat: Java tracing agent with end-to-end optimization pipeline#1874misrasaurabh1 merged 14 commits intomainfrom
Conversation
|
Claude finished @misrasaurabh1's task in 7s —— View job PR Review SummaryTasks
Prek Checks✅ Ruff and formatting: pass (no issues) Mypy flags 3 new errors introduced by this PR:
For Code ReviewBugs / Correctness1.
2.
Design Issues3. Language guard uses string literal instead of enum ( if functions_only and functions_only[0].language == "java":The codebase mixes string comparisons ( 4. Binary JAR committed to git ( A 16MB JAR is modified in the diff (15.95MB → 15.97MB) and stored directly in git. There's no 5.
Duplicate DetectionNo meaningful duplicates detected.
Test CoverageThe new files have accompanying tests:
Optimization PRsPR #1877 ( |
- Use `uv run -m codeflash.main` instead of direct file path - Remove redundant --no-pr (already hardcoded in _run_java_tracer) - Clean up leftover replay tests between retry attempts - Add error logging for subprocess output Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Git doesn't track empty directories, so src/test/java must be created before process_pyproject_config validates tests-root exists. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Unwrap logger.info call in tracer.py that fits within 120-char limit - Revert auto-generated dev version string in version.py back to 0.20.3 Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
The original code performed a linear scan over `self._ranking` on every call to `get_function_addressable_time`, which `rank_functions` invokes repeatedly (once per function to filter, plus once per function to sort). The optimized version builds a hash map `_ranking_by_name` during `__init__`, replacing the O(n) loop with an O(1) dictionary lookup. Line profiler confirms the loop and comparison accounted for 94.7% of original runtime. When `rank_functions` calls `get_function_addressable_time` dozens or hundreds of times across a 1000-method ranking (as in `test_large_number_of_methods_and_repeated_queries_perf_and_correctness`), the lookup cost drops from ~293 µs to ~10 µs per call, yielding the 1244% overall speedup. The optimization also consolidates the two calls to `get_addressable_time_ns` in `get_function_stats_summary` into a single call, stored in a local variable, eliminating redundant work.
⚡️ Codeflash found optimizations for this PR📄 1,245% (12.45x) speedup for
|
…2026-03-19T06.41.55 ⚡️ Speed up method `JavaFunctionRanker.get_function_addressable_time` by 1,245% in PR #1874 (`java-tracer`)
|
This PR is now faster! 🚀 @misrasaurabh1 accepted my optimizations from: |
- Read --timeout from both config.timeout and config.tracer_timeout - Handle multi-line /* */ block comments in package detection (aerospike source files start with license block comments before package declaration) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…n_ranker - Import Language from codeflash.languages (exported) not codeflash.languages.base - Fix _detect_non_python_language return type: object | None -> Language | None - Fix bare dict type annotations: dict -> dict[str, Any] in jfr_parser.py and function_ranker.py - Fix pytest_splits/test_paths type narrowing by separating assignment from None check Co-authored-by: Saurabh Misra <undefined@users.noreply.github.com>
The optimization precomputes all frame-to-key conversions for a stack trace once (into a `keys` list) instead of calling `_frame_to_key` repeatedly inside the caller-callee loop, cutting per-frame extraction from ~3.3 µs to ~0.19 µs (83% reduction) and lifting `_frame_to_key` from 20.8% of total time to 43.2% (the loop cost is now dominated by the upfront list comprehension rather than repeated calls). A local `matches_packages_cached` closure memoizes package-filter results to avoid re-checking the same method keys across caller relationships, reducing `_matches_packages` overhead from 12.6% to 0.8% of total time; profiler data shows `_matches_packages` hits dropped from 18,364 to 1,500. The timestamp-duration calculation switched from accumulating a list then calling `max()`/`min()` to inline min/max tracking, removing intermediate allocations; combined, these changes yield a 42% overall speedup (46.4 ms → 32.6 ms).
⚡️ Codeflash found optimizations for this PR📄 42% (0.42x) speedup for
|
⚡️ Codeflash found optimizations for this PR📄 73% (0.73x) speedup for
|
…, filter empty names - Consolidate _parse_replay_metadata to call parse_replay_test_metadata instead of duplicating the parsing logic - Replace hardcoded fallback java command with a clear error message when no java command is provided - Filter empty strings from function_names split (\"".split(\",\") returns [\"\"] which is truthy) - Fix import ordering in tracer.py (ruff I001) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…2026-03-19T08.11.52 ⚡️ Speed up method `JfrProfile._parse_json` by 42% in PR #1874 (`java-tracer`)
|
This PR is now faster! 🚀 @claude[bot] accepted my optimizations from: |
These files were unrelated to the PR and got swept in during a stash/pop operation. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Summary
Adds a complete Java tracing pipeline that captures method arguments from running Java programs, generates JUnit 5 replay tests, and feeds them into the optimization pipeline — achieving feature parity with the Python tracer.
Two-stage approach:
The traced data is used to generate replay tests that exercise the original functions with real-world inputs, which are then used by the optimizer to verify correctness and benchmark candidates.
Java agent (codeflash-java-runtime)
TracerAgent/TracerConfig— agent entry point, JSON config parsingTracingTransformer/TracingClassVisitor/TracingMethodAdapter— ASM bytecode instrumentation (usesCOMPUTE_MAXSto avoid classloader deadlocks)TraceRecorder/TraceWriter— async SQLite writer with Kryo serialization timeout (500ms via CachedThreadPool)ReplayHelper— runtime class for replay tests: deserializes args from trace DB, invokes methods via reflectionAgentDispatcher— routes to tracer mode viatrace=agent arg prefixPython orchestration
codeflash/languages/java/tracer.py—JavaTracertwo-stage flow (JFR + agent),run_java_tracer()entry pointcodeflash/languages/java/replay_test.py— generates JUnit 5 replay tests from trace SQLite DB with metadata commentscodeflash/languages/java/jfr_parser.py— parses JFR files viajfrCLI tool for method-level profilingIntegration with optimizer pipeline
codeflash/tracer.py— language detection fromcodeflash.tomlconfig; routes Java projects to_run_java_tracer()codeflash/discovery/functions_to_optimize.py—_get_java_replay_test_functions()parses replay test metadata to discover traced functionscodeflash/languages/java/test_discovery.py— discoversReplayTest_*.javafiles via metadata comments (static analysis can't tracehelper.replay()string args)codeflash/discovery/discover_unit_tests.py— classifies replay tests asTestType.REPLAY_TESTusingTestInfo.is_replayflagcodeflash/benchmarking/function_ranker.py—JavaFunctionRankerranks by JFR samples withmin_functions=5escape hatch for short workloadscodeflash/optimization/optimizer.py— extracts Java packages from file paths for JFR filtering; usesJavaFunctionRankerwhenlanguage == "java"Verified end-to-end
Ran the full pipeline on the
Workload.javafixture:repeatString: 2.58x faster (StringBuilder →String.repeat())filterEvens: 32% faster (bitwise parity, pre-sized list)instanceMethod: 75% faster (inlined computation)Test plan
test_java_tracer_e2e.py) — agent capture, replay test generation, two-stage orchestrationtest_java_tracer_integration.py) — function discovery, test discovery, JFR parsing, ranking, Maven compilationtest_test_discovery.py) — metadata-based mapping, no confusion with regular testse2e-java-tracer.yaml) for CI validation🤖 Generated with Claude Code