Skip to content

Audit fixes: build reconciliation, integration tests, XML/YAML hardening, housekeeping#78

Closed
pratyush618 wants to merge 9 commits intomainfrom
fix/audit-p0
Closed

Audit fixes: build reconciliation, integration tests, XML/YAML hardening, housekeeping#78
pratyush618 wants to merge 9 commits intomainfrom
fix/audit-p0

Conversation

@pratyush618
Copy link
Copy Markdown
Collaborator

Summary

Addresses findings from the AUDIT.md report. 9 atomic commits, mvn verify green across all 23 modules.

Fixes

CRITICAL

HIGH

MEDIUM

LOW

  • Add Gradle Kotlin DSL build alongside Maven #10 SpotBugs broad suppressions — replaced ~…Json.* / ~…Jsonl.* / ~…Yaml.* / ~…datasets.version\..* regex patterns with explicit <Or><Class .../></Or> lists. New classes in those packages now surface genuine findings. (f52773c)
  • Feat/docs #12 .gitignore secret patterns — added .env*, *.jks, *.keystore, *.p12, credentials.json. (bundled in de3118f)
  • Close spec compliance gaps #13 Chaos module coverage — added 22 tests across LatencyInjector, SchemaMutationInjector, ResilienceEvaluator. Chaos coverage went from 3/11 to 6/11 production classes. (aec4ba4)

Findings not fixed in this PR:

Test plan

  • ./mvnw verify -B → BUILD SUCCESS across all 23 modules (1:24)
  • ./mvnw -pl agenteval-langchain4j -am test → 11/11 new tests pass
  • ./mvnw -pl agenteval-spring-ai -am test → 12/12 new tests pass
  • ./mvnw -pl agenteval-chaos -am test → 22/22 new chaos tests pass, existing tests still green
  • ./mvnw -pl agenteval-datasets -am install → SpotBugs reports 0 bugs with narrowed suppressions
  • diff <(grep -oP '<module>\K[^<]+' pom.xml | sort) <(grep -oP '"\K[^"]+(?=")' settings.gradle.kts | grep agenteval | sort) — documented: Gradle intentionally lists only agenteval-gradle-plugin
  • CI: both Maven (Java 21 + 23) and narrowed Gradle-plugin job should pass

Maven now owns all 22 non-plugin modules. Gradle owns only the plugin,
which must be Gradle-native for Gradle Plugin Portal publication via
publishPlugins. Eliminates the drift the audit found (Gradle silently
skipping 8 modules) by construction: Gradle only builds one thing.

- Narrow settings.gradle.kts to a single include
- Rewrite agenteval-gradle-plugin/build.gradle.kts to depend on
  published Maven artifacts of the same version rather than sibling
  project(...) refs; adds com.gradle.plugin-publish
- Replace CI gradle job with a narrow gradle-plugin job that first
  mvn-installs the locally-resolvable artifacts
- Narrow dependabot gradle ecosystem to /agenteval-gradle-plugin
- Tighten .gitignore with secret patterns (audit finding #12)
11 tests across three production classes. Covers null-arg contracts,
text and tool-call capture from AiMessage responses, TokenUsage mapping,
ChatLanguageModel stub forwarding with latency capture, and content
retriever delegation + consume semantics.
12 tests across four production classes. Covers null-arg contracts on
the builder and capture, ChatResponse text + TokenUsage mapping,
ChatModel forwarding with latency capture, advisor metadata,
Document capture from the qa_advisor_retrieved_documents context key,
consume/clear semantics, and auto-configuration bean production.
Adds mockito-core test dep for stubbing CallAdvisorChain.
Configure DocumentBuilderFactory with disallow-doctype-decl, external
entity/DTD disabling, and xinclude/entity-expansion off. The reporter
only writes XML today so no external entity is ever parsed, but this
prevents regression if parsing is added later and establishes the
template pattern for any future DocumentBuilderFactory use.

Addresses audit finding HIGH #3.
Switch YAMLFactory construction to the builder with explicit
LoaderOptions: disallow duplicate and recursive keys, cap alias
expansion at 50, cap nesting depth at 50, cap code points at 3 MiB.
SnakeYAML 2.0+ already uses SafeConstructor and blocks custom global
tags, so this hardening is defense-in-depth against billion-laughs /
deeply-nested / oversized payloads rather than gadget-chain RCE.

Addresses audit finding HIGH #4.
- Document 1.0.0 removal milestone on the PromptTemplate delegate and
  SemanticSimilarityMetric.cosineSimilarity so users can plan migrations
- Add explanatory comment to agenteval-bom explaining why build-tooling
  modules are deliberately omitted (independent release cadences)
- Replace sk-test / sk-ant-test in judge provider tests with neutral
  strings so the fixtures don't look like the real key shape to scanners
- Reindent two JSON text-block fixtures to satisfy editorconfig's
  4-space rule (content semantics unchanged)

Addresses audit findings MEDIUM #6, #7, #8.
Replace 50 ms sleeps that were forcing distinct filesystem mtimes
between two sequential tag() calls with explicit Files.setLastModifiedTime
calls. The test now deterministically proves the listVersions ordering
instead of depending on CI clock resolution.

Addresses audit finding MEDIUM #9.
Replace regex class-family patterns with explicit Or/Class lists for
the json/jsonl/yaml loader-writer classes and the datasets.version
package. A new class added to any of these packages will now surface
genuine EI_EXPOSE_REP[2] findings in spotbugs:check instead of being
silently blanket-suppressed.

Addresses audit finding LOW #10 — the broadest and most load-bearing
regex patterns were the ones explicitly called out; other suppressions
still use package patterns where the bug class is common across many
record types in the package.
22 tests across three previously untested classes:
- LatencyInjectorTest: 8 tests covering ms addition, zero-case,
  empty-tool-calls identity, field preservation, and constructor bounds
- SchemaMutationInjectorTest: 10 tests covering each MutationType,
  default constructor, null result handling, escaping, and empty case
- ResilienceEvaluatorTest: 4 tests covering judge delegation,
  rendered-prompt field substitution, and null-response placeholder

Lifts agenteval-chaos coverage from 3/11 to 6/11 production classes.
Addresses audit finding LOW #13.
@pratyush618
Copy link
Copy Markdown
Collaborator Author

Closing to rename the source branch to fix/audit-remediation (this branch covers P0+P1+P2 audit findings, not just P0). Reopening with the renamed branch.

@pratyush618 pratyush618 deleted the fix/audit-p0 branch April 23, 2026 19:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant