Avoid reflective InputSource modelId mutation#12147
Conversation
gnodet
left a comment
There was a problem hiding this comment.
This PR replaces a reflective Field.setAccessible hack in DefaultModelBuilder.doReadFileModel() with a proper approach: pre-parsing the POM to extract the model ID before constructing the InputSource. This is a great improvement — the reflective mutation of InputSource.modelId was fragile and would break under strict module systems or JDK 26+ integrity checks.
The implementation is solid overall. A few observations below.
Claude Code on behalf of Guillaume Nodet
| ByteArrayOutputStream baos = new ByteArrayOutputStream(); | ||
| inputStream.transferTo(baos); | ||
| byte[] buf = baos.toByteArray(); | ||
| modelId = extractModelId(new ByteArrayInputStream(buf)); |
There was a problem hiding this comment.
When modelId is null and the input comes from a stream/reader, the entire content is buffered into memory (ByteArrayOutputStream / CharArrayWriter) just to extract the GAV. For very large POMs this is fine, but there's a subtlety: if the stream/reader was already partially consumed before reaching here (unlikely but defensive), this would silently lose data.
Also, consider whether the extractModelId parse could share a factory with the main MavenStaxReader parse to avoid creating two XMLStreamReader instances for every POM read.
| } | ||
| } | ||
|
|
||
| static class InputFactoryHolder { |
There was a problem hiding this comment.
The InputFactoryHolder lazy-init pattern is good. However, XMLInputFactory.newFactory() is not guaranteed to be thread-safe across all implementations. Since this is a static final, it will be shared across threads. The properties set here (IS_REPLACING_ENTITY_REFERENCES, IS_COALESCING) are set once at init, so the factory itself is safe — but note that some StAX implementations have thread-safety issues when creating readers from a shared factory. The Woodstox implementation used by Maven should be fine, but it's worth a comment.
Also, note that IS_COALESCING = true may have a minor performance cost since it forces the parser to merge adjacent character events. For the limited extraction done here (just groupId/artifactId/version), it's probably negligible.
| switch (currentElement) { | ||
| case "groupId": | ||
| groupId = text; | ||
| break; |
There was a problem hiding this comment.
The early-exit optimization if (artifactId != null && groupId != null && version != null) only checks direct project-level coordinates. If groupId is inherited from the parent and appears after artifactId+version in the POM, the parser will still continue through the entire <project> element. This is correct behavior (the fallback to parent values happens after the loop), but the early-exit condition could also include parentGroupId/parentVersion to exit sooner in the common case where parent is declared before the child elements:
| break; | |
| if (artifactId != null | |
| && (groupId != null || parentGroupId != null) | |
| && (version != null || parentVersion != null)) { | |
| break; | |
| } |
| assertNotNull(source); | ||
| assertEquals("org.example:child:1", source.getModelId()); | ||
| } | ||
|
|
There was a problem hiding this comment.
Good test coverage for the happy path. Consider also adding a test for InputStream-based reading (not just Reader), since the buffering paths differ (ByteArrayOutputStream vs CharArrayWriter). Also, a test where no modelId can be extracted (e.g., a POM missing <artifactId>) would verify the null-fallback behavior.
Fixes #11974. Tests: mvn -B -am -pl impl/maven-impl test