Skip to content

Avoid reflective InputSource modelId mutation#12147

Open
Will-thom wants to merge 1 commit into
apache:maven-4.0.xfrom
Will-thom:issue-11974-inputsource-final-warning-4.0
Open

Avoid reflective InputSource modelId mutation#12147
Will-thom wants to merge 1 commit into
apache:maven-4.0.xfrom
Will-thom:issue-11974-inputsource-final-warning-4.0

Conversation

@Will-thom
Copy link
Copy Markdown

@Will-thom Will-thom commented May 24, 2026

Fixes #11974. Tests: mvn -B -am -pl impl/maven-impl test

Copy link
Copy Markdown
Contributor

@gnodet gnodet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR replaces a reflective Field.setAccessible hack in DefaultModelBuilder.doReadFileModel() with a proper approach: pre-parsing the POM to extract the model ID before constructing the InputSource. This is a great improvement — the reflective mutation of InputSource.modelId was fragile and would break under strict module systems or JDK 26+ integrity checks.

The implementation is solid overall. A few observations below.

Claude Code on behalf of Guillaume Nodet

ByteArrayOutputStream baos = new ByteArrayOutputStream();
inputStream.transferTo(baos);
byte[] buf = baos.toByteArray();
modelId = extractModelId(new ByteArrayInputStream(buf));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When modelId is null and the input comes from a stream/reader, the entire content is buffered into memory (ByteArrayOutputStream / CharArrayWriter) just to extract the GAV. For very large POMs this is fine, but there's a subtlety: if the stream/reader was already partially consumed before reaching here (unlikely but defensive), this would silently lose data.

Also, consider whether the extractModelId parse could share a factory with the main MavenStaxReader parse to avoid creating two XMLStreamReader instances for every POM read.

}
}

static class InputFactoryHolder {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The InputFactoryHolder lazy-init pattern is good. However, XMLInputFactory.newFactory() is not guaranteed to be thread-safe across all implementations. Since this is a static final, it will be shared across threads. The properties set here (IS_REPLACING_ENTITY_REFERENCES, IS_COALESCING) are set once at init, so the factory itself is safe — but note that some StAX implementations have thread-safety issues when creating readers from a shared factory. The Woodstox implementation used by Maven should be fine, but it's worth a comment.

Also, note that IS_COALESCING = true may have a minor performance cost since it forces the parser to merge adjacent character events. For the limited extraction done here (just groupId/artifactId/version), it's probably negligible.

switch (currentElement) {
case "groupId":
groupId = text;
break;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The early-exit optimization if (artifactId != null && groupId != null && version != null) only checks direct project-level coordinates. If groupId is inherited from the parent and appears after artifactId+version in the POM, the parser will still continue through the entire <project> element. This is correct behavior (the fallback to parent values happens after the loop), but the early-exit condition could also include parentGroupId/parentVersion to exit sooner in the common case where parent is declared before the child elements:

Suggested change
break;
if (artifactId != null
&& (groupId != null || parentGroupId != null)
&& (version != null || parentVersion != null)) {
break;
}

assertNotNull(source);
assertEquals("org.example:child:1", source.getModelId());
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good test coverage for the happy path. Consider also adding a test for InputStream-based reading (not just Reader), since the buffering paths differ (ByteArrayOutputStream vs CharArrayWriter). Also, a test where no modelId can be extracted (e.g., a POM missing <artifactId>) would verify the null-fallback behavior.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants