New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AVRO-3560: throw SchemaParseException on dangling content in avsc beyond end of schema #1748
Conversation
0da46af
to
116980f
Compare
ended up finding a few bad schemas in other unit tests as well ... |
Perhaps this change should not apply to parse(InputStream in) where the caller might expect to be able to read content after the schema. |
116980f
to
9d94c70
Compare
updated the PR to allow dangling content when parsing from InputStream |
Could you add tests for trailing content in a UTF-8 file parsed using Schema.parse(File file)? JsonFactory.createParser(File f) apparently creates an InputStream for that, and JsonFactory._createParser(InputStream in, IOContext ctxt) calls ByteSourceJsonBootstrapper.constructParser, which creates an UTF8StreamJsonParser in that case. UTF8StreamJsonParser is a "byte-based" parser and inherits JsonParser.releaseBuffered(Writer w), which just returns -1, but UTF8StreamJsonParser.releaseBuffered(OutputStream out) can write something to the OutputStream. I think this means that, to correctly detect trailing content in a UTF-8 file, Schema.parse would have to call not only JsonParser.releaseBuffered(Writer) but also JsonParser.releaseBuffered(OutputStream), or first call JsonParser.getInputSource() and then use the type of the result to guess whether the parser is byte-based or char-based. If parse(InputStream) parses a stream that has trailing content, and JsonParser buffers that content, it would be best to return that content to the InputStream so that the caller can then read it. However I don't see how to do that. Is it possible that JsonParser buffers some content from a Reader, and the buffered content is all space characters and thus ignored by Avro.Schema, but the Reader has more content that JsonContent did not even read because its buffer filled up? In which case, Avro.Schema would have to check the Reader as well. |
i'll add a check for the File method. |
9d94c70
to
0a8c7a6
Compare
that was an excellent catch @KalleOlaviNiemitalo. i've added a test for File input and adapted the code to work for all cases of dangling input. except for the case of a buffer full of whitespace with more content hiding in the underlying input (either a reader or an inputstream). I'm ok with not solving for that case as I expect it to be rare (and the code changes required more invasive) |
0a8c7a6
to
6db4d34
Compare
6db4d34
to
ecf4a62
Compare
ecf4a62
to
29b8db0
Compare
…yond end of schema
29b8db0
to
e592506
Compare
@RyanSkraba - any chance of this making it in ? |
Hello! We released 1.11.1 the other day. It was kind of in a crunch, and I wasn't looking at any of the non-1.11.1 tagged issues :/ My apologies for not getting to this earlier -- it should be in the next release. To confirm, there's really little or no security issues here right? We strictly just never look farther than the JSON beginning of File. I like the branch name :D |
if (!allowDanglingContent) { | ||
String dangling; | ||
StringWriter danglingWriter = new StringWriter(); | ||
int numCharsReleased = parser.releaseBuffered(danglingWriter); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if the parser didn't read all bytes available in a file?
AFAICT, JsonParser only releases what it read but didn't parse.
(I think we're ok for Strings; I'm not sure about files)
@@ -101,7 +101,7 @@ private static Object[][] dataForResolvingTests() { | |||
"{\"type\":\"record\",\"name\":\"outer\",\"fields\":[" + "{\"name\": \"g1\", " | |||
+ "\"type\":{\"type\":\"record\",\"name\":\"inner\",\"fields\":[" | |||
+ "{\"name\":\"f1\", \"type\":\"int\", \"default\": 101}," + "{\"name\":\"f2\", \"type\":\"int\"}]}}, " | |||
+ "{\"name\": \"g2\", \"type\": \"long\"}]}}", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are nice catches!
…ond end of schema (#1748)
Ooops, it looks like this broke on master due to the unit test migration to JUnit 5. I'm taking a look. |
Make sure you have checked all steps below.
Jira
Tests
Commits
Documentation