Skip to content

[BUG] HttpCollectImpl XML parsing assumes UTF-8 #2852

@pjfanning

Description

@pjfanning

Is there an existing issue for this?

  • I have searched the existing issues

Current Behavior

Document document = db.parse(new ByteArrayInputStream(resp.getBytes(StandardCharsets.UTF_8)));

If you have a String, you don't need to convert to byte array (which is almost a waste of memory).

DocumentBuilder has a parse(InputSource) method.
https://docs.oracle.com/javase/8/docs/api/javax/xml/parsers/DocumentBuilder.html#parse-org.xml.sax.InputSource-

InputSources can be constructed to wrap StringWriters that wrap the String.

Expected Behavior

Don't convert Strings to byte arrays unnecessarily wasting memory and causing parse issues. Imagine if the XML has an XML declaration that has an encoding that is not UTF-8. If you already have the String, the parser will ignore the value. If you convert to a byte array, the parser will use the XML encoding value but you have explicitly converted to UTF-8 in your code so these encodings may not match.

Steps To Reproduce

No response

Environment

HertzBeat version(s): latest

Debug logs

No response

Anything else?

No response

Metadata

Metadata

Assignees

Labels

bugSomething isn't workinggood first issueGood for newcomers

Type

No type
No fields configured for issues without a type.

Projects

Status

To do

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions