Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix reading UTF-8 strings (from OPC UA nodes) #832

Merged
merged 3 commits into from Mar 20, 2023

Conversation

Planet-X
Copy link
Contributor

@Planet-X Planet-X commented Mar 7, 2023

When reading string values via the OPC UA protocol a StringIndexOutOfBoundsException was thrown whenever the string contained non-ASCII characters. Upon further inspection, the problem could be pinned down to the logic in readString() ofReadBufferByteBased.

The readString() function in ReadBufferByteBased could result in a StringIndexOutOfBoundsException upon calling substring(). Reason is that the calculated realLength is in bytes and the length of the created string is the number of characters. For UTF encodings, multiple bytes can be just one character. This causes realLength to be longer than the actual string and thus the substring()-call to fail.

This PR fixes the issue by applying realLength to the byte-length instead of the string length: The byte array is sliced to length realLength and then converted to the final string. substring() isn't used anymore.
The fix can be verified via the included regression test and has additionally been tested locally using the OPC UA protocol.
Maybe other protocols were also affected?

Additionally I've included some other minor fixes for the logic. Check the commits for these.

Regards,
Marc

The previous logic could result in a `StringIndexOutOfBoundsException`
upon calling `substring()`.
Reason is that the calculated `realLength` is in bytes and the length
of the created string is the number of characters. For UTF encodings,
multiple bytes can be just one character. This causes `realLength`
to be longer than the actual string and thus the `substring()`-call
to fail.

Fix this by applying `realLength` to the byte-array instead of the
string. This is done by utilizing a different String constructor.

Note that on the UTF16 cases the `realLength` needs to be incremented
by 2 to account for the two added bytes.

This commit contains a regression test that fails without these
adaptations.

Signed-off-by: Marc Aurel Fritz <Marc-Aurel.Fritz@plansee.com>
Previously, the inner switch statement at the UTF-16 cases
would not convert encoding to upper case. This prevented
matching from working properly.
Do so directly at the beginning in order to prevent future
bugs.

Signed-off-by: Marc Aurel Fritz <Marc-Aurel.Fritz@plansee.com>
Prevent having the same logic twice.

Signed-off-by: Marc Aurel Fritz <Marc-Aurel.Fritz@plansee.com>
@sruehl sruehl requested a review from chrisdutz March 7, 2023 16:28
Copy link
Contributor

@sruehl sruehl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR: LGTM, although @chrisdutz should check the ASCII part regarding the padding

@ottlukas ottlukas requested a review from hutcheb March 7, 2023 17:40
@sruehl sruehl merged commit 93341ca into apache:develop Mar 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants