Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JSON string escaping support #753

Merged
merged 4 commits into from
Nov 23, 2022
Merged

JSON string escaping support #753

merged 4 commits into from
Nov 23, 2022

Conversation

noloader
Copy link
Contributor

@noloader noloader commented Oct 26, 2022

This commit adds JSON string escaping support according to RFC 8259, Section 7.
See GitHub Issue
closes #754. [added by @kwwall]

protected int charToCodepoint( Character ch ) {

final String s = Character.toString(ch);
assert (s.length() == 1) : "Ooops";
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Asserts are enabled by the Surefire plugin by default, so if this triggers we will see the output in the test dump. Would you consider adding more context to the assert failure output to help identify this as the cause when inspecting from a terminal?

EG "JSONCodec - Unable to convert specified character to codepoint. Character String conversion must contain more than one character"

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't feel strongly about this. There is a lot of output in the tests, so the likelihood is that the stacktrace from any assertion failure will be more useful than the message itself. This will get us in the right area, and I think most developers will be able to fill in the rest with the logic being applied.

Resolving this review thread.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In #753 (comment), @jeremiahjstacey wrote:

Asserts are enabled by the Surefire plugin by default, so if this triggers we will see the output in the test dump. Would you consider adding more context to the assert failure output to help identify this as the cause when inspecting from a terminal?

That still doesn't mean that we should rely on 'asserts'. The are not generally enabled in runtime by default and even if we are doing so in our Surefire plugin, I would feel better if anything that can be caused by a user accidentally providing incorrect input or not using the classes as per their Javadoc, should be an actual runtime check and not rely on assert. Rather, there should be an explicit check and if input is amok, then throw an IllegalArgumentException. In other words, do NOT use assert to check for preconditions. Using asserts for postconditions or invariants, I think is okay, as that generally means a program error on our part (assume we've validated all the inputs for sanity), but don't use them for precondition checking.

So, @noloader, if this is a precondition check using assertions, could you change it to an explicit check? Thanks. (I haven't checked, as I'm talking to my mother and trying to listen to her and type at the same time.)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kwwall per your comment on the assert statement I've unresolved that conversation. Suggest waiting to merge pending updated resolution.

Copy link
Contributor

@kwwall kwwall Nov 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my previous comment here. If Unicode characters can extend from 0x0 to 0x10ffff (rather than 0xffff), is there any concern that this could result in a string whose length is >1?

I don't think we need to resolve this before merging this PR, but if it is a problem, we should create a GitHub issue to address it later and to note it somewhere in the JSONCodec Javadoc.

@jeremiahjstacey
Copy link
Collaborator

@noloader, I scanned the existing bugs and I don't see a task to associate this PR to. (I may have missed it, sorry)

If something does not exist already, would you please create an issue in the project that encapsulate the intent and value of this change? Once identified or created, please associate it with this PR for project traceability.

@kwwall
Copy link
Contributor

kwwall commented Oct 26, 2022 via email

Logic adjustements to the implementatation of JsonCodec.decodeCharacter
to consolidate method return events to four:

1 - First character is null (fast return/no Exception)
2 - First character is not null, Second character is Null (fast return w/Exception)
3 - First & Second not null, but no decode can be performed (Exception)
4 - First & Second not null, decode performed and returned (normal response)

Also added the two-character output to the Exception message in case 3
to aid with diagnostics/debugging.
@noloader
Copy link
Contributor Author

@jeremiahjstacey

I scanned the existing bugs and I don't see a task to associate this PR to. (I may have missed it, sorry)

If something does not exist already, would you please create an issue in the project that encapsulate the intent and value of this change? Once identified or created, please associate it with this PR for project traceability.

Its kind of a winding road to get here.

This started with Kevin and I talking offline because we need a JSONEncoder at $dayjob. We did not want to use a JavaScript encoder because the grammars are slightly different. Then, the discussion moved to the mailing list at JSON codec discussion on esapi-github, https://groups.google.com/a/owasp.org/g/esapi-project-users/c/Q0LTNGxbTQc/m/xoyjMNwCAgAJ .

The mailing list message points to the dead PR at #722 . The original PR is dead because I accidentally blew away the code changes on a rebase. I was able to recover the deleted changes and files through Git's reflog, which eventually lead to this [new] PR.

@kwwall
Copy link
Contributor

kwwall commented Oct 27, 2022 via email

Using the more appropriate %c syntax for the String.format when
displaying the first and second characters in the exception message.
@kwwall kwwall mentioned this pull request Oct 28, 2022
Updating comments pertaining to handling of invalid code points in UTF8,
per request in PR ESAPI#753

ESAPI#753 (comment)
@kwwall
Copy link
Contributor

kwwall commented Nov 15, 2022

@noloader, @jeremiahjstacey, & @xeno6696 - Call for final comments and/ commits on this PR. I'd like to get this and @jeremiahjstacey's PR #756 merged soon so we can do another release. I know Dave Wichers is planning on a AntiSamy point release to fix a CVE in a dependency so we should release when AntiSamy does to prevent people from claiming ESAPI is vulnerable.

@noloader
Copy link
Contributor Author

noloader commented Nov 16, 2022 via email

@kwwall
Copy link
Contributor

kwwall commented Nov 16, 2022 via email

@xeno6696
Copy link
Collaborator

@kwwall i know I need to review this, I’m on travel for work (again) and I can’t sit down with this until this weekend.

If we need to push faster than that I’ll trust the group judgment here.

@xeno6696
Copy link
Collaborator

Given that this was recovered code from #722 I feel like I beat that up pretty well already. @kwwall I'm okay with moving forward. I would normally just merge here but it sounded like you had some changes and I don't want to step on your toes here.

@xeno6696
Copy link
Collaborator

I did want to call myself out to building some unit tests that handled the full Unicode range. If we stayed with the AbstractCharacterCodec I have a minor frowny-face just because it was only supposed to be a bridge until all the codecs were converted to Integer-based codecs... however this is my fault as I should have marked that abstract class as deprecated. (It could be that there are some codecs where for some reason, it makes zero sense at all to expand to handle the full UTF-8 range, but I figured we'd handle those case by case.)

Copy link
Contributor

@kwwall kwwall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would like a response to my comment before I approve merging this PR. Changes may or may not be required. I just want to be sure our assumptions about code points are still legitimate as @xeno6696 did the AbstractCodec work many years ago when we were using Java 7 and that may have been an older version of Unicode.

@@ -25,6 +25,9 @@
* points cannot be represented by a {@code char}, this class remedies that by parsing string
* data as codePoints as opposed to a stream of {@code char}s.
*
* WARNING: This class will silently discard an invalid code point according to
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my previous comment in AbstractCodec.java. In particular, it seems as though Character.isValidCodepoint( int ) allows a broader range that what was anticipated.

protected int charToCodepoint( Character ch ) {

final String s = Character.toString(ch);
assert (s.length() == 1) : "Ooops";
Copy link
Contributor

@kwwall kwwall Nov 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my previous comment here. If Unicode characters can extend from 0x0 to 0x10ffff (rather than 0xffff), is there any concern that this could result in a string whose length is >1?

I don't think we need to resolve this before merging this PR, but if it is a problem, we should create a GitHub issue to address it later and to note it somewhere in the JSONCodec Javadoc.

@kwwall kwwall merged commit 5f21e78 into ESAPI:develop Nov 23, 2022
@noloader noloader deleted the json branch June 18, 2023 16:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

JSON encoder
4 participants