Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JSON string escaping support #722

Closed
wants to merge 0 commits into from
Closed

JSON string escaping support #722

wants to merge 0 commits into from

Conversation

noloader
Copy link
Contributor

This commit adds JSON string escaping support per RFC 8259, Section 7.

The encoder escapes special characters using two-character escape sequences for popular characters (like \r, \n and \t). For other characters which require escaping, the six-character representation is used (like \u0000).

@noloader
Copy link
Contributor Author

Hi Everyone/@kwwall,

We need this encoder at $dayjob.

@kwwall
Copy link
Contributor

kwwall commented Jul 17, 2022 via email

@noloader
Copy link
Contributor Author

noloader commented Jul 17, 2022

@kwwall,

I will take a look, but no promises that I can get this into the 2.5.0.0
release, which I am trying to get out today.

No problems. The release is more important.

The encoder is available if you want it. If not, we can carry patches for it now that it is available.

Copy link
Collaborator

@xeno6696 xeno6696 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just want to see a unit test that covers the case 😂, otherwise known as U+1F602 as I want to document clearly what happens when the codec encounters characters that do not exist in the range of legal Character or char items.

Ideally I'd like to see this altered to use the AbstractIntegerCodec if that unit test fails like I think it will. But as this is the first substantial contribution for a Codec since I started working on this project, any new functionality is welcome!

I will also request some unit tests to accommodate mixed encoding behavior, for example, encode a string that contains HTML entities and then do an encoding pass with the JSON Encoder to view what the intermediate result looks like.

Usually it's corner cases like this where we're likely to see unexpected failures when the Codec is used in chains.

src/test/java/org/owasp/esapi/reference/EncoderTest.java Outdated Show resolved Hide resolved
src/test/java/org/owasp/esapi/reference/EncoderTest.java Outdated Show resolved Hide resolved
@xeno6696
Copy link
Collaborator

I will take a look, but no promises that I can get this into the 2.5.0.0 release, which I am trying to get out today. Also, most devs that I've seen just use the GSON library for this use case because they are already using it for most other JSON related use cases.

-kevin

@kwwall It's not a make/break feature as in the past I've just had people do javascript escapes for JSON payload data. It's overkill but it works. This one is lighter weight.

@noloader noloader force-pushed the json branch 2 times, most recently from b37787e to 41068a9 Compare July 18, 2022 06:21
@noloader
Copy link
Contributor Author

@xeno6696,

I'd like to see this altered to use the AbstractIntegerCodec ...

Done. I rebased and pushed an updated version.

I'm still not convinced this codec handles all cases well. I'm going to get some test cases added for U+1F602 and friends.

@noloader
Copy link
Contributor Author

noloader commented Jul 18, 2022

Hi Everyone,

One question I have.... I used IllegalArgumentException for an encoding error when decoding. Should this use an ESAPI-specific exception?

I tried to use EncodingException, but I encountered an error. It think it is the same error from GH #227.

@noloader noloader force-pushed the json branch 5 times, most recently from 5521009 to 1785b15 Compare July 18, 2022 13:14
@xeno6696
Copy link
Collaborator

@noloader I'm a bit scared of the --force push commits. Typically with git it's a major red flag.

What was necessitating the use of the command?

@noloader
Copy link
Contributor Author

noloader commented Jul 18, 2022

Hi @xeno6696,

I'm a bit scared of the --force push commits. Typically with git it's a major red flag

What was necessitating the use of the command?

Rebasing to flatten out history to one commit.

I don't think it effects ESAPI's history. It only effects my fork. From ESAPI's view of the world, there's one commit in the PR against ESAPI's d6251b5. d6251b5 is tip of develop, which is what Kevin pushed yesterday.

@kwwall
Copy link
Contributor

kwwall commented Jul 18, 2022

@xeno6696 wrote:

@kwwall It's not a make/break feature as in the past I've just had people do javascript escapes for JSON payload data. It's overkill but it works. This one is lighter weight.

Plus somewhat ironically, not EVERY use of JSON is associated with JavaScript any more. (At least in the usual sense of JavaScript use.) JSON is now one of the many ubiquitous information preserving file formats used much like CSV and XML files are to move information around, so I think people would favor something lighter weight.

@kwwall
Copy link
Contributor

kwwall commented Jul 18, 2022

@noloader wrote:

One question I have.... I used IllegalArgumentException for an encoding error when decoding. Should this use an ESAPI-specific exception?

I tried to use EncodingException, but I encountered an error. It think it is the same error from GH #227.

I don't think we should tightly couple our various codecs to ESAPI exceptions, especially since some of those are tied to ESAPI's Intrusion Detector which are tied to ESAPI Logging. I never really liked that. Using ESAPI exceptions and other ESAPI components is fine under org.owasp.esapi.reference but I prefer loose-coupling elsewhere. I try even avoid ESAPI interfaces for those spots.

Let's just keep it simple. I think that IllegalArgumentException works fine.

@noloader
Copy link
Contributor Author

noloader commented Jul 18, 2022

Let's just keep it simple. I think that IllegalArgumentException works fine.

Perfect, thanks.

@planetlevel
Copy link

planetlevel commented Jul 18, 2022 via email

@noloader
Copy link
Contributor Author

not EVERY use of JSON is associated with JavaScript any more...

++. Javascript encoding is not valid JSON encoding. A conforming JSON parser should reject \x2f and friends.

@kwwall
Copy link
Contributor

kwwall commented Jul 18, 2022

Perhaps an Observable type model would be better for intrusion relevant events.

Hindsight is 20/20, but something to consider in ESAPI 3.

@noloader noloader force-pushed the json branch 3 times, most recently from ec99507 to d31447a Compare July 19, 2022 06:33
@noloader noloader force-pushed the json branch 3 times, most recently from a258cc5 to 4a4556f Compare July 21, 2022 03:28
@kwwall
Copy link
Contributor

kwwall commented Jul 27, 2022

@noloader - Just wondering, since you are doing this codec and JSON has become the linga franca of transporting data, for completeness, you might want to consider writing some new Validator methods, like isValidSafeJSON() that would be analogous to isValidSafeHTML().

Copy link
Contributor

@kwwall kwwall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I generally approve. Only real issue I saw was the use of assert 'Ooops' instead of a runtime check and throwing IllegalArgumentException. (I am guilty of the same thing and have changed many assertions in the last 5 years.)

I generally approve, but I'd like to take one more pass at resolving my comments before I merge this.

@@ -45,6 +45,7 @@
* <li>JavaScript Escaping</li>
* <li>MySQL Database Escaping</li>
* <li>Oracle Database Escaping</li>
* <li>JSON Escaping</li>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are multiple versions of JSON (I've heard about at least 3 slightly different variants), so if we are going to mention this in the Encoder Javadoc, could we at least specify something like
JSON Escaping (RFC 8259, Section 7)
to distinguish which variant we are supporting?

@@ -556,6 +557,22 @@ public interface Encoder {
*/
String encodeForURL(String input) throws EncodingException;

/**
* Encode data for use in JSON strings. This method performs <a
* href="https://datatracker.ietf.org/doc/html/rfc8259#section-7">String escaping</a>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this maybe say 'JSON string escaping' or 'String escaping for JSON'? Or doesn't this validate it as valid JSON before we escape it? (That seems like it would be important, but I could be argued out of it I suppose.)

@@ -607,4 +624,19 @@ public interface Encoder {
*/
String getCanonicalizedURI(URI dirtyUri);

/**
* Decode data encoded for JSON strings. This method removes <a
* href="https://datatracker.ietf.org/doc/html/rfc8259#section-7">String escaping</a>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as previous comment.

@@ -24,6 +24,9 @@
* data by focusing on {@code int} as opposed to {@code Character}. Because non-BMP code
* points cannot be represented by a {@code char}, this class remedies that by parsing string
* data as codePoints as opposed to a stream of {@code char}s.
*
* WARNING: This class will silently discard an invalid code point according to
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note 'WARNING:' in bold. And @xeno6696, why are discarding things silently? What do you think this is, a UNIX error message? ;-)

@@ -54,8 +54,8 @@ public HTMLEntityCodec() {
* Given an array of {@code char}, scan the input {@code String} and encode unsafe
* codePoints, except for codePoints passed into the {@code char} array.
* <br/><br/>
* WARNING: This method will silently discard any code point per the
* call to {@code Character.isValidCodePoint( int )} method.
* WARNING: This method will silently discard an invalid code point according to
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto about using bold font for 'WARNING:'.


errorMessage = "Invalid JSON two-character escape representation";

// Per the RFC... All characters may be escaped as a six-character sequence: a reverse solidus,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Argh!

protected int charToCodepoint( Character ch ) {

final String s = Character.toString(ch);
assert (s.length() == 1) : "Ooops";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is not a private method, this assertion (which are disabled by default) should really be a runtime check and throw an IllegalArgumentException...hopefully with a more meaningful message than 'Ooops'. ;-)

@@ -42,6 +42,7 @@
import org.owasp.esapi.codecs.PercentCodec;
import org.owasp.esapi.codecs.VBScriptCodec;
import org.owasp.esapi.codecs.XMLEntityCodec;
import org.owasp.esapi.codecs.JSONCodec;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't personally care, but I almost will guarantee that if ESAPI is around for another dozen years, we'll get a PR that will alphabetize the import statements. I know, because I have seen that in some Java coding standards! :D

@@ -1,15 +1,15 @@
/**
* OWASP Enterprise Security API (ESAPI)
*
*
* This file is part of the Open Web Application Security Project (OWASP)
* Enterprise Security API (ESAPI) project. For details, please see
* <a href="http://www.owasp.org/index.php/ESAPI">http://www.owasp.org/index.php/ESAPI</a>.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix the link and remove the pointless anchor tag since the standard doclet that generates Javadoc will not display this anyway so let's just make it easier for the future coders to read.

@@ -38,11 +38,13 @@
import org.owasp.esapi.codecs.HTMLEntityCodec;
import org.owasp.esapi.codecs.MySQLCodec;
import org.owasp.esapi.codecs.OracleCodec;
import org.owasp.esapi.codecs.JSONCodec;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Surprise, surprise. Someone already alphabetized these imports (or perhaps we can blame an IDE).

Seriously, your email tagline should be:
I troll anal people.

@xeno6696
Copy link
Collaborator

@noloader - Just wondering, since you are doing this codec and JSON has become the linga franca of transporting data, for completeness, you might want to consider writing some new Validator methods, like isValidSafeJSON() that would be analogous to isValidSafeHTML().

AntiSamy does JSON now?

@kwwall
Copy link
Contributor

kwwall commented Jul 27, 2022 via email

@xeno6696
Copy link
Collaborator

xeno6696 commented Jul 27, 2022

@kwwall AntiSamy only handles HTML, that ask amounts to rewriting/cloning AntiSamy centering it around JSON parsing.

And THEN you have the mixed context: AntiSamy is designed to allow websites to safely handle user-inputted HTML. Our use-case derives from that. In what context does a similar use case exist for JSON? What scripts can you execute using JSON’s non-existent script tag? What are you defending against, precisely? What would be the json equivalent to the tag whitelisting AntiSamy does?

JSON’s better viewed as a transport layer data format, not a browser renderable scripting language like HTML or JavaScript. JSON could handle data intended to be used in SQL or more likely, javascript itself or HTML intended to be rendered by innerHTML in javascript.

The safeValidHTML contract works because HTML is the single explicit target. With JSON you have everything. You’re asking for an XSS servlet filter. To drill that point home, why don’t we have a safeValidXML call which would be the closest equivalent to JSON?

@planetlevel
Copy link

planetlevel commented Jul 27, 2022 via email

@kwwall
Copy link
Contributor

kwwall commented Jul 27, 2022

@xeno6696 and @planetlevel - Yeah, I went overboard with the "valid safe JSON". What I was really thinking is maybe just a check to see if a string is valid JSON, which ought to be a lot simpler. A Validator.isValidJSON() if you will, just for completeness.

I know you can do that with Gson like this:

import com.google.gson.Gson;

public final class JSONUtils {
  private static final Gson gson = new Gson();

  private JSONUtils(){}

  public static boolean isJSONValid(String jsonInString) {
      try {
          gson.fromJson(jsonInString, Object.class);
          return true;
      } catch(com.google.gson.JsonSyntaxException ex) { 
          return false;
      }
  }
}

so, using other 3rd party FOSS libraries, it's not really difficult.

The question is, how difficult is it if we were to write our own parser? (Because I don''t really want to pull in Gson as a dependency just for this purpose, even though Gson is well-maintained and way secure than alternative like, say, FasterXML Jackson Databind.) But if it didn't require a lot of code (I think we would just parse it, but not bother to deserialize it into objects), it may be worth considering.

Anyhow, sorry for the confusion.

@xeno6696
Copy link
Collaborator

@kwwall the safest way to accomplish that would be to use javacc to build a JSON parser of our own. Nobody rolls their own parsers and lexers these days. This is incomplete but is where I usually begin when I sit down to do one of these. (It's been 8yrs since the last time I did it.)

@planetlevel has disliked the idea in the past because browsers for example are notorious for breaking BNF rules by trying to "help" by doing things like fixing invalid HTML. That's valid. But I think the risk would be less in the case of JSON because of its simplicity and lack of things like XXE attacks. Also, we should call out that the best we can do is ensure that the JSON is legally parseable--which is what you're suggesting with your GSON example.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants