-
Notifications
You must be signed in to change notification settings - Fork 364
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JSON string escaping support #722
Conversation
Hi Everyone/@kwwall, We need this encoder at $dayjob. |
I will take a look, but no promises that I can get this into the 2.5.0.0
release, which I am trying to get out today.
Also, most devs that I've seen just use the GSON library for this use case
because they are already using it for most other JSON related use cases.
…-kevin
On Sun, Jul 17, 2022, 11:55 AM Jeffrey Walton ***@***.***> wrote:
Hi ***@***.*** <https://github.com/kwwall>,
We need this encoder at $dayjob.
—
Reply to this email directly, view it on GitHub
<#722 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAO6PG6B7OX2HRYXZ4TKLN3VUQUHPANCNFSM53Z6OG3Q>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
No problems. The release is more important. The encoder is available if you want it. If not, we can carry patches for it now that it is available. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just want to see a unit test that covers the case 😂, otherwise known as U+1F602
as I want to document clearly what happens when the codec encounters characters that do not exist in the range of legal Character
or char
items.
Ideally I'd like to see this altered to use the AbstractIntegerCodec
if that unit test fails like I think it will. But as this is the first substantial contribution for a Codec since I started working on this project, any new functionality is welcome!
I will also request some unit tests to accommodate mixed encoding behavior, for example, encode a string that contains HTML entities and then do an encoding pass with the JSON Encoder to view what the intermediate result looks like.
Usually it's corner cases like this where we're likely to see unexpected failures when the Codec is used in chains.
@kwwall It's not a make/break feature as in the past I've just had people do javascript escapes for JSON payload data. It's overkill but it works. This one is lighter weight. |
b37787e
to
41068a9
Compare
Done. I rebased and pushed an updated version. I'm still not convinced this codec handles all cases well. I'm going to get some test cases added for |
Hi Everyone, One question I have.... I used I tried to use |
5521009
to
1785b15
Compare
@noloader I'm a bit scared of the --force push commits. Typically with git it's a major red flag. What was necessitating the use of the command? |
Hi @xeno6696,
Rebasing to flatten out history to one commit. I don't think it effects ESAPI's history. It only effects my fork. From ESAPI's view of the world, there's one commit in the PR against ESAPI's d6251b5. d6251b5 is tip of develop, which is what Kevin pushed yesterday. |
@xeno6696 wrote:
Plus somewhat ironically, not EVERY use of JSON is associated with JavaScript any more. (At least in the usual sense of JavaScript use.) JSON is now one of the many ubiquitous information preserving file formats used much like CSV and XML files are to move information around, so I think people would favor something lighter weight. |
@noloader wrote:
I don't think we should tightly couple our various codecs to ESAPI exceptions, especially since some of those are tied to ESAPI's Intrusion Detector which are tied to ESAPI Logging. I never really liked that. Using ESAPI exceptions and other ESAPI components is fine under Let's just keep it simple. I think that |
Perfect, thanks. |
Perhaps an Observable type model would be better for intrusion relevant events.
—Jeff
Jeff Williams
410-707-1487
…________________________________
From: Jeffrey Walton ***@***.***>
Sent: Monday, July 18, 2022 5:00:57 PM
To: ESAPI/esapi-java-legacy ***@***.***>
Cc: Subscribed ***@***.***>
Subject: Re: [ESAPI/esapi-java-legacy] JSON string escaping support (PR #722)
Let's just keep it simple. I think that IllegalArgumentException works fine.
Perfect, thanks.
—
Reply to this email directly, view it on GitHub<#722 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AAUUFTGJZ5DIXAGXU5IZ55TVUXAYTANCNFSM53Z6OG3Q>.
You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>
|
++. Javascript encoding is not valid JSON encoding. A conforming JSON parser should reject |
Hindsight is 20/20, but something to consider in ESAPI 3. |
ec99507
to
d31447a
Compare
a258cc5
to
4a4556f
Compare
@noloader - Just wondering, since you are doing this codec and JSON has become the linga franca of transporting data, for completeness, you might want to consider writing some new |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I generally approve. Only real issue I saw was the use of assert 'Ooops' instead of a runtime check and throwing IllegalArgumentException. (I am guilty of the same thing and have changed many assertions in the last 5 years.)
I generally approve, but I'd like to take one more pass at resolving my comments before I merge this.
@@ -45,6 +45,7 @@ | |||
* <li>JavaScript Escaping</li> | |||
* <li>MySQL Database Escaping</li> | |||
* <li>Oracle Database Escaping</li> | |||
* <li>JSON Escaping</li> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are multiple versions of JSON (I've heard about at least 3 slightly different variants), so if we are going to mention this in the Encoder Javadoc, could we at least specify something like
JSON Escaping (RFC 8259, Section 7)
to distinguish which variant we are supporting?
@@ -556,6 +557,22 @@ public interface Encoder { | |||
*/ | |||
String encodeForURL(String input) throws EncodingException; | |||
|
|||
/** | |||
* Encode data for use in JSON strings. This method performs <a | |||
* href="https://datatracker.ietf.org/doc/html/rfc8259#section-7">String escaping</a> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this maybe say 'JSON string escaping' or 'String escaping for JSON'? Or doesn't this validate it as valid JSON before we escape it? (That seems like it would be important, but I could be argued out of it I suppose.)
@@ -607,4 +624,19 @@ public interface Encoder { | |||
*/ | |||
String getCanonicalizedURI(URI dirtyUri); | |||
|
|||
/** | |||
* Decode data encoded for JSON strings. This method removes <a | |||
* href="https://datatracker.ietf.org/doc/html/rfc8259#section-7">String escaping</a> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as previous comment.
@@ -24,6 +24,9 @@ | |||
* data by focusing on {@code int} as opposed to {@code Character}. Because non-BMP code | |||
* points cannot be represented by a {@code char}, this class remedies that by parsing string | |||
* data as codePoints as opposed to a stream of {@code char}s. | |||
* | |||
* WARNING: This class will silently discard an invalid code point according to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note 'WARNING:' in bold. And @xeno6696, why are discarding things silently? What do you think this is, a UNIX error message? ;-)
@@ -54,8 +54,8 @@ public HTMLEntityCodec() { | |||
* Given an array of {@code char}, scan the input {@code String} and encode unsafe | |||
* codePoints, except for codePoints passed into the {@code char} array. | |||
* <br/><br/> | |||
* WARNING: This method will silently discard any code point per the | |||
* call to {@code Character.isValidCodePoint( int )} method. | |||
* WARNING: This method will silently discard an invalid code point according to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto about using bold font for 'WARNING:'.
|
||
errorMessage = "Invalid JSON two-character escape representation"; | ||
|
||
// Per the RFC... All characters may be escaped as a six-character sequence: a reverse solidus, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Argh!
protected int charToCodepoint( Character ch ) { | ||
|
||
final String s = Character.toString(ch); | ||
assert (s.length() == 1) : "Ooops"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is not a private method, this assertion (which are disabled by default) should really be a runtime check and throw an IllegalArgumentException
...hopefully with a more meaningful message than 'Ooops'. ;-)
@@ -42,6 +42,7 @@ | |||
import org.owasp.esapi.codecs.PercentCodec; | |||
import org.owasp.esapi.codecs.VBScriptCodec; | |||
import org.owasp.esapi.codecs.XMLEntityCodec; | |||
import org.owasp.esapi.codecs.JSONCodec; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't personally care, but I almost will guarantee that if ESAPI is around for another dozen years, we'll get a PR that will alphabetize the import statements. I know, because I have seen that in some Java coding standards! :D
@@ -1,15 +1,15 @@ | |||
/** | |||
* OWASP Enterprise Security API (ESAPI) | |||
* | |||
* | |||
* This file is part of the Open Web Application Security Project (OWASP) | |||
* Enterprise Security API (ESAPI) project. For details, please see | |||
* <a href="http://www.owasp.org/index.php/ESAPI">http://www.owasp.org/index.php/ESAPI</a>. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix the link and remove the pointless anchor tag since the standard doclet that generates Javadoc will not display this anyway so let's just make it easier for the future coders to read.
@@ -38,11 +38,13 @@ | |||
import org.owasp.esapi.codecs.HTMLEntityCodec; | |||
import org.owasp.esapi.codecs.MySQLCodec; | |||
import org.owasp.esapi.codecs.OracleCodec; | |||
import org.owasp.esapi.codecs.JSONCodec; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Surprise, surprise. Someone already alphabetized these imports (or perhaps we can blame an IDE).
Seriously, your email tagline should be:
I troll anal people.
AntiSamy does JSON now? |
No, AFAIK, AntiSamy does NOT do JSON, which is why I asked if @noloader
wanted to do it.
…On Wed, Jul 27, 2022, 12:12 AM Matt Seil ***@***.***> wrote:
@noloader <https://github.com/noloader> - Just wondering, since you are
doing this codec and JSON has become the linga franca of transporting data,
for completeness, you might want to consider writing some new Validator
methods, like isValidSafeJSON() that would be analogous to
isValidSafeHTML().
AntiSamy does JSON now?
—
Reply to this email directly, view it on GitHub
<#722 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAO6PG6C6TV6G43BP4INQJDVWCZMHANCNFSM53Z6OG3Q>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
@kwwall AntiSamy only handles HTML, that ask amounts to rewriting/cloning AntiSamy centering it around JSON parsing. And THEN you have the mixed context: AntiSamy is designed to allow websites to safely handle user-inputted HTML. Our use-case derives from that. In what context does a similar use case exist for JSON? What scripts can you execute using JSON’s non-existent script tag? What are you defending against, precisely? What would be the json equivalent to the tag whitelisting AntiSamy does? JSON’s better viewed as a transport layer data format, not a browser renderable scripting language like HTML or JavaScript. JSON could handle data intended to be used in SQL or more likely, javascript itself or HTML intended to be rendered by innerHTML in javascript. The safeValidHTML contract works because HTML is the single explicit target. With JSON you have everything. You’re asking for an XSS servlet filter. To drill that point home, why don’t we have a safeValidXML call which would be the closest equivalent to JSON? |
I generally agree with Matt here – since JSON and XML are just data they can’t contain attacks so AntiSamy wouldn’t be applicable.
Except that it’s not quite true. XML can have doctypes that contain XXE attacks – would certainly be a nice thing to strip out. And JSON is actually javascript and some people might eval it. So you could imagine sanitizing it so that only data is left.
I’m not recommending that we build it, because it’s hard. You probably couldn’t use an existing parser, because it might execute the code in the data. So you’d have to build a custom one. Which is quite difficult to get right. I’d steer folks away from trying to sanitize XML and JSON and simply ensuring that those data formats are used safely.
…--Jeff
From: Matt Seil ***@***.***>
Date: Wednesday, July 27, 2022 at 9:21 AM
To: ESAPI/esapi-java-legacy ***@***.***>
Cc: Jeff Williams ***@***.***>, Comment ***@***.***>
Subject: Re: [ESAPI/esapi-java-legacy] JSON string escaping support (PR #722)
@kwwall<https://github.com/kwwall> AntiSamy only handles HTML, that ask amounts to rewriting/cloning AntiSamy centering it around JSON parsing.
And THEN you have the mixed context: AntiSamy is designed to allow websites to safely handle user-inputted HTML. Our use-case derives from that. In what context does a similar use case exist for JSON? What scripts can you execute using JSON’s non-existent script tag? What are you defending against, precisely? What would be the json equivalent to the tag whitelisting AntiSamy does?
It’s better viewed as a transport layer data format, not a browser renderable scripting language like HTML or JavaScript. JSON could handle data intended to be used in SQL or more likely, javascript itself or HTML intended to be rendered by innerHTML in javascript.
The safeValidHTML contract works because HTML is the single explicit target. With JSON you have everything. You’re asking for an XSS servlet filter. To drill that point home, why don’t we have a safeValidXML call which would be the closest equivalent to JSON?
—
Reply to this email directly, view it on GitHub<#722 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AAUUFTAUHOPW4XOPJV5UCXLVWEZWLANCNFSM53Z6OG3Q>.
You are receiving this because you commented.Message ID: ***@***.***>
|
@xeno6696 and @planetlevel - Yeah, I went overboard with the "valid safe JSON". What I was really thinking is maybe just a check to see if a string is valid JSON, which ought to be a lot simpler. A Validator.isValidJSON() if you will, just for completeness. I know you can do that with Gson like this: import com.google.gson.Gson;
public final class JSONUtils {
private static final Gson gson = new Gson();
private JSONUtils(){}
public static boolean isJSONValid(String jsonInString) {
try {
gson.fromJson(jsonInString, Object.class);
return true;
} catch(com.google.gson.JsonSyntaxException ex) {
return false;
}
}
} so, using other 3rd party FOSS libraries, it's not really difficult. The question is, how difficult is it if we were to write our own parser? (Because I don''t really want to pull in Gson as a dependency just for this purpose, even though Gson is well-maintained and way secure than alternative like, say, FasterXML Jackson Databind.) But if it didn't require a lot of code (I think we would just parse it, but not bother to deserialize it into objects), it may be worth considering. Anyhow, sorry for the confusion. |
@kwwall the safest way to accomplish that would be to use javacc to build a JSON parser of our own. Nobody rolls their own parsers and lexers these days. This is incomplete but is where I usually begin when I sit down to do one of these. (It's been 8yrs since the last time I did it.) @planetlevel has disliked the idea in the past because browsers for example are notorious for breaking BNF rules by trying to "help" by doing things like fixing invalid HTML. That's valid. But I think the risk would be less in the case of JSON because of its simplicity and lack of things like XXE attacks. Also, we should call out that the best we can do is ensure that the JSON is legally parseable--which is what you're suggesting with your GSON example. |
This commit adds JSON string escaping support per RFC 8259, Section 7.
The encoder escapes special characters using two-character escape sequences for popular characters (like \r, \n and \t). For other characters which require escaping, the six-character representation is used (like \u0000).