New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[API Proposal]: UnicodeJsonEncoder #87153
Comments
Tagging subscribers to this area: @dotnet/area-system-text-json, @gregsdennis Issue DetailsBackground and motivationThere's no built-in implementation that allows characters from all languages to be kept readable, and unnecessary escaping to be avoided when the caller knows recipients parse JSON correctly. For additional context, see: API Proposalnamespace System.Text.Encodings.Web
{
internal sealed class UnicodeJsonEncoder : JavaScriptEncoder
{
internal static readonly UnicodeJsonEncoder Singleton = new UnicodeJsonEncoder();
private readonly bool _preferHexEscape;
private readonly bool _preferUppercase;
public UnicodeJsonEncoder()
: this(preferHexEscape: false, preferUppercase: false)
{
}
public UnicodeJsonEncoder(bool preferHexEscape, bool preferUppercase)
{
_preferHexEscape = preferHexEscape;
_preferUppercase = preferUppercase;
}
// Implementations of base class members.
}
} namespace System.Text.Encodings.Web
{
public abstract class JavaScriptEncoder : TextEncoder
{
// Existing members
public static JavaScriptEncoder Unicode => UnicodeJsonEncoder.Singleton;
}
} PR #87147 has additional implementation details. API Usage// some typed variable with the JSON object to serialize, called "data"
string json = JsonSerializer.Serialize(data new JsonSerializerOptions { Encoder = JavaScriptEncoder.Unicode }); Or, to force hex escapes (\uxxxx) rather than two-character escapes (for example, "): // some typed variable with the JSON object to serialize, called "data"
string json = JsonSerializer.Serialize(data new JsonSerializerOptions { Encoder = new UnicodeJsonEncoder(preferHexEscape: true, preferUppercase; false) }); // or other values for those bools Alternative DesignsNo response RisksSimilar to UnsafeRelaxedJsonEncoder, but see ##87138. Callers need to ensure two things:
|
This API proposal is for a minimal encoder, stating it leaves it up to the caller to further escape content correctly for embedding in whatever other container language they need. However, there's a problem with that for embedding in HTML inside 'script islets' - i.e. inside The way to escape HTML content is to use entity encoding, e.g. to escape E.g. while <script type="application/json">
{ "foo" : "</script>" }
</script> is suitably enough escaped, when the content is read back via the DOM (e.g. via It's decidedly non-trivial to decide what entity-encoding signifies an encoded parameter that needs decoding - and what doesn't. Maybe your JSON contains a series of resource string translations for a technical editing application that talks about how to represent HTML entities and was meant to contain an entity-encoded example? So to get this right, your encoder has to encode all occurences of entity-like sequences and then you have to decode them again when attempting to read this stuff back. Would be better if the new API proposal would be extended to allow specifying additional characters that should be encoded with |
@rjgotten - that's a very interesting case. From looking at the API surface, I believe the same question applies to the existing string data = "</script>";
string json = JsonSerializer.Serialize(data, new JsonSerializerOptions {
Encoder = JavaScriptEncoder.UnsafeRelaxedJsonEscaping });
Console.WriteLine(json); produces:
I think that's an interesting option to consider. I'd tend to leave that functionality out of this API for simplicity. As far as I can tell, JavaScriptEncoder.UnsafeRelaxedJsonEscaping also does not escape these characters and cannot be customized to escape them (without subclassing) - I'd tend to do the same here. |
Background and motivation
There's no built-in implementation that allows characters from all languages to be kept readable, and unnecessary escaping to be avoided when the caller knows recipients parse JSON correctly.
For additional context, see:
#42847
#86800
#87138
API Proposal
PR #87147 has additional implementation details.
API Usage
Or, to force hex escapes (\uxxxx) rather than two-character escapes (for example, "):
Alternative Designs
No response
Risks
Similar to UnsafeRelaxedJsonEncoder, but see #87138.
Callers need to ensure two things:
The text was updated successfully, but these errors were encountered: