Skip to content

[API Proposal]: JavaScriptEncoder.CreateUnsafe(ReadOnlyMemory<UnicodeRange>) method #119551

@AArnott

Description

@AArnott

Background and motivation

We need a way to use System.Text.Json to serialize object graphs to JSON without applying any JSON escaping beyond what the RFC calls for.
In particular, when writing JSON files to disk that users around the world may read and edit, we want the user's data to round trip across deserialize/serialize calls without making their non-Latin characters entirely illegible.
For Visual Studio in particular, GB18030 compliance requires that we not transform the user's surrogate pairs into \u1234\u5678 style escape sequences which will render the JSON file unreadable to humans.

Scenarios that would benefit from this:

  1. Serializing mcp.json files, which are never used in a web context but are often read and edited by humans.
  2. Interactions with LLMs (e.g. the .NET ModelContextProtocol library) where the LLMs are trained on JSON that is not overly escaped.

We have tried using JavaScriptEncoder.UnsafeRelaxedJsonEscaping, and while that reduces the unwanted escaping, it still escapes surrogate pairs and some other on the global block list, which turns out to make certain languages and emojis illegible.

While web scenarios benefit from the added security of the extra escaping that goes beyond the RFC requirements, some non-web scenarios are compromised by it. There should be a way to turn off this extra escaping that applies to very specific serializing code so that we don't have to compromise an entire application (via an appswitch or similar) in order to turn off extra escaping for one specific use case.

Other GitHub issues have been raised requesting something similar, including: #86463

API Proposal

namespace System.Text.Encodings.Web
{
    public abstract class JavaScriptEncoder : TextEncoder
    {
        public static JavaScriptEncoder! CreateUnsafe(ReadOnlyMemory<UnicodeRange>);
    }
}

This method would configure a JavaScriptEncoder instance that disregards the global block list.
The API docs should call this out specifically, and identity web scenarios as particularly impacted by this for security, but that non-web scenarios may be able to use this safely.

API Usage

JsonSerializer.Serialize(myObject, new JsonSerializerOptions { Encoder = JavaScriptEncoder.CreateUnsafe([UnicodeRanges.All]) });

Alternative Designs

No response

Risks

Web applications that benefit from the security that comes from the (default and current behavior) of added escaping may be compromised when their maintainers do a web search and apply this new API without realizing it compromises their security (whether because they wanted to remove escaping or it just happened to be in the code snippet they copied).

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions