Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

System.Text.Json Merge #31433

Open
tb-mtg opened this issue Nov 8, 2019 · 9 comments
Open

System.Text.Json Merge #31433

tb-mtg opened this issue Nov 8, 2019 · 9 comments
Labels
api-suggestion Early API idea and discussion, it is NOT ready for implementation area-System.Text.Json wishlist Issue we would like to prioritize, but we can't commit we will get to it yet
Milestone

Comments

@tb-mtg
Copy link

tb-mtg commented Nov 8, 2019

Using System.Text.Json, is there any way to Merge like Json.Net does?

see Newtonsoft.Json.Linq.JContainer.Merge

public void Merge(
	Object content,
	JsonMergeSettings settings
)
@ahsonkhan
Copy link
Member

ahsonkhan commented Jan 3, 2020

Since the JsonDocument, JsonElement APIs are read-only, you could workaround this by writing your own Merge method based on the Utf8JsonWriter.

Such an API makes more sense with the writable/modifiable DOM so we should consider adding this merge capability with that feature: https://github.com/dotnet/corefx/issues/39922

If your JSON objects only contain non-null simple/primitive values and the order in which the properties show up isn't particularly concerning, the following, relatively straightforward, code sample should work for you:

public static string SimpleObjectMerge(string originalJson, string newContent)
{
    var outputBuffer = new ArrayBufferWriter<byte>();

    using (JsonDocument jDoc1 = JsonDocument.Parse(originalJson))
    using (JsonDocument jDoc2 = JsonDocument.Parse(newContent))
    using (var jsonWriter = new Utf8JsonWriter(outputBuffer, new JsonWriterOptions { Indented = true }))
    {
        JsonElement root1 = jDoc1.RootElement;
        JsonElement root2 = jDoc2.RootElement;

        // Assuming both JSON strings are single JSON objects (i.e. {...})
        Debug.Assert(root1.ValueKind == JsonValueKind.Object);
        Debug.Assert(root2.ValueKind == JsonValueKind.Object);

        jsonWriter.WriteStartObject();

        // Write all the properties of the first document that don't conflict with the second
        foreach (JsonProperty property in root1.EnumerateObject())
        {
            if (!root2.TryGetProperty(property.Name, out _))
            {
                property.WriteTo(jsonWriter);
            }
        }

        // Write all the properties of the second document (including those that are duplicates which were skipped earlier)
        // The property values of the second document completely override the values of the first
        foreach (JsonProperty property in root2.EnumerateObject())
        {
            property.WriteTo(jsonWriter);
        }

        jsonWriter.WriteEndObject();
    }

    return Encoding.UTF8.GetString(outputBuffer.WrittenSpan);
}

Newtonsoft.Json has different null handling when doing a merge where null doesn't override the value of the non-null property (when there are duplicates). I am not sure if you want that behavior or not. If that's needed, you would need to modify the above method to handle the null cases. Here are the modifications:

public static string SimpleObjectMergeWithNullHandling(string originalJson, string newContent)
{
    var outputBuffer = new ArrayBufferWriter<byte>();

    using (JsonDocument jDoc1 = JsonDocument.Parse(originalJson))
    using (JsonDocument jDoc2 = JsonDocument.Parse(newContent))
    using (var jsonWriter = new Utf8JsonWriter(outputBuffer, new JsonWriterOptions { Indented = true }))
    {
        JsonElement root1 = jDoc1.RootElement;
        JsonElement root2 = jDoc2.RootElement;

        // Assuming both JSON strings are single JSON objects (i.e. {...})
        Debug.Assert(root1.ValueKind == JsonValueKind.Object);
        Debug.Assert(root2.ValueKind == JsonValueKind.Object);

        jsonWriter.WriteStartObject();

        // Write all the properties of the first document that don't conflict with the second
        // Or if the second is overriding it with null, favor the property in the first.
        foreach (JsonProperty property in root1.EnumerateObject())
        {
            if (!root2.TryGetProperty(property.Name, out JsonElement newValue) || newValue.ValueKind == JsonValueKind.Null)
            {
                property.WriteTo(jsonWriter);
            }
        }

        // Write all the properties of the second document (including those that are duplicates which were skipped earlier)
        // The property values of the second document completely override the values of the first, unless they are null in the second.
        foreach (JsonProperty property in root2.EnumerateObject())
        {
            // Don't write null values, unless they are unique to the second document
            if (property.Value.ValueKind != JsonValueKind.Null || !root1.TryGetProperty(property.Name, out _))
            {
                property.WriteTo(jsonWriter);
            }
        }

        jsonWriter.WriteEndObject();
    }

    return Encoding.UTF8.GetString(outputBuffer.WrittenSpan);
}

If your JSON objects can potentially contain nested JSON values including other objects and arrays, you would want to extend the logic to handle that too. Something like this should work:

public static string Merge(string originalJson, string newContent)
{
    var outputBuffer = new ArrayBufferWriter<byte>();

    using (JsonDocument jDoc1 = JsonDocument.Parse(originalJson))
    using (JsonDocument jDoc2 = JsonDocument.Parse(newContent))
    using (var jsonWriter = new Utf8JsonWriter(outputBuffer, new JsonWriterOptions { Indented = true }))
    {
        JsonElement root1 = jDoc1.RootElement;
        JsonElement root2 = jDoc2.RootElement;

        if (root1.ValueKind != JsonValueKind.Array && root1.ValueKind != JsonValueKind.Object)
        {
            throw new InvalidOperationException($"The original JSON document to merge new content into must be a container type. Instead it is {root1.ValueKind}.");
        }

        if (root1.ValueKind != root2.ValueKind)
        {
            return originalJson;
        }

        if (root1.ValueKind == JsonValueKind.Array)
        {
            MergeArrays(jsonWriter, root1, root2);
        }
        else
        {
            MergeObjects(jsonWriter, root1, root2);
        }
    }

    return Encoding.UTF8.GetString(outputBuffer.WrittenSpan);
}

private static void MergeObjects(Utf8JsonWriter jsonWriter, JsonElement root1, JsonElement root2)
{
    Debug.Assert(root1.ValueKind == JsonValueKind.Object);
    Debug.Assert(root2.ValueKind == JsonValueKind.Object);

    jsonWriter.WriteStartObject();

    // Write all the properties of the first document.
    // If a property exists in both documents, either:
    // * Merge them, if the value kinds match (e.g. both are objects or arrays),
    // * Completely override the value of the first with the one from the second, if the value kind mismatches (e.g. one is object, while the other is an array or string),
    // * Or favor the value of the first (regardless of what it may be), if the second one is null (i.e. don't override the first).
    foreach (JsonProperty property in root1.EnumerateObject())
    {
        string propertyName = property.Name;

        JsonValueKind newValueKind;

        if (root2.TryGetProperty(propertyName, out JsonElement newValue) && (newValueKind = newValue.ValueKind) != JsonValueKind.Null)
        {
            jsonWriter.WritePropertyName(propertyName);

            JsonElement originalValue = property.Value;
            JsonValueKind originalValueKind = originalValue.ValueKind;

            if (newValueKind == JsonValueKind.Object && originalValueKind == JsonValueKind.Object)
            {
                MergeObjects(jsonWriter, originalValue, newValue); // Recursive call
            }
            else if (newValueKind == JsonValueKind.Array && originalValueKind == JsonValueKind.Array)
            {
                MergeArrays(jsonWriter, originalValue, newValue);
            }
            else
            {
                newValue.WriteTo(jsonWriter);
            }
        }
        else
        {
            property.WriteTo(jsonWriter);
        }
    }

    // Write all the properties of the second document that are unique to it.
    foreach (JsonProperty property in root2.EnumerateObject())
    {
        if (!root1.TryGetProperty(property.Name, out _))
        {
            property.WriteTo(jsonWriter);
        }
    }

    jsonWriter.WriteEndObject();
}

private static void MergeArrays(Utf8JsonWriter jsonWriter, JsonElement root1, JsonElement root2)
{
    Debug.Assert(root1.ValueKind == JsonValueKind.Array);
    Debug.Assert(root2.ValueKind == JsonValueKind.Array);

    jsonWriter.WriteStartArray();

    // Write all the elements from both JSON arrays
    foreach (JsonElement element in root1.EnumerateArray())
    {
        element.WriteTo(jsonWriter);
    }
    foreach (JsonElement element in root2.EnumerateArray())
    {
        element.WriteTo(jsonWriter);
    }

    jsonWriter.WriteEndArray();
}

This sample was tested with the following:

[Fact]
public static void JsonDocumentMergeTest_ComparedToJContainerMerge()
{
    string jsonString1 = @"{
        ""throw"": null,
        ""duplicate"": null,
        ""id"": 1,
        ""xyz"": null,
        ""nullOverride2"": false,
        ""nullOverride1"": null,
        ""william"": ""shakespeare"",
        ""complex"": {""overwrite"": ""no"", ""type"": ""string"", ""original"": null, ""another"":[]},
        ""nested"": [7, {""another"": true}],
        ""nestedObject"": {""another"": true}
    }";

    string jsonString2 = @"{
        ""william"": ""dafoe"",
        ""duplicate"": null,
        ""foo"": ""bar"",
        ""baz"": {""temp"": 4},
        ""xyz"": [1, 2, 3],
        ""nullOverride1"": true,
        ""nullOverride2"": null,
        ""nested"": [1, 2, 3, null, {""another"": false}],
        ""nestedObject"": [""wow""],
        ""complex"": {""temp"": true, ""overwrite"": ""ok"", ""type"": 14},
        ""temp"": null
    }";

    JObject jObj1 = JObject.Parse(jsonString1);
    JObject jObj2 = JObject.Parse(jsonString2);

    jObj1.Merge(jObj2);
    jObj2.Merge(JObject.Parse(jsonString1));

    Assert.Equal(jObj1.ToString(), Merge(jsonString1, jsonString2));
    Assert.Equal(jObj2.ToString(), Merge(jsonString2, jsonString1));
}

Note: If performance is critical for your scenario, this method (even with writing indented) out-performs the Newtonsoft.Json's Merge method both in terms of runtime and allocations. That said, the implementation could be made faster depending on need (for instance, don't write indented, cache the outputBuffer, don't accept/return strings, etc.).

BenchmarkDotNet=v0.12.0, OS=Windows 10.0.19041
Intel Core i7-6700 CPU 3.40GHz (Skylake), 1 CPU, 8 logical and 4 physical cores
.NET Core SDK=5.0.100-alpha1-015914
  [Host]     : .NET Core 5.0.0 (CoreCLR 5.0.19.56303, CoreFX 5.0.19.56306), X64 RyuJIT
  Job-LACFYV : .NET Core 5.0.0 (CoreCLR 5.0.19.56303, CoreFX 5.0.19.56306), X64 RyuJIT

PowerPlanMode=00000000-0000-0000-0000-000000000000  
Method Mean Error StdDev Median Min Max Ratio Gen 0 Gen 1 Gen 2 Allocated
MergeNewtonsoft 29.01 us 0.570 us 0.656 us 28.84 us 28.13 us 30.19 us 1.00 7.0801 0.0610 - 28.98 KB
Merge_New 16.41 us 0.293 us 0.274 us 16.41 us 16.02 us 17.00 us 0.57 1.7090 - - 6.99 KB
[BenchmarkCategory(Categories.CoreFX, Categories.JSON)]
[Benchmark(Baseline = true)]
public string MergeNewtonsoft()
{
    JObject jObj1 = JObject.Parse(_jsonString1);
    JObject jObj2 = JObject.Parse(_jsonString2);

    jObj1.Merge(jObj2);

    return jObj1.ToString();
}

[BenchmarkCategory(Categories.CoreFX, Categories.JSON)]
[Benchmark]
public string Merge_New()
{
    return Merge(_jsonString1, _jsonString2);
}

@tb-mtg
Copy link
Author

tb-mtg commented Jan 6, 2020

Thank you @ahsonkhan, hopefully this feature will be added.

@ahsonkhan
Copy link
Member

ahsonkhan commented Jan 7, 2020

@tb-mtg, as part of requirements, can you expand on your scenarios and what JsonMergeSettings capabilities are necessary for the Merge APIs (for example MergeArrayHandling, MergeNullValueHandling, PropertyNameComparison).

Are there others that Newtonsoft.Json doesn't support that would be needed? Do you generally merge two JObjects, or is it common to merge any two arbitrary JContainers? What should the behavior be of JArray.Merge(some single JToken)?

Also, what is your particular use case for such an API? Having context around sample usage would help answer some of the requirement questions as well.

@msftgits msftgits transferred this issue from dotnet/corefx Feb 1, 2020
@msftgits msftgits added this to the 5.0 milestone Feb 1, 2020
@layomia layomia modified the milestones: 5.0, Future Apr 7, 2020
@vijer
Copy link

vijer commented May 31, 2021

Is there a way to do a join? I have two different json files with a common key and I've been looking for a way to filter and join the output.

For example, the controlling file contains categories and the items are in another file. The filter would be where(rc.CategoryID == tt.CategoryID && rc.CategoryID == "metals" && tt.GameVersion == "A" || tt.GameVersion == "2")

All the examples I find are related to merging two files with the same structure.

"ResourceCategories": [
    {
      "CategoryDescription": "Base Upgrades",
      "CategoryID": "baseupgrades",
      "IncludeCategory": true,
      "GameVersion": "A"
    },

and the item file

"TechType": [
    {
      "CategoryID": "crystalline",
      "TechName": "Quartz",
      "SpawnID": "quartz",
      "TechID": 1,
      "GameVersion": "A"
    },
    {
      "CategoryID": "metals",
      "TechName": "Metal Salvage",
      "SpawnID": "scrapmetal",
      "TechID": 2,
      "GameVersion": "A"
    },
    {
      "CategoryID": "outcrop",
      "TechName": "Limestone Outcrop",
      "SpawnID": "limestonechunk",
      "TechID": 4,
      "GameVersion": "A"
    },

@PeterWone
Copy link

Is there a way to do a join? I have two different json files with a common key and I've been looking for a way to filter and join the output.

Get the graphs as Json objects and use LINQ.

@eiriktsarpalis
Copy link
Member

eiriktsarpalis commented Oct 25, 2021

I believe the new JsonNode type that ships with .NET 6 might be more appropriate to expose this type of functionality. APIs are currently missing, but they are fairly easy to implement as extensions methods:

public static class JsonNodeExtensions
{ 
    public static void AddRange(this JsonArray jsonArray, IEnumerable<JsonNode?> values)
    {
        foreach (var value in values)
        {
            jsonArray.Add(value);
        }
    }

    public static void AddRange(this JsonObject jsonObject, IEnumerable<KeyValuePair<string, JsonNode?>> properties)
    {
        foreach (var kvp in properties)
        {
            jsonObject.Add(kvp);
        }
    }
}

cc @steveharter

@eiriktsarpalis
Copy link
Member

Closing in favor of #56592.

@steveharter
Copy link
Member

I have concerns about add this feature when the options are not a simple "ignore" or "replace".

The existing semantics of Newtonsoft's MergeNullValueHandling and PropertyNameComparison are fairly intuitive however the MergeArrayHandling option is not. This option applies to both JSON arrays and JSON objects, and works fine if just "ignore" or "replace" is desired, but a "merge" requires an Equals() method which for JSON objects would typically be based on a "key property", and perhaps even a "type identifier property". For "union" that is a separate discussion for objects vs arrays. For ordered JSON arrays, the "key" could be the ordinal, but I think in many cases a true "key" would be desired. For these non-trivial merge cases, I believe Newtonsoft does a DeepEquals() here -- however that is not likely the expected behavior for these cases.

So I believe there are many scenarios where a "merge" done on objects would want a "key property" that typically relates to the primary key in a database. Without such a key, a "merge" would likely combine the properties of two independent objects which will likely not be incorrect. Consider:

Current Json:
[
{"ID":100, "Name":"Steve", "PhoneExtension":111},
{"ID":101, "Name":"Joe", "PhoneExtension":222}
]

JSON to merge:
[
{"ID":101, "PhoneExtension":333}
]

Result? Normally I'd expect this
[
{"ID":100, "Name":"Steve", "PhoneExtension":111},
{"ID":101, "Name":"Joe", "PhoneExtension":333}
]

and not, for example
[
{"ID":100, "Name":"Steve", "PhoneExtension":111},
{"ID":101, "Name":"Joe", "PhoneExtension":222},
{"PhoneExtension":333}
]

Also, since the options [should] apply throughout all nodes in the graph, recursively, (note they don't in Newtonsoft with "concat" and "union") some based on "keys" and some not, I don't see how useful this feature with merge\union would be.

Some options IMO:

  • Eirik's extension example above could be used for simple cases that don't have to deal with "keys", and at only one level (not recursive).
  • Extend Eirik's extension sample to provide a "key" property (probably needs to be a simple JsonValue type).
  • Add a callback pattern where the consumer needs to specify the "replace" or "ignore" semantics and perhaps "merge\union" where custom logic would use the current Path of each node to determine the key and related merge semantics.
  • Add the concept of a "key" property to JsonNode by allowing it to be set on each node.

@PeterWone
Copy link

@steveharter I think you're right.

The workaround offered to me was very practical and it is not difficult to use it to to handle the issues you mention. Personally I like the simplicity of merge as something you produce from two immutable graphs.

@dotnet dotnet locked as resolved and limited conversation to collaborators Mar 11, 2022
@dotnet dotnet unlocked this conversation Jul 10, 2023
@eiriktsarpalis eiriktsarpalis added the wishlist Issue we would like to prioritize, but we can't commit we will get to it yet label Jul 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api-suggestion Early API idea and discussion, it is NOT ready for implementation area-System.Text.Json wishlist Issue we would like to prioritize, but we can't commit we will get to it yet
Projects
None yet
Development

No branches or pull requests

8 participants