Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Write readers for reading WebEventData JSON payloads #34000

Merged
merged 7 commits into from Jul 6, 2021
Merged

Conversation

pranavkm
Copy link
Contributor

@pranavkm pranavkm commented Jul 1, 2021

  • Adding source generators without doing all of the work results in a size increase. We can avoid issues with reflection
    and the additional size hit by writing bespoke readers.
  • An additional optimziation this PR includes is to avoid a unicode string -> utf8 byte conversion along with the associated
    string allocation by using JsonElement as the contract for deserializing JSON payload. At least in the Blazor Server scenario,
    we could further consider pooling the byte array used for receiving and deserializing the JSON payload making parsing browser events largely allocation free.

Fixes #33967

@ghost ghost added the area-blazor Includes: Blazor, Razor Components label Jul 1, 2021
@pranavkm pranavkm marked this pull request as ready for review July 1, 2021 20:17
@pranavkm pranavkm requested a review from a team as a code owner July 1, 2021 20:17
@@ -240,7 +241,7 @@ public async ValueTask<bool> ReceiveJSDataChunk(long streamId, long chunkId, byt
return await circuitHost.ReceiveJSDataChunk(streamId, chunkId, chunk, error);
}

public async ValueTask DispatchBrowserEvent(string eventDescriptor, string eventArgs)
public async ValueTask DispatchBrowserEvent(JsonElement eventDescriptor, JsonElement eventArgs)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Careful with changing public hub methods. Old clients will still pass strings which would break. Maybe old clients aren't a concern for blazor apps though 🤷‍♀️

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The hub here is a implementation detail for Blazor and the only supported client is carried / served by this assembly. We don't really have scenarios with old clients to worry about.

@pranavkm pranavkm force-pushed the prkrishn/json branch 2 times, most recently from 7397e3a to 0c128fa Compare July 1, 2021 23:32
@captainsafia
Copy link
Member

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 2 pipeline(s).

{
eventArgs.Type = property.Value.GetString()!;
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This pattern is O(NM) with the number of properties on the eventargs type * number of incoming values. I know that N is fixed, but this is still a lot of NameEquals calls with N=M=14 here. At what point do you think it would be worth amortizing that by creating a SortedDictionary<JsonEncodedText, Action<MouseEventArgs, JsonValue>> or similar up front? It's still O(log(N) * M) where N is fixed in the framework, so maybe it's not a big enough difference to complicate our lives with.

Separately, a hostile user could send an event payload like {"a":0,"b":0,"c":0,..."zzz":0} and force a lot of string comparisons per event. I estimate about 57,000 string comparisons per event based on N=14 and M=~4000 if we allow a 32KB message and they use two-char property name combinations using chars in a-zA-Z0-9. Is it worth counting the number of properties we've considered, and throwing if it gets above (say) 50? That wouldn't make the code much more complex.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact, would it be valid to throw if we see any property name that we didn't expect? Our JS client sends a known set, doesn't it (it doesn't just serialize raw JS Event objects, does it)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact, would it be valid to throw if we see any property name that we didn't expect?

We could. It'd be different to the current behavior (values without corresponding C# properties get ignored by the serializer) but that might be fine since it would indicate a bug or suggest that we could be sending less data from JS if we genuinely don't need it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding a throw for unknown properties to all of the readers doesn't cause any of our tests to fail, so perhaps that's a sufficient solution to limit this sort of attack vector.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure there's a security concern here, there's a limit to the number of operations you can force and its not controlled by the end user. If we want to make the implementation more "scalable" that's fine, but I don't think we should do anything in the name of security.

There's only an issue when you can cause the work performed by the server to grow without growing your own work. (For example, if you have an input where you enter a number and the server calculates the factorial or fibonacci of that number as an example), but in general it's ok if you have to do more work to cause the server to do more work, even if the scale is different. At that point, its up to the server to establish the appropriate limits of what it considers acceptable.

In this case, even if this can be 57K string comparisons in the worst case, I think it'll be fast enough that it won't matter. It might even be the case, that this way is even faster than doing a hash and a lookup on a Dictionary.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was suggesting a structure that would seek via a series of comparisons, not a hash, but in any case it sounds like we’re all settled on the pattern here with the sequence of Equals checks.

@@ -95,7 +95,9 @@ async function initializeConnection(options: CircuitStartOptions, logger: Logger
const connection = connectionBuilder.build();

setEventDispatcher((descriptor, args) => {
connection.send('DispatchBrowserEvent', JSON.stringify(descriptor), JSON.stringify(args));
const getJsonBytes = (value) => new TextEncoder().encode(JSON.stringify(value));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A long time ago we added a polyfill for TextEncoder (the decoding part), which is still there now. At the time this was to help with legacy Edge support. Using TextEncoder directly here means taking an opinion that we don't need this any more.

Suggestion

Either:

  • We should remove our TextEncoder polyfill (Utf8Decoder.ts) - my preferred option
  • Or, we should also polyfill the encoder here.

No problem if we address this in a follow-up PR as there's nothing urgent about it and no reason why @pranavkm should be responsible for it more than anyone else.

Side-question

In 5.0 we announced that Blazor Server and WebAssembly would no longer be fully functional in Legacy Edge, but I don't see any clear statement about what fully functional means. Does it mean it works with polyfills? Or that it doesn't work at all under any circumstances? I don't know. @danroth27 Do you know what expectations we've been setting more recently with customers? Is there any reason to believe any customers are still reliant on polyfilling Legacy Edge support and are we saying anything about that in 6.0?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • We should remove our TextEncoder polyfill (Utf8Decoder.ts) - my preferred option

I agree with you, we should just remove this.

WRT to 5.0 our statement has never been clear, but I think it's fair to summarize it as "your mileage will vary" as in we think it would work but we don't guarantee/test that it does.

Copy link
Member

@SteveSandersonMS SteveSandersonMS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great. Thanks for taking the initiative on this, @pranavkm!

Approving, but before merging, could you please also consider including the suggestions from #34093? If you don't want to, I'm OK with turning that into a separate follow-up PR, but it would be cleaner if the two were merged as a single unit if you agree with those changes.

Comment on lines +53 to +63
else if (type == typeof(JsonElement))
{
var bytes = reader.ReadBytes();
if (bytes is null)
{
return default;
}

var jsonReader = new Utf8JsonReader(bytes.Value);
return JsonElement.ParseValue(ref jsonReader);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does reader.ReadBytes() allocate a new array? Is there a way we could avoid the byte array allocation all together there?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice if we could find a way to elide the allocation until we can parse the final event data (at which point we can't elide it).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is allocating here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't the JsonElement.Parse allocating? (I might be wrong)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know for sure, wondered if you did. JsonElement is a struct and I would have thought, theoretically at least, it could just be a way of walking through the JSON-formatted information in a ReadOnlySequence<byte> without having to allocate anything.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SteveSandersonMS doesn't it have to at least capture the memory before returning? Otherwise wouldn't we be in a use after free kind of situation? (We might already be, since we don't know what happens with the memory of the ReadOnlySequence after we've exited this method).

(I believe Tanay was running into issues while he was trying to avoid the allocation on byte array interop, so this might present a similar issue)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've looked a bit, and I think it only parses the data without allocating (not 100% confirmed, just skimmed through the implementation).

However, it might still be worth to avoid deserializing the JSON payload at this level and wait until we are inside the hub (by receiving a byte [] or similar on the signature instead) for two reasons:

  • What happens when we receive invalid JSON in this case (the error experience changes and we need to make sure we are ok with it, since we don't have a way here to know what circuit is crashing or potentially do appropriate cleanup/logging)
  • We can avoid the JsonElement step all together, start parsing the data with the reader APIs (equivalent as if we were writing a JsonConverter) and use higher level APIs when we know what event type we are parsing (if possible)). At that point, the error experience would be the same and we would make sure we aren't doing any unnecessary work.

Am I missing something?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I basically agree. Earlier today I wrote a comment here saying (approximately) "JsonElement is an unusually high-level thing to handle at the protocol level - should we just return byte data here and deserialize into JsonElement later in the process?" but then decided it wasn't important enough to bother - it's all internal so we can change it any time if we want.

I understand Pranav wanted to use JsonElement rather than pure Utf8JsonReader because it significantly simplifies the parsing process. For example, you can directly find out the size of arrays up front rather than having to do things like capture into List<T> then do .ToArray() on them. I don't have much of a reason to regard that as problematic but am confident we could change it later if we wanted.

Maybe your point about the error experience would lead us to a different outcome - I'm not sure. But anyway this isn't something I feel strongly about because I think the perf characteristics would be overall the same and with it being internal it could be changed any time later.

}

webEventData = WebEventData.Parse(Renderer, _jsonContext, eventDescriptorJson, eventArgsJson);
webEventData = WebEventData.Parse(Renderer, JSRuntime.ReadJsonSerializerOptions(), eventDescriptorJson, eventArgsJson);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way we could push this logic all the way down to the serializer level? (Keep the logic within WebEventData and avoid the intermediate JsonElement deserialization)

Or alternatively, defer any deserialization until we are out of the serializer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's do this in a follow up. I want to unblock the size increase issues for WASM for preview7

@@ -34,23 +35,38 @@ public static void NotifyLocationChanged(string uri, bool isInterceptedLink)
/// For framework use only.
/// </summary>
[JSInvokable(nameof(DispatchEvent))]
public static Task DispatchEvent(WebEventDescriptor eventDescriptor, string eventArgsJson)
public static async Task DispatchEvent(string eventDescriptorJson, string eventArgsJson)
Copy link
Member

@javiercn javiercn Jul 5, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think @SteveSandersonMS did this, but with Tanay's work, this can be a byte [] which cuts the size in half for the allocated memory here (since the string is UTF-16).

Copy link
Member

@javiercn javiercn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great!

Just a couple of small comments/suggestions on further optimizations.

pranavkm and others added 6 commits July 6, 2021 09:25
* Adding source generators without doing all of the work results in a size increase. We can avoid issues with reflection
and the additional size hit by writing bespoke readers.
* An additional optimziation this PR includes is to avoid a unicode string -> utf8 byte conversion along with the associated
string allocation by using JsonElement as the contract for deserializing JSON payload. At least in the Blazor Server scenario,
we could further consider pooling the byte array used for receiving and deserializing the JSON payload making parsing browser events
largely allocation free.
…teropMethods.cs

Co-authored-by: campersau <buchholz.bastian@googlemail.com>
* On WebAssembly, avoid double-JSON-encoding and extra UTF8 decoding step

* On Server, combine the two-per-event binary data messages into one
@pranavkm pranavkm merged commit 51dfc46 into main Jul 6, 2021
@pranavkm pranavkm deleted the prkrishn/json branch July 6, 2021 22:34
@ghost ghost added this to the 6.0-preview7 milestone Jul 6, 2021
public static WebEventData Parse(
Renderer renderer,
JsonSerializerOptions jsonSerializerOptions,
JsonElement eventDescriptorJson,
Copy link
Member

@eerhardt eerhardt Jul 6, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Silly question - but why not base these on Utf8JsonReader instead of JsonElement?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There was one code path (among 3 that use this) that already constructed a JsonElement, so it was easier to use this than to do a bigger refactoring. there.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. Thanks for the answer.

If we were really looking to squeeze out the perf, I wonder how much more could be gained to refactor it to straight Utf8JsonReader. But if we already have acceptable perf, it is probably not worth it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-blazor Includes: Blazor, Razor Components
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[WASM] SoD Changes
7 participants