Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Developers using JsonSerializer can asynchronously (de)serialize IAsyncEnumerable<T> #1570

Closed
davidfowl opened this issue Jun 13, 2019 · 78 comments · Fixed by #50778
Closed
Assignees
Labels
api-approved area-System.Text.Json Cost:S Priority:1 User Story
Milestone

Comments

@davidfowl
Copy link
Member

@davidfowl davidfowl commented Jun 13, 2019

It would serialize to a JSON array but in a streaming manner. This would be very useful for things like MVC that want to support returning entity framework queries (which implement IAsyncEnumerable) to the response steam without buffering the entire enumeration first (which is what is currently being implemented https://github.com/aspnet/AspNetCore/pull/11118/files).

EDIT @eiriktsarpalis see #1570 (comment) for the full API proposal.

@ahsonkhan
Copy link
Member

@ahsonkhan ahsonkhan commented Jun 17, 2019

cc @steveharter, @pranavkm, @layomia

@rossbuggins
Copy link

@rossbuggins rossbuggins commented Aug 5, 2019

You guys might be able to point me to if my issue with quick code for this is HttpClient or MVC related? dotnet/aspnetcore#12883 (comment)

@ahsonkhan
Copy link
Member

@ahsonkhan ahsonkhan commented Sep 27, 2019

From @tangkhaiphuong in https://github.com/dotnet/corefx/issues/41378

I start the sample project with ASP.NET Core 3.0 API and implement the Route like

    [ApiController]
    [Route("[controller]")]
    public class WeatherForecastController : ControllerBase
    {
        private static readonly string[] Summaries = new[]
        {
            "Freezing", "Bracing", "Chilly", "Cool", "Mild", "Warm", "Balmy", "Hot", "Sweltering", "Scorching"
        };

        private readonly ILogger<WeatherForecastController> _logger;

        public WeatherForecastController(ILogger<WeatherForecastController> logger)
        {
            _logger = logger;
        }

        [HttpGet]
        public async IAsyncEnumerable<WeatherForecast> Get()
        {
            var rng = new Random();
            for (var index = 0; index < 10; ++index)
            {
                yield return new WeatherForecast
                {
                    Date = DateTime.Now.AddDays(index),
                    TemperatureC = rng.Next(-20, 55),
                    Summary = Summaries[rng.Next(Summaries.Length)]
                };
                await Task.Delay(2000).ConfigureAwait(false);
            }
        }
    }

The hit the F5 in VS 2019. On browser http://localhost:53237/weatherforecast I need to wait after the 20s then receive the full JSON array. I found on headers response include Content-Length:
I expect ASP.NET Core should return in-stream JSON item continuously instead wait to full buffer then return the whole large JSON.

Is there any configuration to enable to allow stream data chunk to the client immediately without blocking and wait?

@rossbuggins
Copy link

@rossbuggins rossbuggins commented Sep 27, 2019

Hi,

Can you try on Linux and we’ll as windows, I’ve got an issue open where I see this on windows but it streams on Linux as the host,

Cheers,

Ross

@steveharter
Copy link
Member

@steveharter steveharter commented Sep 27, 2019

For IAsyncEnumerable<T> I recently created this issue which appears to be a duplicate: https://github.com/dotnet/corefx/issues/41358

@steveharter steveharter self-assigned this Sep 27, 2019
@layomia
Copy link
Member

@layomia layomia commented Nov 23, 2019

@steveharter #1569 has just deserialization APIs - might be good to consolidate these two issues.

@ericstj ericstj transferred this issue from dotnet/corefx Jan 9, 2020
@Dotnet-GitSync-Bot Dotnet-GitSync-Bot added the untriaged label Jan 9, 2020
@ericstj ericstj added this to To do in System.Text.Json - 6.0 Jan 9, 2020
@ericstj ericstj added area-System.Text.Json and removed untriaged labels Jan 9, 2020
@ericstj ericstj added this to the 5.0 milestone Jan 9, 2020
@JohnGalt1717
Copy link

@JohnGalt1717 JohnGalt1717 commented Jan 24, 2020

I would love to see this. Major perf enhancement for both memory and time to first data to clients (and clients could handle the stream instead if they wanted and parse the stream in json immediately instead of waiting for the entire response which has a perf benefit too.)

The key is to make sure that the entire MVC pipeline can handle the IAsyncEnumerable so that if you return ActionResult it will stream it to the client.

And it should also work on a wrapper object. So if I have:

public class BaseResponse {
public IAsyncEnumerable Result {get; set;}
}

It should serialize this over the wire in a streaming manner to the clients without ever loading the entire thing into memory when I return ActionResult and return it a base response with an IAsyncEnumerable in it. I.e. MVC needs to stream it instead of just dumping the string produced by the serializer, and the serializer needs to stream the object as it goes and handle the sub IAsyncEnumerable properly and stream the array out properly.

To me that's the acceptance test on this one. Stream it out as json incrementally in both those cases, and the client can either wait for the entire string and jsonify it, or grab the stream and stream it using a json reader to deserialize.

@layomia
Copy link
Member

@layomia layomia commented Apr 14, 2020

From @steveharter in #1569:

This is a placeholder to uptake the recent IAsyncEnumerable feature for the serializer.

This includes creating an enumerator class that that implements IAsyncEnumerable and accepts a Stream in its constructor (and others TBD) that contains the actual JSON which can be accessed asynchronously:

public class SerializerAsyncEnumerable<T> : IAsyncEnumerable<T>
{
    public SerializerAsyncEnumerable(Stream stream);
    public SerializerAsyncEnumerable(Stream stream, JsonSerializerOptions options);
...
}

and also likely exposed from the JsonSerializer static class:

public static class JsonSerializer
{
...
    public SerializerAsyncEnumerable<T> DeserializeEnumerableAsync(Stream stream);
    public SerializerAsyncEnumerable<T> DeserializeEnumerableAsync(Stream stream, JsonSerializerOptions options);
...
}

The implementation will leverage the same patterns as in the existing async methods which preserve the unprocessed data from the stream (since the stream just returns raw JSON which won't align with object instances) and the "state" for continuing object deserialization as new JSON is obtained.

@layomia layomia changed the title Add support for asynchronously serializing IAsyncEnumerable<T> Add support for asynchronously (de)serializing IAsyncEnumerable<T> Apr 14, 2020
@logiclrd
Copy link
Contributor

@logiclrd logiclrd commented May 23, 2020

Not saying this isn't a desirable feature, but doesn't IEnumerable<T> already stream elements to the client? The difference between IEnumerable<T> and IAsyncEnumerable<T> is that each time it retrieves the next element in the enumeration, the non-async version blocks the thread, while the async version can release the thread during blocking operations. But, this blocking only applies while retrieving that one element. So, it's still better to be async, but if you serialize an IEnumerable<T>, you'll get elements being written to the body stream as they come out of the enumeration, in realtime. The real problem is that the body stream has some buffering on it, and you need to flush it periodically, otherwise you'll only receive data in big chunks each time the buffer fills up.

This tiny test app shows that JsonSerializer will stream an IEnumerable just fine -- the IEnumerable in question, produced by StreamSomeResults, has no end, so if JsonSerializer were expecting to buffer it, it would never produce any output. The internals of JsonSerializer dynamically grow the buffer, so the output is a bit choppy, but the fact that you're seeing it means that the data is being streamed. :-)

using System;
using System.Collections.Generic;
using System.Text.Json;
using System.Threading;
using System.Threading.Tasks;

namespace TestJsonSerializerStreamEnumerable
{
	class Program
	{
		static IEnumerable<string> StreamSomeResults()
		{
			while (true)
			{
				yield return Guid.NewGuid().ToString();
				Thread.Sleep(50);
			}
		}

		static async Task Main(string[] args)
		{
			var serializerOptions = new JsonSerializerOptions();

			serializerOptions.DefaultBufferSize = 1;

			var stdout = Console.OpenStandardOutput(bufferSize: 1);

			await JsonSerializer.SerializeAsync(stdout, StreamSomeResults(), serializerOptions);
		}
	}
}

So, yes, support for IAsyncEnumerable is a good thing, but the original post's premise seems to be flawed. The specific example listed, of streaming Entity Framework query results, could be implemented in the interim with a simple adapter that wraps an IAsyncEnumerable and waits synchronously on its results:

// Untested code
public class AsyncEnumerableSyncWrapper<T> : IEnumerable
{
  IAsyncEnumerable<T> _wrapped;

  public AsyncEnumerableSyncWrapper(T toWrap) { _wrapped = toWrap; }

  public IEnumerator<T> GetEnumerator() => new AsyncEnumeratorSyncWrapper<T>(_wrapped.GetAsyncEnumerator().GetAwaiter().GetResult());
}

public class AsyncEnumeratorSyncWrapper<T> : IEnumerator, IDisposable
{
  IAsyncEnumerator<T> _wrapped;

  public AsyncEnumeratorSyncWrapper(T toWrap) { _wrapped = toWrap; }

  public T Current => _wrapped.Current;
  public bool MoveNext() => _wrapped.MoveNextAsync().GetAwaiter().GetResult();
  public void Dispose() => _wrapped.DisposeAsync().GetAwaiter().GetResult();
}

@JohnGalt1717
Copy link

@JohnGalt1717 JohnGalt1717 commented May 23, 2020

The purpose of this is the following:

Database query is executed, mapped to DTOs as part of the select, and then sent to the client.

Right now, the entire recordset is loaded into memory and returned as an ienumerable that is then sent on to the client in whatever serialization format you request, and the client may or may not stream that.

In this scenario, that database query is returned as an IAsyncEnumerable instead, and then streams record by record that data into the serializer which is then streamed out of the HTTP response.

In the later, the response starts sooner, and only the memory necessary for the single record is required at any given time instead of the entire recordset.

Notice how much more efficient and responsive the latter is versus the former?

That's what this is about. Streaming data from a source, to the client, without having to load the entire source into memory, which isn't possible right now with asp.net core except for files. Using IAsyncEnumerable and ensuring that the entire serialization and response pipeline supports it without caching in memory solves this limitation and in real world scenarios means a HUGE perf improvement on your most expensive operations.

(and since almost all REST calls are read in most real world scenarios, this has a huge impact on scalability and responsiveness for most APIs)

@onionhammer
Copy link

@onionhammer onionhammer commented May 23, 2020

@JohnGalt1717 not to mention it throws after 8192 records (IIRC)

@logiclrd
Copy link
Contributor

@logiclrd logiclrd commented May 23, 2020

That's all true but missing my point. It's not buffering and blowing up because it's IEnumerable and not IAsyncEnumerable. The async vs non-async only applies to how individual records are retrieved from the enumerator. System.Text.Json absolutely should learn how to work with IAsyncEnumerable, but it doesn't need to be async in order to not buffer. The thing that's buffering is ASP.NET Core's route handling, not System.Text.Json and not because of whether the enumerable is async or not.

This issue's description is:

It would serialize to a JSON array but in a streaming manner. This would be very useful for things like MVC that want to support returning entity framework queries (which implement IAsyncEnumerable) to the response steam without buffering the entire enumeration first (which is what is currently being implemented https://github.com/aspnet/AspNetCore/pull/11118/files).

The premise of this description is that JsonSerializer cannot stream serialized results if they're IEnumerable, and that it needs to be IAsyncEnumerable to make this work, but this premise is wrong. The buffering isn't happening in the serializer at all, and JsonSerializer is perfectly capable of streaming individual items from an IEnumerable.

There is a problem, though, in that JsonSerializer.Serialize, when passed an IAsyncResult, simply writes {} to the output stream.

This code illustrates this:

public interface IDataSource
{
  IEnumerable<string> GetData();
  IAsyncEnumerable<string> GetAsyncData();
}

[Route("/v1")]
public class TestController
{
  IDataSource _dataSource;

  [HttpGet("implicit")]
  public IEnumerable<string> ThisWillBufferTheEntireResultSet()
    => _dataSource.GetData();

  [HttpGet("broken")]
  public IAsyncEnumerable<string> ThisWillNotSerializeCorrectly()
    => _dataSource.GetAsyncData();

  [HttpGet("explicit")]
  public EnumerableDataResult ThisWillReturnDataAsItBecomesAvailableEvenThoughItIsNotAsync()
    => new EnumerableDataResult(_dataSource.GetData());

  [HttpGet("streaming")]
  public UnbufferedEnumerableDataResult ThisWillReturnDataItemsImmediatelyOneByOneWithNoBuffering()
    => new UnbufferedEnumerableDataResult(_dataSource.GetData());
}

class EnumerableDataResult : IActionResult
{
  IEnumerable<string> _data;
  public EnumerableDataResult(IEnumerable<string> data) { _data = data; }

  public async Task InvokeAsync(ActionContext context)
  {
    await JsonSerializer.SerializeAsync(context.HttpContext.Response.Body, _data, context.HttpContext.RequestAborted); // NB: last parameter here is important!
  }
}

class UnbufferedEnumerableDataResult : IActionResult
{
  IEnumerable<string> _data;
  public UnbufferedEnumerableDataResult (IEnumerable<string> data) { _data = data; }

  static readonly byte[] JSONArrayStart = new byte[] { (byte)'[' };
  static readonly byte[] JSONArraySeparator = new byte[] { (byte)',' };
  static readonly byte[] JSONArrayEnd = new byte[] { (byte)']' };

  public async Task InvokeAsync(ActionContext context)
  {
    // NB: uses of context.HttpContext.RequestAborted here are important!
    await context.HttpContext.Response.Body.WriteAsync(JSONArrayStart, context.HttpContext.RequestAborted);
  
    foreach (string item in _data)
    {
      await JsonSerializer.SerializeAsync(context.HttpContext.Response.Body, item, cancellationToken: context.HttpContext.RequestAborted);
      await context.HttpContext.Response.Body.WriteAsync(JSONArraySeparator, context.HttpContext.RequestAborted);

      await context.HttpContext.Response.Body.FlushAsync(context.HttpContext.RequestAborted);
    }

    await context.HttpContext.Response.Body.WriteAsync(JSONArrayEnd, context.HttpContext.RequestAborted);
  }
}

@JohnGalt1717
Copy link

@JohnGalt1717 JohnGalt1717 commented May 23, 2020

I don't believe that's actually true. I've tried it, and if I return an IAsyncEnumerable from EF Core on the mvc endpoint, it loads the entire thing in memory and then streams it to the client via the serializer.

If I have this:

public async Task<IAsyncEnumerable> SomeMethod() {
return someEfCoreResult.AsAsyncEnumerable();
}

It should stream record by record through the pipeline as a streamed serialized result. It absolutely doesn't do anything of the sort right now, the serializer gets the ENTIRE resultset in memory and then serializes.

That's what this ticket is partially about. The other part is making the entire pipeline non-blocking so that it runs in it's own thread the entire way from ef core query to client receiving the data in a stream.

And it should do so without me having to jump through hoops in the action result method and do so based on the accepts header.

@logiclrd
Copy link
Contributor

@logiclrd logiclrd commented May 23, 2020

Oh? I tried passing IAsyncEnumerable into JsonSerializer and all it ever gives back is {}. Maybe ASP.NET Core has special handling in between, though, in the way that it is buffering the response. Again, the problem is ASP.NET Core buffering the response, not the serialization infrastructure, and whether it is async or not is irrelevant.

Making something async doesn't make it run in its own thread. It is in fact exactly the opposite. Without async, the only pattern that is really available for concurrent execution is to dedicate a thread to the task. With async, the task is divorced from threads, so that it acquires a thread pool thread for each little bit of work it needs to do, immediately releasing it when the thread is blocking, and depending on the kernel to know how to schedule the continuation to execute when the blocking operation unblocks.

An async method that is forced to retrieve its data from a non-async IEnumerable will have the following characteristics:

  • It operates asynchronously whenever it is writing data.
  • When it is obtaining data from the data source, for each individual record, the thread obtained for that chunk of processing will block if necessary before returning.
  • No resource is hoarded for the duration of the entire enumeration process.

It is true that no blocking at all would be even better, and IAsyncEnumerable would mean that for each of those individual record retrievals, calls into EF that ended up blocking would temporarily release the thread to the thread pool.

But it is categorically not the case that using IEnumerable intrinsically means a thread will be tied
up for the entire duration. The problem arises specifically when you pass the IEnumerable (or IAsyncEnumerable) back up to the ASP.NET Core infrastructure and let it handle it, because that is the point that buffers the entire result set before it starts sending anything.

I don't know the internals at all, so this could be entirely wrong, but my speculation is that making the ASP.NET Core controller infrastructure stream its results out instead of buffering them would be a major refactoring and not a simple change at all, requiring that control be inverted across different layers of the flow. Without that change, it is largely irrelevant whether IEnumerable or IAsyncEnumerable is used, because ASP.NET Core is going to transfer the entire enumeration into its own buffer before passing it on.

But what wouldn't be a major refactoring would be to have JsonSerializer support IAsyncEnumerable.

As I understand it, the example I showed with IActionResults is the canonical way to stream results from an ASP.NET Core controller.

@davidfowl
Copy link
Member Author

@davidfowl davidfowl commented May 23, 2020

@logiclrd is right that there’s 2 types of “async” happening here:

  1. Asynchronous IO (writing the data async)
  2. Producing each item asynchronously

ASP.NET is currently buffering to avoid having to implement custom array serialization, that should be in the JSON serializer (hence this issue).

I also agree that for EF it’s not as big a deal if results are produced “quickly” enough then there’s blocking only as each item is being produced not written to the response.

That’s not great for then general case of streaming though as it’ll block a thread while waiting for the next result to be produced.

@logiclrd
Copy link
Contributor

@logiclrd logiclrd commented May 23, 2020

I don't think ASP.NET is buffering only because of System.Text.Json not supporting IAsyncEnumerable, though. I will double-check this, but I'm pretty sure it has been my observation that ASP.NET also buffers the entire set when you return a plain IEnumerable from a controller.

@logiclrd
Copy link
Contributor

@logiclrd logiclrd commented May 23, 2020

Okay, I tested it and I was wrong. In my earlier testing, I just didn't wait long enough for it to fill the buffer. The test had a delay between objects returned, and that delay meant the data rate was low enough that it could run for quite some time without producing any output.

There is in fact a serious bug, though: If you return an IEnumerable that produces data indefinitely from your controller, then it appears that ASP.NET Core never stops reading from it. This is due to the fact that ASP.NET Core suppresses all write errors. I believe the fix for this is to supply HttpContext.RequestAborted as the cancellationToken parameter for JsonSerializer.SerializeAsync.

So, maybe the solution in https://github.com/aspnet/AspNetCore/pull/11118/files is wrong. If it used the adapter I showed in my first comment on this issue, then it could pull records from an IAsyncEnumerable<T> one at a time, while serializing. The adapter could be passed into JsonSerializer, which knows what to do with any IEnumerable<T>. This would effectively work around the fact that JsonSerializer doesn't know how to handle IAsyncEnumerable<T>, and could be torn out if/when System.Text.Json gets fixed upstream.

@logiclrd
Copy link
Contributor

@logiclrd logiclrd commented May 27, 2020

I initially arrived at this issue with ASP.NET Core-coloured glasses, and feel a bit foolish now, as it seems clear that this issue was in fact created for the upstream problem in System.Text.Json itself. The things I wrote were ultimately not technically wrong, but they were in response to misunderstanding the ticket's description in the first place.

I see that @steveharter self-assigned this issue back in November, but there haven't been further updates w.r.t. an implementation. I have done some experimentation locally and would like to consider submitting a PR to make System.Text.Json support serializing IAsyncEnumerable.

The preceding issue #1569 talks about deserializing IAsyncEnumerable values. The solution it proposes seems reasonable except for doing it out of a Stream. I can't think of a way to do it without buffering, because you could have a data type such as:

public class Ariannus
{
  public IAsyncEnumerable<Person> BlackAndWhite { get; set; }
  public IAsyncEnumerable<Person> WhiteAndBlack { get; set; }
}

In order to stream the deserialization of this, the underlying data stream would need to be in two places at once, because there's no way to know if the caller wants BlackAndWhite first or WhiteAndBlack first -- or indeed interleaves the two. The situation is further complicated by the possibility of nesting.

I can't think of anything better than to make a List<T> that implements IAsyncEnumerable and just returns every call with a completed task wrapping already-deserialized data.

@pranavkm
Copy link
Contributor

@pranavkm pranavkm commented Jun 9, 2020

@steveharter \ @layomia would you mind pinging me once this feature is in? We'll need to teach MVC to take advantage of this feature once it lands.

@steveharter
Copy link
Member

@steveharter steveharter commented Mar 22, 2021

@JohnGalt1717

And it should also work on a wrapper object. So if I have:

public class BaseResponse {
public IAsyncEnumerable Result {get; set;}
}

It should serialize this over the wire in a streaming manner to the clients without ever loading the entire thing into memory when I return ActionResult and return it a base response with an IAsyncEnumerable in it. I.e. MVC needs to stream it instead of just dumping the string produced by the serializer, and the serializer needs to stream the object as it goes and handle the sub IAsyncEnumerable properly and stream the array out properly.

To me that's the acceptance test on this one. Stream it out as json incrementally in both those cases, and the client can either wait for the entire string and jsonify it, or grab the stream and stream it using a json reader to deserialize.

Are you assuming only 1 nested level deep (BaseResponse.Result) or all levels?

Supporting all levels is possible, but I think the majority of users would be surprised that the serializer is "faulting in all delayed-loaded IAsyncEnumerable properties" which perhaps could write a huge amount of unnecessary and unexpected data to the Stream. In addition, it wouldn't round-trip as expected on the DeserializeAsync* methods.

If only 1 nested level is supported, that may be difficult to document and\or discover from a usability perspective.

If supporting IAsyncEnumerable serialization of nested levels is a valid scenario, I think it would be better to not do that by default and add a JsonSerializerOptions.DeepSerializeIAsyncEnumerable bool property that can be set to true that would enable that.

@eiriktsarpalis
Copy link
Member

@eiriktsarpalis eiriktsarpalis commented Mar 22, 2021

Following feedback from the latest API review, we have decided to pivot on the design: I've been prototyping a JsonConverter implementation for IAsyncEnumerable<T> which would make it possible to stream IAE values using the existing SerializeAsync method:

The design comes with a few caveats that are worth noting:

  • Only root-level IAsyncEnumerable values would be supported. While it is technically possible to serialize nested properties with a JsonConverter design, we want to minimize the risk of users accidentally streaming large amounts of data to the stream. Nested IAsyncEnumerable properties would throw NotSupportedException but as @steveharter mentioned we might consider exposing a JsonSerializerOptions.DeepSerializeIAsyncEnumerable setting for users wanting to opt in.
  • IAsyncEnumerable serialization is only supported via the SerializeAsync method. Invoking the converter via any of the sync Serialize methods will result in a NotSupportedException being thrown.
  • IAsyncEnumerable deserialization is not supported, and a NotSupportedException will be thrown. While we could try to support this scenario it would necessarily require us to buffer all elements before the deserialization is complete. We could revisit the design in the future, but ultimately a dedicated DeserializeAsyncEnumerable method as originally described would be the most appropriate solution to IAsyncEnumerable deserialization.
  • Because of the above constraints, only values of type IAsyncEnumerable<T> would be supported by the new converter. Types implementing IAsyncEnumerable<T> would still default to their current serialization semantics.

cc @davidfowl @pranavkm

@pranavkm
Copy link
Contributor

@pranavkm pranavkm commented Mar 22, 2021

Because of the above constraints, only values of type IAsyncEnumerable would be supported by the new converter. Types implementing IAsyncEnumerable would still default to their current serialization semantics.

Could you explain what the results would be for this type: https://github.com/dotnet/efcore/blob/809fe7a219c99fbe4f3576ef42bd084d4a6ce056/src/EFCore/Query/Internal/EntityQueryable%60.cs#L24-L26 if the call to Serialize looked like so JsonSerailizer.SerializeAsync(stream. typeof(IAsyncEnumerable<SomePoco>>)?

@logiclrd
Copy link
Contributor

@logiclrd logiclrd commented Mar 22, 2021

Off the top of my head:

  • Having a deserialization point to pair with serialization would be important, so even if it isn't part of the initial implementation, it would feel quite odd to me if ultimately only serialization supported IAsyncEnumerable. (But, yes, it would need to be its own top-level method.)
  • If a type implements IAsyncEnumerable<T> and adds additional properties, then there is a clear conflict, because you can't serialize { those properties } and [ the IAsyncEnumerable elements ] to the same output stream. But, if someone has implemented IAsyncEnumerable just to provide a specific implementation of IAsyncEnumerable, is there perhaps an argument for providing some mechanism for allowing the IAsyncEnumerable behaviour? Maybe detect it automatically, similarly to nested IAsyncEnumerable values, so that if a type is encountered that implements IAsyncEnumerable, and it has members that aren't compatible with treating the entire object as a JSON array, then it throws NotSupportedException?
  • Technically, throwing NotSupportedException from non-async Serialize is a breaking change -- what if there are people whose codebases currently depend on serializing an IAsyncEnumerable emitting {}? Should this be configurable behaviour, or is it okay to just say, "Here is a breaking change that comes with this upgrade"?

@logiclrd
Copy link
Contributor

@logiclrd logiclrd commented Mar 22, 2021

@pranavkm My analysis is that that type would require special handling, because from @eiriktsarpalis's description, it would simply not be supported at all, and even with my proposal, the presence of members such as ElementType and Expression would conflict with the object being represented as an array. What happens today when that object is serialized?

@eiriktsarpalis
Copy link
Member

@eiriktsarpalis eiriktsarpalis commented Mar 22, 2021

Could you explain what the results would be for this type: https://github.com/dotnet/efcore/blob/809fe7a219c99fbe4f3576ef42bd084d4a6ce056/src/EFCore/Query/Internal/EntityQueryable%60.cs#L24-L26 if the call to Serialize looked like so JsonSerailizer.SerializeAsync(stream. typeof(IAsyncEnumerable>)?

If the inputType is set to typeof(IAsyncEnumerable<SomePoco>) then it would serialize the async enumeration, however if the inputType was set to typeof(EntityQueryable<TResult>) it would revert to whatever converter the serializer is currently using for that class. This is contrary to how we handle other collections (and one might argue in violation of Liskov's substitution principle), however I'm proposing this approach to mitigate accidentally streaming large amounts of data just because one POCO happens to implement IAsyncEnumerable<T>.

@eiriktsarpalis
Copy link
Member

@eiriktsarpalis eiriktsarpalis commented Mar 22, 2021

Technically, throwing NotSupportedException from non-async Serialize is a breaking change -- what if there are people whose codebases currently depend on serializing an IAsyncEnumerable emitting {}? Should this be configurable behaviour, or is it okay to just say, "Here is a breaking change that comes with this upgrade"?

I tend to agree, one potential mitigation might be to have users opt into any IAsyncEnumerable serialization using a flag in JsonSerializerOptions.

@pranavkm
Copy link
Contributor

@pranavkm pranavkm commented Mar 22, 2021

I tend to agree, one potential mitigation might be to have users opt into any IAsyncEnumerable serialization using a flag in JsonSerializerOptions.

Alternatively requiring the user to register the JsonConverter. Either option seems like a better way to mitigate the risk of accidentally streaming too much content while making the API feel consistent.

@eiriktsarpalis
Copy link
Member

@eiriktsarpalis eiriktsarpalis commented Mar 24, 2021

Following feedback I have updated the prototype so that the IAsyncEnumerable converter becomes a public class users can opt into. This can be done either by registering the converter in a JsonSerializerOptions instance:

var options = new JsonSerializerOptions { Converters = { new JsonAsyncEnumerableConverter() } };

or by annotating relevant properties:

public class AsyncEnumerableDto<TElement>
{
    [JsonConverter(typeof(JsonAsyncEnumerableConverter))]
    public IAsyncEnumerable<TElement> Data { get; set; }
}
  • Users are required to opt into IAsyncEnumerable serialization, removing potential for breaking changes or accidental streaming of data.
  • Using the converter in sync serialization will result in a NotSupportedException being thrown.
  • Using the converter in deserialization will result in a NotSupportedException being thrown. In the future we might expose an optional flag that supports deserialization by buffering all data in a dummy IAsyncEnumerable<T> implementation.
  • The converter can be applied to any type that implements IAsyncEnumerable<T>.

Proposed API

namespace System.Text.Json.Serialization
{
    public class JsonAsyncEnumerableConverter : JsonConverterFactory
    {
        public JsonAsyncEnumerableConverter() { }
    }
}

Optional DeserializeAsyncEnumerable method:

namespace System.Text.Json
{
    public static class JsonSerializer
    {
        public static IAsyncEnumerable<TValue?> DeserializeAsyncEnumerable<TValue>(Stream utf8Json, JsonSerializerOptions? options = null) { throw null; }
    }
}

@pranavkm
Copy link
Contributor

@pranavkm pranavkm commented Mar 24, 2021

This looks great. MVC will enable this converter by default and we'll document how you could undo it if you absolutely need to, but this should cover all of our 6.0 scenarios.

In the future we might expose an optional flag that supports deserialization by buffering all data in a dummy IAsyncEnumerable implementation.

Sounds practical.

@eiriktsarpalis
Copy link
Member

@eiriktsarpalis eiriktsarpalis commented Mar 24, 2021

MVC will enable this converter by default and we'll document how you could undo it if you absolutely need to

Enabling it by default sounds risky to me. Users might have DTOs that unbeknownst to them implement IAsyncEnumerable in strange ways (e.g. because some base class is doing it). Wouldn't it make more sense to use a dedicated ActionResult type for streaming IAsyncEnumerables?

In the future we might expose an optional flag that supports deserialization by buffering all data in a dummy IAsyncEnumerable implementation.

Sounds practical.

It can be convenient, but there are gotchas involved. Really it would just be a List<T> that trivially implements IAsyncEnumerable<T>, which is likely to contradict user expectations and lead to increased memory usage.

@davidfowl
Copy link
Member Author

@davidfowl davidfowl commented Mar 24, 2021

It can be convenient, but there are gotchas involved. Really it would just be a List that trivially implements IAsyncEnumerable, which is likely to contradict user expectations and lead to increased memory usage.

We have this problem, today so enabling it by default would be an improvement 😄

@logiclrd
Copy link
Contributor

@logiclrd logiclrd commented Mar 24, 2021

Enabling it by default sounds risky to me. Users might have DTOs that unbeknownst to them implement IAsyncEnumerable in strange ways (e.g. because some base class is doing it).

Doing it across a major version change somewhat makes it more "okay" for this breakage to occur, and it then illuminates this weirdness in the consuming codebases so that it can be sorted out properly. :-)

@eiriktsarpalis
Copy link
Member

@eiriktsarpalis eiriktsarpalis commented Mar 26, 2021

I've updated the prototype to include optional support for deserialization.

Usage Examples

Users can opt into IAsyncEnumerable serialization by registering a new converter factory:

var options = new JsonSerializerOptions { Converters = { new JsonAsyncEnumerableConverter() } };

or by annotating relevant properties:

public class AsyncEnumerableDto<TElement>
{
    [JsonConverter(typeof(JsonAsyncEnumerableConverter))]
    public IAsyncEnumerable<TElement> Data { get; set; }
}

Serialization Example

async IAsyncEnumerable<int> PrintNumbers(int n)
{
  for (int i = 0; i < n; i++) yield return i;
}

Stream stream = Console.OpenStandardOutput();
await JsonSerializer.SerializeAsync(new AsyncEnumerableDto { Data = PrintNumbers(5) }, options); // prints { "Data" : [0,1,2,3,4] }

Deserialization Example

Streaming IAsyncEnumerable deserialization can only happen using the dedicated DeserializeAsyncEnumerable method, handling only root-level json arrays:

Stream utf8Json = new MemoryStream(Encoding.UTF8.GetBytes(@"[ { ""value"" : 0 }, { ""value"" : 1 } ]"));
await foreach(MyPoco item in JsonSerializer.DeserializeAsyncEnumerable<MyPoco>(stream, options))
{
    count += item.Value;
}

Additional Information

  • Users are required to opt into IAsyncEnumerable serialization, removing potential for breaking changes or accidental streaming of data.
  • Using the converter in sync serialization will result in a NotSupportedException being thrown.
  • Unless the optional supportDeserialization flag is enabled, deserialization will result in a NotSupportedException being thrown. If enabled, it will buffer all data in a dummy IAsyncEnumerable<T> implementation that wraps a List<T>.
  • The converter can be applied to any type that implements IAsyncEnumerable<T>.

Proposed API

namespace System.Text.Json.Serialization
{
    public class JsonAsyncEnumerableConverter : JsonConverterFactory
    {
        public JsonAsyncEnumerableConverter() { }
        public JsonAsyncEnumerableConverter(bool supportDeserialization) { }
        public bool SupportsDeserialization { get; }
    }
}

Optional DeserializeAsyncEnumerable method:

namespace System.Text.Json
{
    public static class JsonSerializer
    {
        public static IAsyncEnumerable<TValue?> DeserializeAsyncEnumerable<TValue>(Stream utf8Json, JsonSerializerOptions? options = null) { throw null; }
    }
}

@eiriktsarpalis eiriktsarpalis added api-ready-for-review blocking and removed api-needs-work labels Mar 26, 2021
@terrajobst
Copy link
Member

@terrajobst terrajobst commented Apr 1, 2021

  • We don't believe we need any opt-in or opt-out
namespace System.Text.Json
{
    public partial class JsonSerializer
    {
        public static IAsyncEnumerable<TValue?> DeserializeAsyncEnumerable<TValue>(Stream utf8Json, JsonSerializerOptions? options = null);
    }
}

@terrajobst terrajobst added api-approved and removed api-ready-for-review blocking labels Apr 1, 2021
@pranavkm
Copy link
Contributor

@pranavkm pranavkm commented Apr 1, 2021

Would the serialization requiring opting in?

@eiriktsarpalis
Copy link
Member

@eiriktsarpalis eiriktsarpalis commented Apr 1, 2021

Would the serialization requiring opting in?

Both serialization and buffered deserialization would be turned on by default. Note that the value would need to be explicitly of type IAsyncEnumerable<T>.

@msftbot msftbot bot added the in-pr label Apr 6, 2021
@msftbot msftbot bot removed the in-pr label Apr 12, 2021
@msftbot msftbot bot locked as resolved and limited conversation to collaborators May 12, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
api-approved area-System.Text.Json Cost:S Priority:1 User Story
Projects
None yet
Development

Successfully merging a pull request may close this issue.