Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IAsyncEnumerable 2 times slower to enumerate than a regular IEnumerable? #1560

Closed
kikaragyozov opened this issue Aug 5, 2020 · 8 comments
Closed
Labels

Comments

@kikaragyozov
Copy link

kikaragyozov commented Aug 5, 2020

What the hell?

I get it that you leverage the async-await flow, which means if done correctly, no thread is ever blocked, but I'd gladly block a thread to do the IO if it meant up to 2x performance increase in speed.

What is happening in the lower levels? You can easily test this.

Just call CsvReader.GetRecordsAsync<YourObject>() and enumerate, versus CsvReader.GetRecords<YourObject>() and enumerate.

I feel like there's some fake-async going on, because there's no way this gets slowed down this much from the over-heads of thread-switching/context switching.

Tested on a Console Application, by running the process on a Thread pool thread (Main Console thread Block-waiting for it to finish)

@kikaragyozov kikaragyozov changed the title IAsyncEnumerable 5 to 10 times slower to enumerate than a regular IEnumerable? IAsyncEnumerable 2 times slower to enumerate than a regular IEnumerable? Aug 5, 2020
@JoshClose
Copy link
Owner

Does increasing the buffer size significantly make a difference? The only place async happens is reading from the Stream into the buffer.

context.CharsRead = await context.Reader.ReadAsync(context.Buffer, context.BufferPosition, context.ParserConfiguration.BufferSize).ConfigureAwait(false);

@kikaragyozov
Copy link
Author

kikaragyozov commented Aug 6, 2020

@JoshClose setting the buffer size to a higher value significantly increased the speed.

Reading 410,000 lines of csv took ~7 seconds synchronously, and now only ~9 seconds asynchronously. This seems about right. Thoughts?

EDIT: It seems that for every 410,000 lines read on my machine, 2 seconds of overhead are added to the total computation when doing the async IO. No matter how high I set the buffer to, I can't bring it down up to the speed of the sync IO. Perhaps this IS the overhead of using 410,000 thread switches and context switches (if any).

TL;DR: If I had 5 million lines in a CSV, synchronous IO would complete 24.38 seconds faster than asynchronous IO.
If I had 20 million lines in a CSV, synchronous IO would complete 97.56 seconds faster than asynchronous IO.

@TonyValenti
Copy link

@spiritbob I would suggest that this is expected behavior. Async methods are not designed to make things faster, they're designed to make threads not block and that does take an additional overhead.

Based on what you've listed, my bet is that you're reading data from a local file which is likely not a good usecase for an Async method. Async reading would be better when you're reading data from a network stream or other remote data source where delays and lags are to be expected. That would make better use of the threadpool.

@kikaragyozov
Copy link
Author

@TonyValenti I'm reading an IFormFile, but I don't think it's possible to directly read it over the network. If the file is too large, I think Microsoft suggests storing it in a MemoryStream, rather than the disk.

I agree that this is the expected behavior.

@joefeser
Copy link

joefeser commented Nov 3, 2020

@spiritbob even it being an IFormFile, it can be backed by any stream, network, file and so forth. Are you reading from the disk or are you taking this as an HTTP request. I am going to guess there is not a buffer that is backing it.

I would test this by using a BufferedStream and set the min to at least 64-128k. We drastically sped up our app that was performing a lot of reads from a network share.

Do you have sample code that you can share?

@kikaragyozov
Copy link
Author

@joefeser it's a simple HTTP request in ASP NET Core 3.1. I believe if the file is less than a certain size, it's stored in the memory, otherwise it's stored in the hard disk? I forgot the exact numbers, feel free to enlighten me.

Was your approach applied to that framework? If so, how?

@joefeser
Copy link

joefeser commented Nov 3, 2020

@spiritbob Network streams should never be used for performance benchmarks. There is no telling how many packet analyzers are in the stream. Especially on a corporate network.

@kikaragyozov
Copy link
Author

@joefeser sorry, if you were referring to my actual testing environment, I believe I read the file from the disk, but my practical use case is ASP NET Core.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants