IAsyncEnumerable 2 times slower to enumerate than a regular IEnumerable? #1560

kikaragyozov · 2020-08-05T13:31:28Z

What the hell?

I get it that you leverage the async-await flow, which means if done correctly, no thread is ever blocked, but I'd gladly block a thread to do the IO if it meant up to 2x performance increase in speed.

What is happening in the lower levels? You can easily test this.

Just call CsvReader.GetRecordsAsync<YourObject>() and enumerate, versus CsvReader.GetRecords<YourObject>() and enumerate.

I feel like there's some fake-async going on, because there's no way this gets slowed down this much from the over-heads of thread-switching/context switching.

Tested on a Console Application, by running the process on a Thread pool thread (Main Console thread Block-waiting for it to finish)

The text was updated successfully, but these errors were encountered:

JoshClose · 2020-08-05T14:10:47Z

Does increasing the buffer size significantly make a difference? The only place async happens is reading from the Stream into the buffer.

CsvHelper/src/CsvHelper/CsvFieldReader.cs

Line 117 in 8185f5d

    
           context.CharsRead = await context.Reader.ReadAsync(context.Buffer, context.BufferPosition, context.ParserConfiguration.BufferSize).ConfigureAwait(false);

kikaragyozov · 2020-08-06T05:19:32Z

@JoshClose setting the buffer size to a higher value significantly increased the speed.

Reading 410,000 lines of csv took ~7 seconds synchronously, and now only ~9 seconds asynchronously. This seems about right. Thoughts?

EDIT: It seems that for every 410,000 lines read on my machine, 2 seconds of overhead are added to the total computation when doing the async IO. No matter how high I set the buffer to, I can't bring it down up to the speed of the sync IO. Perhaps this IS the overhead of using 410,000 thread switches and context switches (if any).

TL;DR: If I had 5 million lines in a CSV, synchronous IO would complete 24.38 seconds faster than asynchronous IO.
If I had 20 million lines in a CSV, synchronous IO would complete 97.56 seconds faster than asynchronous IO.

TonyValenti · 2020-08-21T15:45:07Z

@spiritbob I would suggest that this is expected behavior. Async methods are not designed to make things faster, they're designed to make threads not block and that does take an additional overhead.

Based on what you've listed, my bet is that you're reading data from a local file which is likely not a good usecase for an Async method. Async reading would be better when you're reading data from a network stream or other remote data source where delays and lags are to be expected. That would make better use of the threadpool.

kikaragyozov · 2020-08-21T16:30:25Z

@TonyValenti I'm reading an IFormFile, but I don't think it's possible to directly read it over the network. If the file is too large, I think Microsoft suggests storing it in a MemoryStream, rather than the disk.

I agree that this is the expected behavior.

joefeser · 2020-11-03T00:34:30Z

@spiritbob even it being an IFormFile, it can be backed by any stream, network, file and so forth. Are you reading from the disk or are you taking this as an HTTP request. I am going to guess there is not a buffer that is backing it.

I would test this by using a BufferedStream and set the min to at least 64-128k. We drastically sped up our app that was performing a lot of reads from a network share.

Do you have sample code that you can share?

kikaragyozov · 2020-11-03T06:25:09Z

@joefeser it's a simple HTTP request in ASP NET Core 3.1. I believe if the file is less than a certain size, it's stored in the memory, otherwise it's stored in the hard disk? I forgot the exact numbers, feel free to enlighten me.

Was your approach applied to that framework? If so, how?

joefeser · 2020-11-03T07:32:32Z

@spiritbob Network streams should never be used for performance benchmarks. There is no telling how many packet analyzers are in the stream. Especially on a corporate network.

kikaragyozov · 2020-11-03T10:40:57Z

@joefeser sorry, if you were referring to my actual testing environment, I believe I read the file from the disk, but my practical use case is ASP NET Core.

kikaragyozov added the bug label Aug 5, 2020

kikaragyozov changed the title ~~IAsyncEnumerable 5 to 10 times slower to enumerate than a regular IEnumerable?~~ IAsyncEnumerable 2 times slower to enumerate than a regular IEnumerable? Aug 5, 2020

JoshClose closed this as completed Nov 3, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

IAsyncEnumerable 2 times slower to enumerate than a regular IEnumerable? #1560

IAsyncEnumerable 2 times slower to enumerate than a regular IEnumerable? #1560

kikaragyozov commented Aug 5, 2020 •

edited

Loading

JoshClose commented Aug 5, 2020

kikaragyozov commented Aug 6, 2020 •

edited

Loading

TonyValenti commented Aug 21, 2020

kikaragyozov commented Aug 21, 2020

joefeser commented Nov 3, 2020 •

edited

Loading

kikaragyozov commented Nov 3, 2020

joefeser commented Nov 3, 2020

kikaragyozov commented Nov 3, 2020

IAsyncEnumerable 2 times slower to enumerate than a regular IEnumerable? #1560

IAsyncEnumerable 2 times slower to enumerate than a regular IEnumerable? #1560

Comments

kikaragyozov commented Aug 5, 2020 • edited Loading

JoshClose commented Aug 5, 2020

kikaragyozov commented Aug 6, 2020 • edited Loading

TonyValenti commented Aug 21, 2020

kikaragyozov commented Aug 21, 2020

joefeser commented Nov 3, 2020 • edited Loading

kikaragyozov commented Nov 3, 2020

joefeser commented Nov 3, 2020

kikaragyozov commented Nov 3, 2020

kikaragyozov commented Aug 5, 2020 •

edited

Loading

kikaragyozov commented Aug 6, 2020 •

edited

Loading

joefeser commented Nov 3, 2020 •

edited

Loading