Javascript part allocates the entire file in memory at stream creation #139

jespersh · 2020-06-13T07:59:32Z

Describe the bug
I'm trying to create a "chunk" stream for System.Net.Http.StreamContent without the browser allocating GBs of memory for the entire file. Sending native in browser doesn't have this behavior and testing with a console application neither have this behavior.

The native test:

HttpClient httpClient = new HttpClient();
httpClient.BaseAddress = new Uri("https://localhost:5001/");
using (FileStream fs = File.Open("D:\\bigGBtest.zip", FileMode.Open, FileAccess.Read))
{
    using (var formData = new MultipartFormDataContent())
    {
        var streamContent = new StreamContent(fs);
        streamContent.Headers.ContentDisposition = new System.Net.Http.Headers.ContentDispositionHeaderValue("form-data") { Name = "file", FileName = "Test.zip" };
        streamContent.Headers.ContentType = new System.Net.Http.Headers.MediaTypeHeaderValue("application/octet-stream");
        formData.Add(streamContent);
        var resp = await httpClient.PostAsync("api/v2/version/upload", formData);
    }
}

To Reproduce
Any of these with a multi GB file allocates the entire file into memory:
Using CreateMemoryStreamAsync

var file = (await fileReaderService.CreateReference(fileInputElement).EnumerateFilesAsync()).FirstOrDefault();
await using (var fileStream = await file.CreateMemoryStreamAsync(65536))
{ // Browser memory shoots up after CreateMemoryStreamAsync
  byte[] buffer = new byte[1000];
  await fileStream.ReadAsync(buffer, 0, 1000);
}

Using OpenReadAsync:

var file = (await fileReaderService.CreateReference(fileInputElement).EnumerateFilesAsync()).FirstOrDefault();
await using (var fileStream = await file.OpenReadAsync())
{ // Browser memory shoots up after OpenReadAsync
  byte[] buffer = new byte[1000];
  await fileStream.ReadAsync(buffer, 0, 1000);
}

Expected behavior
The call to ReadAsync decides how much memory is allocated

Screenshots

Project type
Client-side/CSB

Environment
Browser: new Edge with Chromium
BlazorFileReader: 1.5.0.20109
.net SDK: 3.1.301
.net host: 3.1.5

Additional context
A possible fix could be this: https://stackoverflow.com/a/28318964

The text was updated successfully, but these errors were encountered:

Tewr · 2020-06-13T13:31:50Z

This is a regression introduced in this commit. I think I ran a memory analysis and everything, but either I failed to recognize this error, or chrome has changed the way the buffer is allocated.

in any case, a glaring bug, should be easy to fix. Nice catch.

* Fixes #139

Tewr · 2020-06-15T07:43:21Z

@jespersh Please let me know if you have the time to test and give feedback on this.
On a 800Mb file I've measured a ~80-100Mb bump in ram usage which I attribute to the slow, single-threaded GC, rather than anything I can do better.

I guess its always possible to do better, this is a tight loop. But one thing is for sure, going away from the model that caused this bug, it's a 500% slow-down. Not very noticable on small files, but quite painful to go from 1sec to 7sec for a 800mb file.

jespersh · 2020-06-15T08:36:19Z

I'll try to dig a bit into this as soon as I can, but I am wondering if you could reuse the FileReader between reads since I already called OpenRead, so it can be expected one would keep it alive for some time.

How big are your read chunks? I'd test with ~32KB

Tewr · 2020-06-16T20:26:37Z

Finally got the time to make some tests.

My experiments show that FileReader instanciation is basically free, no impact what so ever. It's probably cached. Chunk size has a huge impact on speed, and a small impact on ram usage. No matter the chunk size, ram usage is 80-150MB over rest during the process. I`m testing with a 800MB file. Using a chunk size of 82KB -> 16seconds, 330KB -> 5s.

So my conclusion basically what is costly here is the asynchronous callback, which I to my knowledge have no way of avoiding. I could possibly implement a second level of buffering that could be configured somehow, but I' stopping here for now in favour of other features.

Tewr added the bug Something isn't working label Jun 13, 2020

Tewr self-assigned this Jun 13, 2020

Tewr added a commit that referenced this issue Jun 14, 2020

Fixes #139

2722f10

Tewr mentioned this issue Jun 14, 2020

Fixes memory problems #140

Merged

Tewr closed this as completed in #140 Jun 14, 2020

Tewr added a commit that referenced this issue Jun 14, 2020

Fixes memory problems (#140)

e2eed57

* Fixes #139

Tewr reopened this Jun 15, 2020

Tewr closed this as completed Jun 16, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Javascript part allocates the entire file in memory at stream creation #139

Javascript part allocates the entire file in memory at stream creation #139

jespersh commented Jun 13, 2020

Tewr commented Jun 13, 2020 •

edited

Tewr commented Jun 15, 2020

jespersh commented Jun 15, 2020

Tewr commented Jun 16, 2020 •

edited

Javascript part allocates the entire file in memory at stream creation #139

Javascript part allocates the entire file in memory at stream creation #139

Comments

jespersh commented Jun 13, 2020

Tewr commented Jun 13, 2020 • edited

Tewr commented Jun 15, 2020

jespersh commented Jun 15, 2020

Tewr commented Jun 16, 2020 • edited

Tewr commented Jun 13, 2020 •

edited

Tewr commented Jun 16, 2020 •

edited