Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Javascript part allocates the entire file in memory at stream creation #139

Closed
jespersh opened this issue Jun 13, 2020 · 4 comments · Fixed by #140
Closed

Javascript part allocates the entire file in memory at stream creation #139

jespersh opened this issue Jun 13, 2020 · 4 comments · Fixed by #140
Assignees
Labels
bug Something isn't working

Comments

@jespersh
Copy link

Describe the bug
I'm trying to create a "chunk" stream for System.Net.Http.StreamContent without the browser allocating GBs of memory for the entire file. Sending native in browser doesn't have this behavior and testing with a console application neither have this behavior.

The native test:

HttpClient httpClient = new HttpClient();
httpClient.BaseAddress = new Uri("https://localhost:5001/");
using (FileStream fs = File.Open("D:\\bigGBtest.zip", FileMode.Open, FileAccess.Read))
{
    using (var formData = new MultipartFormDataContent())
    {
        var streamContent = new StreamContent(fs);
        streamContent.Headers.ContentDisposition = new System.Net.Http.Headers.ContentDispositionHeaderValue("form-data") { Name = "file", FileName = "Test.zip" };
        streamContent.Headers.ContentType = new System.Net.Http.Headers.MediaTypeHeaderValue("application/octet-stream");
        formData.Add(streamContent);
        var resp = await httpClient.PostAsync("api/v2/version/upload", formData);
    }
}

To Reproduce
Any of these with a multi GB file allocates the entire file into memory:
Using CreateMemoryStreamAsync

var file = (await fileReaderService.CreateReference(fileInputElement).EnumerateFilesAsync()).FirstOrDefault();
await using (var fileStream = await file.CreateMemoryStreamAsync(65536))
{ // Browser memory shoots up after CreateMemoryStreamAsync
  byte[] buffer = new byte[1000];
  await fileStream.ReadAsync(buffer, 0, 1000);
}

Using OpenReadAsync:

var file = (await fileReaderService.CreateReference(fileInputElement).EnumerateFilesAsync()).FirstOrDefault();
await using (var fileStream = await file.OpenReadAsync())
{ // Browser memory shoots up after OpenReadAsync
  byte[] buffer = new byte[1000];
  await fileStream.ReadAsync(buffer, 0, 1000);
}

Expected behavior
The call to ReadAsync decides how much memory is allocated

Screenshots
image

Project type
Client-side/CSB

Environment
Browser: new Edge with Chromium
BlazorFileReader: 1.5.0.20109
.net SDK: 3.1.301
.net host: 3.1.5

Additional context
A possible fix could be this: https://stackoverflow.com/a/28318964

@Tewr Tewr added the bug Something isn't working label Jun 13, 2020
@Tewr Tewr self-assigned this Jun 13, 2020
@Tewr
Copy link
Owner

Tewr commented Jun 13, 2020

This is a regression introduced in this commit. I think I ran a memory analysis and everything, but either I failed to recognize this error, or chrome has changed the way the buffer is allocated.

in any case, a glaring bug, should be easy to fix. Nice catch.

Tewr added a commit that referenced this issue Jun 14, 2020
@Tewr Tewr closed this as completed in #140 Jun 14, 2020
Tewr added a commit that referenced this issue Jun 14, 2020
@Tewr
Copy link
Owner

Tewr commented Jun 15, 2020

@jespersh Please let me know if you have the time to test and give feedback on this.
On a 800Mb file I've measured a ~80-100Mb bump in ram usage which I attribute to the slow, single-threaded GC, rather than anything I can do better.

I guess its always possible to do better, this is a tight loop. But one thing is for sure, going away from the model that caused this bug, it's a 500% slow-down. Not very noticable on small files, but quite painful to go from 1sec to 7sec for a 800mb file.

@Tewr Tewr reopened this Jun 15, 2020
@jespersh
Copy link
Author

I'll try to dig a bit into this as soon as I can, but I am wondering if you could reuse the FileReader between reads since I already called OpenRead, so it can be expected one would keep it alive for some time.

How big are your read chunks? I'd test with ~32KB

@Tewr
Copy link
Owner

Tewr commented Jun 16, 2020

Finally got the time to make some tests.

My experiments show that FileReader instanciation is basically free, no impact what so ever. It's probably cached. Chunk size has a huge impact on speed, and a small impact on ram usage. No matter the chunk size, ram usage is 80-150MB over rest during the process. I`m testing with a 800MB file. Using a chunk size of 82KB -> 16seconds, 330KB -> 5s.

So my conclusion basically what is costly here is the asynchronous callback, which I to my knowledge have no way of avoiding. I could possibly implement a second level of buffering that could be configured somehow, but I' stopping here for now in favour of other features.

@Tewr Tewr closed this as completed Jun 16, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants