Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Blob upload results in excessive memory usage/leak #894

Open
pawboelnielsen opened this issue May 22, 2019 · 17 comments

Comments

Projects
None yet
6 participants
@pawboelnielsen
Copy link

commented May 22, 2019

Which service(blob, file, queue, table) does this issue concern?

Blob

Which version of the SDK was used?

Microsoft.Azure.Storage.Blob -Version 10.0.3

Which platform are you using? (ex: .NET Core 2.1)

.net core 2.2.300

What problem was encountered?

OutOfMemory / excessive memory usage

How can we reproduce the problem in the simplest way?

namespace ConsoleAppAzureBlobMemoryIssue
{
    class Program
    {
        static void Main(string[] args)
        {
            CloudStorageAccount account = CloudStorageAccount.Parse("StorageConnectionString");
            CloudBlobClient serviceClient = account.CreateCloudBlobClient();

            var container = serviceClient.GetContainerReference("backup");
            container.CreateIfNotExistsAsync().Wait();

            string path = @"c:\folder\gigafile.zip"; 
            CloudBlockBlob blob = container.GetBlockBlobReference(Path.GetFileName(path));
            blob.UploadFromFileAsync(path).Wait();
        }
    }
}

Run the program, open Task Manager and "enjoy" how the memory usage of dotnet.exe explodes as the file gets uploaded, the memory usages increases directly with amount of bytes uploaded sofar. So if you attempt to upload a 50GB file you will need +50GB of free memory.

Have you found a mitigation/solution?

Last working version is WindowsAzure.Storage -Version 9.3.3, newer versions contains the issue.

@carlkenne

This comment has been minimized.

Copy link

commented Jun 4, 2019

we're about to upgrade from v8 to v10. But then I saw this, it looks like a show stopper?

@pawboelnielsen

This comment has been minimized.

Copy link
Author

commented Jun 4, 2019

we're about to upgrade from v8 to v10. But then I saw this, it looks like a show stopper?

Depends on the size of the files you want to upload and the memory available - the memory is released when the upload is complete, it is only during uploading the memory is not released.

@djj0809

This comment has been minimized.

Copy link

commented Jun 23, 2019

I ran into the same problem when upload large file (> 5G)

.Net Core 3.0.0-preview6 & Microsoft.Azure.Storage.Blob version 10.0.3

System.OutOfMemoryException: Exception of type 'System.OutOfMemoryException' was thrown.
at Microsoft.Azure.Storage.Core.Util.StorageAsyncResult1.End() at System.IO.Stream.<>c.<BeginEndWriteAsync>b__61_1(Stream stream, IAsyncResult asyncResult) at System.Threading.Tasks.TaskFactory1.FromAsyncTrimPromise1.Complete(TInstance thisRef, Func3 endMethod, IAsyncResult asyncResult, Boolean requiresSynchronization)
--- End of stack trace from previous location where exception was thrown ---
at Microsoft.Azure.Storage.Blob.BlobWriteStream.WriteAsync(Byte[] buffer, Int32 offset, Int32 count, CancellationToken token)
at Microsoft.Azure.Storage.Core.Util.AsyncStreamCopier1.StartCopyStreamAsyncHelper(Nullable1 copyLength, Nullable1 maxLength, CancellationToken token) at Microsoft.Azure.Storage.Core.Util.AsyncStreamCopier1.StartCopyStreamAsync(Nullable1 copyLength, Nullable1 maxLength, CancellationToken cancellationToken)
at Microsoft.Azure.Storage.Blob.CloudBlockBlob.UploadFromStreamAsyncHelper(Stream source, Nullable`1 length, AccessCondition accessCondition, BlobRequestOptions options, OperationContext operationContext, AggregatingProgressIncrementer progressIncrementer, CancellationToken cancellationToken)
at Microsoft.Azure.Storage.Blob.CloudBlockBlob.UploadFromFileAsync(String path, AccessCondition accessCondition, BlobRequestOptions options, OperationContext operationContext, AggregatingProgressIncrementer progressIncrementer, CancellationToken cancellationToken)

@djj0809

This comment has been minimized.

Copy link

commented Jun 27, 2019

Have you found a mitigation/solution?

In WindowsAzure.Storage -Version 9.3.3 it works however this version we cannot use as some files are corrupted when uploaded.

@pawboelnielsen May I ask in what condition that files will be corrupted in Version 9.3.3? I think I have to go back to 9.3.3.

@pawboelnielsen

This comment has been minimized.

Copy link
Author

commented Jun 27, 2019

@djj0809 There was an issue if an upload was aborted before completed, then upon retrying the upload with the same file name we would get an error: The specified blob or block content is invalid.
This error can be avoid by running the delete command before uploading like this: blob.DeleteIfExistsAsync()
So it was a minor issue to fix, aside from that 9.3.3 works fine for us.

@djj0809

This comment has been minimized.

Copy link

commented Jun 27, 2019

@djj0809 There was an issue if an upload was aborted before completed, then upon retrying the upload with the same file name we would get an error: The specified blob or block content is invalid.
This error can be avoid by running the delete command before uploading like this: blob.DeleteIfExistsAsync()
So it was a minor issue to fix, aside from that 9.3.3 works fine for us.

A big thank you for your really quick response:)

@markheath

This comment has been minimized.

Copy link

commented Jun 27, 2019

I'm seeing the same issue (OutOfMemoryException) uploading a 3.5GB file with UploadFromFileAsync in v10.0.0.3

@markheath

This comment has been minimized.

Copy link

commented Jun 27, 2019

Can't even work around it with UploadFromStreamAsync - that also seems to require enough memory to hold the entire file:

using (var s = File.OpenRead("large-file.zip"))
{
    await targetBlob.UploadFromStreamAsync(s);
}
@pawboelnielsen

This comment has been minimized.

Copy link
Author

commented Jun 28, 2019

Can't even work around it with UploadFromStreamAsync - that also seems to require enough memory to hold the entire file:

Internally both methods are using the same upload implementation.

@markheath

This comment has been minimized.

Copy link

commented Jun 28, 2019

I'd be interested to know if there are any plans to fix this issue soon, or whether there is an underlying technical limitation that requires the entire file to be held in memory. I recently updated our application to the latest version of Microsoft.Azure.Storage.Blob following the advice given here, but this high memory usage is an absolute showstopper for us, and looks like I'm going to have to roll everything back to WindowsAzure.Storage 9.3.3.

Interestingly it doesn't seem to be a problem if the source isn't from a file. I can use UploadFromStreamAsync to copy from a stream from another large blob opened with OpenReadAsync and memory usage stays well below the overall blob size.

@djj0809

This comment has been minimized.

Copy link

commented Jun 28, 2019

I'd be interested to know if there are any plans to fix this issue soon, or whether there is an underlying technical limitation that requires the entire file to be held in memory. I recently updated our application to the latest version of Microsoft.Azure.Storage.Blob following the advice given here, but this high memory usage is an absolute showstopper for us, and looks like I'm going to have to roll everything back to WindowsAzure.Storage 9.3.3.

Interestingly it doesn't seem to be a problem if the source isn't from a file. I can use UploadFromStreamAsync to copy from a stream from another large blob opened with OpenReadAsync and memory usage stays well below the overall blob size.

I browsed some pieces of code (trying to find the bug), I think if we wrap FileStream with some un-seekable stream (aka CanSeek = false), it may not have this problem.

Code internally seems will determine if it's a seekable stream, and will do some optimization for seekable stream (which sauses this issue).

@markheath

This comment has been minimized.

Copy link

commented Jun 28, 2019

Yeah I had a quick browse through the code, with UploadFromMultiStreamAsync and SubStream being possible places that are holding onto more memory than they should, but couldn't spot an obvious bug.

Not 100% sure about whether just setting CanSeek = false will be enough. For example, this works just fine, copying a huge file from blob to blob, even through CanSeek == true on the source stream:

async Task CopyWithUploadFromStreamAsync(CloudBlockBlob fromBlob, CloudBlockBlob toBlob)
{
    using (var sourceStream = await fromBlob.OpenReadAsync())
    {
        // n.b. sourceStream.CanSeek is true
        await toBlob.UploadFromStreamAsync(sourceStream);
    }
}
@DAllanCarr

This comment has been minimized.

Copy link

commented Jun 28, 2019

@markheath What's the memory profile usage for blob to blob sample. I would like someone on the team to officially consider this a break in functionality.

@markheath

This comment has been minimized.

Copy link

commented Jun 28, 2019

From watching task manager, it slowly ramps up to about 200Mb-300Mb during the copy, and then presumably a GC collect occurs and memory usage drops back down to 50Mb. Managed to copy the whole 3.5GB blob using the technique shown above in 12 minutes with the memory usage of the app not going above about 300MB most of the time, although towards the end of the copy it did reach 500Mb at one point. Presumably though that's just about when GC's occur. It's only when I try to upload the same thing from a local file that I get the memory leak issue.

@DAllanCarr

This comment has been minimized.

Copy link

commented Jun 28, 2019

Pretty healthy then. Could try increasing the block size closer to the upper bound (100mb) to reduce the number of parallel streams, the 4mb default could be modelling smaller payloads. But to be honest the implementation seems a little sketchy.

@pawboelnielsen

This comment has been minimized.

Copy link
Author

commented Jul 2, 2019

From watching task manager, it slowly ramps up to about 200Mb-300Mb during the copy, and then presumably a GC collect occurs and memory usage drops back down to 50Mb. Managed to copy the whole 3.5GB blob using the technique shown above in 12 minutes with the memory usage of the app not going above about 300MB most of the time, although towards the end of the copy it did reach 500Mb at one point. Presumably though that's just about when GC's occur. It's only when I try to upload the same thing from a local file that I get the memory leak issue.

Using V9.3.3 with my simple upload code at no point the memory consumption goes above 30K even if the file being uploaded is 50GB. That is where we should be when this hopefully gets fixed.

@markheath

This comment has been minimized.

Copy link

commented Jul 2, 2019

Here's a trace from DotMemory half way through an upload of a large file. As you can see it's being held onto by the HttpClientHandler RequestState and TaskCompletionSource.

image

I'm wondering if its this code in CloudBlockBlob.UploadFromMultiStreamAsync which references the localBlock so it can dispose it. I can't see anything wrong with the implementation of SubStream.Dispose though.

Task cleanupTask = uploadTask.ContinueWith(finishedUpload =>
{
    localBlock.Dispose();
});

Update: Having explored a bit more, I'm not sure the multi-stream code is running at all during the memory leak. Looks like AsyncStreamCopier is performing the copy in this scenario. If I get time I'll see if I can dig a bit deeper and trace through in a bit more detail

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.