Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Invoke-RestMethod fails on very large files #4129

Closed
ffeldhaus opened this issue Jun 28, 2017 · 22 comments · Fixed by #11095
Closed

Invoke-RestMethod fails on very large files #4129

ffeldhaus opened this issue Jun 28, 2017 · 22 comments · Fixed by #11095
Labels
Issue-Bug Issue has been identified as a bug in the product WG-Cmdlets-Utility cmdlets in the Microsoft.PowerShell.Utility module

Comments

@ffeldhaus
Copy link
Contributor

ffeldhaus commented Jun 28, 2017

It seems Invoke-RestMethod fails to download very large files, probably due to not enough memory.

Steps to reproduce

Invoke-RestMethod -Uri http://speedtest.tele2.net/10GB.zip -OutFile /Users/ffeldhaus/Downloads/10GB.zip

Expected behavior

Very large files should be downloaded without issues.

Actual behavior

PS /Users/ffeldhaus/development> Invoke-RestMethod -Uri http://speedtest.tele2.net/10GB.zip -OutFile /Users/ffeldhaus/Downloads/10GB.zip
Invoke-RestMethod : Stream was too long.
At line:1 char:1
+ Invoke-RestMethod -Uri http://speedtest.tele2.net/10GB.zip -OutFile / ...
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ CategoryInfo          : NotSpecified: (:) [Invoke-RestMethod], IOException
+ FullyQualifiedErrorId :
System.IO.IOException,Microsoft.PowerShell.Commands.InvokeRestMethodCommand     

Environment data

Mac OS X 10.12.5

PS /Users/ffeldhaus/development> $PSVersionTable                                                                                                                                                                                              

Name                           Value                                                                                                                                                                                                         
----                           -----                                                                                                                                                                                                         
PSVersion                      6.0.0-beta                                                                                                                                                                                                    
PSEdition                      Core                                                                                                                                                                                                          
BuildVersion                   3.0.0.0                                                                                                                                                                                                       
CLRVersion                                                                                                                                                                                                                                   
GitCommitId                    v6.0.0-beta.2                                                                                                                                                                                                 
OS                             Darwin 16.6.0 Darwin Kernel Version 16.6.0: Fri Apr 14 16:21:16 PDT 2017; root:xnu-3789.60.24~6/RELEASE_X86_64                                                                                                
Platform                       Unix                                                                                                                                                                                                          
PSCompatibleVersions           {1.0, 2.0, 3.0, 4.0...}                                                                                                                                                                                       
PSRemotingProtocolVersion      2.3                                                                                                                                                                                                           
SerializationVersion           1.1.0.1                                                                                                                                                                                                       
WSManStackVersion              3.0     
@SteveL-MSFT SteveL-MSFT added the WG-Cmdlets-Utility cmdlets in the Microsoft.PowerShell.Utility module label Jun 28, 2017
@SteveL-MSFT
Copy link
Member

Didn't get a repro on Win10 nor Ubuntu16.04. Might be Mac specific. Memory usage seemed reasonable on Windows/Linux (less than 64MB on Windows and 4.3MB on Linux).

@JoelMiller74
Copy link

I am using Windows 10 and Powershell 5.1. Anything over 2 GB gives me "Invoke-RestMethod : Stream was too long.".
Memory usage should not be an issue as I am streaming the file directly to disk. After it gets to 2 GB, it delivers that error

@MaximoTrinidad
Copy link

MaximoTrinidad commented Feb 13, 2018

Hum! Does a 10GB file download should be handle using FTP protocol? I can't recall seen downloading an 10GB installation application using http/https protocol.

:)

@markekraus
Copy link
Contributor

markekraus commented Feb 13, 2018

@JoelMiller74 for Windows PowerShell 5.1 issues, please use the Windows PowerShell UserVoice. This repo is for PowerShell Core (6.0.0 and newer). We use a different API and different underlying architecture in PowerShell Core than was use in Windows PowerShell. Issues with downloads in 5.1 would likely not translate to problems in 6.0.1 (as in, the same error message may be due to different causes).

@markekraus
Copy link
Contributor

@SteveL-MSFT I'm no expert on performance tuning, but It seems to me we should be calling Flush() every now and then in the code below.

do
{
if (cmdlet != null)
{
ProgressRecord record = new ProgressRecord(ActivityId,
WebCmdletStrings.WriteRequestProgressActivity,
StringUtil.Format(WebCmdletStrings.WriteRequestProgressStatus, totalWritten));
cmdlet.WriteProgress(record);
}
read = input.Read(data, 0, ChunkSize);
if (0 < read)
{
output.Write(data, 0, read);
totalWritten += read;
}
} while (read != 0);

The chunk size is 10,000 bytes so flushing every loop is probably not ideal, but, waiting until the end of the 10GB file does some funky stuff with RAM. Can you ping someone who can provide me with some guidance on this?

@SteveL-MSFT
Copy link
Member

@daxian-dbw would be best to provide guidance here although he's currently on vacation

Even thought we have a separate timer now for writing progress to the screen, it seems overkill to update progress every chunk. Seems like we can solve two problems by updating progress and flushing the buffer every N milliseconds (or seconds?).

@markekraus
Copy link
Contributor

Thanks!
I agree, updating the progress that frequently is overkill. It does seem like a good 2-for-1 enhancement.

@daxian-dbw
Copy link
Member

daxian-dbw commented Feb 21, 2018

The exception is actually thrown from BufferingStreamReader.Read at _streamBuffer.Write(_copyBuffer, 0, bytesRead).

Here is the stack trace:

at System.IO.MemoryStream.Write(Byte[] buffer, Int32 offset, Int32 count)
at Microsoft.PowerShell.Commands.InvokeRestMethodCommand.BufferingStreamReader.Read(Byte[] buffer, Int32 offset, Int32 count)
at Microsoft.PowerShell.Commands.StreamHelper.WriteToStream(Stream input, Stream output, PSCmdlet cmdlet)
at Microsoft.PowerShell.Commands.StreamHelper.SaveStreamToFile(Stream stream, String filePath, PSCmdlet cmdlet)
at Microsoft.PowerShell.Commands.InvokeRestMethodCommand.ProcessResponse(HttpResponseMessage response)
at Microsoft.PowerShell.Commands.WebRequestPSCmdlet.ProcessRecord()
at System.Management.Automation.Cmdlet.DoProcessRecord()
at System.Management.Automation.CommandProcessor.ProcessRecord()

For a 64-bit program, the memory limit for a single .NET object is 2GB, unless you enable gcAllowVeryLargeObjects in the app.config file as follows:

<configuration>
    <runtime>
        <gcAllowVeryLargeObjects enabled="true" />
    </runtime>
</configuration>

For a 32-bit program, the memory limit for a single .NET object is 512MB.

So the MemoryStream object cannot hold more than 2GB in a 64-bit program (512MB in a 32-bit program).

The Stream types BufferingStreamReader and WebResponseContentMemoryStream need to be redesigned because they both rely on a MemoryStream to cache the content from the response stream, and thus they cannot handle content that is larger than 2GB in 64-bit PowerShell.

@iSazonov
Copy link
Collaborator

When I was working on optimizing the progress bar I found that WriteProgress doesn't have overhead - performance problem was only in writing on screen. @daxian-dbw and @lzybkr helped me to find optimal solution with flushing by timer every 200 ms. I don't think that we can do something else with the API. If we want be better we should re-design the progress bar API. Currently we do tons extra allocations.

As for BufferingStreamReader.Read it looks as bug. But I'd first look at .Net Core 2.1 - there's done a huge number of optimizations. I think we could use Spans and maybe Pipelines if we want re-design. When do we plan to move on to 2.1?

@SteveL-MSFT
Copy link
Member

@iSazonov for 6.1, we will need to move to dotnetcore 2.1 for other reasons anyways (like full ARM support...)

@ThomasNieto
Copy link
Contributor

ThomasNieto commented May 3, 2019

@markekraus We talked about this issue during the PS Summit where Invoke-RestMethod fails to download a large file but Invoke-WebRequest works.

Here are my testing results:

Name                           Value
----                           -----
PSVersion                      6.2.0
PSEdition                      Core
GitCommitId                    6.2.0
OS                             Microsoft Windows 10.0.16299
Platform                       Win32NT
PSCompatibleVersions           {1.0, 2.0, 3.0, 4.0…}
PSRemotingProtocolVersion      2.3
SerializationVersion           1.1.0.1
WSManStackVersion              3.0

image

Invoke-RestMethod would continually consume memory and GC ~2-3 times before erroring when it reached 4GB RAM used. This screenshot is at the moment the cmdlet errored.

image

Invoke-WebRequest memory did not increase during download.
image

Invoke-RestMethod made it to 2GB before erroring out and Invoke-WebRequest completed the 10GB file.
image

@daxian-dbw
Copy link
Member

Invoke-RestMethod made it to 2GB before erroring out and Invoke-WebRequest completed the 10GB file.

Yes, see my previous comment #4129 (comment). It depends on a MemoryStream object to cache all the content. For x64 process, the memory limit for a single .NET object is 2GB.

@ChrisLynchHPE
Copy link

ChrisLynchHPE commented Nov 13, 2019

I am very curious to understand how the behavior of Invoke-RestMethod differs from Invoke-WebRequest in, more specific to handling large files (upload or download) as @tnieto88 shows. I have a PowerShell advanced function that has worked for quite a long time using .NetClientFramework 4.6 and newer API, in which it uses the System.Net.HttpWebRequest class to interact with a REST API. Part of it's functionality is to upload very large ISO images to its web service. I have a byte buffer I use to read in the file 8MB at a time. When uploading a file larger than 2GB, the error happens with the GetRequestStream.Write method. I can see pwsh.exe increase in memory utilization. I implemented a method to invoke .Flush() every 200ms and when the write buffer has consumed 100MB at a time.

I am able to reproduce this on a Windows 10 PC:

Name                           Value
----                           -----
PSVersion                      6.2.2
PSEdition                      Core
GitCommitId                    6.2.2
OS                             Microsoft Windows 10.0.18362 
Platform                       Win32NT
PSCompatibleVersions           {1.0, 2.0, 3.0, 4.0…}
PSRemotingProtocolVersion      2.3
SerializationVersion           1.1.0.1
WSManStackVersion              3.0

[PS] C:\Users\user> dotnet --list-runtimes 
Microsoft.AspNetCore.All 2.1.11 [C:\Program Files\dotnet\shared\Microsoft.AspNetCore.All]
Microsoft.AspNetCore.All 2.1.12 [C:\Program Files\dotnet\shared\Microsoft.AspNetCore.All]
Microsoft.AspNetCore.All 2.2.6 [C:\Program Files\dotnet\shared\Microsoft.AspNetCore.All]
Microsoft.AspNetCore.App 2.1.11 [C:\Program Files\dotnet\shared\Microsoft.AspNetCore.App]
Microsoft.AspNetCore.App 2.1.12 [C:\Program Files\dotnet\shared\Microsoft.AspNetCore.App]
Microsoft.AspNetCore.App 2.2.6 [C:\Program Files\dotnet\shared\Microsoft.AspNetCore.App]
Microsoft.NETCore.App 1.0.1 [C:\Program Files\dotnet\shared\Microsoft.NETCore.App]
Microsoft.NETCore.App 2.0.5 [C:\Program Files\dotnet\shared\Microsoft.NETCore.App]
Microsoft.NETCore.App 2.1.11 [C:\Program Files\dotnet\shared\Microsoft.NETCore.App]
Microsoft.NETCore.App 2.1.12 [C:\Program Files\dotnet\shared\Microsoft.NETCore.App]
Microsoft.NETCore.App 2.2.6 [C:\Program Files\dotnet\shared\Microsoft.NETCore.App]

Is there any way to handle this correctly in PowerShell 6/PowerShellCore 6?

@ChrisLynchHPE
Copy link

I have done some further tests, and this looks like it is a .NetCore issue with MemoryStream itself. When using the same code to upload or download large files (2GB or more), not only does the main process (pwsh.exe) continue to increase in memory allocation, the thread fails with Stream was too long exception. I have even tried PowerShell 7 Preview 5 on the same Windows 10 PC as noted above, and can get the same behavior. It almost seems like either MemoryStream or its base Stream class hasn't implemented the Flush() method correctly. Why do I say that? Using the exact same code on a PowerShell 5.1 console, with .NetFramework 4.8.1, memory allocation never increases, and each buffer (FileStream when reading a file to upload, GetRequestStream for upload the byte array) flushes correctly when either the FlushAsync() or Flush() methods are used (individually, not together). So, shouldn't this be raised as an issue with the .NetCore project instead?

@iSazonov
Copy link
Collaborator

I tried 7.0 Preview5 without any issue:

 Invoke-WebRequest -Uri https://images-dl.endlessm.com/release/3.7.3/eos-amd64-amd64/th/eos-eos3.7-amd64-amd64.191024-140039.th.iso -OutFile C:\temp\q.txt

It was 9.4 Gb and was long but no memory issues and no exceptions.

@daxian-dbw
Copy link
Member

@iSazonov Invoke-WebRequest never had the problem. The problem is in Invoke-RestMethod. See #4129 (comment)

@iSazonov
Copy link
Collaborator

iSazonov commented Nov 17, 2019

@daxian-dbw Thanks for remainder! It seems BufferingStreamReader class has an issue. Also I discovered that Ctrl-C doesn't work in Invoke-RestMethod for the large file download scenario. So I pulled a PR to resolve both issues.

@iSazonov iSazonov added the Issue-Bug Issue has been identified as a bug in the product label Nov 17, 2019
@ghost
Copy link

ghost commented Mar 26, 2020

🎉This issue was addressed in #11095, which has now been successfully released as v7.1.0-preview.1.:tada:

Handy links:

@ChrisLynchHPE
Copy link

It's nice to see this is being fixed in PowerShell Core. But, considering that Invoke-RestMethod was built upon System.Net.HttpWebRequest class (which isn't async) and there are a TON of examples and working code that uses this serial class, why can the underlying issue not be fixed? Meaning, not everyone can move their code to async class like System.Net.HttpClient. What does someone do with existing code that uses the older (yes, I know legacy) class?

@iSazonov
Copy link
Collaborator

@ChrisLynchHPE .Net team moved to HttpClient in .Net Core 3.0/3.1 and PowerShell 7.0 was released with the change. I am afraid that this was an inevitable breaking change.

@ChrisLynchHPE
Copy link

File transfer using API's is very common. So it is extremely frustrating to see breaking changes with no guidance on how to handle this going forward, beyond "well, just move to an async method class like [System.Net.HttpCliet]." As I stated, there are a ton of examples that use the legacy [System.Net.HttpWebRequest] class to transfer large files, and is not trivial to use [System.Net.HttpClient] Async class within PowerShell. Not everyone has the luxury to use Invoke-RestMethod or Invoke-WebRequest Cmdlets. So does anyone have any working examples on how to use HttpClient within PowerShell to upload and download large files?

@iSazonov
Copy link
Collaborator

iSazonov commented Aug 11, 2020

@ChrisLGardner Please open new issue to discuss your scenario and reference the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Issue-Bug Issue has been identified as a bug in the product WG-Cmdlets-Utility cmdlets in the Microsoft.PowerShell.Utility module
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants