-
Notifications
You must be signed in to change notification settings - Fork 7.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Using Invoke-WebRequest POST to upload a file is broken #23843
Comments
We have seen similar where binary data is being encoded as text, eg your zip file. A change with 7.4.* is defaulting to UTF-8. Have a look at #21604 You may have better success using .NET classes to create the body |
Verified that this works fine in |
I don't know what causes the problem, but here's some more context: Re whether or not what follows
Re the additional
However, it is RFC 6266 that specifically mentions the This notation specifies the encoding of the file name only, and is unrelated to the content type of the form field, so the 7.4.0 change to defaulting text-based request bodies to UTF-8 does not explain the problem.
|
From the previous issues it looks like trying to pass binary data off as text does not work when the character set encodings are not what you expected. So the solution is don't try to do that, use the .NET API to properly package binary data in the multipart form data. |
That is not the case here: as you can see from the screenshot, the relevant part has The case at hand is fundamentally the same as example 6 in the
|
Without a byte by by hex-dump of the data or a hash I cannot say they are the same. It appears a change between 7.3 and 7.4 has broken something in this area. The content-type for the multi-item-part does say octet-stream, but is the actual data correct? |
@rkeithhill-keysight did you by any chance try to explicitly set the encoding to see if it helps? I've encountered an extremely similar issue with Invoke-RestMethod (PUT / POST), exactly when switching from PS v7.3.x to v7.4.x, apparently due to the switch to a default encoding of UTF-8. However, in that situation, if we explicitly set the encoding to something like this, everything worked fine. Might be worth a try. |
@rhubarb-geek-nz: I inferred it from the presence of the following char. in the screenshots, which wouldn't render the same if ISO-8859-1 were used in one case vs. UTF-8 in the other: To confirm this, I've since run a test which indicates that - at least when targeting https://postman-echo.com/post - the binary file content is correctly round-tripped, in v7.3.10, v7.4.2, and v7.5.0-preview.2: # Create a test file with 2 bytes that are the ISO-8859-1-encoding of 'AÄ'
# Trying to read this file as UTF-8 replaces the invalid-as-UTF-8 byte 0xC4
# with the Unicode REPLACEMENT CHARACTER, U+FFFD, which usually renders as
# a "?" inside a diamond.
[byte[]] (0x41, 0xC4) | Set-Content test.bin -AsByteStream
$url = 'https://postman-echo.com/post'
$filePath = 'test.bin'
$form = [ordered] @{
File = Get-Item -LiteralPath $filePath
Motörhead = 'Hüsker Dü'
}
# Submit via Invoke-RestMethod, which makes postman-echo.com
# return a JSON response that describes the original request.
($r =
Invoke-RestMethod -Form $form -Uri $url -Method Post
) | Out-Host
$fileSubmission = $r.Files.'test.bin'
$base64FileContent = ($fileSubmission -split ',')[-1]
[pscustomobject] @{
'test.bin file submission' = $fileSubmission
'Base64-encoding of the content' = $base64FileContent
'Content as byte values' = [Convert]::FromBase64String($base64FileContent).ForEach({ '0x{0:X}' -f $_ }) -join ' '
} The result shows that the two bytes were round-tripped correctly:
|
I did a similar test with a .NET minimal web api, and the binary file contents do get uploaded correctly: var builder = WebApplication.CreateBuilder(args);
var app = builder.Build();
app.MapPost("/upload", async Task<IResult> (HttpRequest request) =>
{
IFormCollection form = await request.ReadFormAsync();
IFormFile? file = form.Files.FirstOrDefault();
if (file == null) return Results.BadRequest("There is no file to upload!");
string app_name = form["APPLICATION_NAME"]!;
string app_version = form["APPLICATION_VERSION"]!;
Console.WriteLine($"Received file {file.FileName} ({file.Length} bytes) for {app_name} {app_version}");
var filePath = Path.Combine(Path.GetTempPath(), file.FileName);
using var fs = new FileStream(filePath, FileMode.OpenOrCreate);
await file.CopyToAsync(fs);
Console.WriteLine($" - Saved to local file: {filePath}");
return Results.Created();
}).Accepts<IFormFile>("multipart/form-data").Produces(201);
app.Run("http://localhost:6054"); Again, using $form = [ordered]@{
APPLICATION_NAME = "Contoso"
APPLICATION_VERSION = "1.0.0"
UPLOADED_FILE = Get-Item $Path
} I'm going to see if I can track down the source to this internal CGI website. |
Apparently that is a recognized way of encoding UTF8 filenames.
|
Good point, @rhubarb-geek-nz, though, strictly speaking, what you cite only applies to From https://www.rfc-editor.org/rfc/rfc8187, emphasis added:
So, unless I'm misinterpreting the RFCs, in the context of the At the very least, as the example at hand shows, v7.4+ provides the Conversely, a file name containing non-ASCII characters causes the This encoded-word method is also used as needed in the As for each form-part's actual data, the so-called body area of a part: Even in 7.3 It follows from the above that even a 7.4+ switch from ISO-8859-1 to UTF-8 (if it applies to this use case at all) in the (by definition textual) metadata parts of the request body is a moot point, given that any non-ASCII characters are still (meta-)encoded in 7.4+, as they were in 7.3, so that this metadata is by definition a subset of both ISO-8859-1 and UTF-8. That said, leaving the default character encoding aside, there are effective differences between 7.3 and 7.4+ in the case at hand:
|
RE [CmdletBinding()]
param(
[Parameter()]
[string]
$Path = "$HOME\Downloads\CascadiaCode.zip",
[Parameter()]
[switch]
$UseFiddlerProxy
)
if ($UseFiddlerProxy) {
$proxy = [System.Net.WebProxy]::new("http://127.0.0.1:8888", $false)
$proxy.UseDefaultCredentials = $false
$clientHandler = [System.Net.Http.HttpClientHandler]::new()
$clientHandler.Proxy = $proxy
$hc = [System.Net.Http.HttpClient]::new($clientHandler, $true)
}
else {
$hc = [System.Net.Http.HttpClient]::new()
}
$fileName = Split-Path $Path -Leaf
$bytes = [System.IO.File]::ReadAllBytes($Path)
$byteContent = [System.Net.Http.ByteArrayContent]::new($bytes)
$form = [System.Net.Http.MultipartFormDataContent]::new()
$form.Add([System.Net.Http.StringContent]::new("contoso-HttpClient"), "APPLICATION_NAME")
$form.Add([System.Net.Http.StringContent]::new("2.0.0"), "APPLICATION_VERSION")
$form.Add($byteContent, "UPLOADED_FILE", "`"$fileName`"")
$t = $hc.PostAsync("http://localhost.:6054/upload", $form)
while (!$t.IsCompleted) { Start-Sleep -Milliseconds 500 }
$t.Result NOTE: I'm experimenting with embedding quotes which is why the filename value appears in quotes. If I run the Which, I think, shows some of the differences @mklement0 refers to above. |
This was in #19467. |
Anyone interested to pull PR? I'm ready to quickly review and merge it. |
I finally tracked down the source to the internal website that breaks with PS 7.4. It's written in Perl, ugh. Guess I'll see how good Copilot is at explaining perl code.
This API is what I used above in my HttpClient example. When I examine the request form data generated by this API, I see it generates the |
If your HttpClient example works well with your web site we can use the API fearlessly. |
We can mostly fix the problem by replacing Lines 1807 to 1808 in bfa3dbe
with result.Headers.ContentDisposition.FileName = $"\"{file.Name}\""; This gets rid of the inappropriate Additionally, the
must be replaced with: contentDisposition.Name = $"\"{LanguagePrimitives.ConvertTo<string>(fieldName)}\""; This gives us 7.3 behavior back, from what I can tell (except for the 7.4+ addition of header field Unfortunately, though - as in 7.3 - .NET invariably applies the "encoded-word" encoding method to file names with non-ASCII characters; e.g., If we had the choice, I think we should follow To quote from Appendix A of the current RFC for
It is unfortunate that the .NET issue (dotnet/runtime#23761) was closed as by-design, although that was back in 2018, and there's a recommendation to create a new issue to request specific changes.
|
I tracked down the perl code for the website. I think this the relevant bits: sub bUploadedDocumentation
{
my %settings = %{ shift( @_ ) };
my $ad = $settings{ 'AD' };
my $bFlag;
if( param( 'APPLICATION_NAME' ) and param( 'APPLICATION_NAME' ) =~ /\w/
and param( 'APPLICATION_VERSION' ) and param( 'APPLICATION_VERSION' ) =~ /\w/
and param( 'UPLOADED_FILE' ) and param( 'UPLOADED_FILE' ) =~ /\w/ ){
$bFlag = 1;
}
return $bFlag;
}
...
if( $bFileTooBig ){
print errorPage( \%settings, "The uploaded file was too big. " . sprintf( "(%3.1f GB)", $ENV{'CONTENT_LENGTH'}/(1024*1024*1024) ) . "\n" );
} elsif( bShowLog( \%settings ) ){
$html = showLog( \%settings );
print $html unless $bRobot;
} elsif( bUploadedDocumentation( \%settings ) && !$bFileTooBig ) { # <<< Upload test is here
if( bValidUser( \%settings ) ){
$html = publishDocumentation( \%settings );
print $html unless $bRobot;
} else {
$error = "Invalid user credentials.";
}
} elsif( bValidUser( \%settings ) ){
$html = managerView( \%settings );
print $html unless $bRobot;
} else {
$html = welcome( \%settings ); # <<< I see this HTML returned
print $html unless $bRobot;
} I suspect the |
I missed that the @rkeithhill-keysight, this difference - e.g. |
Hi @mklement0, Can you please help with my sample, ive spent a long time getting this working and since 7.4 its no longer working.
Whats changed in 7.4, is it concidered a bug? or do i need to adopt fixes? |
Rather than manually creating the body of the form, let PowerShell do it for you as per the example from @mklement0
It should then assemble the multipart form and manage the boundaries. |
@MMouse23, if you can, follow @rhubarb-geek-nz's advice, though note that the textual form field will be UTF-8-encoded (see below) and you won't be able to control the media type of the file submission - PowerShell invariably uses As for what changed: In PowerShell 7.4+, the web cmdlets ( To use ISO-8859-1 encoding in v7.4+, either append For an example of the latter technique, see #21604 (comment) |
Thanks @MMouse23 and @mklement0 for that script. That allowed me to precisely control the payload. If I take the PS 7.4 payload and recreate that with script, and simply quote the name values, the upload works. That's with the I'm still trying to track down if the PERL CGI module requires the name value to be quoted. That might be a strong-ish argument for making a change or at the very least, providing some sort of parameter to enable quoting. The content size using ISO-8859-1 is about 50% larger than using CURL. The CURL encoded file looks to be UTF-8 but when I try UTF-8, the content size is almost double what it is for CURL. |
Dang. Spoke too soon. The site is returning the expected response now for an uploaded file, but it is not displaying it afterwards. Probably an issue with the file encoding. Will continue experimenting. |
OK, I undid the first two changes in this PR #19467 that removes the quotes from the field name values, rebuilt PowerShell and tried IWR again. It works! It not only uploads successfully but the website processes the zip file contents correctly. |
The comments on the previous version of // .NET does not enclose field names in quotes, however, modern browsers and curl do.
contentDisposition.Name = "\"" + LanguagePrimitives.ConvertTo<string>(fieldName) + "\""; Maybe it's just me but if modern browser AND curl quote, why would PowerShell not want to quote? Yeah OK, so .NET doesn't seem to. But in this case, I'd go with the modern browser/curl behavior. The question now is how to undo this. We could just undo the quoting change and leave the filenamestar change (that seems good). Or will that break folks that have come to depend on those values not being quoted. In which case, I suppose we would want to add a new irm/iwr parameter to control the quoting behavior. Thoughts? |
BTW I'm still unsure about PS At the very least, encoding of binary files results in a larger, potentially significantly larger, payload to the server. |
There was a question about this in the dotnet repository and the answer was - open a new issue if you know a scenario where this doesn't work. |
@rkeithhill-keysight: That is curious, because in my experiments PowerShell too dumps the raw bytes into the body (which means that the recipient cannot read the body as a whole as UTF-8 is unlikely to work, given that arbitrary binary data is highly unlikely to be well-formed UTF-8). The size increase in your case is 77%, which cannot be explained with Base64 encoding. So it would be good to understand what happens in your case.
The textual (non-file-data) part of your request seems to be all-ASCII, so if the website uses ISO-8859-1 decoding, it'll work too. |
That could take a looong time. Also note that we previously already deviated from .NET's behavior, which was a positive (that was regrettably undone in 7.4). Given that the unquoted field-name change is fairly recent, and given that quoting is generally expected, my guess is that undoing the 7.4 changes amounts to a bucket-3 change. In essence, I suggest doing what is described in #23843 (comment), which looks like reverting #19467, including getting rid of the The only downsides:
|
P.S.: Another data point re unquoted field names being problematic: popular HTTPS proxy / debugger https://mitmproxy.org/ doesn't recognize such form submissions as valid in its GUI (it complains about not being to parse them and falls back to a raw textual view). |
Interesting curl discussion curl/curl#7789 |
@iSazonov: Interesting, yes, but that's a separate issue that points to a separate problem in PowerShell:
I haven't looked into what constraints the relevant spec places on form names. |
Just wanted to add that Sophos WAF is blocking POST requests because the request doesn't adhere to RFCs ... |
Prerequisites
Steps to reproduce
I have a PowerShell module that uploads a zip file of API documentation to a CGI-based website that has been working untouched since 2019. Unfortunately, when I upgraded our build nodes to 7.4.2, it broke the upload to this website. This might be hard to reproduce because I don't know how much of the behavior depends on the CGI website.
This silently fails (returns a 200 status code). The file does not get uploaded. When I capture this with Fiddler, this is what I see for the request headers:
What is up with the uploaded filename being specified TWICE? And what's with that second funky filename value
utf-8''Klm...
? I suspect this might be the problem. It's also interesting that curl quotes the field names e.g."UPLOADED_FILE"
whereas 7.4.2 does not. Ditto for the filename value.Now if I try this with CURL (on Windows) - surprise, it works. :🤦♂️
This results in these headers:
Expected behavior
I expect that using `$form["UPLOADED_FILE"] = Get-Item $zipPath` should continue to work and produce the correct multi-part form data - particularly the filename.
Actual behavior
The CGI website does not accept the uploaded file ... unless I use `curl`.
Error details
Environment data
Visuals
Bad PowerShell POST request headers:
Working curl POST request headers:
The text was updated successfully, but these errors were encountered: