Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using Invoke-WebRequest POST to upload a file is broken #23843

Open
5 tasks done
rkeithhill-keysight opened this issue May 25, 2024 · 35 comments
Open
5 tasks done

Using Invoke-WebRequest POST to upload a file is broken #23843

rkeithhill-keysight opened this issue May 25, 2024 · 35 comments
Labels
Needs-Triage The issue is new and needs to be triaged by a work group. Up-for-Grabs Up-for-grabs issues are not high priorities, and may be opportunities for external contributors

Comments

@rkeithhill-keysight
Copy link

rkeithhill-keysight commented May 25, 2024

Prerequisites

Steps to reproduce

I have a PowerShell module that uploads a zip file of API documentation to a CGI-based website that has been working untouched since 2019. Unfortunately, when I upgraded our build nodes to 7.4.2, it broke the upload to this website. This might be hard to reproduce because I don't know how much of the behavior depends on the CGI website.

  1. Construct the IWR args and call it like so:
$uri = "https://api.is.acme.com/cgi-bin/org/of_apps/apiManager/apiManager.cgi"

$form = [ordered]@{
    USERNAME = $UserName
    API_KEY = $ApiKey
    APPLICATION_NAME = $AppName
    APPLICATION_VERSION = $Version
    APPLICATION_VERSION_ALIAS = $VersionAlias
    DELETE_EXISTING_VERSION = $AllowClobber ? "true" : "false"
    ROBOT = "true"
    UPLOADED_FILE = Get-Item $Path # this is a zip file
}

Invoke-WebRequest -Uri $uri -Method POST -Form $form -SkipCertificateCheck

This silently fails (returns a 200 status code). The file does not get uploaded. When I capture this with Fiddler, this is what I see for the request headers:
image

What is up with the uploaded filename being specified TWICE? And what's with that second funky filename value utf-8''Klm...? I suspect this might be the problem. It's also interesting that curl quotes the field names e.g. "UPLOADED_FILE" whereas 7.4.2 does not. Ditto for the filename value.

Now if I try this with CURL (on Windows) - surprise, it works. :🤦‍♂️

curl --insecure -F USERNAME=$username -F API_KEY=$apiDocsKey -F APPLICATION_NAME="KAL License Management Client API - Cplusplus" -F APPLICATION_VERSION=7.4.0-develop -F APPLICATION_VERSION_ALIAS=LATEST-develop -F UPLOADED_FILE=@C:\Temp\KlmCppReference.zip -F ROBOT=true https://api.is.keysight.com/cgi-bin/org/of_apps/apiManager/apiManager.cgi -x 127.0.0.1:8888

This results in these headers:
image

Expected behavior

I expect that using `$form["UPLOADED_FILE"] = Get-Item $zipPath` should continue to work and produce the correct multi-part form data - particularly the filename.

Actual behavior

The CGI website does not accept the uploaded file ... unless I use `curl`.

Error details

There is no error from Invoke-WebRequest. Even the HTTP status code is 200 in both failed & working cases.  The difference is that when it fails, I get the home HTML as the raw content.  When it works I get the somewhat cryptic content:

1
1
0

🤷‍♂️ I didn't write this CGI website.

Environment data

Name                           Value
----                           -----
PSVersion                      7.4.2
PSEdition                      Core
GitCommitId                    7.4.2
OS                             Microsoft Windows 10.0.22631
Platform                       Win32NT
PSCompatibleVersions           {1.0, 2.0, 3.0, 4.0…}
PSRemotingProtocolVersion      2.3
SerializationVersion           1.1.0.1
WSManStackVersion              3.0

Visuals

Bad PowerShell POST request headers:
image

Working curl POST request headers:
image

@rkeithhill-keysight rkeithhill-keysight added the Needs-Triage The issue is new and needs to be triaged by a work group. label May 25, 2024
@rhubarb-geek-nz
Copy link

We have seen similar where binary data is being encoded as text, eg your zip file. A change with 7.4.* is defaulting to UTF-8.

Have a look at #21604

You may have better success using .NET classes to create the body

@rkeithhill-keysight
Copy link
Author

rkeithhill-keysight commented May 25, 2024

Verified that this works fine in 7.3.11. I think for now, we'll rollback to 7.3.11 on all our build nodes.

@mklement0
Copy link
Contributor

mklement0 commented May 26, 2024

I don't know what causes the problem, but here's some more context:

Re whether or not what follows filename= should be double-quoted, from https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Content-Disposition:

Warning: The string following filename should always be put into quotes; but, for compatibility reasons, many browsers try to parse unquoted names that contain spaces.

Re the additional filename*= attribute:

The parameters filename and filename* differ only in that filename* uses the encoding defined in RFC 5987. When both filename and filename* are present in a single header field value, filename* is preferred over filename when both are understood.

However, it is RFC 6266 that specifically mentions the utf-8''... notation.

This notation specifies the encoding of the file name only, and is unrelated to the content type of the form field, so the 7.4.0 change to defaulting text-based request bodies to UTF-8 does not explain the problem.

MDN Web Docs
In a regular HTTP response, the Content-Disposition response header is a header indicating if the content is expected to be displayed inline in the browser, that is, as a Web page or as part of a Web page, or as an attachment, that is downloaded and saved locally.

@rhubarb-geek-nz
Copy link

I don't know what causes the problem:

From the previous issues it looks like trying to pass binary data off as text does not work when the character set encodings are not what you expected. So the solution is don't try to do that, use the .NET API to properly package binary data in the multipart form data.

@mklement0
Copy link
Contributor

mklement0 commented May 27, 2024

That is not the case here: as you can see from the screenshot, the relevant part has Content-Type: application/octet-stream, as in the curl case, and the binary data that follows looks the same in both screenshots.

The case at hand is fundamentally the same as example 6 in the Invoke-WebRequest help topic: https://learn.microsoft.com/en-us/powershell/module/microsoft.powershell.utility/invoke-webrequest?view=powershell-7.4#example-6-simplified-multipart-form-data-submission

The Invoke-WebRequest cmdlet sends HTTP and HTTPS requests to a web page or web service. It parses the response and returns collections of links, images, and other significant HTML elements. This cmdlet was introduced in PowerShell 3.0. Beginning in PowerShell 7.0, Invoke-WebRequest supports proxy configuration defined by environment variables. See the Notes section of this article. Important The examples in this article reference hosts in the contoso.com domain. This is a fictitious domain used by Microsoft for examples. The examples are designed to show how to use the cmdlets. However, since the contoso.com sites don't exist, the examples don't work. Adapt the examples to hosts in your environment. Beginning in PowerShell 7.4, character encoding for requests defaults to UTF-8 instead of ASCII. If you need a different encoding, you must set the charset attribute in the Content-Type header.

@rhubarb-geek-nz
Copy link

rhubarb-geek-nz commented May 27, 2024

That is not the case here: as you can see from the screenshot, the relevant part has Content-Type: application/octet-stream, as in the curl case, and the binary data that follows looks the same in both screenshots.

Without a byte by by hex-dump of the data or a hash I cannot say they are the same. It appears a change between 7.3 and 7.4 has broken something in this area. The content-type for the multi-item-part does say octet-stream, but is the actual data correct?

@mariuselix
Copy link

@rkeithhill-keysight did you by any chance try to explicitly set the encoding to see if it helps?

I've encountered an extremely similar issue with Invoke-RestMethod (PUT / POST), exactly when switching from PS v7.3.x to v7.4.x, apparently due to the switch to a default encoding of UTF-8. However, in that situation, if we explicitly set the encoding to something like this, everything worked fine. Might be worth a try.
Invoke-WebRequest .... -ContentType "application/octet-stream; charset=iso-8859-1"

@mklement0
Copy link
Contributor

@rhubarb-geek-nz: I inferred it from the presence of the following char. in the screenshots, which wouldn't render the same if ISO-8859-1 were used in one case vs. UTF-8 in the other:
image

To confirm this, I've since run a test which indicates that - at least when targeting https://postman-echo.com/post - the binary file content is correctly round-tripped, in v7.3.10, v7.4.2, and v7.5.0-preview.2:

# Create a test file with 2 bytes that are the ISO-8859-1-encoding of 'AÄ'
# Trying to read this file as UTF-8 replaces the invalid-as-UTF-8 byte 0xC4
# with the Unicode REPLACEMENT CHARACTER, U+FFFD, which usually renders as
# a "?" inside a diamond.
[byte[]] (0x41, 0xC4) | Set-Content test.bin -AsByteStream 

$url = 'https://postman-echo.com/post'
$filePath = 'test.bin'
$form = [ordered] @{ 
  File = Get-Item -LiteralPath $filePath
  Motörhead = 'Hüsker Dü'
}

# Submit via Invoke-RestMethod, which makes postman-echo.com
# return a JSON response that describes the original request.  
($r = 
  Invoke-RestMethod -Form $form -Uri $url -Method Post
) | Out-Host

$fileSubmission = $r.Files.'test.bin'
$base64FileContent = ($fileSubmission -split ',')[-1]
[pscustomobject] @{
  'test.bin file submission' = $fileSubmission
  'Base64-encoding of the content' = $base64FileContent
  'Content as byte values' = [Convert]::FromBase64String($base64FileContent).ForEach({ '0x{0:X}' -f $_ }) -join ' '
}

The result shows that the two bytes were round-tripped correctly:

test.bin file submission                  Base64-encoding of the content Content as byte values
------------------------                  ------------------------------ ----------------------
data:application/octet-stream;base64,QcQ= QcQ=                           0x41 0xC4

@rkeithhill
Copy link
Collaborator

I did a similar test with a .NET minimal web api, and the binary file contents do get uploaded correctly:

var builder = WebApplication.CreateBuilder(args);
var app = builder.Build();

app.MapPost("/upload", async Task<IResult> (HttpRequest request) =>
{
    IFormCollection form = await request.ReadFormAsync();
    IFormFile? file = form.Files.FirstOrDefault();
    if (file == null) return Results.BadRequest("There is no file to upload!");

    string app_name = form["APPLICATION_NAME"]!;
    string app_version = form["APPLICATION_VERSION"]!;
    Console.WriteLine($"Received file {file.FileName} ({file.Length} bytes) for {app_name} {app_version}");

    var filePath = Path.Combine(Path.GetTempPath(), file.FileName);
    using var fs = new FileStream(filePath, FileMode.OpenOrCreate);
    await file.CopyToAsync(fs);

    Console.WriteLine($" - Saved to local file: {filePath}");
    return Results.Created();
}).Accepts<IFormFile>("multipart/form-data").Produces(201); 

app.Run("http://localhost:6054");

Again, using Get-Item $Path to get the file's contents for the upload. Here's the relevant part of the iwr.ps1 script:

$form = [ordered]@{
    APPLICATION_NAME = "Contoso"
    APPLICATION_VERSION = "1.0.0"
    UPLOADED_FILE = Get-Item $Path
}

image

I'm going to see if I can track down the source to this internal CGI website.

@rhubarb-geek-nz
Copy link

What is up with the uploaded filename being specified TWICE?

form-data: name=UPLOADED_FILE; filename=KlmCppReference.zip;   filename*=utf-8''KlmCppReference.zip

Apparently that is a recognized way of encoding UTF8 filenames.

Content-Disposition

The parameters filename and filename* differ only in that filename* uses the encoding defined in RFC 5987. When both filename and filename* are present in a single header field value, filename* is preferred over filename when both are understood.

@mklement0
Copy link
Contributor

mklement0 commented May 28, 2024

Good point, @rhubarb-geek-nz, though, strictly speaking, what you cite only applies to Content-Disposition in HTTP header fields, not also to multipart/form-data request bodies.

From https://www.rfc-editor.org/rfc/rfc8187, emphasis added:

Note: This encoding [the utf-8''... form] does not apply to message payloads transmitted over HTTP, such as when using the media type "multipart/form-data" ([RFC7578]).

So, unless I'm misinterpreting the RFCs, in the context of the multipart/form-data media type the request body should not use filename* parameters - though in practice, recipients may honor it.

At the very least, as the example at hand shows, v7.4+ provides the filename* parameter even in cases where it is redundant, namely when the file name is composed solely of ASCII characters.

Conversely, a file name containing non-ASCII characters causes the filename parameter to use a different (meta-)encoding, even in v7.3, the so-called encoded-word method; e.g., file name tü.bin is encoded as
filename="=?utf-8?B?dMO8LmJpbg==?="
vs. - in v7.4+ only, additionally -
filename*=utf-8''t%C3%BC.bin

This encoded-word method is also used as needed in the name parameters of the form-data parts.

As for each form-part's actual data, the so-called body area of a part:

Even in 7.3 Content-Type: text/plain; charset=utf-8 is applied to non-file-upload form parts, and Content-Type: application/octet-stream to the file-upload form part, and the fact that the file-upload part works with arbitrary binary data in 7.3 implies that the associated data was correctly encoded; similarly, non-file-upload form data containing non-ASCII characters works correctly in 7.3, which implies that UTF-8 encoding was correctly embedded.


It follows from the above that even a 7.4+ switch from ISO-8859-1 to UTF-8 (if it applies to this use case at all) in the (by definition textual) metadata parts of the request body is a moot point, given that any non-ASCII characters are still (meta-)encoded in 7.4+, as they were in 7.3, so that this metadata is by definition a subset of both ISO-8859-1 and UTF-8.


That said, leaving the default character encoding aside, there are effective differences between 7.3 and 7.4+ in the case at hand:

  • v7.4+ adds an Accept-Encoding: gzip, deflate, br header field that is missing altogether in v7.3.

  • v7.4+ inexplicably represents values of the name and filename parameters without double-quoting if they are ASCII-only, space-less strings; e.g. v7.3 filename="t.bin" becomes filename=t.bin in v7.4+

    • It is the MDN topic that @rhubarb-geek-nz linked to, https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Content-Disposition, that contains the following warning (emphasis added):
      • Warning: The string following filename should always be put into quotes; but, for compatibility reasons, many browsers try to parse unquoted names that contain spaces.

    • While I could not find anything in the multipart/form-specific RFC, RFC7578, the examples there use double-quoting even for space-less ASCII-only values; on the flip side, section 4.1 of RFC6266 - for use of Content-Disposition in HTTP header fields - suggests that unquoted values are acceptable too.
  • in v7.4+, as noted, a filename* parameter is always present, even though (a) it arguably shouldn't be there at all, as argued above, and (b) leaving that aside, it is conceptually only needed for values with non-ASCII chars.

MDN Web Docs
In a regular HTTP response, the Content-Disposition response header is a header indicating if the content is expected to be displayed inline in the browser, that is, as a Web page or as part of a Web page, or as an attachment, that is downloaded and saved locally.

@rkeithhill
Copy link
Collaborator

rkeithhill commented May 28, 2024

RE filename*, see this comment on a .NET runtime issue. I think this may be more of a .NET issue (see the .NET Runtime issue comment). I wrote an HttpClient impl and it behaves the exact same way as PS 7.4.x, except for the missing Content-Type: application/octet-stream. And I don't think that is the issue because curl sets that as well.

[CmdletBinding()]
param(
    [Parameter()]
    [string]
    $Path = "$HOME\Downloads\CascadiaCode.zip",

    [Parameter()]
    [switch]
    $UseFiddlerProxy
)

if ($UseFiddlerProxy) {
    $proxy = [System.Net.WebProxy]::new("http://127.0.0.1:8888", $false)
    $proxy.UseDefaultCredentials = $false
    $clientHandler = [System.Net.Http.HttpClientHandler]::new()
    $clientHandler.Proxy = $proxy
    $hc = [System.Net.Http.HttpClient]::new($clientHandler, $true)
}
else {
    $hc = [System.Net.Http.HttpClient]::new()
}

$fileName = Split-Path $Path -Leaf
$bytes = [System.IO.File]::ReadAllBytes($Path)
$byteContent = [System.Net.Http.ByteArrayContent]::new($bytes)

$form = [System.Net.Http.MultipartFormDataContent]::new()
$form.Add([System.Net.Http.StringContent]::new("contoso-HttpClient"), "APPLICATION_NAME")
$form.Add([System.Net.Http.StringContent]::new("2.0.0"), "APPLICATION_VERSION")
$form.Add($byteContent, "UPLOADED_FILE", "`"$fileName`"")

$t = $hc.PostAsync("http://localhost.:6054/upload", $form)
while (!$t.IsCompleted) { Start-Sleep -Milliseconds 500 }
$t.Result

image

NOTE: I'm experimenting with embedding quotes which is why the filename value appears in quotes.

If I run the Invoke-WebRequest code on PS 7.3.11 (in Windows Sandbox), this is the Fiddler output I get:
image

Which, I think, shows some of the differences @mklement0 refers to above.

@iSazonov
Copy link
Collaborator

  • v7.4+ inexplicably represents values of the name and filename parameters without double-quoting if they are ASCII-only, space-less strings; e.g. v7.3 filename="t.bin" becomes filename=t.bin in v7.4+

This was in #19467.
From the comments on developer.mozilla.org this was acceptable for browsers, but I missed that the problem might be on the server side.
Apparently the code was too old. It would be more correct to rely on .Net. Now we could use public void Add(HttpContent content, string name, string fileName)

@iSazonov iSazonov added the Up-for-Grabs Up-for-grabs issues are not high priorities, and may be opportunities for external contributors label May 29, 2024
@iSazonov
Copy link
Collaborator

Anyone interested to pull PR? I'm ready to quickly review and merge it.

@rkeithhill
Copy link
Collaborator

I finally tracked down the source to the internal website that breaks with PS 7.4. It's written in Perl, ugh. Guess I'll see how good Copilot is at explaining perl code.

It would be more correct to rely on .Net. Now we could use public void Add(HttpContent content, string name, string fileName)

This API is what I used above in my HttpClient example. When I examine the request form data generated by this API, I see it generates the filename* field as well, even though the filename is ascii chars only. Also, the .NET API does not quote the form data name values either.

@iSazonov
Copy link
Collaborator

Also, the .NET API does not quote the form data name values either.

If your HttpClient example works well with your web site we can use the API fearlessly.

@mklement0
Copy link
Contributor

mklement0 commented May 29, 2024

We can mostly fix the problem by replacing

result.Headers.ContentDisposition.FileName = file.Name;
result.Headers.ContentDisposition.FileNameStar = file.Name;

with

            result.Headers.ContentDisposition.FileName = $"\"{file.Name}\"";

This gets rid of the inappropriate FileName* parameter, and encloses the FileName parameter value in "...".

Additionally, the name parameter values must be double-quoted (again) too:

must be replaced with:

            contentDisposition.Name = $"\"{LanguagePrimitives.ConvertTo<string>(fieldName)}\"";

This gives us 7.3 behavior back, from what I can tell (except for the 7.4+ addition of header field Accept-Encoding: gzip, deflate, br)

Unfortunately, though - as in 7.3 - .NET invariably applies the "encoded-word" encoding method to file names with non-ASCII characters; e.g., tü.bin turns into filename="=?utf-8?B?dMO8LmJpbg==?="

If we had the choice, I think we should follow curl's model, which simply uses UTF-8 encoding as-is.

To quote from Appendix A of the current RFC for multipart/form-data, https://tools.ietf.org/html/rfc7578#appendix-A, emphasis added:

The handling of non-ASCII field names has changed -- the method
described in RFC 2047 is no longer recommended; instead, it is
suggested that senders send UTF-8 field names directly and that file
names
be sent directly in the form-charset
.

  • The RFC 2047 method mentioned is the "encoded-word" method that .NET, unfortunately, uses.

  • The "form-charset", in the absence of an HTML page that could provide a different charset, should default to UTF-8 too, according to section 5.1 of the same RFC, https://datatracker.ietf.org/doc/html/rfc7578#section-5.1.2. The only (unlikely) exception in our non-HTML use case is if a form has a field named _charset_, i.e. a separate part (field) whose data specifies a different character encoding (charset).

It is unfortunate that the .NET issue (dotnet/runtime#23761) was closed as by-design, although that was back in 2018, and there's a recommendation to create a new issue to request specific changes.

IETF Datatracker
This specification defines the multipart/form-data media type, which can be used by a wide variety of applications and transported by a wide variety of protocols as a way of returning a set of values as the result of a user filling out a form. This document obsoletes RFC 2388.
IETF Datatracker
This specification defines the multipart/form-data media type, which can be used by a wide variety of applications and transported by a wide variety of protocols as a way of returning a set of values as the result of a user filling out a form. This document obsoletes RFC 2388.

@rkeithhill-keysight
Copy link
Author

rkeithhill-keysight commented May 30, 2024

I tracked down the perl code for the website. I think this the relevant bits:

sub bUploadedDocumentation
{
   my %settings = %{ shift( @_ ) };  
   my $ad = $settings{ 'AD' };
   my $bFlag;

   if( param( 'APPLICATION_NAME' ) and param( 'APPLICATION_NAME' ) =~ /\w/
   and param( 'APPLICATION_VERSION' ) and param( 'APPLICATION_VERSION' ) =~ /\w/
   and param( 'UPLOADED_FILE' ) and param( 'UPLOADED_FILE' ) =~ /\w/ ){
      $bFlag = 1;
   }

   return $bFlag;
}

...

      if( $bFileTooBig ){
         print errorPage( \%settings, "The uploaded file was too big. " . sprintf( "(%3.1f GB)", $ENV{'CONTENT_LENGTH'}/(1024*1024*1024) ) . "\n" );
      } elsif( bShowLog( \%settings ) ){
         $html = showLog( \%settings );
         print $html unless $bRobot;
      } elsif( bUploadedDocumentation( \%settings ) && !$bFileTooBig ) {  # <<<  Upload test is here
         if( bValidUser( \%settings ) ){
            $html = publishDocumentation( \%settings );
            print $html unless $bRobot;
         } else {
            $error = "Invalid user credentials.";
         }
      } elsif( bValidUser( \%settings ) ){
         $html = managerView( \%settings );
         print $html unless $bRobot;
      } else {
         $html = welcome( \%settings ); # <<< I see this HTML returned
         print $html unless $bRobot;
      }

I suspect the bUploadedDocumentation sub does not return a 1 which would cause the website to return the welcome HTML, which is exactly what I see from IWR on 7.4.

@mklement0
Copy link
Contributor

I missed that the name parameter values too must be double-quoted, and I've updated my previous comment accordingly.


@rkeithhill-keysight, this difference - e.g. name=APPLICATION_VERSION (7.4+) vs. name="APPLICATION_VERSION" (7.3.x) may make the difference with respect to whether param( 'APPLICATION_VERSION' ) in the Perl code is able to retrieve the data associated with the given parameter correctly.

@MMouse23
Copy link

MMouse23 commented Jun 8, 2024

Hi @mklement0,

Can you please help with my sample, ive spent a long time getting this working and since 7.4 its no longer working.

`    write-log -message "Starting the Encoder"
  
    $Encoder = [System.Text.Encoding]::GetEncoding("ISO-8859-1")
  
    write-log -message "Reading File Bites"
  
    $FileBody = [System.IO.File]::ReadAllBytes($IconPath)
  
    write-log -message "Encoding the body"
    
    $EncodedFileBody = $Encoder.GetString($FileBody)
  
    $Lf = "`r`n" #Variable for carriage returns.
    $Boundary = (new-guid).guid
  
    $Body = (
      "--$Boundary",
      "Content-Disposition: form-data; name=`"image`"; filename=`"blob`"",
      "Content-Type: image/jpeg$Lf",
      $EncodedFileBody,
      "--$Boundary",
      "Content-Disposition: form-data; name=`"name`"$Lf",
      $IconName,
      "--$Boundary--"
    ) -join $Lf`

    $RequestPayload = @{
      Uri         = "https://$($PcClusterIP):9440/api/nutanix/v3/app_icons/upload"
      Method      = "Post"
      Body        = $Body
      Headers     = $AuthHeader
      ContentType = "multipart/form-data;boundary =$Boundary"
    }

Whats changed in 7.4, is it concidered a bug? or do i need to adopt fixes?

@rhubarb-geek-nz
Copy link

Whats changed in 7.4, is it concidered a bug? or do i need to adopt fixes?

Rather than manually creating the body of the form, let PowerShell do it for you as per the example from @mklement0

$url = 'https://postman-echo.com/post'
$filePath = 'test.bin'
$form = [ordered] @{ 
  File = Get-Item -LiteralPath $filePath
  Motörhead = 'Hüsker Dü'
}

# Submit via Invoke-RestMethod, which makes postman-echo.com
# return a JSON response that describes the original request.  
($r = 
  Invoke-RestMethod -Form $form -Uri $url -Method Post
) | Out-Host

It should then assemble the multipart form and manage the boundaries.

@mklement0
Copy link
Contributor

@MMouse23, if you can, follow @rhubarb-geek-nz's advice, though note that the textual form field will be UTF-8-encoded (see below) and you won't be able to control the media type of the file submission - PowerShell invariably uses application/octet-stream

As for what changed:

In PowerShell 7.4+, the web cmdlets (Invoke-WebRequest, Invoke-RestMethod) consistently encode text-based request bodies as UTF-8, unless explicitly specified otherwise; previously, and still in Windows PowerShell, the default was ISO-8859-1, except for JSON, which has been UTF-8-encoded since v7.0.

To use ISO-8859-1 encoding in v7.4+, either append ; charset=iso-8895-1 to the -ContentType argument or manually encode the request body as ISO-8859-1 with [System.Text.Encoding]::Latin1.GetBytes() and pass the resulting [byte[]] array directly to -Body.

For an example of the latter technique, see #21604 (comment)

@rkeithhill-keysight
Copy link
Author

I wonder if Copilot is correct. I'm beginning to think that the PERL sub bUploadedDocumentation I referenced earlier, is not finding the form fields because the values aren't quoted. Need to find some time to experiment.

image

@rkeithhill-keysight
Copy link
Author

rkeithhill-keysight commented Jun 27, 2024

Thanks @MMouse23 and @mklement0 for that script. That allowed me to precisely control the payload. If I take the PS 7.4 payload and recreate that with script, and simply quote the name values, the upload works. That's with the charset=utf-8 and the funky filename* field and not quoting the actual filename. That is, this payload works:

image

I'm still trying to track down if the PERL CGI module requires the name value to be quoted. That might be a strong-ish argument for making a change or at the very least, providing some sort of parameter to enable quoting.

The content size using ISO-8859-1 is about 50% larger than using CURL. The CURL encoded file looks to be UTF-8 but when I try UTF-8, the content size is almost double what it is for CURL.

@rkeithhill-keysight
Copy link
Author

Dang. Spoke too soon. The site is returning the expected response now for an uploaded file, but it is not displaying it afterwards. Probably an issue with the file encoding. Will continue experimenting.

@rkeithhill-keysight
Copy link
Author

rkeithhill-keysight commented Jun 28, 2024

OK, I undid the first two changes in this PR #19467 that removes the quotes from the field name values, rebuilt PowerShell and tried IWR again. It works! It not only uploads successfully but the website processes the zip file contents correctly.

@rkeithhill
Copy link
Collaborator

The comments on the previous version of src/Microsoft.PowerShell.Commands.Utility/commands/utility/WebCmdlet/Common/WebRequestPSCmdlet.Common.cs that quotes the field name values mentions this about quoting:

            // .NET does not enclose field names in quotes, however, modern browsers and curl do.
            contentDisposition.Name = "\"" + LanguagePrimitives.ConvertTo<string>(fieldName) + "\"";

Maybe it's just me but if modern browser AND curl quote, why would PowerShell not want to quote? Yeah OK, so .NET doesn't seem to. But in this case, I'd go with the modern browser/curl behavior. The question now is how to undo this. We could just undo the quoting change and leave the filenamestar change (that seems good). Or will that break folks that have come to depend on those values not being quoted. In which case, I suppose we would want to add a new irm/iwr parameter to control the quoting behavior. Thoughts?

@rkeithhill-keysight
Copy link
Author

BTW I'm still unsure about PS encoding a binary file for upload. I still haven't figured out how the curl content length is so much smaller 637,726 vs my modified, working PS that uses, I suppose utf-8, and has a content length of 1,129,311. FWIW the actual ZIP file's size is 636,390 which leads me to believe that CURL is just dumping the file's bytes into the body (no encoding). But then I'm left wondering how the heck did our PERL/CGI website rehydrate the UTF-8 encoded file contents correctly. 🤔

At the very least, encoding of binary files results in a larger, potentially significantly larger, payload to the server.

@iSazonov
Copy link
Collaborator

Thoughts?

There was a question about this in the dotnet repository and the answer was - open a new issue if you know a scenario where this doesn't work.
So it's better to fix .Net and use it than to dance around endlessly.

@mklement0
Copy link
Contributor

@rkeithhill-keysight: That is curious, because in my experiments PowerShell too dumps the raw bytes into the body (which means that the recipient cannot read the body as a whole as UTF-8 is unlikely to work, given that arbitrary binary data is highly unlikely to be well-formed UTF-8).

The size increase in your case is 77%, which cannot be explained with Base64 encoding.

So it would be good to understand what happens in your case.

how the heck did our PERL/CGI website rehydrate the UTF-8 encoded file contents correctly. 🤔

The textual (non-file-data) part of your request seems to be all-ASCII, so if the website uses ISO-8859-1 decoding, it'll work too.

@mklement0
Copy link
Contributor

mklement0 commented Jun 28, 2024

@iSazonov

So it's better to fix .Net

That could take a looong time.

Also note that we previously already deviated from .NET's behavior, which was a positive (that was regrettably undone in 7.4).

Given that the unquoted field-name change is fairly recent, and given that quoting is generally expected, my guess is that undoing the 7.4 changes amounts to a bucket-3 change.

In essence, I suggest doing what is described in #23843 (comment), which looks like reverting #19467, including getting rid of the filename* field, which should never have been added (and .NET should never have implemented support for it, as it doesn't belong in form data).

The only downsides:

  • We'll need to keep an eye on future .NET releases, should they ever implement quoting of fields there.

  • (As before), we cannot escape .NET's unfortunate "encoded-word" double encoding for field and file names with non-ASCII characters (e.g., filename="=?utf-8?B?dMOrc3QuYmlu?=" in lieu of straight UTF-8 encoding of filename="tëst.bin").

@mklement0
Copy link
Contributor

mklement0 commented Jun 28, 2024

P.S.: Another data point re unquoted field names being problematic: popular HTTPS proxy / debugger https://mitmproxy.org/ doesn't recognize such form submissions as valid in its GUI (it complains about not being to parse them and falls back to a raw textual view).

@iSazonov
Copy link
Collaborator

Interesting curl discussion curl/curl#7789

@mklement0
Copy link
Contributor

mklement0 commented Jun 30, 2024

@iSazonov: Interesting, yes, but that's a separate issue that points to a separate problem in PowerShell:

  • The linked issue - in the context of the double-quoting of form field names that curl invariably performs - is about how to escape characters in the name - notably ", \r, \n (others?) - that require escaping for syntactic reasons (they all appear to be ASCII-range chars.), and curl's (relatively) new behavior is to percent-escape them. Aside from that, curl uses direct UTF-8 encoding (neither percent-encoding of non-ASCII characters nor the encoded-word technique).

    • E.g., curl -F "`"`r`n`t\Motörhead=..." ... turns into name="%22%0D%0A \Motörhead" in the form part.
  • PowerShell's web cmdlets reject such form field names (maybe at the .NET level); trying the above triggers the following error:

    • The format of value '" \Motörhead' is invalid.
    • Curiously:
      • If you remove the embedded ", it works, but only if at least one non-ASCII chars. is also present.
      • E.g., "`r`n`t\Motörhead" works (triggers encoded-word encoding), but "`r`n`t\Motorhead" (o instead of ö) doesn't.

I haven't looked into what constraints the relevant spec places on form names.

@Khaos66
Copy link

Khaos66 commented Aug 28, 2024

Just wanted to add that Sophos WAF is blocking POST requests because the request doesn't adhere to RFCs ...
Using curl (or Postman for debugging) works ofc...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Needs-Triage The issue is new and needs to be triaged by a work group. Up-for-Grabs Up-for-grabs issues are not high priorities, and may be opportunities for external contributors
Projects
None yet
Development

No branches or pull requests

8 participants