-
Notifications
You must be signed in to change notification settings - Fork 7.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Invoke-WebRequest should save the file name with the name in Content-Disposition #11671
Comments
You must now specify it in a variable. When downloading large files will become difficult |
If I understand correctly you want to set only target directory and use file name from |
I agree that the following would make perfect sense and constitutes a simple enhancement to
This has been asked for twice before - #6618 and #6537 - but shot down on what appears to me to be flawed reasoning: this is a simple enhancement to an extant feature that makes its use more convenient, not an attempt at "feature creep". @SteveL-MSFT, given that and that this is now being asked for for the third time - can we revisit this? |
P.S.: Given that the file name would then be controlled by the server of origin, there's an even more pressing need to implement #3174 and to use |
Need to get 'Content-Disposition' approach, there is currently no method.To get it you must store "iwr" as a variable. It will take up a lot of memory. Through links, sometimes it's hard to get a file name, |
There is a reference to RFC Download cmdlet PowerShell/PowerShell-RFC#124 |
@he852100 , yes, the current workaround is both inconvenient and inefficient (not sure if how the header field is parsed below is robust enough): $url = 'https://github.com/PowerShell/Win32-OpenSSH/releases/download/v8.1.0.0p1-Beta/OpenSSH-Win32.zip'
$response = iwr $url # stores the full file content in memory
$filename = $response.Headers.'Content-Disposition' -replace '.*\bfilename=(.+)(?: |$)', '$1'
$outDir = Convert-Path $pwd
[IO.File]::WriteAllBytes("$outDir/$filename", $response.Content) Yes, @iSazonov, but that is a separate matter - a dedicated downloading cmdlet with all the bells and whistles is also nice to have. As stated, this is simply about making an existing feature more convenient - arguably, it should have worked that way from the beginning. |
@mklement0 The RFC comes from a fact that web cmdlets are already very complex - they have over 30 parameters! And now we discuss that |
So it's a good thing that this enhancement doesn't require a new parameter.
The semantics would be the exact same as with No additional complexity, just long-established and well-known patterns. |
@iSazonov Can I transfer You just need to provide this feature,The user determines the processing method by the variable value.
|
I should again notice that web cmdlets is very complex. We should avoid to add more complicity. For file processing scenarios we have @markekraus 's RFC. |
As noted, enhancing |
@he852100: The existing I don't see a need for a new parameter that passes just the header through - the proposed enhancement (arguably: fix) requires no new parameters and is the most straightforward solution to the problem. |
Agree that the proposed enhancement and the RFC are not mutually exclusive and downloading the whole file just to get the filename is a pretty bad experience. Overloading |
Thanks, @SteveL-MSFT. Adding a new parameter is still preferable to not implementing this enhancement at all, but please consider this regarding overloading
|
@mklement0 I would support someone implementing it as an experimental feature and so we can get real world usage to validate the experience :) |
@CarloToso has graciously tackled an implementation, in the following PR: New information has since come to light, which requires further discussion:
My suggestion is as follows:
|
Since this is already implemented in #19007 we can merge the PR. It seems it works well without experimental feature. |
Fair enough - the |
@mklement0 As we wait for #19007 to be merged this is the current state in local: internal static string GetOutFilePath(HttpResponseMessage response, string _qualifiedOutFile)
{
if (Directory.Exists(_qualifiedOutFile))
{
string contentDisposition = response.Content.Headers.ContentDisposition?.FileNameStar ?? response.Content.Headers.ContentDisposition?.FileName;
string pathAndQuery = response.RequestMessage.RequestUri.PathAndQuery;
if (!string.IsNullOrEmpty(contentDisposition))
{
// Get file name from Content-Disposition header if present
return Path.Join(_qualifiedOutFile, contentDisposition);
}
else if (pathAndQuery != "/")
{
string lastUriSegment = System.Net.WebUtility.UrlDecode(response.RequestMessage.RequestUri.Segments[^1]);
string requestUri = "CURL_URI_STRING"; // We get the last segment, must be sanitized
// Should I use ValueTuple<bool, bool> or (bool, bool) ?
ValueTuple<bool, bool> test = (lastUriSegment.Contains('.'), requestUri.Contains('.'));
// Is this easily understandable?
string result = test switch
{
(true, _) or
(false, false) => lastUriSegment,
(false, true) => requestUri,
};
//I need better names for test and result
// Get file name from last segment of Uri
return Path.Join(_qualifiedOutFile, result);
}
else
{
//Warning here?
// File name not found use sanitized Host name instead
return Path.Join(_qualifiedOutFile, response.RequestMessage.RequestUri.Host.Replace('.', '_'));
}
}
else
{
return _qualifiedOutFile;
}
} Could you please review? |
Thank you, @CarloToso. I'll leave the question about the specific of the C# code to others (it looks understandable to me), by my thoughts on the logic are:
This presents two challenges:
|
@mklement0 Thank you for your thorough review.
I will wait for @SteveL-MSFT to decide the best course of action before continuing. |
Thanks, @CarloToso.
I do see your point, but with the last-input-URI-segment logic that risk is known, unlike with the other two options.
I do see the parameter-proliferation problem; but if opt-in is desirable (TBD), there's no other way. An opt-in would also allow us to report the then-unknown resulting filename by default, instead of the user needing to remember to ask for it via
Since nothing has been released yet that would allow you to pass a mere directory path to That said, with |
@SteveL-MSFT friendly ping |
I don't see a problem if we try Content-Disposition first and then make a fallback to last segment. If an user is not satisfied with this he can always use the explicit name as he does now. |
Oh, finally, thank you for this @CarloToso 🥇 |
@CarloToso The provided code looks more-or-less the same as what Chromium is doing, sans some sanitization: https://github.com/chromium/chromium/blob/11a147047d7ed9d429b53e536fe1ead54fad5053/net/base/filename_util_internal.cc#L219 In general, it would make sense to me to stay close to the behavior of major browsers as much as possible, since that's what most users likely expect as the default. |
@CarloToso Note that both the Content-Disposition filename value and the URL decoded last segment may contain invalid chars (especially / and \, which could lead to nasty path traversal issues, and also the usual set of forbidden characters ( The linked Chromium function has quite a lot of various checks, I'm trying to port the relevant parts to C# now (for an unrelated project), I'll post the final code if I get something usable. |
This is what I have now (Windows-only, although only minor modifications should be needed to make it work on Linux), seems to work reasonably intuitively from my testing: Source codeusing System;
using System.IO;
using System.Linq;
using System.Net.Http.Headers;
using System.Text.RegularExpressions;
using System.Web;
namespace Pog.Utils.Http;
/// <summary>
/// Utility class for getting the server-provided file name for a downloaded file.
/// </summary>
public static class HttpFileNameParser {
/// <param name="resolvedUri">Resolved URI (after redirects) of the downloaded file.</param>
/// <param name="contentDisposition">The Content-Disposition header from the server response, if any.</param>
public static string GetDownloadedFileName(Uri resolvedUri, ContentDispositionHeaderValue? contentDisposition) {
// link to a similar function in Chromium:
// https://github.com/chromium/chromium/blob/11a147047d7ed9d429b53e536fe1ead54fad5053/net/base/filename_util_internal.cc#L219
// Chromium also takes into account the mime type of the response, we're not doing that for now
// try available names in the order of preference, use the first one found that is valid
string? fileName = null;
// if Content-Disposition is set and valid, use the specified filename
fileName ??= SanitizeDownloadedFileName(GetRawDispositionFileName(contentDisposition));
// use the last segment of the resolved URL
var lastSegment = resolvedUri.Segments.LastOrDefault();
fileName ??= SanitizeDownloadedFileName(HttpUtility.UrlDecode(lastSegment));
// use the hostname
fileName ??= SanitizeDownloadedFileName(resolvedUri.Host);
// fallback default name
fileName ??= "download";
return fileName;
}
private static string? GetRawDispositionFileName(ContentDispositionHeaderValue? contentDisposition) {
if (contentDisposition is not {DispositionType: "attachment"}) {
return null;
}
var headerVal = contentDisposition switch {
{FileNameStar: not null} => contentDisposition.FileNameStar,
{FileName: not null} => contentDisposition.FileName,
_ => null,
};
if (headerVal == null) {
return null;
}
if (headerVal.StartsWith("\"") && headerVal.EndsWith("\"")) {
// ContentDispositionHeaderValue parser leaves the quotes in the parsed filename, strip them
headerVal = headerVal.Substring(1, headerVal.Length - 2);
}
return headerVal;
}
private static readonly Regex InvalidDosNameRegex = new Regex(@"^(CON|PRN|AUX|NUL|COM\d|LPT\d)(\..+)?$",
RegexOptions.Compiled | RegexOptions.IgnoreCase | RegexOptions.CultureInvariant);
private static string? SanitizeDownloadedFileName(string? fileName) {
// sources for relevant functions in Chromium:
// GetFileNameFromURL: https://github.com/chromium/chromium/blob/11a147047d7ed9d429b53e536fe1ead54fad5053/net/base/filename_util_internal.cc#L119
// GenerateSafeFileName: https://github.com/chromium/chromium/blob/bf9e98c98e8d7e79befeb057fde42b0e320d9b19/net/base/filename_util.cc#L163
// SanitizeGeneratedFileName: https://github.com/chromium/chromium/blob/11a147047d7ed9d429b53e536fe1ead54fad5053/net/base/filename_util_internal.cc#L79
// list of invalid filenames on Windows: https://stackoverflow.com/a/62888
if (fileName == null) {
return null;
}
// Win32 does not like trailing '.' and ' ', remove it
fileName = fileName.TrimEnd('.', ' ');
// replace any invalid characters with _
fileName = string.Join("_", fileName.Split(Path.GetInvalidFileNameChars()));
if (InvalidDosNameRegex.IsMatch(fileName)) {
// is a DOS file name, prefix with _ to lose the special meaning
fileName = "_" + fileName;
}
// if fileName is empty or only consists of invalid chars (or _), skip it
if (fileName.All(c => c == '_')) {
return null;
}
return fileName;
}
} Unit testsusing System.Net.Http.Headers;
using Pog.Utils.Http;
using Xunit;
namespace Pog.Tests;
// list of test cases for Content-Disposition parsing: http://test.greenbytes.de/tech/tc2231/
public class HttpFileNameParserTests {
[Fact]
public void TestDosNames() {
TestSingle("_aux", "https://example.com/aux");
TestSingle("_AUX.txt", "https://example.com/AUX.txt");
TestSingle("_COM9", "https://example.com/COM9");
}
[Fact]
public void TestPriority() {
TestSingle("header", "https://host/segment", "attachment; filename=header");
TestSingle("segment", "https://host/segment");
TestSingle("segment", "https://host/segment", "attachment; filename=_");
TestSingle("host", "https://host/");
TestSingle("host", "https://host/_", "attachment; filename=_");
}
[Fact]
public void TestSanitization() {
// filename*
TestSingle("hello_world.txt", "https://host/segment", "attachment; filename*=utf-8''hello%2Fworld.txt");
// filename
TestSingle("hello_world.txt", "https://host/segment", "attachment; filename=\"hello/world.txt\"");
// segment
TestSingle(".._hello_world.txt", "https://host/..%2Fhello%2Fworld.txt");
TestSingle("invalid_chars_.txt", "https://host/invalid%2fchars%0a.txt");
}
private static void TestSingle(string expected, string url, string? dispositionHeader = null) {
var disposition = dispositionHeader == null ? null : ContentDispositionHeaderValue.Parse(dispositionHeader);
Assert.Equal(expected, HttpFileNameParser.GetDownloadedFileName(new Uri(url), disposition));
}
} |
This remains a valid issue that should be addressed. |
How to get a file name from the server when downloading a file?
The server has a
Content-Disposition
property, how to get it, and is used when downloading filesSteps to reproduce
Expected behavior
Actual behavior
Environment data
The text was updated successfully, but these errors were encountered: