Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Uri] Does not allow $ in a hostname #36595

Closed
itodirel opened this issue May 16, 2020 · 35 comments
Closed

[Uri] Does not allow $ in a hostname #36595

itodirel opened this issue May 16, 2020 · 35 comments
Milestone

Comments

@itodirel
Copy link

Issue Title

System.Uri cannot parse trailing $ sign in paths

General

We want to store and load data from WSL2 rootfs projection, which is exposed into Windows as "\\wsl$", we use this UNC path internally, to construct and manipulate the path. However, System.Uri cannot parse "\\wsl$" and throws a format exception:

System.UriFormatException: 'Invalid URI: The hostname could not be parsed.'

Example code and repro

var uri = new System.Uri(@"\\wsl$");

@scalablecory scalablecory transferred this issue from dotnet/core May 16, 2020
@Dotnet-GitSync-Bot Dotnet-GitSync-Bot added area-System.Net untriaged New issue has not been triaged by the area owner labels May 16, 2020
@ghost
Copy link

ghost commented May 16, 2020

Tagging subscribers to this area: @dotnet/ncl
Notify danmosemsft if you want to be subscribed.

@scalablecory
Copy link
Contributor

I don't think I've ever seen that format of URL before. @MihaZupan do we have a validation bug?

@jeffhandley
Copy link
Member

This seems like something we'll want to support; I recommend we triage this into the 5.0 milestone.

@danmoseley
Copy link
Member

https://devblogs.microsoft.com/commandline/whats-new-for-wsl-in-windows-10-version-1903/

Why is the WSL resource name in the filepath called wsl$? Since wsl is a short acronym we realize that some resources on networks may already have that name. So we’ve added a dollar sign, since a machine name can’t have a dollar sign in it, which ensures that the name will be accessible with any existing network configuration.

Seems we already have special hostname validation just for UNC's. https://source.dot.net/#System.Private.Uri/System/UncNameHelper.cs,68

@MihaZupan
Copy link
Member

This is an issue with that part of a UNC path being the host - $ in the host name will not be recognized.

The validation fails in IsValidDomainLabelCharacter for http Uris or UncNameHelper.IsValid for UNC paths.

This stems from the fact that System.Uri validates hostnames as a Server-based Naming Authority for DNS resolution and not by the less-strict, generic Registry-based Naming Authority rules as defined in RFC 2396 and 3986.

@MihaZupan MihaZupan changed the title System.Uri cannot parse $ sign in paths [Uri] Does not allow $ in a hostname May 18, 2020
@paulomorgado
Copy link
Contributor

paulomorgado commented May 18, 2020

So, should \\.\C$ be valid?

@MihaZupan
Copy link
Member

It won't be becuase . alone is not a valid host either.

$ itself in the path is not problematic - for example \\foo\C$ works fine.

@MihaZupan
Copy link
Member

@itodirel How come you are using System.Uri in this case over System.IO.Path, as it sounds like you are making requests to the file system?

@paulomorgado
Copy link
Contributor

paulomorgado commented May 18, 2020

I'm not going to speak for @itodirel, but I've built systems in the past that used URIs to specify actions.

  • HTTP/HTTPS schemes were to launch browser tabs with the URI.
  • FILE scheme was used to open files
  • Other application specific schemes (like CMD, SESSION, etc.) were also used.

Relative URIs were also used to add/override parameters.

@itodirel
Copy link
Author

Regarding the question about why using System.Uri. It isn't my choice, and I can't change it easily. The dependency exists in MSBuild today, MSBuild will fail to load/build projects, because it uses System.Uri internally when it tries to build a project located on \\wsl$. CPS will fail to load projects, because internally it uses MSBuild for the evaluation, and for the same reason MSBuild uses System.Uri. Same with Visual Studio, the New Project Dialog, takes a dependency on System.Uri, and fails to create projects on \\wsl$. Do we want to tell all these folk to not use System.Uri? That would be a massive change, in code that might not been changed or touched for 20 years, or in legacy code, it can be done, with effort, but I'm trying to understand why can't System.Uri support it? Is it a bug in System.Uri?

@scalablecory
Copy link
Contributor

scalablecory commented May 19, 2020

It might be reasonable to relax the hostname validation into just reg-name, at least for the file schema. I tried finding some sort of standard Windows path -> URI transform recommendation but can't see anything.

The ramification being that it would cause code to fail later rather than sooner if they try to feed a file host into a resolver, but that seems like a rare scenario?

@karelz
Copy link
Member

karelz commented May 19, 2020

Is there a specification for UNC file names and allowed characters? We should likely comply with that. If $ is allowed, then it is technically a bug.

A thing to consider for perspective is that it is a bug existing for 20 years -- either no one uses System.Uri with weird hostnames, or usage of $ in hostnames is extremely rare ... doesn't make it less severe for your scenario, but explains that it is a corner case not hit so far :(

@paulomorgado
Copy link
Contributor

paulomorgado commented May 19, 2020

In this case, the $ is not part of the hostname. It's part of the path.

`\hostname\share$' works:

AbsolutePath   : /share$
AbsoluteUri    : file://hostname/share$
LocalPath      : \\hostname\share$
Authority      : hostname
HostNameType   : Dns
IsDefaultPort  : True
IsFile         : True
IsLoopback     : False
PathAndQuery   : /share$
Segments       : {/, share$}
IsUnc          : True
Host           : hostname
Port           : -1
Query          :
Fragment       :
Scheme         : file
OriginalString : \\hostname\share$
DnsSafeHost    : hostname
IdnHost        : hostname
IsAbsoluteUri  : True
UserEscaped    : False
UserInfo       :

Also, new Uri(@"\\share$", UriKind.Relative) kind of works:

AbsolutePath   :
AbsoluteUri    :
LocalPath      :
Authority      :
HostNameType   :
IsDefaultPort  :
IsFile         :
IsLoopback     :
PathAndQuery   :
Segments       :
IsUnc          :
Host           :
Port           :
Query          :
Fragment       :
Scheme         :
OriginalString : \\share$
DnsSafeHost    :
IdnHost        :
IsAbsoluteUri  : False
UserEscaped    : False
UserInfo       :

@karelz
Copy link
Member

karelz commented May 19, 2020

I am confused. I understood the original report as \\wsl$ is a hostname. (likely a fake one like localhost, but a hostname) Did I misunderstand?
If I did, what is it then?

@scalablecory
Copy link
Contributor

I am confused. I understood the original report as '\wsl$' is a hostname. (likely a fake one like localhost, but a hostname) Did I misunderstand?

new Uri("\\wsl$") parses as an absolute URI, so it interprets the wsl$ as a hostname.

new Uri("\\wsl$", UriKind.Relative) parses as a relative, so it interprets the wsl$ as part of the path/query.

@Eilon
Copy link
Member

Eilon commented May 19, 2020

Please note that in some examples above, the "\\..." is not C# string-encoded, so people probably mean @"\\...".

@Eilon
Copy link
Member

Eilon commented May 19, 2020

Also worth noting that \\wsl$ is not a valid URI at all, so I don't think that on the face of it making System.Uri work with it is critical. I am worried about potential regressions in other System.Uri cases.

Definition of URI from the RFC: https://tools.ietf.org/html/rfc3986#section-3

  1. Syntax Components

The generic URI syntax consists of a hierarchical sequence of
components referred to as the scheme, authority, path, query, and
fragment.

  URI         = scheme ":" hier-part [ "?" query ] [ "#" fragment ]

  hier-part   = "//" authority path-abempty
              / path-absolute
              / path-rootless
              / path-empty

So while it may make sense to fix this one case of a $ in a UNC path, I'm worried that this is a slippery slope.

I would suggest instead that a new factory method Uri.FromUnc(string uncPath) be added, which supports UNC paths and generates a file:// URI from it. Then MSBuild and other consumers can call this API. Or even a fancier (and far more dangerous API): Uri.MakeThisArbitraryStringIntoAUriPlease(string randomString).

@danmoseley
Copy link
Member

As noted offline, before going to far we probably want to understand whether this would help @itodirel anyway, given any change would almost surely only be in .NET Core. (He mentioned Visual Studio which still runs on .NET Framework.)

@MihaZupan
Copy link
Member

MihaZupan commented May 19, 2020

Also, new Uri("\share$", UriKind.Relative) kind of works

Yes, but relative is more of a Uri string wrapper that doesn't provide much utility.

Also worth noting that \\wsl$ is not a valid URI at all

It's not valid because of the $ being in the host - System.Uri supports implicit file as input, so \\wsl\foo without the explicit file scheme is fine.

If you don't think about it being in an implicit file form though, the format itself is not against the Uri spec - it's just a question of what kind of hostname rules are used. See #36595 (comment)

Or even a fancier (and far more dangerous API): Uri.MakeThisArbitraryStringIntoAUriPlease(string randomString)

I am against this. UriKind.Relative is kind-of this already - a string wrapper with no utility. If a string can't be parsed and understood by Uri, it can't provide info with properties either.

If we decide this is something we would like to support, the question of allowing $ in the host opens up discussion for whether we should support Registry-based Naming Authority in its entirety (that would include more characters and percent-encoding) or just make an exception for $.

@Eilon
Copy link
Member

Eilon commented May 19, 2020

I am against this.

I am too 😄

@MihaZupan MihaZupan removed the untriaged New issue has not been triaged by the area owner label May 19, 2020
@rainersigwald
Copy link
Member

Just a note that most of MSBuild's dependencies on this are inherited from System.Xml, like XmlReader.BaseURI returning a string like file:////wsl$/Ubuntu/home/raines/msbuild/src/Build/Microsoft.Build.csproj that we want to translate back into a local path (which historically used new Uri(reader.BaseURI).LocalPath).

A fix would only systematically help VS if it was in Framework, though.

@Eilon
Copy link
Member

Eilon commented May 19, 2020

I know nothing of MSBuild's use case of this scenario, but XmlReader.BaseURI is abstract, so maybe MSBuild can do some hackery to override that property and do something "smarter" (i.e. more correct) with UNC paths?

(I get that these types are usually created from factories, but those instances can maybe be wrapped in new custom XmlReader types that delegate all behavior to the broken inner instance except this one property, where it will do the "smarter" thing.)

@paulomorgado
Copy link
Contributor

Probably it's what MSBuild is feeding to the Uris that needs to be changed.

@MihaZupan
Copy link
Member

We've discussed this in triage:

  • UNC spec requires the host to be a FQDN. As $ is not a valid character for DNS hostnames, a path like \\wsl$ technically violates the spec. We should seek guidance from the spec's maintainers regarding this.
  • RFC 3986 allows such hostnames under the reg-name syntax definition, it is System.Uri that chooses to use a more-strict DNS validation for hostnames.
  • We are willing to relax this restriction for UNC paths only (both in implicit \\wsl$ and explicit file:////wsl$ form). We don't see an upside to allowing it for other schemes (such as http).

@Eilon
Copy link
Member

Eilon commented May 20, 2020

Would it be worth asking the Microsoft owners of WSL to consider changing their choice of character? I get the wsl$ is baked in now, and might have to keep being supported, but perhaps if they added another alias such as wsl_dollar it would help alleviate headaches for people using libraries/APIs/frameworks that use more strict (and arguably correct) interpretations of various specs.

@zivkan
Copy link
Member

zivkan commented May 20, 2020

File shares ending in $ appear to be called "Administrative shares", and have been supported by Windows for decades: https://en.wikipedia.org/wiki/Administrative_share

@karelz
Copy link
Member

karelz commented May 20, 2020

@zivkan share names with $ are working correctly to our best knowledge. The name in question here is hostname where it is technically not allowed per spec, although wsl$ uses it that way.

@itodirel
Copy link
Author

@Eilon I'm meeting with WSL team on Friday, that's was actually one of my questions and asks to them, if anyone here wants to join us or come, please let me me know.

@zivkan
Copy link
Member

zivkan commented May 20, 2020

@zivkan share names with $ are working correctly to our best knowledge. The name in question here is hostname where it is technically not allowed per spec, although wsl$ uses it that way.

Oops, sorry. Yes, I was totally mixed up.

@karelz
Copy link
Member

karelz commented May 20, 2020

@itodirel I will follow up with you offline on the meeting.

@karelz
Copy link
Member

karelz commented Jun 2, 2020

Based on offline discussion, WSL team is considering change of the name (as it violates spec). Closing for now, we can reopen and reconsider if things change in future.

@karelz karelz closed this as completed Jun 2, 2020
@karelz karelz added this to the 5.0 milestone Jun 2, 2020
@itodirel
Copy link
Author

itodirel commented Jun 2, 2020

Thanks @karelz. I have met with the WSL team, they are open to changing the share name to work with System.Uri, they have a candidate for a new name, and are working on telemetry to validate that the new name does not conflict with other names. I will provide more updates here once I have them.

@danmoseley
Copy link
Member

@itodirel do you have an update to share? I will unlock meantime.😀

@rlabrecque
Copy link

rlabrecque commented Mar 30, 2021

Oh looks like as of Windows 10 20175 they added support for \\wsl\ instead of \\wsl$\ with the dollar sign version being a legacy fallback.
https://docs.microsoft.com/en-us/windows/wsl/release-notes

This build should be getting generally released soon enough. (Edited: I thought this was coming in 21H1, I was wrong. Probably 21H2?)

@ghost ghost locked as resolved and limited conversation to collaborators May 21, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests