Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

C# Uri with underscore #44719

Open
ptzremote opened this issue Nov 16, 2020 · 8 comments
Open

C# Uri with underscore #44719

ptzremote opened this issue Nov 16, 2020 · 8 comments

Comments

@ptzremote
Copy link

Description

I try to use Uri class to validate and extract host from URL.
Where are two cases:

  • Uri.TryCreate("http://_contoso._com", UriKind.Absolute, out _) returns true
  • Uri.TryCreate("http://_c._contoso._com", UriKind.Absolute, out _) returns false

Based on Top Level Domain Name Specification:

A TLD label MUST be at least two characters long and MAY be as long as 63 characters - not counting any leading or trailing periods (.). It MUST consist of only ASCII characters from the groups "letters" (A-Z), "digits" (0-9) and "hyphen" (-), and it MUST start with an ASCII "letter", and it MUST NOT end with a "hyphen". Upper and lower case MAY be mixed at random, since DNS lookups are case-insensitive.

So, looks like http://_contoso._com is invalid absolute URI.

Configuration

C# (.NET Framework 4.8)

@Dotnet-GitSync-Bot Dotnet-GitSync-Bot added area-System.Net untriaged New issue has not been triaged by the area owner labels Nov 16, 2020
@ghost
Copy link

ghost commented Nov 16, 2020

Tagging subscribers to this area: @dotnet/ncl
See info in area-owners.md if you want to be subscribed.

Issue Details
Description:

Description

I try to use Uri class to validate and extract host from URL.
Where are two cases:

  • Uri.TryCreate("http://_contoso._com", UriKind.Absolute, out _) returns true
  • Uri.TryCreate("http://_c._contoso._com", UriKind.Absolute, out _) returns false

Based on Top Level Domain Name Specification:

A TLD label MUST be at least two characters long and MAY be as long as 63 characters - not counting any leading or trailing periods (.). It MUST consist of only ASCII characters from the groups "letters" (A-Z), "digits" (0-9) and "hyphen" (-), and it MUST start with an ASCII "letter", and it MUST NOT end with a "hyphen". Upper and lower case MAY be mixed at random, since DNS lookups are case-insensitive.

So, looks like http://_contoso._com is invalid absolute URI.

Configuration

C# (.NET Framework 4.8)

Author: ptzremoute
Assignees: -
Labels:

area-System.Net, untriaged

Milestone: -

@karelz
Copy link
Member

karelz commented Nov 16, 2020

Fails also on .NET 5 - here's minimal repro poking at various variants:

    static void Main()
    {
        PrintAbsoluteUri_IsValid("http://a.com");
        PrintAbsoluteUri_IsValid("http://a._com");
        PrintAbsoluteUri_IsValid("http://a.b._com");
        PrintAbsoluteUri_IsValid("http://a.c_m");
        PrintAbsoluteUri_IsValid("http://a.b.c_m");
        PrintAbsoluteUri_IsValid("http://a.com_");
        PrintAbsoluteUri_IsValid("http://a.b.com_");
        PrintAbsoluteUri_IsValid("http://a_b.com");
        PrintAbsoluteUri_IsValid("http://_a.com");
        PrintAbsoluteUri_IsValid("http://a_.com");
        PrintAbsoluteUri_IsValid("http://a._b.com");
        PrintAbsoluteUri_IsValid("http://a.b_.com");
        PrintAbsoluteUri_IsValid("http://a.a_b.com");
        PrintAbsoluteUri_IsValid("http://_a.b.com");
    }
    static void PrintAbsoluteUri_IsValid(string uri)
    {
        Console.WriteLine(uri + " ... " + Uri.TryCreate(uri, UriKind.Absolute, out _));
    }
http://a.com ... True
http://a._com ... True
http://a.b._com ... False
http://a.c_m ... True
http://a.b.c_m ... True
http://a.com_ ... True
http://a.b.com_ ... True
http://a_b.com ... True
http://_a.com ... True
http://a_.com ... True
http://a._b.com ... True
http://a.b_.com ... True
http://a.a_b.com ... True
http://_a.b.com ... True

@MihaZupan any thoughts?

@karelz karelz added bug and removed untriaged New issue has not been triaged by the area owner labels Nov 16, 2020
@karelz karelz added this to the Future milestone Nov 16, 2020
@wfurt
Copy link
Member

wfurt commented Nov 16, 2020

While original RFC did not allow '_', it is commonly used.
The other part is fact that DNS is only one mechanism for name resolution. URL processing is not necessarily linked to that and I don't think URL is setup to enforce full DNS validation.

@MihaZupan
Copy link
Member

I agree with Tomas, '_' is historically in use, there's no point in avoiding it outright.

It isn't as simple as being accepted by Uri or not, the input examples also differ by HostNameType:

Dns   http://a.com
Basic http://a._com
      http://a.b._com
Dns   http://a.c_m
Dns   http://a.b.c_m
Dns   http://a.com_
Dns   http://a.b.com_
Dns   http://a_b.com
Basic http://_a.com
Dns   http://a_.com
Basic http://a._b.com
Dns   http://a.b_.com
Dns   http://a.a_b.com
Basic http://_a.b.com

Dns will be matched first if possible. Every input that contains a label where _ is the first character is not matched by it. Specifically because of this IsASCIILetterOrDigit check.

The rest are matched by UNC, which shows up as 'basic' in Uri.HostNameType.

The interesting case here is that a._com is matched, while a.b._com is not. I believe this specific case is a bug as it looks like we're not properly resetting the validShortName flag between labels in UNC validation.

@wfurt
Copy link
Member

wfurt commented Jan 25, 2023

Is there something we should do here @MihaZupan? If not, we should perhaps close it. While the DNS my not allow it, URL does not necessarily depend on it (e.g. it is only subset)

@pimvd
Copy link

pimvd commented Mar 17, 2023

Any updates on this? I'm facing this same issue in our use case where we are trying to create a URI object for (example) URL "http://test.test._test.pimvandijk.eu". This is an example-url, but we will use it like this a lot (the underscore at the third part of the url).

Error is: Invalid URI: The hostname could not be parsed.

You can check it via this small piece of code:
[Fact] public void Test() { string domain = "http://test.test._test.pimvandijk.eu"; var uri = new Uri(domain); }

The url works when I'm checking the txt-records via Google Dig:
image

Checked on .net 7 & C# 11

Interesting is that, as @MihaZupan mentioned, "http://test._test.test.pimvandijk.eu" is working perfect, but putting the underscore on the next part of the url fails.

@pimvd
Copy link

pimvd commented Mar 17, 2023

I found what's happening. Let's take http://test1.test2._test3.pimvandijk.eu for example.

In the UncNameHelper.IsValid it will first check "test1.". This is valid and will take us to the next for-loop. Here it checks "test2." and at the check of the . it will set validShortName to false. Coming to the _ of _test3, which will return false at

.

As mentioned in the comment on line 77 ("Subsequent segments must start with a letter or a digit") I think we have to add underscore to this check, but only if there are coming a letter or digit after the underscore.

@MullenStudio
Copy link

I recently hit this issue as well. As previous comments pointed out, the issue happens when the third or later segment of the URL starts with '_', like a.b._c.d.
Based on the code, such URL would fail DomainNameHelper.IsValidByIri first since this method checks if the first letter of any segment is a letter or digit (which is a weird check in my opinion).
Then it would run the UncNameHelper.IsValid check. In this check, the comment in the code says the first segment allows '_' but following segments don't. However, if follow the comments, the URL like a._b.c should be invalid but the method would return true, The reason is validShortName is not reset to false after checking the first segment.
However, the question is what the expected design should be (should the examples like a.b._c.d and a._b.c considered as valid or invalid?). I feel that while UncNameHelper has issue (comment and behavior are not inconsistent), DomainNameHelper may also need change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants