Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mapDomain doesn’t follow IDNA separator requirements #11

Closed
mathiasbynens opened this issue Sep 20, 2012 · 5 comments
Closed

mapDomain doesn’t follow IDNA separator requirements #11

mathiasbynens opened this issue Sep 20, 2012 · 5 comments

Comments

@mathiasbynens
Copy link
Owner

As reported by @annevk, mapDomain does not seem to follow IDNA separator requirements.

http://logbot.glob.com.au/?c=freenode%23whatwg&s=18%20Sep%202012&e=18%20Sep%202012#c722062

From http://tools.ietf.org/html/rfc3490#section-3.1 (IDNA 2003):

3.1 Requirements

   IDNA conformance means adherence to the following four requirements:

   1) Whenever dots are used as label separators, the following
      characters MUST be recognized as dots: U+002E (full stop), U+3002
      (ideographic full stop), U+FF0E (fullwidth full stop), U+FF61
      (halfwidth ideographic full stop).

   2) Whenever a domain name is put into an IDN-unaware domain name slot
      (see section 2), it MUST contain only ASCII characters.  Given an
      internationalized domain name (IDN), an equivalent domain name
      satisfying this requirement can be obtained by applying the
      ToASCII operation (see section 4) to each label and, if dots are
      used as label separators, changing all the label separators to
      U+002E.

   3) ACE labels obtained from domain name slots SHOULD be hidden from
      users when it is known that the environment can handle the non-ACE
      form, except when the ACE form is explicitly requested.  When it
      is not known whether or not the environment can handle the non-ACE
      form, the application MAY use the non-ACE form (which might fail,
      such as by not being displayed properly), or it MAY use the ACE
      form (which will look unintelligle to the user).  Given an
      internationalized domain name, an equivalent domain name
      containing no ACE labels can be obtained by applying the ToUnicode
      operation (see section 4) to each label.  When requirements 2 and
      3 both apply, requirement 2 takes precedence.

   4) Whenever two labels are compared, they MUST be considered to match
      if and only if they are equivalent, that is, their ASCII forms
      (obtained by applying ToASCII) match using a case-insensitive
      ASCII comparison.  Whenever two names are compared, they MUST be
      considered to match if and only if their corresponding labels
      match, regardless of whether the names use the same forms of label
      separators.
@annevk
Copy link

annevk commented Sep 20, 2012

I'm not entirely sure whether IDNA2008 requires this too by the way, but it seems highly unlikely browsers will ever move away from supporting these additional label separators as content relies on them working.

@mathiasbynens
Copy link
Owner Author

Hrm, turns out IDNA2008 RFC 5895 is rather vague on this subject:

  4.  [IDNA2008protocol] is specified such that the protocol acts on
       the individual labels of the domain name.  If an implementation
       of this mapping is also performing the step of separation of the
       parts of a domain name into labels by using the FULL STOP
       character (U+002E), the IDEOGRAPHIC FULL STOP character (U+3002)
       can be mapped to the FULL STOP before label separation occurs.
       There are other characters that are used as "full stops" that one
       could consider mapping as label separators, but their use as such
       has not been investigated thoroughly.  This step was chosen
       because some input mechanisms do not allow the user to easily
       enter proper label separators.  Only the IDEOGRAPHIC FULL STOP
       character (U+3002) is added in this mapping because the authors
       have not fully investigated the applicability of other characters
       and the environments where they should and should not be
       considered domain name label separators.

@annevk
Copy link

annevk commented Sep 21, 2012

I think we want to have the same characters as IDNA2003, but we should prolly figure out what the browser vendors are going to implement. And then standardize that either in UTS #46 or the URL Standard.

@mathiasbynens
Copy link
Owner Author

I have a patch ready that adds support for IDNA2003 separators.

mathiasbynens added a commit that referenced this issue Sep 28, 2012
@mathiasbynens
Copy link
Owner Author

Committed: 131260b. Thanks again!

mathiasbynens added a commit that referenced this issue Jul 3, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants