Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Asciidoctor not recognizing valid email #3154

Closed
CosmicToast opened this issue Mar 17, 2019 · 13 comments
Closed

Asciidoctor not recognizing valid email #3154

CosmicToast opened this issue Mar 17, 2019 · 13 comments
Assignees
Milestone

Comments

@CosmicToast
Copy link

My developer email is toast@toastin.space.
This is not recognized by asciidoctor as an email (for automatic link generation purposes).
It also isn't showing up in the atom.io highlighter.

I end up needing to explicitly specify link:mailto:toast@toastin.space[toast@toastin.space] to get the same effect.

I suspect (but have not verified) that this is because asciidoctor keeps a list of "valid" TLDs that is incomplete (based on a different similar email address working just fine).

@ggrossetie
Copy link
Member

Actually, your tld is one character too long 😉

InlineEmailRx = %r(([\\>:/])?#{CG_WORD}(?:&|[#{CC_WORD}.%+-])*@#{CG_ALNUM}[#{CC_ALNUM}.-]*\.#{CG_ALPHA}{2,4}\b)

I believe it will be way slower to check the tld against https://publicsuffix.org/list/, right ?
In my opinion, it's a fair trade-off since you can use an explicit syntax if your email address is a bit "unusual".

We could, of course, just change the regular expression to accept tld between 2 and 5 characters but we have to be careful because we could fall on the other side of the fence (ie. recognizing sentence as email).

@ggrossetie
Copy link
Member

For reference quite a few "new" tld are longer than 4 letters:

  • photography
  • pictures
  • pizza
  • ninja
  • plumbing
  • productions
  • progressive
  • democrat
  • republican
  • restaurant
  • ...

@CosmicToast
Copy link
Author

It's perfectly valid to even have an ip address as the "domain", per the specification.
Further, https://publicsuffix.org/list/ isn't even complete, since there are lots of "alternative" DNS TLDs that exist (such as namecoin and co.).
I understand having the tradeoff, but I don't think stray @s appear in the middle of a sentence with no whitespace on either side very often.

I would simply remove any post @ restrictions except requring at least one dot-atom (as per the spec).

@mojavelinux
Copy link
Member

mojavelinux commented Mar 17, 2019

It's not Asciidoctor's intent to validate e-mail addresses. That would be a fool's errand. It's Asciidoctor's attempt to do a best effort to detect an implicit e-mail address under reasonable circumstances without performing unwanted replacements. Our job is to find that balance.

I would simply remove any post @ restrictions except requring at least one dot-atom (as per the spec).

That would start matching all kinds of things we don't want to match and break documents. When the e-mail address is too obscure for a reasonable match, you always have the option of using the formal macro (i.e., mailto:address[text]). I don't think using that prefix is too much of an inconvenience.

I don't think stray @s appear in the middle of a sentence with no whitespace on either side very often.

You would be surprised.

Btw, the link prefix and explicit text are not required. The following will work:

mailto:toast@toastin.space[]

@mojavelinux mojavelinux added this to the discussion milestone Mar 17, 2019
@mojavelinux
Copy link
Member

If we did change the regexp, I think 5 characters would be reasonable. I don't really want to go longer than that.

@mojavelinux
Copy link
Member

@graphitefriction I think we should document how Asciidoctor determines which e-mail address to match and the slightly longer workaround (from the end of my comment above) when the e-mail address doesn't match automatically.

@CosmicToast
Copy link
Author

I don't think stray @s appear in the middle of a sentence with no white-space on either side very often.

You would be surprised.

If that's the case, then I certainly understand the hesitance.
In that case, I think it's reasonable to try and determine various still-common TLDs.

The abovementioned .ninja, alongside .space are surprisingly common in practice.
While things like .technology and .republican certainly do exist, they seem to be very purpose-specific (in that you don't see them being used outside of their exact specified purpose), so perhaps 5 would be good.
For a proper conclusion someone would need to go over the list and compile some data, but I'm a tad bit too lazy (and busy) for that :)

The following will work: mailto:toast@toastin.space[]

Oh! I didn't realize (or forgot) that. Thanks a lot :D

@mojavelinux
Copy link
Member

I really don't want to get into the business of cross referencing valid TLDs. That is way overreaching, IMO. Again, we will make a best case based on a likely pattern (which I'm fine with extending to a 5-character TLD) and beyond that, the formal macro is required.

@mojavelinux mojavelinux modified the milestones: discussion, v2.0.0 Mar 18, 2019
@mojavelinux mojavelinux self-assigned this Mar 18, 2019
mojavelinux added a commit to mojavelinux/asciidoctor that referenced this issue Mar 18, 2019
@mojavelinux
Copy link
Member

According to http://data.iana.org/TLD/tlds-alpha-by-domain.txt, here are the counts for each length (minus some obscure ones we wouldn't match anyway):

    248 2
    222 3
    216 4
    169 6
    165 5
    138 7
     91 8
     50 9
     37 10
     23 11
      7 14
      5 13
      4 12
      3 15
      2 18

So there are actually more 6-character TLDs than 5. Hmmm.

@mojavelinux
Copy link
Member

mojavelinux commented Mar 18, 2019

(Other good examples of 5-character TLDs are .local and .cloud)

@mojavelinux
Copy link
Member

I can't find any 6-character TLDs in use, even when scanning git commits (except for ones that are clearly fake).

@mojavelinux
Copy link
Member

On the other hand, I determined that even a TLD length up to 7 doesn't catch stuff we don't want to catch...so that at least is not a risk.

mojavelinux added a commit to mojavelinux/asciidoctor that referenced this issue Mar 18, 2019
mojavelinux added a commit to mojavelinux/asciidoctor that referenced this issue Mar 20, 2019
@mojavelinux
Copy link
Member

Let's proceed with 5 characters for now, and we'll extend when the use cases are presented.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants