New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hyperlinks with double underscores don't render correctly #625

Closed
neontapir opened this Issue Sep 11, 2013 · 8 comments

Comments

Projects
None yet
4 participants
@neontapir

neontapir commented Sep 11, 2013

I found while including a hyperlink with double underscores in an AsciiDoc file that they are not rendered correctly. To reproduce, I created a document named link.asc whose sole line is:

Link: http://www.migrations.fr/la_guerre__de__sept__ans.htm

When I render it with Asciidoctor using

$ asciidoctor link.asc

...the line in question becomes:

Link: <a href="http://www.migrations.fr/la_guerre">http://www.migrations.fr/la_guerre</a><em>de</em>sept__ans.htm

Notice that the href tag is incorrect and the double underscores have been interpreted as italics. I expect the line to render as:

Link: <a href="http://www.migrations.fr/la_guerre__de__sept__ans.htm">http://www.migrations.fr/la_guerre__de__sept__ans.htm</a>

AsciiDoc does not render this line correctly either, by the way. I originally posted this question in the Nabble discussion forum.

@mojavelinux

This comment has been minimized.

Member

mojavelinux commented Sep 11, 2013

This is one of the many reasons I'm researching peg-based grammars for parsing inline markup. The problem we're facing here is that's a limit to how much context a regular expression can see...and it messes up when the markup looks perfectly valid from a pattern-matching perspective. Most of the lightweight markup languages out there have this problem...until they make the move to a grammer-based parser.

I'm really looking forward to this improvement in Asciidoctor because it's going to make inline-formatting much more predictable...and we can get access to it in the AST.

Fortunately, AsciiDoc provides many different ways to control substitution to work around issues like this one. I'll present the solutions in the order that I recommend using them (as not all solutions are good practice).

Solution A ::
The simplest and easiest way to get a link to behave itself is to stick it into an attribute.

:link-with-underscores: http://www.migrations.fr/la_guerre__de__sept__ans.htm

This URL has repeating underscores {link-with-underscores} but AsciiDoc won't process them.

This works because quotes are substituted before attributes, so the URL remains "hidden" while the text in the line is being formatted (strong, emphasis, monospace, etc).
Solution B ::

Another way to solve formatting glitches is to explicitly specify the formatting you want to have applied to a span of text using the inline pass macro. If you want to display a URL, and have it be completely preserved, you can put it inside a pass macro and enable only macros (which is what substitutes links).

This URL has repeating underscores pass:macros[http://www.migrations.fr/la_guerre__de__sept__ans.htm] but AsciiDoc won't process them.

This works because the pass macro removes the content from the line of text while substitutions are performed, applies the explicit substitutions to that text while it's on the sidelines, then restores it to the original location.

Solution C and D ::

The final two solution I'll mention are related, but I don't recommend using them. It's possible to escape individual characters or a range of characters inside the URL.

You can isolate the part of the URL causing problems using the double dollar escape:

This URL has repeating underscores http://www.migrations.fr/$$la_guerre__de__sept__ans$$.htm but AsciiDoc won't process them.

Like the pass macro, it pulls the text out during substitution, but it doesn't offer a way to apply substitutions to that text. You tend to use double dollar when you want to prevent the processor from detecting a URL, like:

This URL won't be recognized by the processor $$http://www.migrations.fr/la_guerre__de__sept__ans.htm$$

It's also possible to escape the underscores:

This URL won't be recognized by the processor http://www.migrations.fr/la\_guerre__de__sept__ans.htm

However, escaping is not consistent between AsciiDoc and Asciidoctor (mostly because Ruby 1.8.7, which we still support, doesn't have look behind capabilities in the regex engine).

I think you'll be the most happy with Solution A. It's best practice to pull all your links into attributes anyway, and by doing so you get the bonus that they aren't mangled.

@neontapir

This comment has been minimized.

neontapir commented Sep 11, 2013

Thank you so much for your thorough reply. I'm learning a lot just by
asking questions. :)

I'm intrigued by your last statement about pulling links into attributes.
Where the document do you recommend defining the links? Possibilities I
considered were at the end of the section where they are used and at the
end of the document. It's worth noting that in my use case, they are being
put in footnotes, in case that matters.

-- Chuck

On Wed, Sep 11, 2013 at 3:46 AM, Dan Allen notifications@github.com wrote:

This is one of the many reasons I'm researching peg-based grammars for
parsing inline markup. The problem we're facing here is that's a limit to
how much context a regular expression can see...and it messes up when the
markup looks perfectly valid from a pattern-matching perspective. Most of
the lightweight markup languages out there have this problem...until they
make the move to a grammer-based parser.

I'm really looking forward to this improvement in Asciidoctor because it's
going to make inline-formatting much more predictable...and we can get
access to it in the AST.

Fortunately, AsciiDoc provides many different ways to control substitution
to work around issues like this one. I'll present the solutions in the
order that I recommend using them (as not all solutions are good practice).

Solution A ::
The simplest and easiest way to get a link to behave itself is to stick it
into an attribute.

:link-with-underscores: http://www.migrations.fr/la_guerre__de__sept__ans.htm

This URL has repeating underscores {link-with-underscores} but AsciiDoc won't process them.

This works because quotes are substituted before attributes, so the URL
remains "hidden" while the text in the line is being formatted (strong,
emphasis, monospace, etc).
Solution B ::

Another way to solve formatting glitches is to explicitly specify the
formatting you want to have applied to a span of text using the inline pass
macro. If you want to display a URL, and have it be completely preserved,
you can put it inside a pass macro and enable only macros (which is what
substitutes links).

This URL has repeating underscores pass:macros[http://www.migrations.fr/la_guerre__de__sept__ans.htm] but AsciiDoc won't process them.

This works because the pass macro removes the content from the line of
text while substitutions are performed, applies the explicit substitutions
to that text while it's on the sidelines, then restores it to the original
location.

Solution C and D ::

The final two solution I'll mention are related, but I don't recommend
using them. It's possible to escape individual characters or a range of
characters inside the URL.

You can isolate the part of the URL causing problems using the double
dollar escape:

This URL has repeating underscores http://www.migrations.fr/$$la_guerre__de__sept__ans$$.htm but AsciiDoc won't process them.

Like the pass macro, it pulls the text out during substitution, but it
doesn't offer a way to apply substitutions to that text. You tend to use
double dollar when you want to prevent the processor from detecting a URL,
like:

This URL won't be recognized by the processor $$http://www.migrations.fr/la_guerre__de__sept__ans.htm$$

It's also possible to escape the underscores:

This URL won't be recognized by the processor http://www.migrations.fr/la\_guerre__de__sept__ans.htm

However, escaping is not consistent between AsciiDoc and Asciidoctor
(mostly because Ruby 1.8.7, which we still support, doesn't have look
behind capabilities in the regex engine).

I think you'll be the most happy with Solution A. It's best practice to
pull all your links into attributes anyway, and by doing so you get the
bonus that they aren't mangled.


Reply to this email directly or view it on GitHubhttps://github.com//issues/625#issuecomment-24224899
.

@mojavelinux

This comment has been minimized.

Member

mojavelinux commented Sep 11, 2013

On Wed, Sep 11, 2013 at 12:59 PM, Chuck Durfee notifications@github.comwrote:

Thank you so much for your thorough reply. I'm learning a lot just by
asking questions. :)

That's what we like to hear. We learn from each other, as questions almost
always lead to documentation improvements, if not code and design
improvements!

I'm intrigued by your last statement about pulling links into attributes.
Where the document do you recommend defining the links? Possibilities I
considered were at the end of the section where they are used and at the
end of the document. It's worth noting that in my use case, they are being
put in footnotes, in case that matters.

We like to put them under the document title. That's the convention we're
going to recommend in the user manual. Here are two examples I ported to
AsciiDoc from the spring.io site that demonstrate perfectly how to use
attributes for links:

https://gist.github.com/mojavelinux/6519908/raw/1cf543e3078d212de2f7362e45d3e006cfdcca2e/gs-messaging-jms.adoc
https://gist.github.com/mojavelinux/6520414/raw/23557a81a101d36024dc586f3ed01b15c4f6177f/spring-gs-accessing-twitter.adoc

Notice how attributes can build on other attributes.

It's also possible to define the links for a section underneath the section
title. The only requirement is that attributes must be defined before
they are referenced.

In Asciidoctor 1.5.0 (or later) I'm planning to allow document attributes
(available anywhere) to be defined at the bottom of the document. I know
some people prefer to put the link definitions at the bottom of the
document so they don't clutter up the header.

...of course, you can always put the links in an include file, then just
reference it from the header like so:

= Document Title
Author Name
include::asciidoc-settings[]
include::link-definitions[]

content

There are lots of options, but as with all things, pick the way that works
best for you and go with that approach :)

-Dan

Dan Allen | http://google.com/profiles/dan.j.allen

@neontapir

This comment has been minimized.

neontapir commented Sep 12, 2013

I went with option A and it worked great! Thanks again!

@mojavelinux

This comment has been minimized.

Member

mojavelinux commented Sep 14, 2013

Excellent!

@ghost ghost assigned mojavelinux Sep 14, 2013

@imet

This comment has been minimized.

imet commented Apr 25, 2015

Thank you very much @mojavelinux ,

Your solution double dollar escape also worked for http://www.google.com/~example@gmail.com

For asciidoc, we need to escape it like this:

http://www.google.com/$$~example@gmail.com$$ then I can get one normal url link

@mojavelinux

This comment has been minimized.

Member

mojavelinux commented Apr 28, 2015

Great to hear! Note that these techniques are now documented in the user manual. See http://asciidoctor.org/docs/user-manual/#complex-urls.

@richnsoos

This comment has been minimized.

richnsoos commented Apr 21, 2017

Thanks, @neontapir and @mojavelinux !! Ran into a this very issue today with a link that includes underscores and Solution A works great.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment