Metadata syntax (colon in first line) causes confusion #16

Closed
camelpunch opened this Issue Jan 18, 2010 · 28 comments

Comments

7 participants

https://gist.github.com/280125

I believe RDiscount is correct, here.

(bhollis@ edited title to point to the correct problem)

This is a pretty serious bug that prevents usage of colons in paragraphs. Would love to use Maruku but this bug is stopping us!

jarrett commented Oct 7, 2011

Same issue here. I had to switch to RDiscount, as this bug caused whole paragraphs to disappear, confusing my users.

Collaborator

distler commented Oct 10, 2011

That's a feature, not a bug.

Put a space at the beginning of the line, and it won't be confused with "email header"-style metadata.

For that matter, prepend a "\n" to the input string parsed by Maruku, and you'll totally suppress the matching of "email header" metadata (without, I think, changing the parsing of anything else).

Interesting. I wouldn't have thought that would be a great default, but good to know Maruku can be used with a bit of tinkering!

Owner

bhollis commented Sep 16, 2012

Seems like there should at least be an option to suppress the email-header functionality (perhaps defaulting to "off").

papile commented Dec 28, 2012

This is a bad feature to have on by default. Maruku is not primarily an email parser so it really violates "the principle of least surprise" when you parse a document with a colon anywhere in the entire first paragraph and the transformation discards it. Colons are also not a special character in Markdown so I should not expect users to be escaping colons.

Maruku should have this off by default and it should be turned on either by subclassing the document parser or passing in an email option.

Owner

bhollis commented Dec 29, 2012

@papile agreed. I'd like to have it off by default.

Collaborator

distler commented Dec 29, 2012

Maruku is not primarily an email parser...

Whether on- or off-by-default, I think this comment misses the larger point.

This is not about "email". This is about document metadata.

Maruku has a couple of different syntaxes for attaching metadata to a document, and neither is part of the original Markdown syntax. Which is to say that it is easy to construct documents which will be parsed one way by the original Markdown, and a startlingly different way by Maruku. (Similar comments could, of course, be made for all of the other extensions to the traditional Markdown syntax that are supported by Maruku.)

In the case at hand, we are talking about a metadata syntax that allows you to say things like:

author: William Shakespeare
title: As You Like It

at the beginning of a document (and have the result interpreted as metadata). By contrast,


author: William Shakespeare
title: As You Like It

is interpreted as ordinary text. (I.e., it's trivial to have your text not be interpreted as metadata, if that's what you intend.)

As a general rule, my personal preference would be to better-advertise the extensions to the traditional Markdown syntax that are supported by Maruku, rather than willy-nilly disabling them. (That is, I think that the existence of these extensions is what makes Markuku attractive as a Markdown interpreter.)

Owner

bhollis commented Dec 30, 2012

@distler thanks for the additional context. I disagree that this is a generally useful feature that should be enabled by default, however. This is a tricky way to define metadata about the document that trips up people who aren't expecting it, and other tools that make use of Markdown (like Jekyll or Middleman) end up using their own, much more explicit system for adding metadata to the document. I agree that we shouldn't willy-nilly disable Maruku's extensions (even when they're unique) but instead should evaluate each one individually to decide whether it enhances everyone's experience using the library, whether it should be an option that can be enabled, or whether it should be removed entirely.

Collaborator

distler commented Dec 30, 2012

Leaving aside the business about other Maruku extensions to the Markdown syntax, I think there are two questions here.

  1. Should Maruku have a syntax for this sort of metadata?
  2. If yes, is the above syntax sub-optimal, and is there a better one?

I really dislike the idea of having the metadata syntax (or other extensions) be subject to a series of optional switches. That's really confusing for most classes of users.

If a feature is worth having, it's worth having "on" by default.

As to whether having a metadata syntax is a useful feature, I would cite two data points:

  1. Instiki uses it.
  2. The "S5" output uses it.

I am, however, fully willing to entertain the possibility that there might be a superior metadata syntax, which would be a less-confusing replacement for this one.

Owner

bhollis commented Dec 31, 2012

Jekyll/Middleman use a more "obvious" syntax called YAML frontmatter:

---
name: foo
list:
  - one
  - two
  - three
---

It simply consists of YAML, in between two sets of ---, at the top of the file.

IMO, Markdown should not have its own built-in metadata syntax at all.

papile commented Jan 1, 2013

@distler I appreciate the context of this as well but currently a colon anywhere in the first paragraph causes this non obvious behavior. Having a commonly used punctuation with such a greedy Regex is not optimal. I agree with your point that having option flags is also confusing. Having option flags would be more confusing to implementors/programmers but right now it is confusing to end users writing something in Markdown which in my opinion is more troublesome.

I also do not think that markdown should have metadata syntax either, but if it does it should be something that is not very easy to invoke unknowingly when end users are writing.

Collaborator

distler commented Jan 2, 2013

I also do not think that markdown should have metadata syntax either...

Having a common metadata syntax is way superior to not having one and, instead, having implementors (see the discussion of Jekyll/Middleman above) roll their own, incompatible metadata syntaxes on top of Maruku.

The latter both hurts interoperability and confuses end-users.

Still, some more obvious demarcation of the metadata from the body of the document would be desirable.

Owner

bhollis commented Jan 2, 2013

Other implementors have to roll their own metadata syntax (see Jekyll/Middleman) because everybody uses libraries like Tilt to allow Markdown libraries to be swapped out. Unless all Markdown libraries (and other template libraries, if you want to support more than just Markdown) support the same metadata syntax, you're going to end up inventing your own and applying it on top of everything.

Collaborator

distler commented Jan 2, 2013

... because everybody uses libraries like Tilt to allow Markdown libraries to be swapped out. Unless all Markdown libraries (and other template libraries, if you want to support more than just Markdown) support the same metadata syntax, ...

The same logic applies to all of the extensions, to the standard Markdown syntax, that are supported by Maruku.

...you're going to end up inventing your own and applying it on top of everything.

A very undesirable situation.

Owner

bhollis commented Jan 2, 2013

The metadata stuff is actually not comparable to the syntax extensions. For those, once a user has chosen their Markdown implementation, they may or may not be able to use certain syntax. But regardless of Markdown library (or even if they're using something other than Markdown) they need to be able to express metadata. That's why it's better that the framework provides its own metadata system on top of Markdown, rather than having it built into the format. Ideally, libraries like Maruku would just parse and render Markdown, and anything else can be handled by whatever application is using the library.

Collaborator

distler commented Jan 3, 2013

But regardless of Markdown library (or even if they're using something other than Markdown) they need to be able to express metadata. That's why it's better that the framework provides its own metadata system ...

Even better would be to have a common metadata syntax, so that documents (along with their metadata) are portable between platforms and (ideally) between Markdown processors.

The portability issue is exactly the same for extended Markdown syntaxes.

Anyway, y'all seem to be under the misapprehension that this particular metadata syntax was something Andrea Censi cooked up for Maruku. That is incorrect.

The same metadata syntax is used by MultiMarkdown, nesta and several others.

It probably doesn't need to be said, but MultiMarkdown is, likely, the most popular Markdown implementation, after Gruber's original. So I would vote for MMD compatibility.

papile commented Jan 3, 2013

The original point is that a colon anywhere in the first paragraph causes this behavior. Of course there is no defacto standard for markdown and markdown extended syntax but it is a poor choice. Regardless of how it came about, it is very confusing to end users. I think this debate is turning a bit pedantic.

I contend that having a very common punctuation trigger an extended markdown function that is different amongst distributions is a very bad thing. So the question is what should be done to avoid the situation of confusing end users. Typically someone writing markdown would read something like the Daring Fireball guide to markdown syntax and I would think that most users using this do not use the metadata extension.

What if in your previous comment when you wrote y'all, it removed that paragraph because ' in GitHub flavored markdown is a special character that turns that section of text into inline metadata. But it only did this in the first paragraph. You would be mystified as to why this happens, you know Markdown. You complain to GitHub saying common punctuation breaks your post as my users are doing to me. GItHub investigates and they find that the parser they use implements much more than core Markdown by default, and there is no way to shut it off. So what am I to do?

I would think that if anything is implementing some sort of extended syntax that is on by default, and it cannot be disabled, it should be much more difficult to trigger. Having something triggered like this in common parlance is a bad thing can we just agree on that? If you agree on that, what should be done? Our end users are not Markdown dialect experts, they just want to use common base Markdown

Collaborator

distler commented Jan 3, 2013

You complain to GitHub saying common punctuation breaks your post as my users are doing to me. GItHub investigates and they find that the parser they use implements much more than core Markdown by default, and there is no way to shut it off. So what am I to do?

As explained above, prepend a newline character to the user's input stream. That completely disables metadata processing, and does not in the slightest change the output serialization of other text.

That's even more trivial than setting a flag.

papile commented Jan 3, 2013

Yes I can prepend a newline and it achieves the effect of disabling it via not matching the regex anymore but to me that sounds like a hack. Also now that I am very familiar with this bug yes I could just do that, but in the end that is still unclear to future users of this library. Why can I not do something like parser: core_markdown as an option. then it is clear to implementors of this library what they are getting. I have a feeling you keep reading my posts and picking out one thing to contest but I believe you are not seeing the main point that this is something that is not clear to implementors OR end users.

Collaborator

distler commented Jan 3, 2013

Why can I not do something like parser: core_markdown as an option.

That's a perfectly reasonable suggestion: the ability to turn off all extensions to the core Markdown syntax.

Unlike the metadata parsing, the other extensions are much harder to disable. And if your users are liable to be thrown off by encountering extensions to the core Markdown syntax (as, e.g., those of MultiMarkdown), then it would be most reasonable to provide a way to turn them all off.

Collaborator

distler commented Jan 3, 2013

... but to me that sounds like a hack.

I wouldn't call it that.

The convention is that the metadata section (as in email) is separated from the body of the document by a blank line. A document which begins with a blank line is, therefore, a document with an empty metadata section.

Owner

bhollis commented Jan 4, 2013

Well, feel free to keep discussing it, but my current plan is for this to be disabled by default (opt-in via config option) for the 1.0 release.

adregan commented Jun 21, 2013

I just ran into this problem as well. A colon in the first line of a paragraph caused the whole thing to be swallowed. I chased the problem for a few hours until I found this. I'm glad this feature will be disabled in future releases.

ms-ati commented Jul 7, 2013

This is all a very interesting discussion, but bang, here I am 3 years later hunting down why my first paragraph disappears in Github Pages.

IMHO, a version with this fixed (in one way or another) should have been made a long time ago. Why are all parties involved allowing this to linger?

Owner

bhollis commented Jul 7, 2013

@ms-ati it's because this isn't anybody's real job, so there's only so much time that can be spent on it. Now that I'm back from a long trip, I'll be trying to concentrate on getting a new version out that fixes a bunch of bugs, and after that I'll start working on the non-backwards-compatible release that changes this behavior.

ms-ati commented Jul 7, 2013

@bhollis I hear you. I also posted to the relevant Jekyll issue to suggest they simply prepend content to Maruku with a newline. It seems crazy for folks to be bumping into this even one day after tracking it down -- trivial fixes seem easily possible. But instead, folks appear to simply have migrated to RedCarpet, while leaving Maruku as the Jekyll default. A bit of a minefield ;)

Last commit on this project was 10 months ago, and last comment here 2.5 years ago. I'm closing this issue. Sad.

@camelpunch camelpunch closed this Dec 1, 2015

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment