Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Only consider an identifier followed by a colon a tag if a newline follows the colon #6989

Merged

Conversation

andralex
Copy link
Member

@andralex andralex commented Jul 12, 2017

This solves a number of annoying problems when ddoc confuses the use of ":" in running text with the intent to insert a tag. The matter can happen after text is reflown with no intent or knowledge that a tag will be generated.

Among the most annoying: if a link http://... happens to fall at the start of a line the http is converted to a tag.

I suspect this will expose a few cases when people do want to insert tags but don't insert a newline.

@dlang-bot
Copy link
Contributor

Thanks for your pull request, @andralex!

Bugzilla references

Your PR doesn't reference any Bugzilla issue.

If your PR contains non-trivial changes, please reference a Bugzilla issue or create a manual changelog.

@UplinkCoder
Copy link
Member

LGTM

@andralex
Copy link
Member Author

As I suspected that caused some problems so now the heuristic is to require identifier to start with an ASCII uppercase letter and have whitespace after the colon.

@andralex
Copy link
Member Author

andralex commented Jul 12, 2017

There's a bug in std.datetime.systime's documentation exposed by this. I changed the lowercase labels in dlang/phobos#5606.

@CyberShadow
Copy link
Member

I think requiring whitespace after the colon is completely reasonable.

I'm not entirely sure about being picky about case. It precludes starting section names with things like keywords or identifiers, or digits.

In any case, I don't feel strongly about it, but it seems like this change is non-trivial enough that it warrants a changelog entry or linked issue number.

@andralex
Copy link
Member Author

andralex commented Jul 12, 2017

Probably a better heuristic could be found. The egregious problem (which did hurt us, see the fixed tests) is that links starting with http: were transformed into tags. The space fixes that.

The second issue, which may occur in flowing text, is words followed by colons that are not intended at all to be tags. Worse, reformatting paragraphs (e.g. with emacs' M-q (fill-paragraph)) makes these unintended tags spring into existence. Consider:

Lorem ipsum dolor sit amet, nam risus, sodales orci tempus sociis vel 
lacinia, adipiscing vestibulum viverra ipsum, consequat vestibulum dolor
quam in, ante accumsan amet tristique est sit. Depending on the input,
the function frobnicate returns: (a) junk, (b) meh, or (c) something awesome.

Now the author edits the paragraph here and there and at some point the word "returns:" is liable to fall at the beginning of a line. Bam, you get it formatted as a "Returns:" heading. The uppercase requirement takes care of that.

Of course there will be situations when you do have a titlecased word followed by a colon that you really want to be part of flowing text, e.g.:

Depending on the input, the function frobnicate returns: (a) junk,
(b) meh, or (c) something awesome. Unconvinced? Consider: some
people still edit code with Notepad.

Again, "Consider:" is liable to be reflown at the start of a line and unwittingly transformed into a section heading.

This kind of surprising formatting is the stuff of nightmares for authors. Any ideas for a better heuristic that doesn't break too much with the past? I'm thinking e.g. requiring a non-keyword label followed by a colon to be followed immediately by a newline. (Keywords being the predefined labels Returns:, Authors:, Copyright: etc). Thoughts?

@andralex
Copy link
Member Author

Also, it shouldn't be forgotten that an author can always force a heading label by using DDOC_SECTION or DDOC_SECTION_H manually. But there's no obvious escape from automated formatting of words followed by colons.

@CyberShadow
Copy link
Member

Any ideas for a better heuristic that doesn't break too much with the past?

How about requiring an empty line (two consecutive newlines) before a section heading, unless the previous line is also a section heading?

@wilzbach
Copy link
Member

wilzbach commented Jul 12, 2017

I'm not entirely sure about being picky about case. It precludes starting section names with things like keywords or identifiers, or digits.

I don't think being picky about the case will be a huge issue as long as it has a changelog entry and the spec is updated as there are many libraries out there which try to emulate the ddoc spec, e.g. this is how libddoc deals with sections (it follows the current "spec"). The nice thing about this change is that its more restrictive and existing parsers will continue to work. For example: "libddoc":

  1. In its custom Lexer header is a special token - parsed as follows:
current.type = Type.word; // ascii, digit, _
current.text = text[oldOffset .. offset];
if (parseHeader && prevIsNewline(oldOffset, text) && offset < text.length
&& text[offset] == ':')
{
  current.type = Type.header;
  1. With the token stream, libbdoc parses everything between two header tags as one section::
while (!lex.empty) switch (lex.front.type)
{
case Type.header:
	finishSection(); // <-- a previous section is closed on a new header tag
	name = lex.front.text;
	lex.popFront();
	lex.stripWhitespace(); // <- libbdoc ignores whitespace - it doesn't require it
	break;
case Type.newline:
	lex.popFront();
	if (name is null && !lex.empty && lex.front.type == Type.newline)
		finishSection();
	break;
default:
	lex.popFront();
	sliceEnd = lex.offset;
	break;
}
finishSection();

(as I saw doc.d I had to show that this parsing can be done much more legible)

I'm thinking e.g. requiring a non-keyword label followed by a colon to be followed immediately by a newline. (Keywords being the predefined labels Returns:, Authors:, Copyright: etc).

Sounds reasonable, I prefer <keyword>: <newline> style anyways.
(edited) Also they standard sections are already defined in the spec.

Btw it wouldn't be too hard to add to the properly_documented_public_functions Dscanner check a condition that enforces "standard sections" in Phobos - in other words a simple comments.sections.all!(s => s.isStandard) here would do.

Any ideas for a better heuristic that doesn't break too much with the past?

FWIW I would only worry about breaking Phobos - most other well-documented projects have switched to Ddox or other documentation engines ;-)

@wilzbach
Copy link
Member

There's a bug in std.datetime.systime's documentation exposed by this. I changed the lowercase labels in dlang/phobos#5606.

Nice! For quick reference:

https://dlang.org/phobos-prerelease/std_datetime_systime.html#.SysTime.fromISOExtString

image

Copy link
Member

@wilzbach wilzbach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned before, I like this change.

but it seems like this change is non-trivial enough that it warrants a changelog entry or linked issue number.

+1 (and the spec).

@andralex
Copy link
Member Author

andralex commented Aug 3, 2017

How about requiring an empty line (two consecutive newlines) before a section heading, unless the previous line is also a section heading?

That's nice but a bit involved. I suggest we go with this simpler rule until we accumulate more experience.

@CyberShadow
Copy link
Member

CyberShadow commented Aug 3, 2017

Among the most annoying: if a link http://... happens to fall at the start of a line the http is converted to a tag.

See #7043, which changed how URLs are treated outside of a macro. Not sure if URLs inside macros will still break the macro and be treated as a section heading.

In either case, this needs a rebase, tests, and a changelog entry.

@andralex andralex force-pushed the ddoc-tags-must-be-followed-by-whitespace branch from bb875b7 to 9da687d Compare August 3, 2017 18:59
@andralex
Copy link
Member Author

andralex commented Aug 3, 2017

Found one more bug exposed and fixed by this change. In std.datetime.systime:

Otherwise, the resulting `chunks` will be input ranges consuming the same
input: iterating over ...

The old tag detector would think "input:" is a section heading.

@andralex
Copy link
Member Author

andralex commented Aug 3, 2017

@CyberShadow I experimented with the "must be preceded by an empty line and found it broke a bunch of code. See e.g. core.atomic:

 * Copyright: Copyright Sean Kelly 2005 - 2016.
 * License:   $(LINK2 http://www.boost.org/LICENSE_1_0.txt, Boost License 1.0)
 * Authors:   Sean Kelly, Alex Rønne Petersen
 * Source:    $(DRUNTIMESRC core/_atomic.d)

We'd need to change it to this to make it work:

 * Copyright: Copyright Sean Kelly 2005 - 2016.
 *
 * License:   $(LINK2 http://www.boost.org/LICENSE_1_0.txt, Boost License 1.0)
 *
 * Authors:   Sean Kelly, Alex Rønne Petersen
 *
 * Source:    $(DRUNTIMESRC core/_atomic.d)

A bunch of code would be affected.

@andralex andralex force-pushed the ddoc-tags-must-be-followed-by-whitespace branch 2 times, most recently from 1459b21 to 4252f92 Compare August 3, 2017 19:25
@andralex
Copy link
Member Author

andralex commented Aug 3, 2017

I've piggybacked some tests to the tests afferent to #7043, please don't rake me over coals for this.

@CyberShadow
Copy link
Member

I experimented with the "must be preceded by an empty line and found it broke a bunch of code.

Yes, that's completely expected. This is why my suggestion also has the condition that the previous line may also be another one-line section.

@andralex
Copy link
Member Author

andralex commented Aug 3, 2017

@CyberShadow that may work indeed from what I see in the diffs; let's leave it for another day.

@CyberShadow
Copy link
Member

Please remove id.h.

@@ -4,8 +4,6 @@
/*
TEST_OUTPUT:
---
compilable/ddoc4899.d(16): Warning: Ddoc: Stray '('. This may cause incorrect Ddoc output. Use $(LPAREN) instead for unpaired left parentheses.
compilable/ddoc4899.d(16): Warning: Ddoc: Stray ')'. This may cause incorrect Ddoc output. Use $(RPAREN) instead for unpaired right parentheses.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same problem as #7043 (comment).

I'm surprised this change isn't causing a merge conflict!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So what's the action item here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I provided some suggestions at the link above.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change here is that previously foo:) was interpreted as a tag. But now, given that there's no whitespace between the colon and the close paren, the text is not interpreted as a tag anymore, the closing paren becomes legit, and the warning text disappears altogether. What am I missing?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not to repeat myself again, but, see the link above - specifically, you may want to read https://issues.dlang.org/show_bug.cgi?id=4899.

@@ -1,484 +1,506 @@

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please either do the newline change in its own commit, or simply commit this file with the same line endings it uses currently.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait I just copied the produced file over the other, so technically if I put them in different commits I create artificial steps that don't exist. So what difference does this line make if it's in a separate commit?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Err, sorry, this is not about any particular new line, but about the newline characters in the file. If you look at the diff, you'll see that apparently every single line in the file changed - a symptom of the line endings changing (from CR/LF to LF or the other way).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So what difference does this line make if it's in a separate commit?

The question still makes sense despite the above, so to answer that - because it makes it actually possible to see only the lines that actually changed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I understand, thanks. Yeah that's a big and surprising change, will try to keep endlines as they were.

MoreTag:
yes the above is also a tag
*/

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ew took me a bit of figuring out. Looks good now?

@CyberShadow
Copy link
Member

Ew took me a bit of figuring out. Looks good now?

Not quite. I'm not sure what the change in 640a74d is about, but it doesn't look like it's related to newlines.

ddoc4899 is still useless.

@andralex
Copy link
Member Author

andralex commented Aug 4, 2017

Not quite. I'm not sure what the change in 640a74d is about, but it doesn't look like it's related to newlines.

Oy Atom messed up again. Guess I'll get back to my trusty emacs!

@andralex andralex force-pushed the ddoc-tags-must-be-followed-by-whitespace branch from 640a74d to ae5ce88 Compare August 4, 2017 16:44
@andralex
Copy link
Member Author

andralex commented Aug 4, 2017

Oy that was unpleasant. Still have a difficult time figuring where those Windows line endings came about - I'm on Linux! @CyberShadow if you're around slack let's huddle there.

@andralex
Copy link
Member Author

andralex commented Aug 4, 2017

About ddoc4899: it now verifies that a foo:) is not detected as a tag and the parens are closed properly. No?

@andralex andralex force-pushed the ddoc-tags-must-be-followed-by-whitespace branch from 13179d9 to 42bf6fa Compare August 4, 2017 17:05

To distinguish between textual use and the use as a tag, the new rule is: a tag
is detected if (a) it is an identifier at the beginning of a line; (b) it is
followed by a colon and a whitespace; and (b) it starts with an uppercase
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be (c) here :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@andralex andralex force-pushed the ddoc-tags-must-be-followed-by-whitespace branch from 42bf6fa to d1a88f0 Compare August 4, 2017 17:08
@dlang-bot dlang-bot merged commit adba6f9 into dlang:master Aug 4, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants