New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Only consider an identifier followed by a colon a tag if a newline follows the colon #6989
Only consider an identifier followed by a colon a tag if a newline follows the colon #6989
Conversation
Thanks for your pull request, @andralex! Bugzilla referencesYour PR doesn't reference any Bugzilla issue. If your PR contains non-trivial changes, please reference a Bugzilla issue or create a manual changelog. |
LGTM |
As I suspected that caused some problems so now the heuristic is to require identifier to start with an ASCII uppercase letter and have whitespace after the colon. |
7303775
to
bb875b7
Compare
There's a bug in std.datetime.systime's documentation exposed by this. I changed the lowercase labels in dlang/phobos#5606. |
I think requiring whitespace after the colon is completely reasonable. I'm not entirely sure about being picky about case. It precludes starting section names with things like keywords or identifiers, or digits. In any case, I don't feel strongly about it, but it seems like this change is non-trivial enough that it warrants a changelog entry or linked issue number. |
Probably a better heuristic could be found. The egregious problem (which did hurt us, see the fixed tests) is that links starting with The second issue, which may occur in flowing text, is words followed by colons that are not intended at all to be tags. Worse, reformatting paragraphs (e.g. with emacs' M-q (fill-paragraph)) makes these unintended tags spring into existence. Consider:
Now the author edits the paragraph here and there and at some point the word "returns:" is liable to fall at the beginning of a line. Bam, you get it formatted as a "Returns:" heading. The uppercase requirement takes care of that. Of course there will be situations when you do have a titlecased word followed by a colon that you really want to be part of flowing text, e.g.:
Again, "Consider:" is liable to be reflown at the start of a line and unwittingly transformed into a section heading. This kind of surprising formatting is the stuff of nightmares for authors. Any ideas for a better heuristic that doesn't break too much with the past? I'm thinking e.g. requiring a non-keyword label followed by a colon to be followed immediately by a newline. (Keywords being the predefined labels |
Also, it shouldn't be forgotten that an author can always force a heading label by using |
How about requiring an empty line (two consecutive newlines) before a section heading, unless the previous line is also a section heading? |
I don't think being picky about the case will be a huge issue as long as it has a changelog entry and the spec is updated as there are many libraries out there which try to emulate the ddoc spec, e.g. this is how libddoc deals with sections (it follows the current "spec"). The nice thing about this change is that its more restrictive and existing parsers will continue to work. For example: "libddoc":
current.type = Type.word; // ascii, digit, _
current.text = text[oldOffset .. offset];
if (parseHeader && prevIsNewline(oldOffset, text) && offset < text.length
&& text[offset] == ':')
{
current.type = Type.header;
while (!lex.empty) switch (lex.front.type)
{
case Type.header:
finishSection(); // <-- a previous section is closed on a new header tag
name = lex.front.text;
lex.popFront();
lex.stripWhitespace(); // <- libbdoc ignores whitespace - it doesn't require it
break;
case Type.newline:
lex.popFront();
if (name is null && !lex.empty && lex.front.type == Type.newline)
finishSection();
break;
default:
lex.popFront();
sliceEnd = lex.offset;
break;
}
finishSection(); (as I saw
Sounds reasonable, I prefer Btw it wouldn't be too hard to add to the
FWIW I would only worry about breaking Phobos - most other well-documented projects have switched to Ddox or other documentation engines ;-) |
Nice! For quick reference: https://dlang.org/phobos-prerelease/std_datetime_systime.html#.SysTime.fromISOExtString |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As mentioned before, I like this change.
but it seems like this change is non-trivial enough that it warrants a changelog entry or linked issue number.
+1 (and the spec).
That's nice but a bit involved. I suggest we go with this simpler rule until we accumulate more experience. |
See #7043, which changed how URLs are treated outside of a macro. Not sure if URLs inside macros will still break the macro and be treated as a section heading. In either case, this needs a rebase, tests, and a changelog entry. |
bb875b7
to
9da687d
Compare
Found one more bug exposed and fixed by this change. In std.datetime.systime:
The old tag detector would think "input:" is a section heading. |
@CyberShadow I experimented with the "must be preceded by an empty line and found it broke a bunch of code. See e.g. core.atomic:
We'd need to change it to this to make it work:
A bunch of code would be affected. |
1459b21
to
4252f92
Compare
I've piggybacked some tests to the tests afferent to #7043, please don't rake me over coals for this. |
Yes, that's completely expected. This is why my suggestion also has the condition that the previous line may also be another one-line section. |
@CyberShadow that may work indeed from what I see in the diffs; let's leave it for another day. |
Please remove id.h. |
@@ -4,8 +4,6 @@ | |||
/* | |||
TEST_OUTPUT: | |||
--- | |||
compilable/ddoc4899.d(16): Warning: Ddoc: Stray '('. This may cause incorrect Ddoc output. Use $(LPAREN) instead for unpaired left parentheses. | |||
compilable/ddoc4899.d(16): Warning: Ddoc: Stray ')'. This may cause incorrect Ddoc output. Use $(RPAREN) instead for unpaired right parentheses. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same problem as #7043 (comment).
I'm surprised this change isn't causing a merge conflict!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So what's the action item here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I provided some suggestions at the link above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The change here is that previously foo:)
was interpreted as a tag. But now, given that there's no whitespace between the colon and the close paren, the text is not interpreted as a tag anymore, the closing paren becomes legit, and the warning text disappears altogether. What am I missing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not to repeat myself again, but, see the link above - specifically, you may want to read https://issues.dlang.org/show_bug.cgi?id=4899.
@@ -1,484 +1,506 @@ | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please either do the newline change in its own commit, or simply commit this file with the same line endings it uses currently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wait I just copied the produced file over the other, so technically if I put them in different commits I create artificial steps that don't exist. So what difference does this line make if it's in a separate commit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Err, sorry, this is not about any particular new line, but about the newline characters in the file. If you look at the diff, you'll see that apparently every single line in the file changed - a symptom of the line endings changing (from CR/LF to LF or the other way).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So what difference does this line make if it's in a separate commit?
The question still makes sense despite the above, so to answer that - because it makes it actually possible to see only the lines that actually changed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I understand, thanks. Yeah that's a big and surprising change, will try to keep endlines as they were.
MoreTag: | ||
yes the above is also a tag | ||
*/ | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ew took me a bit of figuring out. Looks good now?
Not quite. I'm not sure what the change in 640a74d is about, but it doesn't look like it's related to newlines. ddoc4899 is still useless. |
Oy Atom messed up again. Guess I'll get back to my trusty emacs! |
640a74d
to
ae5ce88
Compare
Oy that was unpleasant. Still have a difficult time figuring where those Windows line endings came about - I'm on Linux! @CyberShadow if you're around slack let's huddle there. |
About ddoc4899: it now verifies that a |
13179d9
to
42bf6fa
Compare
changelog/ddoc-tag-deduction.dd
Outdated
|
||
To distinguish between textual use and the use as a tag, the new rule is: a tag | ||
is detected if (a) it is an identifier at the beginning of a line; (b) it is | ||
followed by a colon and a whitespace; and (b) it starts with an uppercase |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be (c)
here :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
42bf6fa
to
d1a88f0
Compare
This solves a number of annoying problems when ddoc confuses the use of ":" in running text with the intent to insert a tag. The matter can happen after text is reflown with no intent or knowledge that a tag will be generated.
Among the most annoying: if a link
http://...
happens to fall at the start of a line thehttp
is converted to a tag.I suspect this will expose a few cases when people do want to insert tags but don't insert a newline.