Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue 9731: implement $(DDOC_PARAGRAPH) #5315

Closed
wants to merge 5 commits into from

Conversation

andralex
Copy link
Member

This feature inserts $(DDOC_PARAGRAPH_SEPARATOR) whenever encountering two newlines or more in the source text.

After some research I figured that this is the right behavior instead of wrapping each paragraph in a DDOC_PARAGRAPH macro. That approach does not work well or at all whenever nesting of macros is present, with two newlines in between. In contrast, a paragraph break is always insertable.

One interesting thing is the <P> tag is somewhat special in HTML, due to it being a "mistake" of sorts (it should be a self-closing tag, and it seems to not be due to historical accident). Anyhow HTML does not require </p> so we should be fine with paragraph separators as introduced by this feature.

Alternatively, instead of <p> we should insert a <div class="DDOC_PARAGRAPH_SEPARATOR></div> and let the css choose. For now the macro is defined to nothing and is backwards compatible.

@andralex
Copy link
Member Author

#4338 is the previous take on this.

@andralex
Copy link
Member Author

@adamdruppe
Copy link
Contributor

The <p> tag in html isn't a mistake, it is a directive inspired from SGML which did not require all tags to be there; it could implicitly insert them. As such, it is by design and perfectly valid to use plain <p> and wold not be right to use <p />.

I prefer the use of the closing tag, it makes parsing easier (a fair chunk of code in my dom.d could be removed if all p's were closed consistently..) and it is required in xhtml but you're right that there's no need for it in html.... but still, the other PR kinda looks better to me. I gotta look closer.

@adamdruppe
Copy link
Contributor

PS I actually came close to writing this myself today. It'd be really nice to have, even if imperfect.

@adamdruppe
Copy link
Contributor

I think the other PR you referenced has better behavior and left some comments there...

@CyberShadow
Copy link
Member

One interesting thing is the <P> tag is somewhat special in HTML, due to it being a "mistake" of sorts (it should be a self-closing tag, and it seems to not be due to historical accident). Anyhow HTML does not require </p> so we should be fine with paragraph separators as introduced by this feature.

Although closing some tags may be "optional" in some contexts, it makes sensibly validating the HTML much more difficult. I've spent some effort fixing this recently (see my HTML fixes PRs), please don't undo my work.

@nomad-software
Copy link

Anyhow HTML does not require </p> so we should be fine with paragraph separators as introduced by this feature.

From the specification

A p element’s end tag may be omitted if the p element is immediately followed by an address, article, aside, blockquote, dir, div, dl, fieldset, footer, form, h1, h2, h3, h4, h5, h6, header, hr, menu, nav, ol, p, pre, section, table, or ul element, or if there is no more content in the parent element and the parent element is not an a element.

@andralex
Copy link
Member Author

Thanks for reviewing. As I feared this is getting sidetracked into a discussion about <p>. We don't need to use it for paragraph separation - a DIV should be just fine.

@andralex
Copy link
Member Author

BTW could please anyone double check the code? I'm not sure whether I've left some corner cases uncovered.

FWIW: to the best of my knowledge this PR mimics what TeX does: the processor replaces runs of whitespace containing two or more newlines into one \par. Paragraph start is not marked, only paragraph ending.

}
// We got to a non-whitespace, time to insert the paragraph
// break.
static __gshared immutable ps =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any need for all those qualifiers in new D code? You're just referring to a string literal, use ordinary immutable ps = "it";.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

@andralex
Copy link
Member Author

@adamdruppe: thanks, read your comments on #4338. I think #5315 is the right approach (just insert a paragraph separator). With #4338, I'm unclear on what happens e.g. if there's a macro call that has two newlines inside of it. Consider:

$(MYMACRO some text

some more text

$(ANOTHER_MACRO

and some more text

)

)

I wonder how this whole nesting could be properly wrapped in paragraphs. With the terminator there's no such issue - insert a paragraph separator wherever there's two or more newlines.

@adamdruppe
Copy link
Contributor

As I feared this is getting sidetracked into a discussion about <p>.

We really should understand what HTML is when writing code to generate it. I'll grant that ddoc is going to be a leaky abstraction at some level, but let's try not to be gratuitously wrong just because we're too lazy to think through the semantics.

We don't need to use it for paragraph separation - a DIV should be just fine.

That's just totally wrong. <p> works and has a standard way to parse it (if a bit messy to implement if not closed) while semantically meaning paragraph break. It is a bad idea to invent new tags in html that mean the same thing as existing tags.

What we're trying to do is to achieve a more perfect mapping to native, semantic paragraphs.

@CyberShadow
Copy link
Member

Instead of a macro that is inserted in place of blank lines, how about a macro that wraps paragraphs (and is defined as $0 by default)? That solves the </p> problem, because it can be redefined as <p>$0</p>.

Edit: Oh, that's what #4338 does. I think that's the better direction to pursue then.

@CyberShadow
Copy link
Member

I wonder how this whole nesting could be properly wrapped in paragraphs.

This problem applies to this PR as well, because we'll just be starting a paragraph in the middle of ANOTHER_MACRO, i.e.: $(ANOTHER_MACRO foo $(P bar))

Some options:

  • restrict the macros to the top level - wouldn't work in practice as it prevents usage of paragraphs in tables and lists
  • require explicit usage of $(DDOC_PARAGRAPH) inside nested macros containing blank lines, but only if it has been redefined to preserve backwards compatibility
  • instantiate them automatically at the start of the contents of each macro block that contains blank lines (so $(ANOTHER_MACRO foo \n\n bar) becomes $(ANOTHER_MACRO $(P foo) $(P bar))

Option 3 seems most practical... however, then it has the issue that given a list, the list items which contain 1 paragraph will have no padding-bottom, but ones with 2 or more paragraphs will, which will look inconsistent.

So... #5315 (this PR) does have the advantage in consistency and ease of use, however we can't use it with <p>. We need a paragraph separator tag (kinda like <br> or <hr>), which <p> isn't. #4338 would be better if we can define some way for single-paragraph macro blocks to be wrapped into DDOC_PARAGRAPH.

@andralex
Copy link
Member Author

@CyberShadow it's totally fine to have a paragraph break inside a macro expansion. @adamdruppe I think you're making this quite a bit more than it really is.

@CyberShadow
Copy link
Member

@CyberShadow it's totally fine to have a paragraph break inside a macro expansion.

Why do you think so? What if <p> has a margin-left? It will look crooked.

<p> is not a paragraph break. It is not like <br>, it has to wrap the text in the paragraph.

@adamdruppe
Copy link
Contributor

I'm not sure just what the other PR's code would do, but given:

$(MYMACRO some text

some more text

$(ANOTHER_MACRO

and some more text

)

)

The output I would expect is (assuming the macros are defined to eponymous tags):

<MYMACRO> some text

<p>some more text</p>

<ANOTHER_MACRO>

<p>and some more text</p>

</ANOTHER_MACRO>

</MYMACRO>

You'll notice that some text did not get wrapped. I did this because it starts on the same line as the macro opening.

My preferred rule is: naked text that begins a section or immediately follows \n\n is opened with $(DDOC_PARAGRAPH and it is closed at the next \n\n, when a macro containing the start of this paragraph was closed, or the end of a section, whichever comes first. At the end of input, if a paragraph is opened, it is closed.

"Naked text" is defined as any text that is not in a code example section, not in a macro definition, and not entirely contained inside a macro itself (if naked text starts with $(, the closing ) must also appear on the same line, before the final character, for it to count as an automatic paragraph). A section is defined as the beginning of the documentation comment, a code example, the description of a function parameter, or a Ddoc_defined: colon-separated section.

This "naked text" might be determined before any macro expansion is done; perhaps the insertion of paragraph macros is done in a pre-processing stage on the raw text before other transformations and is run once and only once in the transformation process. I'm not sure that's necessary but it might be.

A paragraph can never be empty or consist only of whitespace.

Finally, let us not forget that Params: is a special section in ddoc that generates a table.

For example:

   /**
         This line would be wrapped in a paragraph macro. It begins a section.

         So would this line, continuing
         into this line, since it is naked text bracketed by \n\n.

         This is also a paragraph, ending at the following sample:
         ---
            But this is not.

            Nor is this, because it is a code example.
         ---
         But this is a paragraph again, because it begins a section.

         $(Foo this is NOT an automatic paragraph, because it begins with a macro
             opening. Foo itself is not wrapped in a paragraph either.

             However, this is a paragraph again, and closed before the Foo macro is closed.

         ) <!-- note that there would be no extra paragraph at the end, despite \n\n, because a paragraph can never be empty -->

         $(LINK foo.com) IS a paragraph, despite starting with a macro, because the closing tag is obviously inline.

         $(LI but this is NOT a paragraph because the entire line is a macro again.)

         This is back to being a normal paragraph.




        And this is just ONE more, because the preceding lines, being empty, do not qualify as a paragraph. If space was intentional, use a $(SPACER) macro of some sort in there.

       Params:
               a = special text describing the param which makes a table...

                       ... and includes two paragraphs.
               b = this is a paragraph too. The generated code would be like <th>b</th><td><p>this is a paragraph too</p></td>

        Macros:
          FOO=this is not an automatic
              paragraph

              this still not a paragraph

              because automatic paragraph wrapping is never done in macro definitions.
   */
   void foo(int a, int b) {}

I'm actually not 100% convinced of the ANOTHER_MACRO / $(Foo first line case, I might change my mind on them... if I would, I'd change it to be that macros spanning multiple lines be defined specifically as any macro whose immediate text includes \n\n counts as a section... but I think the rest of my definition here is comprehensive across ddoc's features, semantically correct, convenient and understandable for doc authors to write, and implementable in the compiler.

@CyberShadow
Copy link
Member

@adamdruppe What about the case I mentioned, a list where one item has one paragraph and another item has two paragraphs?

@andralex
Copy link
Member Author

@CyberShadow : by "totally fine" I mean "it won't lead to unmatched parens". It will of course look like paragraphs do because that's what the user asked for!

@adamdruppe: I understand what you're saying but the amount of analysis (parens on lines etc) required by that semantics is not matched by what you get from it.

Again: this is what we need - collect 2 or more newlines into one paragraph separator. It's the simple, meaningful, motivated solution. I'll hop on IRC to address any questions.

@CyberShadow
Copy link
Member

Again: this is what we need - collect 2 or more newlines into one paragraph separator.

I agree. Unfortunately, HTML does not have a paragraph separator.

We can invent one, but we'll be throwing semantics out the window. This might still be the best option though. DDOC_PARAGRAPH=<div class="paragraph-separator"></div> / .paragraph-separator { height: 0.5em; }

I'll hop on IRC to address any questions.

I'm not home (in the US) right now so can't come to IRC.

@adamdruppe
Copy link
Contributor

I just realized too that I forgot to address the issue of macros with multiple arguments. Ugh.

@andralex
Copy link
Member Author

@CyberShadow: yes, that kind of DIV is what I had in mind.

I'm not worried about <p>. From what I can tell HTML didn't quite get paragraphs right because <p> has a long history (which I was surprised to learn about last night) and has the most exceptions associated with it (as @nomad-software mentioned). Initially there was no </p>. Javadoc requires <p> at the end of a paragraph and also requires no </p>. At least for a while, </p> closes all previously-opened <p>s. (Probably that's what browsers do today.)

Far as I can tell the formatting that one would add to the <p> style may be simply done at the body-text level - there's no loss.

So after researching the matter I figured it's not worth trying to do some complicated processing that would support the use of <p>...</p>. What's needed (and TeX got right) is a paragraph terminator/separator, not a paragraph wrapper.

@adamdruppe
Copy link
Contributor

On Tue, Dec 22, 2015 at 09:13:38AM -0800, Vladimir Panteleev wrote:

@adamdruppe What about the case I mentioned, a list where one item has one paragraph and another item has two paragraphs?

I don't see the post even scrolling up...

But

$(LI this I just don't think should be a paragraph. the li block alone is enough)
$(LI
   but this probably should be.

   and this certainly should be.
)
$(LI $(P you could manually wrap still if you want it to have one.))

would that cover your concern?

@andralex
Copy link
Member Author

@adamdruppe you need to show how to wrap general paragraphs that contain unbalanced parens.

@CyberShadow
Copy link
Member

$(LI this I just don't think should be a paragraph. the li block alone is enough)

Thing is that it has to be, otherwise the spacing between list items will be inconsistent.

@CyberShadow
Copy link
Member

@CyberShadow: yes, that kind of DIV is what I had in mind.

OK, then I'm on board with this PR.

However, the implementation seems to need more work. I can't judge the code but looking at the documentation tester's diffs, it looks like it's eating whitespace where it shouldn't. If I understand correctly, by itself this patch should not affect documentation output.

@adamdruppe
Copy link
Contributor

you need to show how to wrap general paragraphs that contain unbalanced parens.

Let's get an example:

This is a paragraph $(BLOCK with

some big

block in the middle of it)!

Putting a block in the middle of a paragraph doesn't make sense anyway... it would by nature split the paragraph into two! But there's two solutions: the paragraph would probably end with the $(. We could redefine section in my other comment to also include any macro that includes \n\n to cover that.

Or we could define the whole macro to be inside the one paragraph which means it would span those \n\n in the middle too. The latter kinda makes sense to me, maybe the macro is formatted across lines to be pretty in source rather than to semantically be a paragraph, since if it was meant to be one, it would surely be on its own block in the source too.

BTW

This is a paragraph <div> with
<p>
some big
<p>
block in the middle of it</div>!

is technically broken html.... but it is good enough to parse...

}
if (c == '\n')
{
iLineStart = scout + 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is probably what is breaking the autotester, since iLineStart is also used for the DDOC_BLANKLINE test up on line 2150: https://github.com/andralex/dmd/blob/DDOC_PARAGRAPH_SEPARATOR/src/doc.d#L2150

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually, playing with it, it is the buf.insert call that is hosing it. It advances i up beyond the blank lines.

@adamdruppe
Copy link
Contributor

yo i forked your fork and did a simpler implementation that is more correct in my little tests. God willing, it will pass the autotester. https://github.com/D-Programming-Language/dmd/pull/5316/files

You should try it on your test cases too to make sure the paragraph separators actually go where you want them to. This is what I tried:

/++
    This is the summary section.

    This is not a new paragraph. Rather, it is the description section.

    This IS a new paragraph.




    and this too. There should be only one.

+/
void main() {

The output was the same before and after my patch. Kewl, the blank lines were all still there (that's what was missing in your version, the buf.insert skipped over them so they wouldn't be inserted with the old blankline macro).

Now add some macro action:

    Macros:
    DDOC_PARAGRAPH_SEPARATOR=<p>

Here's the code:

<html><head>
        <!-- Generated by Ddoc from t.d -->
        <META http-equiv="content-type" content="text/html; charset=utf-8">
        <title>t</title>
        </head><body>
        <h1>t</h1>
<br><br>
<dl><dt><big><a name="main"></a>void <u>main</u>();
</big></dt>
<dd>This is the summary section.
<br><br>
This is not a new paragraph. Rather, it is the description section.
<br><br>

    <p>This IS a new paragraph.
<br><br>

<br><br>

<br><br>

<br><br>

    <p>and this too. There should be only one.

<br><br>

</dd>
</dl>

        <hr><small>Page generated by <a href="http://dlang.org/ddoc.html">Ddoc</a>. </small>
        </body></html>

The <br><br> is the result of DDOC_BLANKLINE. The <p> is the result of the paragraph separator. Notice that the description section didn't get a <p> cuz it is a whole new thing as far as ddoc is concerned.

I'm not gonna say this is correct across all possible cases, I opened my PR just to give you some code to try that hopefully saves you some time. I don't want to take ownership of this issue.

@andralex
Copy link
Member Author

@adamdruppe I think your PR is better. But I have trouble figuring how it doesn't emit one paragraph separator every two blank lines. Say you have 100 blank lines, it seems to me every 2 lines it'll emit one separator then reset the blankLineRun counter and then again?

@adamdruppe
Copy link
Contributor

On Tue, Dec 22, 2015 at 05:39:31PM -0800, Andrei Alexandrescu wrote:

@adamdruppe I think your PR is better. But I have trouble figuring how it doesn't emit one paragraph separator every two blank lines. Say you have 100 blank lines, it seems to me every 2 lines it'll emit one separator then reset the blankLineRun counter and then again?

Since the output is at the default case, it won't write out a <p>
at all until it encounters some non-whitespace character. Until then,
the blank line counter is only incremented, not reset.

Though I think it might need to be done in some of the other cases
too, I'd better double check that.

@adamdruppe
Copy link
Contributor

And merging my PR automatically updated this one!

I'd love to see PRs against PRs become a common thing.

@dlang-bot
Copy link
Contributor

Fix Bugzilla Description
15389 extern(C++) forward referencing problem
15464 Template parameter-dependent attributes

@andralex
Copy link
Member Author

andralex commented Jan 7, 2016

Alrighty, this looks like the ticket.

@andralex
Copy link
Member Author

andralex commented Jan 7, 2016

thx @adamdruppe !!

@adamdruppe
Copy link
Contributor

The auto tester seems to be complaining about whitespace again.... though I think the pull request I did didn't delete your old code. Lines 2175 to 2201 can be removed.

Lines 2427 to 2438 are the new way that do it better.

@CyberShadow
Copy link
Member

You can also look at the doc tester diffs, e.g. currently this patch is messing with code blocks' indentation.

@adamdruppe
Copy link
Contributor

Yeah, it is doing that because lines 2175-2201 are still there.

@andralex
Copy link
Member Author

andralex commented Jan 8, 2016

Well I did things a different way, which is simpler because it doesn't use nested loops.

@@ -2301,100 +2330,103 @@ extern (C++) void highlightText(Scope* sc, Dsymbols* a, OutBuffer* buf, size_t o
/* A line beginning with --- delimits a code section.
* inCode tells us if it is start or end of a code section.
*/
if (leadingBlank)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found fit to remove this Ridiculously Long If Statement. 90 lines!

@andralex
Copy link
Member Author

andralex commented Jan 8, 2016

one more empty line to fix... brb...

@andralex
Copy link
Member Author

I got this to work but it's just too much for too little. I decided instead to go with #5344 and dlang/dlang.org#1186, which simplify things for everyone.

@andralex andralex closed this Jan 10, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants