Issue 9731: implement $(DDOC_PARAGRAPH) #5315

andralex · 2015-12-22T02:56:09Z

This feature inserts $(DDOC_PARAGRAPH_SEPARATOR) whenever encountering two newlines or more in the source text.

After some research I figured that this is the right behavior instead of wrapping each paragraph in a DDOC_PARAGRAPH macro. That approach does not work well or at all whenever nesting of macros is present, with two newlines in between. In contrast, a paragraph break is always insertable.

One interesting thing is the  tag is somewhat special in HTML, due to it being a "mistake" of sorts (it should be a self-closing tag, and it seems to not be due to historical accident). Anyhow HTML does not require  so we should be fine with paragraph separators as introduced by this feature.

Alternatively, instead of  we should insert a <div class="DDOC_PARAGRAPH_SEPARATOR></div> and let the css choose. For now the macro is defined to nothing and is backwards compatible.

andralex · 2015-12-22T02:56:31Z

#4338 is the previous take on this.

andralex · 2015-12-22T03:00:39Z

Relevant: http://stackoverflow.com/questions/8460993/p-end-tag-p-is-not-needed-in-html

adamdruppe · 2015-12-22T03:08:35Z

The  tag in html isn't a mistake, it is a directive inspired from SGML which did not require all tags to be there; it could implicitly insert them. As such, it is by design and perfectly valid to use plain  and wold not be right to use .

I prefer the use of the closing tag, it makes parsing easier (a fair chunk of code in my dom.d could be removed if all p's were closed consistently..) and it is required in xhtml but you're right that there's no need for it in html.... but still, the other PR kinda looks better to me. I gotta look closer.

adamdruppe · 2015-12-22T03:09:48Z

PS I actually came close to writing this myself today. It'd be really nice to have, even if imperfect.

adamdruppe · 2015-12-22T03:28:26Z

I think the other PR you referenced has better behavior and left some comments there...

CyberShadow · 2015-12-22T07:05:26Z

One interesting thing is the  tag is somewhat special in HTML, due to it being a "mistake" of sorts (it should be a self-closing tag, and it seems to not be due to historical accident). Anyhow HTML does not require  so we should be fine with paragraph separators as introduced by this feature.

Although closing some tags may be "optional" in some contexts, it makes sensibly validating the HTML much more difficult. I've spent some effort fixing this recently (see my HTML fixes PRs), please don't undo my work.

nomad-software · 2015-12-22T09:05:24Z

Anyhow HTML does not require  so we should be fine with paragraph separators as introduced by this feature.

From the specification

A p element’s end tag may be omitted if the p element is immediately followed by an address, article, aside, blockquote, dir, div, dl, fieldset, footer, form, h1, h2, h3, h4, h5, h6, header, hr, menu, nav, ol, p, pre, section, table, or ul element, or if there is no more content in the parent element and the parent element is not an a element.

andralex · 2015-12-22T16:13:35Z

Thanks for reviewing. As I feared this is getting sidetracked into a discussion about . We don't need to use it for paragraph separation - a DIV should be just fine.

andralex · 2015-12-22T16:18:51Z

BTW could please anyone double check the code? I'm not sure whether I've left some corner cases uncovered.

FWIW: to the best of my knowledge this PR mimics what TeX does: the processor replaces runs of whitespace containing two or more newlines into one \par. Paragraph start is not marked, only paragraph ending.

adamdruppe · 2015-12-22T16:24:16Z

src/doc.d

+                    }
+                    // We got to a non-whitespace, time to insert the paragraph
+                    // break.
+                    static __gshared immutable ps =


Is there any need for all those qualifiers in new D code? You're just referring to a string literal, use ordinary immutable ps = "it";.

andralex · 2015-12-22T16:24:24Z

@adamdruppe: thanks, read your comments on #4338. I think #5315 is the right approach (just insert a paragraph separator). With #4338, I'm unclear on what happens e.g. if there's a macro call that has two newlines inside of it. Consider:

$(MYMACRO some text

some more text

$(ANOTHER_MACRO

and some more text

)

)

I wonder how this whole nesting could be properly wrapped in paragraphs. With the terminator there's no such issue - insert a paragraph separator wherever there's two or more newlines.

adamdruppe · 2015-12-22T16:29:47Z

As I feared this is getting sidetracked into a discussion about .

We really should understand what HTML is when writing code to generate it. I'll grant that ddoc is going to be a leaky abstraction at some level, but let's try not to be gratuitously wrong just because we're too lazy to think through the semantics.

We don't need to use it for paragraph separation - a DIV should be just fine.

That's just totally wrong.  works and has a standard way to parse it (if a bit messy to implement if not closed) while semantically meaning paragraph break. It is a bad idea to invent new tags in html that mean the same thing as existing tags.

What we're trying to do is to achieve a more perfect mapping to native, semantic paragraphs.

CyberShadow · 2015-12-22T16:37:16Z

Instead of a macro that is inserted in place of blank lines, how about a macro that wraps paragraphs (and is defined as $0 by default)? That solves the  problem, because it can be redefined as $0.

Edit: Oh, that's what #4338 does. I think that's the better direction to pursue then.

CyberShadow · 2015-12-22T16:44:43Z

I wonder how this whole nesting could be properly wrapped in paragraphs.

This problem applies to this PR as well, because we'll just be starting a paragraph in the middle of ANOTHER_MACRO, i.e.: $(ANOTHER_MACRO foo $(P bar))

Some options:

restrict the macros to the top level - wouldn't work in practice as it prevents usage of paragraphs in tables and lists
require explicit usage of $(DDOC_PARAGRAPH) inside nested macros containing blank lines, but only if it has been redefined to preserve backwards compatibility
instantiate them automatically at the start of the contents of each macro block that contains blank lines (so $(ANOTHER_MACRO foo \n\n bar) becomes $(ANOTHER_MACRO $(P foo) $(P bar))

Option 3 seems most practical... however, then it has the issue that given a list, the list items which contain 1 paragraph will have no padding-bottom, but ones with 2 or more paragraphs will, which will look inconsistent.

So... #5315 (this PR) does have the advantage in consistency and ease of use, however we can't use it with . We need a paragraph separator tag (kinda like   or <hr>), which  isn't. #4338 would be better if we can define some way for single-paragraph macro blocks to be wrapped into DDOC_PARAGRAPH.

andralex · 2015-12-22T17:06:51Z

@CyberShadow it's totally fine to have a paragraph break inside a macro expansion. @adamdruppe I think you're making this quite a bit more than it really is.

CyberShadow · 2015-12-22T17:08:24Z

@CyberShadow it's totally fine to have a paragraph break inside a macro expansion.

Why do you think so? What if  has a margin-left? It will look crooked.

 is not a paragraph break. It is not like  , it has to wrap the text in the paragraph.

adamdruppe · 2015-12-22T17:09:11Z

I'm not sure just what the other PR's code would do, but given:

$(MYMACRO some text

some more text

$(ANOTHER_MACRO

and some more text

)

)

The output I would expect is (assuming the macros are defined to eponymous tags):

<MYMACRO> some text

<p>some more text</p>

<ANOTHER_MACRO>

<p>and some more text</p>

</ANOTHER_MACRO>

</MYMACRO>

You'll notice that some text did not get wrapped. I did this because it starts on the same line as the macro opening.

My preferred rule is: naked text that begins a section or immediately follows \n\n is opened with $(DDOC_PARAGRAPH and it is closed at the next \n\n, when a macro containing the start of this paragraph was closed, or the end of a section, whichever comes first. At the end of input, if a paragraph is opened, it is closed.

"Naked text" is defined as any text that is not in a code example section, not in a macro definition, and not entirely contained inside a macro itself (if naked text starts with $(, the closing ) must also appear on the same line, before the final character, for it to count as an automatic paragraph). A section is defined as the beginning of the documentation comment, a code example, the description of a function parameter, or a Ddoc_defined: colon-separated section.

This "naked text" might be determined before any macro expansion is done; perhaps the insertion of paragraph macros is done in a pre-processing stage on the raw text before other transformations and is run once and only once in the transformation process. I'm not sure that's necessary but it might be.

A paragraph can never be empty or consist only of whitespace.

Finally, let us not forget that Params: is a special section in ddoc that generates a table.

For example:

   /**
         This line would be wrapped in a paragraph macro. It begins a section.

         So would this line, continuing
         into this line, since it is naked text bracketed by \n\n.

         This is also a paragraph, ending at the following sample:
         ---
            But this is not.

            Nor is this, because it is a code example.
         ---
         But this is a paragraph again, because it begins a section.

         $(Foo this is NOT an automatic paragraph, because it begins with a macro
             opening. Foo itself is not wrapped in a paragraph either.

             However, this is a paragraph again, and closed before the Foo macro is closed.

         ) <!-- note that there would be no extra paragraph at the end, despite \n\n, because a paragraph can never be empty -->

         $(LINK foo.com) IS a paragraph, despite starting with a macro, because the closing tag is obviously inline.

         $(LI but this is NOT a paragraph because the entire line is a macro again.)

         This is back to being a normal paragraph.




        And this is just ONE more, because the preceding lines, being empty, do not qualify as a paragraph. If space was intentional, use a $(SPACER) macro of some sort in there.

       Params:
               a = special text describing the param which makes a table...

                       ... and includes two paragraphs.
               b = this is a paragraph too. The generated code would be like <th>b</th><td><p>this is a paragraph too</p></td>

        Macros:
          FOO=this is not an automatic
              paragraph

              this still not a paragraph

              because automatic paragraph wrapping is never done in macro definitions.
   */
   void foo(int a, int b) {}

I'm actually not 100% convinced of the ANOTHER_MACRO / $(Foo first line case, I might change my mind on them... if I would, I'd change it to be that macros spanning multiple lines be defined specifically as any macro whose immediate text includes \n\n counts as a section... but I think the rest of my definition here is comprehensive across ddoc's features, semantically correct, convenient and understandable for doc authors to write, and implementable in the compiler.

CyberShadow · 2015-12-22T17:13:33Z

@adamdruppe What about the case I mentioned, a list where one item has one paragraph and another item has two paragraphs?

andralex · 2015-12-22T17:14:14Z

@CyberShadow : by "totally fine" I mean "it won't lead to unmatched parens". It will of course look like paragraphs do because that's what the user asked for!

@adamdruppe: I understand what you're saying but the amount of analysis (parens on lines etc) required by that semantics is not matched by what you get from it.

Again: this is what we need - collect 2 or more newlines into one paragraph separator. It's the simple, meaningful, motivated solution. I'll hop on IRC to address any questions.

CyberShadow · 2015-12-22T17:16:53Z

Again: this is what we need - collect 2 or more newlines into one paragraph separator.

I agree. Unfortunately, HTML does not have a paragraph separator.

We can invent one, but we'll be throwing semantics out the window. This might still be the best option though. DDOC_PARAGRAPH=<div class="paragraph-separator"></div> / .paragraph-separator { height: 0.5em; }

I'll hop on IRC to address any questions.

I'm not home (in the US) right now so can't come to IRC.

adamdruppe · 2015-12-22T17:22:54Z

I just realized too that I forgot to address the issue of macros with multiple arguments. Ugh.

andralex · 2015-12-22T17:30:20Z

@CyberShadow: yes, that kind of DIV is what I had in mind.

I'm not worried about . From what I can tell HTML didn't quite get paragraphs right because  has a long history (which I was surprised to learn about last night) and has the most exceptions associated with it (as @nomad-software mentioned). Initially there was no . Javadoc requires  at the end of a paragraph and also requires no . At least for a while,  closes all previously-opened s. (Probably that's what browsers do today.)

Far as I can tell the formatting that one would add to the  style may be simply done at the body-text level - there's no loss.

So after researching the matter I figured it's not worth trying to do some complicated processing that would support the use of .... What's needed (and TeX got right) is a paragraph terminator/separator, not a paragraph wrapper.

adamdruppe · 2015-12-22T17:35:48Z

On Tue, Dec 22, 2015 at 09:13:38AM -0800, Vladimir Panteleev wrote:

@adamdruppe What about the case I mentioned, a list where one item has one paragraph and another item has two paragraphs?

I don't see the post even scrolling up...

But

$(LI this I just don't think should be a paragraph. the li block alone is enough)
$(LI
   but this probably should be.

   and this certainly should be.
)
$(LI $(P you could manually wrap still if you want it to have one.))

would that cover your concern?

andralex · 2015-12-22T17:37:20Z

@adamdruppe you need to show how to wrap general paragraphs that contain unbalanced parens.

CyberShadow · 2015-12-22T19:39:32Z

$(LI this I just don't think should be a paragraph. the li block alone is enough)

Thing is that it has to be, otherwise the spacing between list items will be inconsistent.

CyberShadow · 2015-12-22T19:42:15Z

@CyberShadow: yes, that kind of DIV is what I had in mind.

OK, then I'm on board with this PR.

However, the implementation seems to need more work. I can't judge the code but looking at the documentation tester's diffs, it looks like it's eating whitespace where it shouldn't. If I understand correctly, by itself this patch should not affect documentation output.

adamdruppe · 2015-12-23T00:20:11Z

you need to show how to wrap general paragraphs that contain unbalanced parens.

Let's get an example:

This is a paragraph $(BLOCK with

some big

block in the middle of it)!

Putting a block in the middle of a paragraph doesn't make sense anyway... it would by nature split the paragraph into two! But there's two solutions: the paragraph would probably end with the $(. We could redefine section in my other comment to also include any macro that includes \n\n to cover that.

Or we could define the whole macro to be inside the one paragraph which means it would span those \n\n in the middle too. The latter kinda makes sense to me, maybe the macro is formatted across lines to be pretty in source rather than to semantically be a paragraph, since if it was meant to be one, it would surely be on its own block in the source too.

BTW

This is a paragraph <div> with
<p>
some big
<p>
block in the middle of it</div>!

is technically broken html.... but it is good enough to parse...

adamdruppe · 2015-12-23T00:24:58Z

src/doc.d

+                    }
+                    if (c == '\n')
+                    {
+                        iLineStart = scout + 1;


This is probably what is breaking the autotester, since iLineStart is also used for the DDOC_BLANKLINE test up on line 2150: https://github.com/andralex/dmd/blob/DDOC_PARAGRAPH_SEPARATOR/src/doc.d#L2150

actually, playing with it, it is the buf.insert call that is hosing it. It advances i up beyond the blank lines.

adamdruppe · 2015-12-23T00:50:40Z

yo i forked your fork and did a simpler implementation that is more correct in my little tests. God willing, it will pass the autotester. https://github.com/D-Programming-Language/dmd/pull/5316/files

You should try it on your test cases too to make sure the paragraph separators actually go where you want them to. This is what I tried:

/++
    This is the summary section.

    This is not a new paragraph. Rather, it is the description section.

    This IS a new paragraph.




    and this too. There should be only one.

+/
void main() {

The output was the same before and after my patch. Kewl, the blank lines were all still there (that's what was missing in your version, the buf.insert skipped over them so they wouldn't be inserted with the old blankline macro).

Now add some macro action:

    Macros:
    DDOC_PARAGRAPH_SEPARATOR=<p>

Here's the code:

<html><head>
        <!-- Generated by Ddoc from t.d -->
        <META http-equiv="content-type" content="text/html; charset=utf-8">
        <title>t</title>
        </head><body>
        <h1>t</h1>
<br><br>
<dl><dt><big><a name="main"></a>void <u>main</u>();
</big></dt>
<dd>This is the summary section.
<br><br>
This is not a new paragraph. Rather, it is the description section.
<br><br>

    <p>This IS a new paragraph.
<br><br>

<br><br>

<br><br>

<br><br>

    <p>and this too. There should be only one.

<br><br>

</dd>
</dl>

        <hr><small>Page generated by <a href="http://dlang.org/ddoc.html">Ddoc</a>. </small>
        </body></html>

The   is the result of DDOC_BLANKLINE. The  is the result of the paragraph separator. Notice that the description section didn't get a  cuz it is a whole new thing as far as ddoc is concerned.

I'm not gonna say this is correct across all possible cases, I opened my PR just to give you some code to try that hopefully saves you some time. I don't want to take ownership of this issue.

andralex · 2015-12-23T01:39:25Z

@adamdruppe I think your PR is better. But I have trouble figuring how it doesn't emit one paragraph separator every two blank lines. Say you have 100 blank lines, it seems to me every 2 lines it'll emit one separator then reset the blankLineRun counter and then again?

adamdruppe · 2015-12-23T01:43:31Z

On Tue, Dec 22, 2015 at 05:39:31PM -0800, Andrei Alexandrescu wrote:

@adamdruppe I think your PR is better. But I have trouble figuring how it doesn't emit one paragraph separator every two blank lines. Say you have 100 blank lines, it seems to me every 2 lines it'll emit one separator then reset the blankLineRun counter and then again?

Since the output is at the default case, it won't write out a 
at all until it encounters some non-whitespace character. Until then,
the blank line counter is only incremented, not reset.

Though I think it might need to be done in some of the other cases
too, I'd better double check that.

adamdruppe · 2016-01-07T17:24:28Z

And merging my PR automatically updated this one!

I'd love to see PRs against PRs become a common thing.

dlang-bot · 2016-01-07T19:10:11Z

Fix	Bugzilla	Description
✓	15389	extern(C++) forward referencing problem
✓	15464	Template parameter-dependent attributes

andralex · 2016-01-07T19:17:16Z

Alrighty, this looks like the ticket.

andralex · 2016-01-07T19:20:39Z

thx @adamdruppe !!

adamdruppe · 2016-01-07T21:41:25Z

The auto tester seems to be complaining about whitespace again.... though I think the pull request I did didn't delete your old code. Lines 2175 to 2201 can be removed.

Lines 2427 to 2438 are the new way that do it better.

CyberShadow · 2016-01-08T03:14:04Z

You can also look at the doc tester diffs, e.g. currently this patch is messing with code blocks' indentation.

adamdruppe · 2016-01-08T03:16:25Z

Yeah, it is doing that because lines 2175-2201 are still there.

andralex · 2016-01-08T16:17:45Z

Well I did things a different way, which is simpler because it doesn't use nested loops.

andralex · 2016-01-08T16:18:54Z

src/doc.d

@@ -2301,100 +2330,103 @@ extern (C++) void highlightText(Scope* sc, Dsymbols* a, OutBuffer* buf, size_t o
            /* A line beginning with --- delimits a code section.
             * inCode tells us if it is start or end of a code section.
             */
-            if (leadingBlank)


I found fit to remove this Ridiculously Long If Statement. 90 lines!

… text.

andralex · 2016-01-08T16:42:09Z

one more empty line to fix... brb...

andralex · 2016-01-10T04:26:01Z

I got this to work but it's just too much for too little. I decided instead to go with #5344 and dlang/dlang.org#1186, which simplify things for everyone.

adamdruppe reviewed Dec 22, 2015
View reviewed changes

adamdruppe reviewed Dec 23, 2015
View reviewed changes

adamdruppe mentioned this pull request Dec 23, 2015

fix Andrei's code to keep the existing DDOC_BLANKLINEs #5316

Closed

andralex force-pushed the DDOC_PARAGRAPH_SEPARATOR branch from d571623 to 450e065 Compare January 7, 2016 19:10

andralex force-pushed the DDOC_PARAGRAPH_SEPARATOR branch from 450e065 to f4968e9 Compare January 7, 2016 19:16

andralex reviewed Jan 8, 2016
View reviewed changes

andralex and others added 5 commits January 8, 2016 11:20

Ddoc: insert whenever encountering two newlines or more in the source…

fdc2c6c

… text.

Fix a couple of errors

46fa5ae

@adamruppe review

00583d0

fix Andrei's code to keep the existing DDOC_BLANKLINEs

a6207f8

Redo DDOC_PARAGRAPH_SEPARATOR to fix its issues

33bcb40

andralex force-pushed the DDOC_PARAGRAPH_SEPARATOR branch from 09d7a40 to 33bcb40 Compare January 8, 2016 16:26

andralex mentioned this pull request Jan 10, 2016

Better DDOC_BLANKLINE works uniformly on .dd and .d files #5344

Merged

andralex closed this Jan 10, 2016

Issue 9731: implement $(DDOC_PARAGRAPH) #5315

Issue 9731: implement $(DDOC_PARAGRAPH) #5315

Conversation

andralex commented Dec 22, 2015

andralex commented Dec 22, 2015

andralex commented Dec 22, 2015

adamdruppe commented Dec 22, 2015

adamdruppe commented Dec 22, 2015

adamdruppe commented Dec 22, 2015

CyberShadow commented Dec 22, 2015

nomad-software commented Dec 22, 2015

andralex commented Dec 22, 2015

andralex commented Dec 22, 2015

adamdruppe Dec 22, 2015

Choose a reason for hiding this comment

andralex Dec 22, 2015

Choose a reason for hiding this comment

andralex commented Dec 22, 2015

adamdruppe commented Dec 22, 2015

CyberShadow commented Dec 22, 2015

CyberShadow commented Dec 22, 2015

andralex commented Dec 22, 2015

CyberShadow commented Dec 22, 2015

adamdruppe commented Dec 22, 2015

CyberShadow commented Dec 22, 2015

andralex commented Dec 22, 2015

CyberShadow commented Dec 22, 2015

adamdruppe commented Dec 22, 2015

andralex commented Dec 22, 2015

adamdruppe commented Dec 22, 2015

andralex commented Dec 22, 2015

CyberShadow commented Dec 22, 2015

CyberShadow commented Dec 22, 2015

adamdruppe commented Dec 23, 2015

adamdruppe Dec 23, 2015

Choose a reason for hiding this comment

adamdruppe Dec 23, 2015

Choose a reason for hiding this comment

adamdruppe commented Dec 23, 2015

andralex commented Dec 23, 2015

adamdruppe commented Dec 23, 2015

adamdruppe commented Jan 7, 2016

dlang-bot commented Jan 7, 2016

andralex commented Jan 7, 2016

andralex commented Jan 7, 2016

adamdruppe commented Jan 7, 2016

CyberShadow commented Jan 8, 2016

adamdruppe commented Jan 8, 2016

andralex commented Jan 8, 2016

andralex Jan 8, 2016

Choose a reason for hiding this comment

andralex commented Jan 8, 2016

andralex commented Jan 10, 2016