Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hint to prevent page break #38

Closed
jxxcarlson opened this issue Sep 6, 2014 · 48 comments
Closed

Hint to prevent page break #38

jxxcarlson opened this issue Sep 6, 2014 · 48 comments
Assignees
Milestone

Comments

@jxxcarlson
Copy link

It would be good to have a way inserting hints (invisible in all asciidoctor output) that indicate when a page break is a good thing. (There may be other hints that are desirable as well, like when it is good to keep things together). Attached is an example. At the bottom of the page is the title of the poem that appears on the next page. That poem is just five lines long. The poem and the heading should go together --- sometimes one poem per page, or a page break at the end of each poem is a good solution.

TeX (or LaTeX) do a fantastic job of controlling the page break, even without hinting. It "knows" what things should go together, where good, bad, and terrible breaks are. There is a lot of sophisticated behind-the-scenes optimization going on. I don't know if there is an engine out there for managing page breaks that one could use.

In any case, the current form of asciidoctor-pdf is great progress!

dylan_poetry

This was referenced Sep 6, 2014
@mojavelinux
Copy link
Member

Thanks for the suggestion, @jxxcarlson. Asciidoctor PDF does have some support for keeping content together. There's a special block in Asciidoctor PDF called keep_together for this very purpose.

The choices of what to keep together so far have been made “as needed”. For instance, code listings are kept together. I didn't implement it for verses, but it does make sense that they would be kept together.

There's also some support for keeping the section title from being orphaned at the bottom of the page. But again, they are borderline hacks at the moment based on cases that came up.

What we'll want to do is align with well-respected style guides. That way, we can put together test cases for the scenarios that are known and make sure Asciidoctor PDF behaves that way.

@mojavelinux mojavelinux self-assigned this Sep 8, 2014
@mojavelinux
Copy link
Member

I also want to mention that there's a block role in AsciiDoc called "keep-together" that was designed to control this sort of thing. We don't currently honor that role, but we could. (We also might want to think about a shorter name for that role).

@mojavelinux mojavelinux changed the title Page breaks Hint to prevent page break Dec 12, 2014
@antoyo
Copy link

antoyo commented Nov 20, 2015

Hello.
Is there a way to use the keep_together block alone or do we need to use another block using it (for instance, an example block)?
For instance, I would like to prevent a page break between a source code and its callouts.
Is there a way to do this without wrapping them in an example (or quote block or another block changing the width of its content)?
Can you provide some asciidoc source code showing how to do this, please?

@mojavelinux
Copy link
Member

Not at the moment. This is a feature which is linked to specific blocks. The keep_together is an API method, so if you extend the converter you could wrap it around a call to the delegate method.

@schauder
Copy link

schauder commented Nov 23, 2016

Could you provide a tiny example, how to use this? We tried various stuff like

[keep-together]
--
paragraph that should not get split between pages
--


[keep_together]
--
paragraph that should not get split between pages
--

[role="keep_together"]
--
paragraph that should not get split between pages
--

But nothing worked :-/

@mojavelinux
Copy link
Member

mojavelinux commented Nov 24, 2016

Currently, the keep together logic in Asciidoctor PDF is not configurable. The converter decides which content is kept together and which is not. The point of this issue is to discuss if and how it can be configurable.

You can force a page break between any two blocks using:

ifdef::backend-pdf[<<<]

@handcraftsman
Copy link

I abuse the quote block to achieve keep-together.

[quote]
--
foo

bar
--

The caveat is one line titles (i.e = foo) are treated like regular text by the quote block so you cannot use this trick to keep a title with the next paragraph.

@mojavelinux Could you perhaps make it so that one line titles are supported within quote blocks?

@mojavelinux
Copy link
Member

While I understand the need to use this hack for a quick solution, I urge you only to use it as a last and temporary resort because it violates the semantics of the document.

I'm not sure what you mean by one-line titles. Could you show a specific example?

@handcraftsman
Copy link

Section 11.2 of the user guide http://www.methods.co.nz/asciidoc/userguide.html calls these "one line titles"

= Hello World =
== Level two ==

I'll happily switch to using a semantic keep-together tag/block when one becomes available. In the mean time this reduces the pain of manually adding <<< new page hints in order to keep text with related code blocks across page breaks. When keep-with-next is added to the one line titles that will take care of the rest of the places I end up adding <<< hints. A smart keep-together or option thereof that perhaps allows things to split across paired verso-recto pages but not across page flips would also be useful.

Thanks!

@mojavelinux
Copy link
Member

Oh, now I see what you are saying.

You can include headings inside of a block by adding the "discrete" style.

____
bar

[discrete]
= foo
____

The "discrete" style is how you make a regular heading instead of a section title.

A smart keep-together or option thereof that perhaps allows things to split across paired verso-recto pages but not across page flips would also be useful.

I agree. But this becomes increasingly more complicated (and slower) the more checks we have to do. If I could rewrite Prawn, I would make it a two-phase converter that first converts to an AST, then figures out where to put stuff. Unfortunately, that's not how it works today.

@mojavelinux mojavelinux added this to the v1.5.0.beta.1 milestone Dec 5, 2016
@handcraftsman
Copy link

handcraftsman commented Dec 5, 2016

Unfortunately marking the title with "discrete" removes it from the TOC. (sigh)

Fixing orphaned titles and split paragraphs currently involves a laborious manual process:

repeat
    generate the document
    find the next problem
    add a manual page break

Then if you later decide to add/remove/move a couple of paragraphs in the middle it may shift enough to throw off the whole structure and have to be done again through the end of the chapter. In my case, my book has both 8x11 and 7x9 versions so this has to be done (at least) twice.

If we could only have one fix, then I would suggest a style change so that titles are marked keep-with-next by default with maybe the ability to override in the style sheet. That would resolve most users' heartache and make it behave like MSWord.

If we could only have a generic fix then I'd suggest exposing a way to mark items keep-with-next without having to go thought the API, like:

[keep-with-next], or
.keep-with-next

That way one could declaratively mark titles, paragraphs, etc, while writing. It isn't smart about verso-recto pages but I don't know that MSWord is either.

@Nowass
Copy link

Nowass commented Dec 13, 2016

Hi,
I am quite a newbie in asciidoc and PDF generation, but I am currently dealing with exactly the same problem ... Header stays at the end of the page and the picture / other content is on next page ...

If I good understand, you are thinking about to have some defined API, which will be used directly in the source text file to define which blocks should stay together. This would be nice for some special cases in the text like tables and so on. But from my point of view, I would say, that in the case of the header and the following paragraph it should be defined within the template. The reason is quite simple ... I want this behavior at every header and seems to me not comfortable to put some tag to every header.

I am not sure if it could be done in the PDF theme or if the Ruby-based converter class has to be changed.

Correct me if I missed something ...

Petr

@rlopez133
Copy link

@mojavelinux I noticed this is labeled as improvement, is there any timeline for this?

Thanks as always for all ya do with asciidoctor :)

@elburro1887
Copy link

elburro1887 commented Jul 28, 2017

Is there any update on this?

btw, issue #105 is similar.

@ghost
Copy link

ghost commented Aug 4, 2017

Currently, the keep together logic in Asciidoctor PDF is not configurable. The converter decided which content is kept together and which is not. The point of this issue is to discuss if and how it can be configurable.

I'd like to see all headings of level 2 and below kept with the following block by default. I currently have a PDF with multiple orphaned headings at the bottom of pages. From the discussion above, I still don't understand how to workaround this issue with the latest version of asciidoctor-pdf.

@antoyo
Copy link

antoyo commented Aug 4, 2017

@dcp65: I think I had this issue once. If I remember correctly, I had to manually insert a page break before the header.

@ghost
Copy link

ghost commented Aug 4, 2017

@antoyo OK, thanks. That's not a great solution. Systems like DocBook XSL and even Word handle keep-with-next pretty well, so hopefully this will be supported soon.

@elburro1887
Copy link

@dcp65 Yes, that is exactly how it should work. All the heading, regardless of the level, should be kept with the following block. I hope this will get implemented soon.

@hueami
Copy link

hueami commented Feb 8, 2018

Figures and their captions should always kept together in my opinion.
We have one case where a figure (with caption below) is followed by a table (with caption above) and the ends after the caption of the table. So at the end of the page there is a figure followed by two captions. The one of the figure and the other of the table.
We can add a manual pagebreak between figure and table but because of the document is under development, that will destroy the design after there are some changes above.

@owjjh
Copy link

owjjh commented May 3, 2018

+1 for this feature

@dunctait
Copy link

dunctait commented Jun 5, 2018

I too +1 this.

Use case:
Creating REST API docs. We want headings like "GET api/movies Sample Response" to stick with the JSON codeblock that gets shunted to the next page in the pdf because it's quite long

@themowski
Copy link

I'll throw another +1 for preventing orphaned headers.

Use-case: I'm working on some documentation with the following sub-structure occurring often:

== Header

// Long table here, abridged for example
[cols="30,70",options="header"]
|===
| Property
| Description

| Many rows
| So the table breaks to the next page and orphans the header
|===

The section header gets orphaned about halfway down the PDF page and the table pops to the next page. It's not a huge deal, we can manually move the header with a page break, but obviously the cost of maintaining that as documentation changes could get pretty big.

@elburro1887
Copy link

This is the only feature that is really missing for me to make this a real useful tool. Currently I have to go through my documents and manually insert page breaks, which takes ages on a 30-40 page doc.

Oh well, i suppose I will have to move to latex in the end, which i wanted to avoid, since it's just too heavy for the stuff i would need it for (simple product manuals). too bad that this project is kinda dead already, no new releases in 1yr, no timeline or any other feedback to this issue from the project owner.

@nuket
Copy link
Contributor

nuket commented Aug 13, 2019

+1 for automatic widow / orphan handling.

It's one of the things that make good-looking books happen and the thought of going through 600+ pages to figure out where to place extraneous page breaks makes my stomach churn. Handling this would dramatically improve the look and readability of produced content and remove a lot of work from writers going forward.

@mojavelinux
Copy link
Member

mojavelinux commented Aug 15, 2019

Automatic widow / orphan handling is almost impossible to achieve with Asciidoctor PDF. Since Prawn doesn't provide general layout facilities, doing it requires that the converter look ahead by converting blocks in advance at every single block to see whether it needs to jump to the next page before converting those blocks. It would add enormous complexity to the converter and slow it down considerably. It's just not going to happen.

What we could do is insert a page break when starting a specific type of block, such as a heading, if the cursor has crossed a certain threshold of area remaining on the current page.

This issue has diverged in too many directions to address. I'll file a new issue for the suggestion I just made. Then, every single block for which we want to support keep together will also needs its own issue. Otherwise, this is going to just stagnate.

@mojavelinux
Copy link
Member

This issue is one of the reasons we need to start looking into a new strategy for PDF generation (such as asciidoctor-pdf.js). If we use a browser engine instead of Prawn, we can use either CSS rules or JavaScript to shift content around until it lays out nicely on pages (in fact, that's what paged media does). That rearranging of content is not possible with Prawn. With Prawn, you can write to the canvas once, and after that it can't be moved. We've worked around this in the past by writing to a scratch document first to figure out how much room is needed, then going back to the original document and writing it again where we want it. But that can only be used in certain circumstances, is very slow, and has side effects. We have to accept the limits we have in this converter by using Prawn (and recognize the converter has gone well beyond anything we ever expected it to be able to do).

@mojavelinux mojavelinux removed this from the v1.5.0.beta.3 milestone Aug 15, 2019
@mojavelinux
Copy link
Member

It "knows" what things should go together, where good, bad, and terrible breaks are. There is a lot of sophisticated behind-the-scenes optimization going on. I don't know if there is an engine out there for managing page breaks that one could use.

This is exactly what Prawn cannot do, yet exactly what paged media was designed to do.

@nuket
Copy link
Contributor

nuket commented Aug 15, 2019

Since Prawn doesn't provide general layout facilities, doing it requires that the converter look ahead by converting blocks in advance at every single block to see whether it needs to jump to the next page before converting those blocks. It would add enormous complexity to the converter and slow it down considerably. It's just not going to happen.

I would take that speed hit optionally, if it improved output. But I can understand that Prawn has its limitations, particularly in its imperative drawing operation.

What we could do is insert a page break when starting a specific type of block, such as a heading, if the cursor has crossed a certain threshold of area remaining on the current page.

This currently happens with new chapters when using the book doctype, but I don't think this would ultimately solve a lot of the single-line orphans in paragraph text.

Is there no way for the Prawn span or text_box methods to not place text down if it overflows the page but pass back an exception and allow Asciidoctor to determine how many lines made it onto the both sides of the page break and adjust itself?

I get the use of the scratch document, definitely makes sense, but yeah, unwieldy.

If we use a browser engine instead of Prawn, we can use either CSS rules or JavaScript to shift content around until it lays out nicely on pages (in fact, that's what paged media does).

I think I have deep reservations about how well the paged media CSS modules work in most browsers. At least, the last time I looked, they weren't that well implemented, and it wasn't also in the browser maker's interests to actively improve this, since Google in particular wants people spending as much time reading online as possible. Perhaps things have changed in recent times, but I tend to think any underlying issues that are encountered in those layout engines could potentially be a lot harder to resolve. The inclusion of JS to handle edge cases would be beautiful, though.

I'm trying to wrap my head around how the Asciidoctor PDF code works though my Ruby experience specifically is minimal.

@mojavelinux
Copy link
Member

mojavelinux commented Aug 15, 2019

Is there no way for the Prawn span or text_box methods to not place text down if it overflows the page but pass back an exception and allow Asciidoctor to determine how many lines made it onto the both sides of the page break and adjust itself?

No, there is not. And Prawn has pretty much staled at this point, so the chance of getting that behavior is low. Plus, text box handling in Prawn is absurdly complicated.

I think I have deep reservations about how well the paged media CSS modules work in most browsers.

To be clear, we would not be relying on the browser behavior. Paged media (the project) uses JavaScript to layout the pages according to the paged media spec and working draft. So it goes way beyond what the browser does by default and provides JavaScript callbacks right out of the box. One thing it can handle are widows and orphans. And we've already proved it can work in the asciidctor-pdf.js project.

@mojavelinux
Copy link
Member

The path forward for advanced PDF generation (beyond what Asciidoctor PDF can do) is the browser. I have thought long and hard about this and I don't see any other viable option that comes anywhere close (well, except for generating LaTeX and tying into that ecosystem, but that seems far less approachable to me).

@nuket
Copy link
Contributor

nuket commented Aug 15, 2019

Paged media (the project) uses JavaScript to layout the pages according to the paged media spec and working draft. So it goes way beyond what the browser does by default and provides JavaScript callbacks right out of the box.

Ah ok, now I understand. This sounds a lot more promising.

except for generating LaTeX and tying into that ecosystem

Eww. Yes, please don't. I write Asciidoc so I don't have to go near LaTeX at all. 😄

@jhines0042
Copy link

I will take a speed hit on generation times every day and twice on Sundays if it saves author time and results in a better product in the end.

@jstasiak
Copy link

except for generating LaTeX and tying into that ecosystem

Eww. Yes, please don't. I write Asciidoc so I don't have to go near LaTeX at all. 😄

Well, to be frank that'd still let other people (asciidoctor users) not go near LaTeX at all, so it could be a net benefit. :)

I'm half considering writing a custom AsciiDoc -> LaTeX translator when I have a next book publishing project because asciidoctor-pdf was far from optimal for my needs in the past and I'd like a better control on the output. (It was still a great help and a useful tool, don't get me wrong).

@mojavelinux
Copy link
Member

I will take a speed hit on generation times every day and twice on Sundays if it saves author time and results in a better product in the end.

You're underestimating the impact. It would be prohibitively slow. As in, it may never complete, esp for large documents.

Speed isn't even the main issue. The code would become so complex it would be impossible to understand and maintain. The amount of logic would explode exponentially.

@mojavelinux
Copy link
Member

I'm half considering writing a custom AsciiDoc -> LaTeX translator when I have a next book publishing project

There's already an AsciiDoc to LaTeX converter available. See https://github.com/asciidoctor/asciidoctor-latex/. It may need quite a bit of work to make it more general purpose.

asciidoctor-pdf was far from optimal for my needs in the past and I'd like a better control on the output. (It was still a great help and a useful tool, don't get me wrong).

This is the right way to think about Asciidoctor PDF. It fills in a void between not being able to create a PDF and doing advanced layout like LaTeX provides. This was never about high-end book publication. (That's not to say you can't try and succeed, but that wasn't the aim).

@janicemhdc
Copy link
Contributor

I'm seeing this with several things. The two most annoying cases:

  1. A table title is at the bottom of one page and the actual table starts on the next
  2. A list intro is at the bottom of one page and the actual list starts on the next (i.e. "foo as follows:" -page break- "* item 1")

#1 seems like a no-brainer to me. I can see where #2 would be difficult to handle as there's no guarantee that the line above is actually attached to the list (but it would be pretty bad practice if it weren't, plus you could look for a colon at the end of the line - in general it's probably bad practice to have a page break immediately following a colon)

@mojavelinux
Copy link
Member

To be honest, it's highly unlikely Asciidoctor PDF will ever support this type of page break control. It's just not something Prawn can really handle practically. (I've given it a lot of thought). This is an example of a feature we'll look to focus on in the browser-based converter (currently in prototype), which gives us fine-grained layout controls that would be extremely expensive to emulate with the low-level PDF generator.

@mrotteveel
Copy link

There's also some support for keeping the section title from being orphaned at the bottom of the page. But again, they are borderline hacks at the moment based on cases that came up.

Does this require enabling something, or adding a specific block attribute to the section? Or is this a case of 'it might happen in some situations, but not in others'?

@mojavelinux
Copy link
Member

The (rather limited) orphaning logic is built-in and cannot be configured. But you can always force keep together semantics by wrapping the content in an unbreakable block.

[%unbreakable]
--
lines in here

will not be split

across pages
--

@Hamsterbau
Copy link

But this mentioned "unbreakable" block cannot be used with headlines:

[%unbreakable]
--

= Broken Headline

lines in here

will not be split

across pages
--

And with discrete, like already mentioned, the headline gets removed from TOC :(

Or is there anything i can do to build a "unbreakable" block with headline and content?

@mojavelinux
Copy link
Member

Or is there anything i can do to build a "unbreakable" block with headline and content?

Not at the moment, no.

@daleghent
Copy link

I landed here, attempting to find out how to keep entire sections from creating half-filled pages because they implicitly create page breaks. How about getting rid of implicit page breaks and just, well, make them explicit? Implicit formatting is a bear to figure out, especially when there's no good way to avoid it other than employing hacks such as quote

@mojavelinux
Copy link
Member

That's already the plan for Asciidoctor PDF 2 (and the main thing holding up the release). There were technical reasons why we couldn't do this before, but I believe I have solved it, which is why we can tune the behavior. Trust me, it is extremely complicated.

@mojavelinux
Copy link
Member

The hint to keep the section title with the first block of content is now covered by #2075.

@mojavelinux mojavelinux added this to the v2.0.0 milestone Apr 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests