New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hint to prevent page break #38
Comments
Thanks for the suggestion, @jxxcarlson. Asciidoctor PDF does have some support for keeping content together. There's a special block in Asciidoctor PDF called The choices of what to keep together so far have been made “as needed”. For instance, code listings are kept together. I didn't implement it for verses, but it does make sense that they would be kept together. There's also some support for keeping the section title from being orphaned at the bottom of the page. But again, they are borderline hacks at the moment based on cases that came up. What we'll want to do is align with well-respected style guides. That way, we can put together test cases for the scenarios that are known and make sure Asciidoctor PDF behaves that way. |
I also want to mention that there's a block role in AsciiDoc called "keep-together" that was designed to control this sort of thing. We don't currently honor that role, but we could. (We also might want to think about a shorter name for that role). |
Hello. |
Not at the moment. This is a feature which is linked to specific blocks. The keep_together is an API method, so if you extend the converter you could wrap it around a call to the delegate method. |
Could you provide a tiny example, how to use this? We tried various stuff like
But nothing worked :-/ |
Currently, the keep together logic in Asciidoctor PDF is not configurable. The converter decides which content is kept together and which is not. The point of this issue is to discuss if and how it can be configurable. You can force a page break between any two blocks using:
|
I abuse the quote block to achieve keep-together.
The caveat is one line titles @mojavelinux Could you perhaps make it so that one line titles are supported within quote blocks? |
While I understand the need to use this hack for a quick solution, I urge you only to use it as a last and temporary resort because it violates the semantics of the document. I'm not sure what you mean by one-line titles. Could you show a specific example? |
Section 11.2 of the user guide http://www.methods.co.nz/asciidoc/userguide.html calls these "one line titles"
I'll happily switch to using a semantic keep-together tag/block when one becomes available. In the mean time this reduces the pain of manually adding <<< new page hints in order to keep text with related code blocks across page breaks. When keep-with-next is added to the one line titles that will take care of the rest of the places I end up adding <<< hints. A smart keep-together or option thereof that perhaps allows things to split across paired verso-recto pages but not across page flips would also be useful. Thanks! |
Oh, now I see what you are saying. You can include headings inside of a block by adding the "discrete" style.
The "discrete" style is how you make a regular heading instead of a section title.
I agree. But this becomes increasingly more complicated (and slower) the more checks we have to do. If I could rewrite Prawn, I would make it a two-phase converter that first converts to an AST, then figures out where to put stuff. Unfortunately, that's not how it works today. |
Unfortunately marking the title with "discrete" removes it from the TOC. (sigh) Fixing orphaned titles and split paragraphs currently involves a laborious manual process:
Then if you later decide to add/remove/move a couple of paragraphs in the middle it may shift enough to throw off the whole structure and have to be done again through the end of the chapter. In my case, my book has both 8x11 and 7x9 versions so this has to be done (at least) twice. If we could only have one fix, then I would suggest a style change so that titles are marked keep-with-next by default with maybe the ability to override in the style sheet. That would resolve most users' heartache and make it behave like MSWord. If we could only have a generic fix then I'd suggest exposing a way to mark items keep-with-next without having to go thought the API, like:
That way one could declaratively mark titles, paragraphs, etc, while writing. It isn't smart about verso-recto pages but I don't know that MSWord is either. |
Hi, If I good understand, you are thinking about to have some defined API, which will be used directly in the source text file to define which blocks should stay together. This would be nice for some special cases in the text like tables and so on. But from my point of view, I would say, that in the case of the header and the following paragraph it should be defined within the template. The reason is quite simple ... I want this behavior at every header and seems to me not comfortable to put some tag to every header. I am not sure if it could be done in the PDF theme or if the Ruby-based converter class has to be changed. Correct me if I missed something ... Petr |
@mojavelinux I noticed this is labeled as improvement, is there any timeline for this? Thanks as always for all ya do with asciidoctor :) |
Is there any update on this? btw, issue #105 is similar. |
I'd like to see all headings of level 2 and below kept with the following block by default. I currently have a PDF with multiple orphaned headings at the bottom of pages. From the discussion above, I still don't understand how to workaround this issue with the latest version of asciidoctor-pdf. |
@dcp65: I think I had this issue once. If I remember correctly, I had to manually insert a page break before the header. |
@antoyo OK, thanks. That's not a great solution. Systems like DocBook XSL and even Word handle keep-with-next pretty well, so hopefully this will be supported soon. |
@dcp65 Yes, that is exactly how it should work. All the heading, regardless of the level, should be kept with the following block. I hope this will get implemented soon. |
Figures and their captions should always kept together in my opinion. |
+1 for this feature |
I too +1 this. Use case: |
I'll throw another +1 for preventing orphaned headers. Use-case: I'm working on some documentation with the following sub-structure occurring often: == Header
// Long table here, abridged for example
[cols="30,70",options="header"]
|===
| Property
| Description
| Many rows
| So the table breaks to the next page and orphans the header
|=== The section header gets orphaned about halfway down the PDF page and the table pops to the next page. It's not a huge deal, we can manually move the header with a page break, but obviously the cost of maintaining that as documentation changes could get pretty big. |
This is the only feature that is really missing for me to make this a real useful tool. Currently I have to go through my documents and manually insert page breaks, which takes ages on a 30-40 page doc. Oh well, i suppose I will have to move to latex in the end, which i wanted to avoid, since it's just too heavy for the stuff i would need it for (simple product manuals). too bad that this project is kinda dead already, no new releases in 1yr, no timeline or any other feedback to this issue from the project owner. |
+1 for automatic widow / orphan handling. It's one of the things that make good-looking books happen and the thought of going through 600+ pages to figure out where to place extraneous page breaks makes my stomach churn. Handling this would dramatically improve the look and readability of produced content and remove a lot of work from writers going forward. |
Automatic widow / orphan handling is almost impossible to achieve with Asciidoctor PDF. Since Prawn doesn't provide general layout facilities, doing it requires that the converter look ahead by converting blocks in advance at every single block to see whether it needs to jump to the next page before converting those blocks. It would add enormous complexity to the converter and slow it down considerably. It's just not going to happen. What we could do is insert a page break when starting a specific type of block, such as a heading, if the cursor has crossed a certain threshold of area remaining on the current page. This issue has diverged in too many directions to address. I'll file a new issue for the suggestion I just made. Then, every single block for which we want to support keep together will also needs its own issue. Otherwise, this is going to just stagnate. |
This issue is one of the reasons we need to start looking into a new strategy for PDF generation (such as asciidoctor-pdf.js). If we use a browser engine instead of Prawn, we can use either CSS rules or JavaScript to shift content around until it lays out nicely on pages (in fact, that's what paged media does). That rearranging of content is not possible with Prawn. With Prawn, you can write to the canvas once, and after that it can't be moved. We've worked around this in the past by writing to a scratch document first to figure out how much room is needed, then going back to the original document and writing it again where we want it. But that can only be used in certain circumstances, is very slow, and has side effects. We have to accept the limits we have in this converter by using Prawn (and recognize the converter has gone well beyond anything we ever expected it to be able to do). |
This is exactly what Prawn cannot do, yet exactly what paged media was designed to do. |
I would take that speed hit optionally, if it improved output. But I can understand that Prawn has its limitations, particularly in its imperative drawing operation.
This currently happens with new chapters when using the book doctype, but I don't think this would ultimately solve a lot of the single-line orphans in paragraph text. Is there no way for the Prawn I get the use of the scratch document, definitely makes sense, but yeah, unwieldy.
I think I have deep reservations about how well the paged media CSS modules work in most browsers. At least, the last time I looked, they weren't that well implemented, and it wasn't also in the browser maker's interests to actively improve this, since Google in particular wants people spending as much time reading online as possible. Perhaps things have changed in recent times, but I tend to think any underlying issues that are encountered in those layout engines could potentially be a lot harder to resolve. The inclusion of JS to handle edge cases would be beautiful, though. I'm trying to wrap my head around how the Asciidoctor PDF code works though my Ruby experience specifically is minimal. |
No, there is not. And Prawn has pretty much staled at this point, so the chance of getting that behavior is low. Plus, text box handling in Prawn is absurdly complicated.
To be clear, we would not be relying on the browser behavior. Paged media (the project) uses JavaScript to layout the pages according to the paged media spec and working draft. So it goes way beyond what the browser does by default and provides JavaScript callbacks right out of the box. One thing it can handle are widows and orphans. And we've already proved it can work in the asciidctor-pdf.js project. |
The path forward for advanced PDF generation (beyond what Asciidoctor PDF can do) is the browser. I have thought long and hard about this and I don't see any other viable option that comes anywhere close (well, except for generating LaTeX and tying into that ecosystem, but that seems far less approachable to me). |
Ah ok, now I understand. This sounds a lot more promising.
Eww. Yes, please don't. I write Asciidoc so I don't have to go near LaTeX at all. 😄 |
I will take a speed hit on generation times every day and twice on Sundays if it saves author time and results in a better product in the end. |
Well, to be frank that'd still let other people (asciidoctor users) not go near LaTeX at all, so it could be a net benefit. :) I'm half considering writing a custom AsciiDoc -> LaTeX translator when I have a next book publishing project because asciidoctor-pdf was far from optimal for my needs in the past and I'd like a better control on the output. (It was still a great help and a useful tool, don't get me wrong). |
You're underestimating the impact. It would be prohibitively slow. As in, it may never complete, esp for large documents. Speed isn't even the main issue. The code would become so complex it would be impossible to understand and maintain. The amount of logic would explode exponentially. |
There's already an AsciiDoc to LaTeX converter available. See https://github.com/asciidoctor/asciidoctor-latex/. It may need quite a bit of work to make it more general purpose.
This is the right way to think about Asciidoctor PDF. It fills in a void between not being able to create a PDF and doing advanced layout like LaTeX provides. This was never about high-end book publication. (That's not to say you can't try and succeed, but that wasn't the aim). |
I'm seeing this with several things. The two most annoying cases:
#1 seems like a no-brainer to me. I can see where #2 would be difficult to handle as there's no guarantee that the line above is actually attached to the list (but it would be pretty bad practice if it weren't, plus you could look for a colon at the end of the line - in general it's probably bad practice to have a page break immediately following a colon) |
To be honest, it's highly unlikely Asciidoctor PDF will ever support this type of page break control. It's just not something Prawn can really handle practically. (I've given it a lot of thought). This is an example of a feature we'll look to focus on in the browser-based converter (currently in prototype), which gives us fine-grained layout controls that would be extremely expensive to emulate with the low-level PDF generator. |
Does this require enabling something, or adding a specific block attribute to the section? Or is this a case of 'it might happen in some situations, but not in others'? |
The (rather limited) orphaning logic is built-in and cannot be configured. But you can always force keep together semantics by wrapping the content in an unbreakable block.
|
But this mentioned "unbreakable" block cannot be used with headlines:
And with discrete, like already mentioned, the headline gets removed from TOC :( Or is there anything i can do to build a "unbreakable" block with headline and content? |
Not at the moment, no. |
I landed here, attempting to find out how to keep entire sections from creating half-filled pages because they implicitly create page breaks. How about getting rid of implicit page breaks and just, well, make them explicit? Implicit formatting is a bear to figure out, especially when there's no good way to avoid it other than employing hacks such as |
That's already the plan for Asciidoctor PDF 2 (and the main thing holding up the release). There were technical reasons why we couldn't do this before, but I believe I have solved it, which is why we can tune the behavior. Trust me, it is extremely complicated. |
The hint to keep the section title with the first block of content is now covered by #2075. |
It would be good to have a way inserting hints (invisible in all asciidoctor output) that indicate when a page break is a good thing. (There may be other hints that are desirable as well, like when it is good to keep things together). Attached is an example. At the bottom of the page is the title of the poem that appears on the next page. That poem is just five lines long. The poem and the heading should go together --- sometimes one poem per page, or a page break at the end of each poem is a good solution.
TeX (or LaTeX) do a fantastic job of controlling the page break, even without hinting. It "knows" what things should go together, where good, bad, and terrible breaks are. There is a lot of sophisticated behind-the-scenes optimization going on. I don't know if there is an engine out there for managing page breaks that one could use.
In any case, the current form of asciidoctor-pdf is great progress!
The text was updated successfully, but these errors were encountered: