Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Thread is stuck during HTML to PDF conversion with nested tables for fairly large HTML #551

Closed
swillis12 opened this issue Sep 11, 2020 · 24 comments

Comments

@swillis12
Copy link

swillis12 commented Sep 11, 2020

Update 9/23/20: The first two HTML files I attached are misleading and work as expected. I found that the problem is actually occurring only when the height/max-height of the div enclosing the table is set to "auto":
image

@syjer Try this new HTML file attached, it should reproduce the issue for you as well. Sorry about the confusion with the previous test cases. FYI running this in my minimal test application using this HTML file was 90,087ms which is consistent with the results I see inside of my program.
smaller_test_auto_height.txt

Original Issue/info below, the stack trace thread dump info applies to the "smaller_test_auto_height.txt" test case since my proram was using this "auto" height all along (which I wasn't aware of 🤦 ):

Currently our HTML target is email, so we are using a lot of HTML table elements, nested and everything is inline styles. I suspect that I have a thread hanging due to the nested tables. Unfortunately I'm not sure what I can do to work around this issue. I've tried "table-layout: fixed" and assigning column widths as well.

Update: made sure that it is valid XHTML as well using https://validator.w3.org/: problem_html1.txt

Here is also a smaller test html that is still painfully slow (~1.5-2 minutes). It is the same content, just reduced the number of table rows to 71 for easier visibility:
smaller_test.txt

If I enable logging I see infinite messages as follow:

com.openhtmltopdf.cascade FINEST:: min-height, relative= 0.0 (0), absolute= 0.0 using base=460.0
com.openhtmltopdf.cascade FINEST:: text-indent, relative= 0.0 (0), absolute= 0.0 using base=0.0
com.openhtmltopdf.cascade FINEST:: min-width, relative= 0.0 (0), absolute= 0.0 using base=0.0
com.openhtmltopdf.cascade FINEST:: min-width, relative= 0.0 (0), absolute= 0.0 using base=0.0
com.openhtmltopdf.cascade FINEST:: height, relative= 23.0 (23px), absolute= 460.0 using base=0.0
com.openhtmltopdf.cascade FINEST:: min-height, relative= 0.0 (0), absolute= 0.0 using base=460.0
com.openhtmltopdf.cascade FINEST:: text-indent, relative= 0.0 (0), absolute= 0.0 using base=0.0
com.openhtmltopdf.cascade FINEST:: min-width, relative= 0.0 (0), absolute= 0.0 using base=0.0
com.openhtmltopdf.cascade FINEST:: min-width, relative= 0.0 (0), absolute= 0.0 using base=0.0
com.openhtmltopdf.cascade FINEST:: height, relative= 23.0 (23px), absolute= 460.0 using base=0.0

Thread dump shows this stack:

"http-bio-8080-exec-10" #204 daemon prio=5 os_prio=31 tid=0x00007fa152c59000 nid=0x13803 runnable [0x0000700014582000]
   java.lang.Thread.State: RUNNABLE
        at org.apache.juli.ClassLoaderLogManager.getLogger(ClassLoaderLogManager.java:229)
        - locked <0x00000006c01af2f0> (a org.apache.juli.ClassLoaderLogManager)
        at java.util.logging.LogManager.demandLogger(LogManager.java:551)
        at java.util.logging.Logger.demandLogger(Logger.java:455)
        at java.util.logging.Logger.getLogger(Logger.java:502)
        at com.openhtmltopdf.util.JDKXRLogger.getLogger(JDKXRLogger.java:103)
        at com.openhtmltopdf.util.JDKXRLogger.isLogLevelEnabled(JDKXRLogger.java:75)
        at com.openhtmltopdf.util.XRLog.log(XRLog.java:122)
        at com.openhtmltopdf.util.XRLog.log(XRLog.java:113)
        at com.openhtmltopdf.css.style.derived.LengthValue.calcFloatProportionalValue(LengthValue.java:204)
        at com.openhtmltopdf.css.style.derived.LengthValue.getFloatProportionalTo(LengthValue.java:80)
        at com.openhtmltopdf.css.style.CalculatedStyle.getFloatPropertyProportionalTo(CalculatedStyle.java:437)
        at com.openhtmltopdf.css.style.CalculatedStyle.getMinHeight(CalculatedStyle.java:1174)
        at com.openhtmltopdf.render.BlockBox.getCSSMinHeight(BlockBox.java:1628)
        at com.openhtmltopdf.render.BlockBox.applyCSSMinMaxHeight(BlockBox.java:1172)
        at com.openhtmltopdf.render.BlockBox.layout(BlockBox.java:1074)
        at com.openhtmltopdf.newtable.TableRowBox.layoutCell(TableRowBox.java:452)
        at com.openhtmltopdf.newtable.TableRowBox.layoutChildren(TableRowBox.java:206)
        at com.openhtmltopdf.render.BlockBox.layout(BlockBox.java:1058)
        at com.openhtmltopdf.newtable.TableRowBox.layout(TableRowBox.java:95)
        at com.openhtmltopdf.render.BlockBox.layout(BlockBox.java:973)
        at com.openhtmltopdf.layout.BlockBoxing.layoutBlockChild0(BlockBoxing.java:321)
        at com.openhtmltopdf.layout.BlockBoxing.layoutBlockChild(BlockBoxing.java:299)
        at com.openhtmltopdf.layout.BlockBoxing.layoutContent(BlockBoxing.java:90)
        at com.openhtmltopdf.render.BlockBox.layoutChildren(BlockBox.java:1204)
        at com.openhtmltopdf.newtable.TableSectionBox.layoutChildren(TableSectionBox.java:137)
        at com.openhtmltopdf.render.BlockBox.layout(BlockBox.java:1058)
        at com.openhtmltopdf.newtable.TableSectionBox.layout(TableSectionBox.java:278)
        at com.openhtmltopdf.render.BlockBox.layout(BlockBox.java:973)
        at com.openhtmltopdf.layout.BlockBoxing.layoutBlockChild0(BlockBoxing.java:321)
        at com.openhtmltopdf.layout.BlockBoxing.layoutBlockChild(BlockBoxing.java:299)
        at com.openhtmltopdf.layout.BlockBoxing.layoutContent(BlockBoxing.java:90)
        at com.openhtmltopdf.render.BlockBox.layoutChildren(BlockBox.java:1204)
        at com.openhtmltopdf.newtable.TableBox.layoutChildren(TableBox.java:319)
        at com.openhtmltopdf.render.BlockBox.layout(BlockBox.java:1058)
        at com.openhtmltopdf.render.BlockBox.layout(BlockBox.java:973)
        at com.openhtmltopdf.newtable.TableBox.layoutTable(TableBox.java:284)
        at com.openhtmltopdf.newtable.TableBox.layout(TableBox.java:243)
        at com.openhtmltopdf.layout.BlockBoxing.layoutBlockChild0(BlockBoxing.java:321)
        at com.openhtmltopdf.layout.BlockBoxing.layoutBlockChild(BlockBoxing.java:299)
        at com.openhtmltopdf.layout.BlockBoxing.layoutContent(BlockBoxing.java:109)
        at com.openhtmltopdf.render.BlockBox.layoutChildren(BlockBox.java:1204)
        at com.openhtmltopdf.render.BlockBox.layout(BlockBox.java:1058)
        at com.openhtmltopdf.render.BlockBox.layout(BlockBox.java:973)
        at com.openhtmltopdf.layout.BlockBoxing.layoutBlockChild0(BlockBoxing.java:321)
        at com.openhtmltopdf.layout.BlockBoxing.layoutBlockChild(BlockBoxing.java:299)
        at com.openhtmltopdf.layout.BlockBoxing.layoutContent(BlockBoxing.java:109)
        at com.openhtmltopdf.render.BlockBox.layoutChildren(BlockBox.java:1204)
        at com.openhtmltopdf.render.BlockBox.layout(BlockBox.java:1058)
        at com.openhtmltopdf.newtable.TableRowBox.layoutCell(TableRowBox.java:452)
        at com.openhtmltopdf.newtable.TableRowBox.layoutChildren(TableRowBox.java:206)
        at com.openhtmltopdf.render.BlockBox.layout(BlockBox.java:1058)
        at com.openhtmltopdf.newtable.TableRowBox.layout(TableRowBox.java:95)
        at com.openhtmltopdf.render.BlockBox.layout(BlockBox.java:973)
        at com.openhtmltopdf.layout.BlockBoxing.layoutBlockChild0(BlockBoxing.java:321)
        at com.openhtmltopdf.layout.BlockBoxing.layoutBlockChild(BlockBoxing.java:299)
        at com.openhtmltopdf.layout.BlockBoxing.layoutContent(BlockBoxing.java:103)
        at com.openhtmltopdf.render.BlockBox.layoutChildren(BlockBox.java:1204)
        at com.openhtmltopdf.newtable.TableSectionBox.layoutChildren(TableSectionBox.java:137)
        at com.openhtmltopdf.render.BlockBox.layout(BlockBox.java:1058)
        at com.openhtmltopdf.newtable.TableSectionBox.layout(TableSectionBox.java:278)
        at com.openhtmltopdf.render.BlockBox.layout(BlockBox.java:973)
        at com.openhtmltopdf.layout.BlockBoxing.layoutBlockChild0(BlockBoxing.java:321)
        at com.openhtmltopdf.layout.BlockBoxing.layoutBlockChild(BlockBoxing.java:299)
        at com.openhtmltopdf.layout.BlockBoxing.layoutContent(BlockBoxing.java:103)
        at com.openhtmltopdf.render.BlockBox.layoutChildren(BlockBox.java:1204)
        at com.openhtmltopdf.newtable.TableBox.layoutChildren(TableBox.java:319)
        at com.openhtmltopdf.render.BlockBox.layout(BlockBox.java:1058)
        at com.openhtmltopdf.render.BlockBox.layout(BlockBox.java:973)
        at com.openhtmltopdf.newtable.TableBox.layoutTable(TableBox.java:284)
        at com.openhtmltopdf.newtable.TableBox.layout(TableBox.java:243)
        at com.openhtmltopdf.layout.BlockBoxing.layoutBlockChild0(BlockBoxing.java:321)
        at com.openhtmltopdf.layout.BlockBoxing.layoutBlockChild(BlockBoxing.java:299)
        at com.openhtmltopdf.layout.BlockBoxing.layoutContent(BlockBoxing.java:103)
        at com.openhtmltopdf.render.BlockBox.layoutChildren(BlockBox.java:1204)
        at com.openhtmltopdf.render.BlockBox.layout(BlockBox.java:1058)
        at com.openhtmltopdf.newtable.TableRowBox.layoutCell(TableRowBox.java:452)
        at com.openhtmltopdf.newtable.TableRowBox.layoutChildren(TableRowBox.java:206)
        at com.openhtmltopdf.render.BlockBox.layout(BlockBox.java:1058)
        at com.openhtmltopdf.newtable.TableRowBox.layout(TableRowBox.java:95)
        at com.openhtmltopdf.render.BlockBox.layout(BlockBox.java:973)
        at com.openhtmltopdf.layout.BlockBoxing.layoutBlockChild0(BlockBoxing.java:321)
        at com.openhtmltopdf.layout.BlockBoxing.layoutBlockChild(BlockBoxing.java:299)
        at com.openhtmltopdf.layout.BlockBoxing.layoutContent(BlockBoxing.java:103)
        at com.openhtmltopdf.render.BlockBox.layoutChildren(BlockBox.java:1204)
        at com.openhtmltopdf.newtable.TableSectionBox.layoutChildren(TableSectionBox.java:137)
        at com.openhtmltopdf.render.BlockBox.layout(BlockBox.java:1058)
        at com.openhtmltopdf.newtable.TableSectionBox.layout(TableSectionBox.java:278)
        at com.openhtmltopdf.render.BlockBox.layout(BlockBox.java:973)
        at com.openhtmltopdf.layout.BlockBoxing.layoutBlockChild0(BlockBoxing.java:321)
        at com.openhtmltopdf.layout.BlockBoxing.layoutBlockChild(BlockBoxing.java:299)
        at com.openhtmltopdf.layout.BlockBoxing.layoutContent(BlockBoxing.java:109)
        at com.openhtmltopdf.render.BlockBox.layoutChildren(BlockBox.java:1204)
        at com.openhtmltopdf.newtable.TableBox.layoutChildren(TableBox.java:319)
        at com.openhtmltopdf.render.BlockBox.layout(BlockBox.java:1058)
        at com.openhtmltopdf.render.BlockBox.layout(BlockBox.java:973)
        at com.openhtmltopdf.newtable.TableBox.layoutTable(TableBox.java:284)
        at com.openhtmltopdf.newtable.TableBox.layout(TableBox.java:243)
        at com.openhtmltopdf.layout.BlockBoxing.layoutBlockChild0(BlockBoxing.java:321)
        at com.openhtmltopdf.layout.BlockBoxing.layoutBlockChild(BlockBoxing.java:299)
        at com.openhtmltopdf.layout.BlockBoxing.layoutContent(BlockBoxing.java:109)
        at com.openhtmltopdf.render.BlockBox.layoutChildren(BlockBox.java:1204)
        at com.openhtmltopdf.render.BlockBox.layout(BlockBox.java:1058)
        at com.openhtmltopdf.newtable.TableRowBox.layoutCell(TableRowBox.java:452)
        at com.openhtmltopdf.newtable.TableRowBox.layoutChildren(TableRowBox.java:206)
        at com.openhtmltopdf.render.BlockBox.layout(BlockBox.java:1058)
        at com.openhtmltopdf.newtable.TableRowBox.layout(TableRowBox.java:95)
        at com.openhtmltopdf.render.BlockBox.layout(BlockBox.java:973)
        at com.openhtmltopdf.layout.BlockBoxing.layoutBlockChild0(BlockBoxing.java:321)
        at com.openhtmltopdf.layout.BlockBoxing.layoutBlockChild(BlockBoxing.java:299)
        at com.openhtmltopdf.layout.BlockBoxing.layoutContent(BlockBoxing.java:109)
        at com.openhtmltopdf.render.BlockBox.layoutChildren(BlockBox.java:1204)
        at com.openhtmltopdf.newtable.TableSectionBox.layoutChildren(TableSectionBox.java:137)
        at com.openhtmltopdf.render.BlockBox.layout(BlockBox.java:1058)
        at com.openhtmltopdf.newtable.TableSectionBox.layout(TableSectionBox.java:278)
        at com.openhtmltopdf.render.BlockBox.layout(BlockBox.java:973)
        at com.openhtmltopdf.layout.BlockBoxing.layoutBlockChild0(BlockBoxing.java:321)
        at com.openhtmltopdf.layout.BlockBoxing.layoutBlockChild(BlockBoxing.java:299)
        at com.openhtmltopdf.layout.BlockBoxing.layoutContent(BlockBoxing.java:103)
        at com.openhtmltopdf.render.BlockBox.layoutChildren(BlockBox.java:1204)
        at com.openhtmltopdf.newtable.TableBox.layoutChildren(TableBox.java:319)
        at com.openhtmltopdf.render.BlockBox.layout(BlockBox.java:1058)
        at com.openhtmltopdf.render.BlockBox.layout(BlockBox.java:973)
        at com.openhtmltopdf.newtable.TableBox.layoutTable(TableBox.java:284)
        at com.openhtmltopdf.newtable.TableBox.layout(TableBox.java:243)
        at com.openhtmltopdf.layout.BlockBoxing.layoutBlockChild0(BlockBoxing.java:321)
        at com.openhtmltopdf.layout.BlockBoxing.layoutBlockChild(BlockBoxing.java:299)
        at com.openhtmltopdf.layout.BlockBoxing.layoutContent(BlockBoxing.java:103)
        at com.openhtmltopdf.render.BlockBox.layoutChildren(BlockBox.java:1204)
        at com.openhtmltopdf.render.BlockBox.layout(BlockBox.java:1058)
        at com.openhtmltopdf.newtable.TableRowBox.layoutCell(TableRowBox.java:452)
        at com.openhtmltopdf.newtable.TableRowBox.layoutChildren(TableRowBox.java:206)
        at com.openhtmltopdf.render.BlockBox.layout(BlockBox.java:1058)
        at com.openhtmltopdf.newtable.TableRowBox.layout(TableRowBox.java:95)
        at com.openhtmltopdf.render.BlockBox.layout(BlockBox.java:973)
        at com.openhtmltopdf.layout.BlockBoxing.layoutBlockChild0(BlockBoxing.java:321)
        at com.openhtmltopdf.layout.BlockBoxing.layoutBlockChild(BlockBoxing.java:299)
        at com.openhtmltopdf.layout.BlockBoxing.layoutContent(BlockBoxing.java:103)
        at com.openhtmltopdf.render.BlockBox.layoutChildren(BlockBox.java:1204)
        at com.openhtmltopdf.newtable.TableSectionBox.layoutChildren(TableSectionBox.java:137)
        at com.openhtmltopdf.render.BlockBox.layout(BlockBox.java:1058)
        at com.openhtmltopdf.newtable.TableSectionBox.layout(TableSectionBox.java:278)
        at com.openhtmltopdf.render.BlockBox.layout(BlockBox.java:973)
        at com.openhtmltopdf.layout.BlockBoxing.layoutBlockChild0(BlockBoxing.java:321)
        at com.openhtmltopdf.layout.BlockBoxing.layoutBlockChild(BlockBoxing.java:299)
        at com.openhtmltopdf.layout.BlockBoxing.layoutContent(BlockBoxing.java:103)
        at com.openhtmltopdf.render.BlockBox.layoutChildren(BlockBox.java:1204)
        at com.openhtmltopdf.newtable.TableBox.layoutChildren(TableBox.java:319)
        at com.openhtmltopdf.render.BlockBox.layout(BlockBox.java:1058)
        at com.openhtmltopdf.render.BlockBox.layout(BlockBox.java:973)
        at com.openhtmltopdf.newtable.TableBox.layoutTable(TableBox.java:284)
        at com.openhtmltopdf.newtable.TableBox.layout(TableBox.java:243)
        at com.openhtmltopdf.layout.BlockBoxing.layoutBlockChild0(BlockBoxing.java:321)
        at com.openhtmltopdf.layout.BlockBoxing.layoutBlockChild(BlockBoxing.java:299)
        at com.openhtmltopdf.layout.BlockBoxing.layoutContent(BlockBoxing.java:90)
        at com.openhtmltopdf.render.BlockBox.layoutChildren(BlockBox.java:1204)
        at com.openhtmltopdf.render.BlockBox.layout(BlockBox.java:1058)
        at com.openhtmltopdf.newtable.TableRowBox.layoutCell(TableRowBox.java:452)
        at com.openhtmltopdf.newtable.TableRowBox.layoutChildren(TableRowBox.java:206)
        at com.openhtmltopdf.render.BlockBox.layout(BlockBox.java:1058)
        at com.openhtmltopdf.newtable.TableRowBox.layout(TableRowBox.java:95)
        at com.openhtmltopdf.render.BlockBox.layout(BlockBox.java:973)
        at com.openhtmltopdf.layout.BlockBoxing.layoutBlockChild0(BlockBoxing.java:321)
        at com.openhtmltopdf.layout.BlockBoxing.layoutBlockChild(BlockBoxing.java:299)
        at com.openhtmltopdf.layout.BlockBoxing.layoutContent(BlockBoxing.java:90)
        at com.openhtmltopdf.render.BlockBox.layoutChildren(BlockBox.java:1204)
        at com.openhtmltopdf.newtable.TableSectionBox.layoutChildren(TableSectionBox.java:137)
        at com.openhtmltopdf.render.BlockBox.layout(BlockBox.java:1058)
        at com.openhtmltopdf.newtable.TableSectionBox.layout(TableSectionBox.java:278)
        at com.openhtmltopdf.render.BlockBox.layout(BlockBox.java:973)
        at com.openhtmltopdf.layout.BlockBoxing.layoutBlockChild0(BlockBoxing.java:321)
        at com.openhtmltopdf.layout.BlockBoxing.layoutBlockChild(BlockBoxing.java:299)
        at com.openhtmltopdf.layout.BlockBoxing.layoutContent(BlockBoxing.java:90)
        at com.openhtmltopdf.render.BlockBox.layoutChildren(BlockBox.java:1204)
        at com.openhtmltopdf.newtable.TableBox.layoutChildren(TableBox.java:319)
        at com.openhtmltopdf.render.BlockBox.layout(BlockBox.java:1058)
        at com.openhtmltopdf.render.BlockBox.layout(BlockBox.java:973)
        at com.openhtmltopdf.newtable.TableBox.layoutTable(TableBox.java:284)
        at com.openhtmltopdf.newtable.TableBox.layout(TableBox.java:243)
        at com.openhtmltopdf.layout.BlockBoxing.layoutBlockChild0(BlockBoxing.java:321)
        at com.openhtmltopdf.layout.BlockBoxing.layoutBlockChild(BlockBoxing.java:299)
        at com.openhtmltopdf.layout.BlockBoxing.layoutContent(BlockBoxing.java:90)
        at com.openhtmltopdf.render.BlockBox.layoutChildren(BlockBox.java:1204)
        at com.openhtmltopdf.render.BlockBox.layout(BlockBox.java:1058)
        at com.openhtmltopdf.render.BlockBox.layout(BlockBox.java:973)
        at com.openhtmltopdf.layout.BlockBoxing.layoutBlockChild0(BlockBoxing.java:321)
        at com.openhtmltopdf.layout.BlockBoxing.layoutBlockChild(BlockBoxing.java:299)
        at com.openhtmltopdf.layout.BlockBoxing.layoutContent(BlockBoxing.java:90)
        at com.openhtmltopdf.render.BlockBox.layoutChildren(BlockBox.java:1204)
        at com.openhtmltopdf.render.BlockBox.layout(BlockBox.java:1058)
        at com.openhtmltopdf.render.BlockBox.layout(BlockBox.java:973)
        at com.openhtmltopdf.pdfboxout.PdfBoxRenderer.layout(PdfBoxRenderer.java:344)
        at com.openhtmltopdf.pdfboxout.PdfRendererBuilder.run(PdfRendererBuilder.java:41)
...

Question is -- what exactly is the issue and what can I do to work around this if I am pretty much stuck with the current layout? It may not be feasible to get rid of the table nesting.

@syjer
Copy link
Contributor

syjer commented Sep 12, 2020

hi @swillis12 ,

I'll have a look using your example (thank you!) in the profiler, maybe there is some obvious fix that can be done.

@syjer
Copy link
Contributor

syjer commented Sep 12, 2020

Which version are you using?

With 1.0.4, with the smaller_test.txt example, it takes 1.2 seconds to run.

Using the following code:

package test;

import com.openhtmltopdf.pdfboxout.PdfRendererBuilder;

import java.io.File;
import java.io.FileOutputStream;
import java.io.OutputStream;

public class App {
    public static void main(String[] args) throws Exception {

        long start = System.currentTimeMillis();
        try (OutputStream os = new FileOutputStream("test.pdf")) {
            PdfRendererBuilder builder = new PdfRendererBuilder();

            builder.useFastMode();
            builder.withFile(new File("smaller_test.html"));

            builder.toStream(os);
            builder.run();
        }
        System.err.println((System.currentTimeMillis() - start) + "ms");
    }
}

I got 1218ms

@swillis12
Copy link
Author

swillis12 commented Sep 12, 2020

Thanks @syjer. Yes I am on 1.0.4. That is strange.. it is much slower on mine. The only difference is that I am reading the HTML from a String in memory rather than from a file like you are doing.

Update: I tried it with your code and am seeing the same results. It is very fast (I got 2685ms on the larger HTML file)! This is good news, but now I have to figure out what is going on in my program. This is the code I'm using by the way:

                PdfRendererBuilder builder = new PdfRendererBuilder();
		builder.useFastMode();
		builder.withHtmlContent(html, "");
		builder.toStream(out);
		builder.run();

@syjer
Copy link
Contributor

syjer commented Sep 12, 2020

It may be:

  • memory related, too much pressure on the GC and thus a lot of pauses
  • something logging related? Are you using the slf4j support? I think we can still improve this part to reduce the amount of generated "garbage" if the log is not used.

To be noted, I think we can still improve the performance :).

@swillis12
Copy link
Author

It may be:

  • memory related, too much pressure on the GC and thus a lot of pauses
  • something logging related? Are you using the slf4j support? I think we can still improve this part to reduce the amount of generated "garbage" if the log is not used.

This is very possible. Yes I currently have the SLF4J jar added -- How do you recommend testing what you mention about reducing the amount of "garbage" generated by the logging? Should I just exclude the SLF4J jar? Note: it doesn't seem to be making that many log statements, just what's below:

com.openhtmltopdf.load INFO:: SAX XMLReader in use (parser): com.sun.org.apache.xerces.internal.parsers.SAXParser
com.openhtmltopdf.load INFO:: SAX XMLReader in use (parser): com.sun.org.apache.xerces.internal.parsers.SAXParser
com.openhtmltopdf.load.xml-entities INFO:: Entity public: -//W3C//DTD XHTML 1.1//EN, no local mapping. Returning empty entity to avoid pulling from network.
com.openhtmltopdf.load INFO:: Loaded document in ~45ms
com.openhtmltopdf.load INFO:: TIME: parse stylesheets 59ms
com.openhtmltopdf.match INFO:: media = print
com.openhtmltopdf.match INFO:: Matcher created with 162 selectors
com.openhtmltopdf.css-parse WARNING:: () text-overflow is an unrecognized CSS property at line 0. Ignoring declaration.
com.openhtmltopdf.css-parse WARNING:: () text-overflow is an unrecognized CSS property at line 0. Ignoring declaration.
com.openhtmltopdf.css-parse WARNING:: () Ident auto is an invalid or unsupported value for overflow at line 1. Skipping declaration.
com.openhtmltopdf.css-parse WARNING:: () text-overflow is an unrecognized CSS property at line 0. Ignoring declaration.
com.openhtmltopdf.css-parse WARNING:: () text-overflow is an unrecognized CSS property at line 0. Ignoring declaration.
com.openhtmltopdf.css-parse WARNING:: () text-overflow is an unrecognized CSS property at line 0. Ignoring declaration.
com.openhtmltopdf.css-parse WARNING:: () text-overflow is an unrecognized CSS property at line 0. Ignoring declaration.
com.openhtmltopdf.css-parse WARNING:: () Ident auto is an invalid or unsupported value for max-height at line 1. Skipping declaration.
com.openhtmltopdf.css-parse WARNING:: () Ident auto is an invalid or unsupported value for overflow at line 1. Skipping declaration.
com.openhtmltopdf.css-parse WARNING:: () Value for padding must be a length or percentage at line 1. Skipping declaration.

@syjer
Copy link
Contributor

syjer commented Sep 12, 2020

How do you recommend testing

As a first start, I would add the following flag to the jvm: -XX:+PrintGC or -XX:+PrintGCDetails so you can see if the issue is the GC.

You will see during the execution some lines like:

[GC (Allocation Failure) [PSYoungGen: 64512K->10746K(75264K)] 64512K->12319K(247296K), 0.0072593 secs] [Times: user=0.03 sys=0.00, real=0.00 secs]

If they appear too often and with quite a lot of time, then it could be the issue: maybe not enough memory is given to the java process or some kind of memory leak is happening.

Alternatively you can use visualvm, for visualizing the gc activity. With it you can also do a first profiling of the application (cpu or memory) and try to pin down the main root cause.

edit: for reducing the amount of garbage: well, first we need to identify what could be the issue in this library :)

@swillis12
Copy link
Author

I'm reading the HTML content from a string in memory as well as writing the outputstream to memory (browser Response).. So when I get a chance I'll do a quick test and first write the HTML content to a file and then have the library write it to a file as well.

Thanks for the help so far @syjer I'll try this out once I get a little more time. What do you recommend I do with this issue for now?

@syjer
Copy link
Contributor

syjer commented Sep 12, 2020

@swillis12 , you can keep the issue open, maybe somebody else has/had the same issue and could provide additional feedback.

I'm still thinking how it would be possible to have this much difference in execution time (1-2minutes vs few seconds), even in a case of GC issue, I don't think it would be that bad.

To be noted, we have another issue of slow generation time: #506 but this one seems to be more font related

@swillis12
Copy link
Author

Thanks again @syjer. I ran a quick test below and got the same slow results.. I'll have to do some GC profiling as you described. I wish I could help you reproduce it :).

Yes I had scoured the issue tracker for mentions of slowness and did come across that one. I also want to note that I'm seeing the same behavior using FlyingSaucer (I swapped this library for FS to run a quick test). It seems to be equivalently slow.

                 Path f = Files.createTempFile("temp", ".html");
                 Files.write(f, html.getBytes(), StandardOpenOption.APPEND);
                 File f2 = f.toFile();
		f2.deleteOnExit();
		
		File f1 = File.createTempFile("output", ".pdf");
		PdfRendererBuilder builder = new PdfRendererBuilder();
		builder.useFastMode();
		builder.withFile(f2);
		builder.toStream(new FileOutputStream(f1));
		builder.run();
		Path path = f1.toPath();
		Files.copy(path, out);
		out.flush();

@syjer
Copy link
Contributor

syjer commented Sep 12, 2020

btw, beware when using deleteOnExit , as it may slowly leak memory because the jvm need to keep track of what files need to be removed on exit. See this sonar rule: https://rules.sonarsource.com/java/RSPEC-2308 :)

Better to delete in a finally block :)

(obviously, if it's a short running process, it's not an issue ;))

@swillis12
Copy link
Author

swillis12 commented Sep 23, 2020

@syjer I was away last week but got another chance to look at this. My test case that I originally attached is not reproducing the actual issue. If you look at my update, the issue is seen only when the enclosing div has auto height. You can try it out and hopefully see what I am talking about.

P.S. I tested with the latest code including your change #552. So it doesn't seemed to have improved it much. It seems maybe related to a memory leak in CSS calculations when the table height is auto. Notice the constant GC (purple line) since the new generation space is filling up over and over again:

Screen Shot 2020-09-23 at 2 15 09 PM

@syjer
Copy link
Contributor

syjer commented Sep 23, 2020

hi @swillis12 , thank you for updating the example 👍 .

I'll have a look most likely friday.

@syjer
Copy link
Contributor

syjer commented Sep 25, 2020

hi @swillis12 , with the new template, I'm able to reproduce the reported issue.

Now, for finding the main issue, it will not be easy :).

It does not seems to be a memory issue, but something in the layout algorithm that cause to spend quite a lot of time.

@syjer
Copy link
Contributor

syjer commented Sep 25, 2020

the main culprit seems to be:

tbody {
        page-break-inside: avoid;
      }

removing this css rule generate the pdf in 1.4~ seconds on my pc for the file smaller_test_auto_height.txt.

edit: most likely the page break avoiding algorithm is sub optimal and spiral out of control when you have a lot of children elements.

@syjer
Copy link
Contributor

syjer commented Sep 25, 2020

Looking at https://github.com/danfickle/openhtmltopdf/blob/open-dev-v1/openhtmltopdf-core/src/main/java/com/openhtmltopdf/layout/BlockBoxing.java#L40 , I've got the impression the rule is dropped too late, maybe due to the peculiarity of this page (deep hierarchy).

Most likely @danfickle has a better idea than me :), I'll still try on my side to find a solution though.

@syjer
Copy link
Contributor

syjer commented Sep 25, 2020

Even more interesting, the issue appear only when the first tbody has the css rule applied. If you apply the page-break-inside:avoid only to the most inner tbody, it work without problem :).

I think I'll be able to condense the issue in a more compact html file.

@swillis12
Copy link
Author

Good catch! Thanks for the workaround @syjer. Hopefully this helps others that may run into this too.

@swillis12
Copy link
Author

@syjer do you think #506 is perhaps the same issue that we're seeing here? I noticed that test case also has "page-break-inside: avoid" rule applied to the body element (long table inside this like mine):

image

@syjer
Copy link
Contributor

syjer commented Sep 25, 2020

@swillis12 it may have a role, but I think in the #506 issue, the main cause is more inside pdfbox.

@syjer
Copy link
Contributor

syjer commented Sep 28, 2020

I've cut a little bit the problematic file:

issue-551-page-break-inside-avoid.txt

It's quite clear we have O(n²) algorithm, or maybe even exponential, as it depend on how deep it need to check.

@chubbard
Copy link

chubbard commented Jul 9, 2021

I'm struggling with PDF generation where it goes out of control and eventually locks things up using this:

.page {
   page-break-after: always;
}

Would that be the same culprit as avoid setting? Any news on a fix? I'm using v1.0.9

@danfickle
Copy link
Owner

Hi @chubbard,

I recently refactored the page-break related code. However, it is not released and is part of the footnotes work in #711. You could try building and using that branch to see if it fixes your issue.

@danfickle
Copy link
Owner

P.S. There was an n squared algorithm in BlockBoxing that I replaced with the use of a TreeMap.

@syjer
Copy link
Contributor

syjer commented Jul 12, 2021

hi @danfickle , I've tried the issue-551-page-break-inside-avoid.txt file with the current issue_364_footnotes branch, but it currently still trigger the issue (waited more than 2 minutes, still not finished).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants