HTML optimization #104
Comments
+1, reducing the number of |
Could sub/superscripts use CSS |
Oh I didn't know it can take a length as the value. I've just checked the CSS standard, seems to be better than relative positioning. |
Yes I agree, reducing the amount of divs is going to mean reflowing the browser will be faster. |
I just discovered the project and would love to get involved as I worked on similar stuff a year ago. Just a quick hint (maybe you already know this): to fix the issue of WebKit and decimals not being taken into account for letter-spacing for instance => you can multiply all your values by X then use a CSS transform to scale down by a factor of X and then the decimals do work |
@iclems Thanks for the message. Actually the scaling trick has always been in there since an very earlier version. There are still some issues marked as 'need solution', to which I have not been able to figure out solutions. Maybe you may share some of your thoughts? |
Thanks ! I've been having a look at the project today and I'm now getting familiar with the way things are done. Meeting again my old friend Poppler... I remember having thought about how to properly optimize the background image, try to have a fast enough conversion, etc... Good example of a small PDF very slow to convert and very big once converted (and just 1.7Mo though in PDF) : http://clement.wehrung.free.fr/scaling.pdf I'll probably be able to start focusing on some specific issues next week by the way, do you have any priority list ? |
OK, Thanks for the PDF. I'll take a look tomorrow. I'm now trying to reduce the number of I think you may just pick up any one you found interesting. And I'd like to recommend #39, which is serious and doable for now. I'm not sure if you are familiar with dealing with clipping paths, I've no experience at all. I'd like to explain the codebase and discuss about possible solutions with you. Thanks! |
Hi @iclems, I have a similar background - familiarity with Poppler, and now starting to make some small contributions to pdf2htmlEX. I'm actually working on #39 at the moment, rather slowly. This issue - reducing the number of divs - is in my opinion one of the most important because of the impact on performance. I'd recommend trying out some of your typical PDFs and seeing if any features you care about are missing - that's how I ended up adding stroked text. |
I've finished the optimization of |
Would that be DOM memory, HTML file size, or frame rate? |
Oh, it was the time for parsing and rendering the entire document (with On Tue, Apr 2, 2013 at 6:23 PM, John Hewson notifications@github.comwrote:
|
Ok. Btw - I think you should keep the un-optimized text generation mode, and have a flag |
Have you tried looking at the DOM memory in the Chrome's Task Manager? |
Right, I'll add it. On Tue, Apr 2, 2013 at 6:30 PM, John Hewson notifications@github.comwrote:
|
No, let me do a comparison of the optimized and not-optimized versions On Tue, Apr 2, 2013 at 6:31 PM, John Hewson notifications@github.comwrote:
|
Hi @jahewson I have a few concerns for now, and will try to start thinking on how I could contribute today :
|
@jahewson what does |
Indeed pdf2htmlEX is very slow converting your sample PDF. There are too many pages for it. I've just checked One possible solution is to use multiple threads, since rendering background image of each page is independent to each other. And fortunately, poppler has just become thread-safe since a recent version. Visibility test, indeed, even harder than #39 where we may simply estimate the clipping path as a rectangle. I've been thinking about this, but no good idea so far. Maybe we may estimate each object by its bounding box, and test the visibility in the preprocessor. About cutting the background image. That should be intuitive and useful, how did you do that? Actually I've tried to dump every image object in PDF and put them directly into HTML. But it did not work due to clipping paths, also there may be other drawing objects. I also tried to at least detect "if there is anything on the background", (there is a bg_integrate branch, which has not been maintained for a while), which did not work well either, since a simple header/footer will make the background nonempty. In the bg_integrated path, I also attempted to employ SVG for the visibility issue, but it turned out to be too complicated to me. Crocdoc seems to support render in SVG now, I never succeeded in viewing them though, they always froze my browsers. |
Thanks for the long reply :) Could I have your mail to send you a link to some source ? I think visibility test is not the #1 priority. Most probably :
|
@coolwanglu the columns should be:
The most important value is Resident, which is the first column. So you're seing a 23% reduction in RAM with your optimizations - great! (93MB -> 72MB) |
@iclems My email is available in README |
@iclems, yep these are tricky issues:
It could be done by sending all the drawing commands to a polygon clipper, and pruning any text which gets drawn over (where the text rectangle intersects the drawing polygon). It's a very big job. Alternatively, if each drawing command was rendered to a separate transparent PNG image, then the problem goes away, as does the problem below.
"per image" absolute positioning, for image objects that's fine, but what about paths? These would need to be rendered into separate images, it could be done. The simplest approach might be to keep track of the min/max x and y values used for drawing, and crop the background to that size. |
@jahewson I wonder if per-path images would introduce too many overhead. For example, why people use CSS sprites? I think maybe we need some clustering algorithms. About polygon clipper, do you know any light-weight geometry libraries, for example CGAL? About image objects can also be clipped, and thus cannot be directly dumped and inserted to HTML. |
There's only one way to find out...
Because they look good on retina displays, and scale well with zoom. I don't think that size or overhead are the reasons people choose CSS sprites. |
|
Looks great, but seems that bezier curves are not supported. Bezier curves might be used in cilpping paths, drawing objects. hmm.. |
@iclems Previously I was not using the I guess this is the best poppler can do (with current parameters) |
Just realized that #64 is about visibility test |
The first item seems not to be able to bring performance improvements. Probably the only good thing about it is that it would possibly prevent vertical overlapping caused by rounded font sizes by the browsers, which never happened to me. I've created HTMLTextPage which allows future optimizations, but the rest part seems to be dull to me. The last 2 items have been implemented and indeed improve the performance. |
Crocdoc is (once again) a good one to learn from
display:block
and propermargin-top
valuesmargin-top
classes thany axis
top
and relative positioningvertical-align
seems to be betterThe text was updated successfully, but these errors were encountered: