You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When profiling COOL usage - we see a lot of cost on the server associated with PDF thumbnailing it seems - @caolanm you're prolly interested.
I expect that this is all thumbnailing PDF files; we should prolly optimize these things in order:
40bn cycles in pdfi::PDFDetector::detect - that is gratuitous; and the boost::spirit on the stack doesn't encourage me - it seems to take longer to 'detect' that a file is PDF than it does to load it ;-) - we should fix that: if extension is PDF and magic is correct - do we need to take longer ?
27bn cycles FPDF_LoadPage - I -hope- that if we are thumbnailing, we only load the 1st page - and then on-demand render the PDF as needed - perhaps worth checking if whatever shortcut is used here is annotating the code-paths so we do as little work as possible.
4bn+ cycles in SdrPageWindow::RedrawAll - hopefully this draws only a single page - and not all of them (?)
Anyhow - the 1st one is (I hope) somewhat low hanging fruit =)
The text was updated successfully, but these errors were encountered:
I think this pdfi::getAdditionalStreams is to detect if this is a hybrid pdf document, and if so, get its mimetype and the contents of the embedded odf document to stash for the final import. So there is a parse of the pdf structure to get those.
I suspect the stashed embedded contents are not used by the final parse and we end up reparsing if it is hybrid (grep for "comment extracting hybrid substream twice - once during detection, second time here")
Because this flamegraph has a prominent SdPdfFilter::Import it looks like the pdfs involved are not hybrid ones in any case so the expensive parse in the detection presumably is that of a non-hybrid pdf to find out that is is non-hybrid and has no embedded document or mimetype
It might be that it might be safe to assume that the absence of a "AdditionalStreams" keyword in the last few lines or so of a pdf means that is isn't hybrid and so skip the parse during detect in such a case.
example end of a hybrid pdf document:
trailer
<</Size 43/Root 41 0 R
/Info 42 0 R
/ID [
]
/DocChecksum /9D2161D19E360801716BA882E229DC2D
/AdditionalStreams [/application#2Fvnd#2Eoasis#2Eopendocument#2Epresentation 19 0 R
]
startxref
28402
%%EOF
probably good enough to take the last few k of the document during detect to see if its a hybrid and if not elide this expensive detect-time parse: -> https://gerrit.libreoffice.org/c/core/+/157227 for that thought
When profiling COOL usage - we see a lot of cost on the server associated with PDF thumbnailing it seems - @caolanm you're prolly interested.
I expect that this is all thumbnailing PDF files; we should prolly optimize these things in order:
Anyhow - the 1st one is (I hope) somewhat low hanging fruit =)
The text was updated successfully, but these errors were encountered: