Speeding up individual page rendering of large PDFs with range query? #557
-
Hi Cantaloupe Folks, I was wondering if PDFBox could potentially use a chunked response or range query to render individual pages from a PDF instead of having to download the whole PDF file? The use case: A large collection of large PDF files where a user would like to quickly render an arbitrary page in an arbitrary PDF. The PDFs are stored in an object store and downloading the whole PDF file to the Cantaloupe server to render only page N is slow. Could PDFBox support a Is this possible already with the right config, impossible but doable with the right PR, or impossible on a technical level using the PDFBox library? Thank you for your time and for a fantastic piece of software. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
Hi @cudevmaxwell, PDDocument wants to read the whole PDF in order to load it either fully into memory, or into some configurable ratio of memory & swap. When it reads from an InputStream, it consumes the whole stream. So the inability to do random access is a limitation of PDFBox, unfortunately. |
Beta Was this translation helpful? Give feedback.
Hi @cudevmaxwell,
PDDocument wants to read the whole PDF in order to load it either fully into memory, or into some configurable ratio of memory & swap. When it reads from an InputStream, it consumes the whole stream. So the inability to do random access is a limitation of PDFBox, unfortunately.