Speeding up individual page rendering of large PDFs with range query? #557

cudevmaxwell · 2022-01-22T04:08:09Z

cudevmaxwell
Jan 22, 2022

Hi Cantaloupe Folks,

I was wondering if PDFBox could potentially use a chunked response or range query to render individual pages from a PDF instead of having to download the whole PDF file?

The use case: A large collection of large PDF files where a user would like to quickly render an arbitrary page in an arbitrary PDF. The PDFs are stored in an object store and downloading the whole PDF file to the Cantaloupe server to render only page N is slow.

Could PDFBox support a PdfBoxProcessor with isSeeking() true? It looks like readDocument() might already support calling PDDocument.load() from an InputStream?

Is this possible already with the right config, impossible but doable with the right PR, or impossible on a technical level using the PDFBox library?

Thank you for your time and for a fantastic piece of software.

Answered by adolski

Jan 24, 2022

Hi @cudevmaxwell,

PDDocument wants to read the whole PDF in order to load it either fully into memory, or into some configurable ratio of memory & swap. When it reads from an InputStream, it consumes the whole stream. So the inability to do random access is a limitation of PDFBox, unfortunately.

View full answer

adolski · 2022-01-24T16:44:48Z

adolski
Jan 24, 2022

Hi @cudevmaxwell,

PDDocument wants to read the whole PDF in order to load it either fully into memory, or into some configurable ratio of memory & swap. When it reads from an InputStream, it consumes the whole stream. So the inability to do random access is a limitation of PDFBox, unfortunately.

1 reply

cudevmaxwell Jan 24, 2022
Author

OK no worries, thanks for responding to this @adolski! (And thanks for your work on Cantaloupe.)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speeding up individual page rendering of large PDFs with range query? #557

{{title}}

Replies: 1 comment 1 reply

{{title}}

{{title}}

Select a reply

Speeding up individual page rendering of large PDFs with range query? #557

cudevmaxwell Jan 22, 2022

Replies: 1 comment · 1 reply

adolski Jan 24, 2022

cudevmaxwell Jan 24, 2022 Author

cudevmaxwell
Jan 22, 2022

Replies: 1 comment 1 reply

adolski
Jan 24, 2022

cudevmaxwell Jan 24, 2022
Author