Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Working with split pdfs #26

Closed
Conaws opened this issue May 14, 2016 · 14 comments
Closed

Working with split pdfs #26

Conaws opened this issue May 14, 2016 · 14 comments

Comments

@Conaws
Copy link

Conaws commented May 14, 2016

A few questions

  1. What is the best way to save the output of split-pdf as a pdf?
    Is java interop necessary for that, or another clojure library?
    for instance, if I want to turn "/sample/pdf-title.pdf" into "sample/pdf-title-pages/1.pdf" "sample/pdf-title-pages/2.pdf"
@dotemacs
Copy link
Owner

Hi @Conaws

See if the test examples work for you:

https://github.com/dotemacs/pdfboxing/blob/master/test/pdfboxing/split_test.clj

I'll have a look at this a bit later.

Can I ask you to type in all that you type in at the repl?

Thanks

@Conaws
Copy link
Author

Conaws commented May 15, 2016

Yeah, That's what I went off of. I got splitting to work fine, it's just I'm not familiar with PDDocuments so don't know how to output a new file.

@Conaws
Copy link
Author

Conaws commented May 15, 2016

I've been using split-pdf and extract-text, but curious how to save each page as a new pdf

@dotemacs
Copy link
Owner

Hey @Conaws

see this function: merge-pddocuments:
https://github.com/dotemacs/pdfboxing/blob/master/src/pdfboxing/split.clj#L36

and see how it's being used:
https://github.com/dotemacs/pdfboxing/blob/master/src/pdfboxing/split.clj#L64

(merge-pddocuments :docs [pddoc1 pddoc2 pddoc3] :output "output.pdf")

Does that help?

@Conaws
Copy link
Author

Conaws commented May 19, 2016

So you're saying I can't create a pdf of a single pddoc, I can only merge
multiple docs?

Or could I use merge-pddocuments with a vector of only one pddoc?

On Monday, May 16, 2016, Александар Симић notifications@github.com wrote:

Hey @Conaws https://github.com/Conaws

see this function: merge-pddocuments:

https://github.com/dotemacs/pdfboxing/blob/master/src/pdfboxing/split.clj#L36

and see how it's being used:

https://github.com/dotemacs/pdfboxing/blob/master/src/pdfboxing/split.clj#L64

(merge-pddocuments :docs [pddoc1 pddoc2 pddoc3] :output "output.pdf")

Does that help?


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#26 (comment)

Sent from Gmail Mobile

@dotemacs
Copy link
Owner

Hello Conor

On 19 May 2016, at 08:27, Conor White-Sullivan notifications@github.com wrote:

So you're saying I can't create a pdf of a single pddoc, I can only merge
multiple docs?

Or could I use merge-pddocuments with a vector of only one pddoc?

You can create a PDF document with merge-pddocuments where only one PDDocument is supplied in a vector.

Please have a play with it yourself.

On Monday, May 16, 2016, Александар Симић notifications@github.com wrote:

Hey @Conaws https://github.com/Conaws

see this function: merge-pddocuments:

https://github.com/dotemacs/pdfboxing/blob/master/src/pdfboxing/split.clj#L36

and see how it's being used:

https://github.com/dotemacs/pdfboxing/blob/master/src/pdfboxing/split.clj#L64

(merge-pddocuments :docs [pddoc1 pddoc2 pddoc3] :output "output.pdf")

Does that help?


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#26 (comment)

Sent from Gmail Mobile

You are receiving this because you commented.
Reply to this email directly or view it on GitHub

@Conaws
Copy link
Author

Conaws commented May 19, 2016

brilliant

@Conaws Conaws closed this as completed May 19, 2016
@alanmarazzi
Copy link

I'm trying to read a pdf in, split it with split-pdf and then write it to disk with merge-pddocuments in this way:

(pdf-split/merge-pddocuments
  :docs (pdf-split/split-pdf :input path :start 1 :end 4)
  :output "test.pdf")

But I get this error, full stacktrace below:

Unhandled java.io.IOException
   COSStream has been closed and cannot be read. Perhaps its enclosing
   PDDocument has been closed?

            COSStream.java:   77  org.apache.pdfbox.cos.COSStream/checkClosed
            COSStream.java:  125  org.apache.pdfbox.cos.COSStream/createRawInputStream
            COSWriter.java: 1203  org.apache.pdfbox.pdfwriter.COSWriter/visitFromStream
            COSStream.java:  383  org.apache.pdfbox.cos.COSStream/accept
            COSObject.java:  158  org.apache.pdfbox.cos.COSObject/accept
            COSWriter.java:  522  org.apache.pdfbox.pdfwriter.COSWriter/doWriteObject
            COSWriter.java:  460  org.apache.pdfbox.pdfwriter.COSWriter/doWriteObjects
            COSWriter.java:  444  org.apache.pdfbox.pdfwriter.COSWriter/doWriteBody
            COSWriter.java: 1099  org.apache.pdfbox.pdfwriter.COSWriter/visitFromDocument
          COSDocument.java:  419  org.apache.pdfbox.cos.COSDocument/accept
            COSWriter.java: 1370  org.apache.pdfbox.pdfwriter.COSWriter/write
            COSWriter.java: 1257  org.apache.pdfbox.pdfwriter.COSWriter/write
           PDDocument.java: 1267  org.apache.pdfbox.pdmodel.PDDocument/save
NativeMethodAccessorImpl.java:   -2  sun.reflect.NativeMethodAccessorImpl/invoke0
NativeMethodAccessorImpl.java:   62  sun.reflect.NativeMethodAccessorImpl/invoke
DelegatingMethodAccessorImpl.java:   43  sun.reflect.DelegatingMethodAccessorImpl/invoke
               Method.java:  498  java.lang.reflect.Method/invoke
            Reflector.java:   93  clojure.lang.Reflector/invokeMatchingMethod
            Reflector.java:   28  clojure.lang.Reflector/invokeInstanceMethod
                 split.clj:   25  pdfboxing.split/pddocument->byte-array
                 split.clj:   21  pdfboxing.split/pddocument->byte-array
                 split.clj:   34  pdfboxing.split/pddocument->input-stream
                 split.clj:   32  pdfboxing.split/pddocument->input-stream
                  core.clj: 2745  clojure.core/map/fn
              LazySeq.java:   40  clojure.lang.LazySeq/sval
              LazySeq.java:   49  clojure.lang.LazySeq/seq
              LazySeq.java:  130  clojure.lang.LazySeq/toArray
            ArrayList.java:  581  java.util.ArrayList/addAll
     PDFMergerUtility.java:  215  org.apache.pdfbox.multipdf.PDFMergerUtility/addSources
                 split.clj:   41  pdfboxing.split/merge-pddocuments
                 split.clj:   36  pdfboxing.split/merge-pddocuments
               RestFn.java:  457  clojure.lang.RestFn/invoke
                      REPL:   11  pdfparse.core/eval6654
                      REPL:   11  pdfparse.core/eval6654
             Compiler.java: 7062  clojure.lang.Compiler/eval
             Compiler.java: 7025  clojure.lang.Compiler/eval
                  core.clj: 3206  clojure.core/eval
                  core.clj: 3202  clojure.core/eval
                  main.clj:  243  clojure.main/repl/read-eval-print/fn
                  main.clj:  243  clojure.main/repl/read-eval-print
                  main.clj:  261  clojure.main/repl/fn
                  main.clj:  261  clojure.main/repl
                  main.clj:  177  clojure.main/repl
               RestFn.java: 1523  clojure.lang.RestFn/invoke
    interruptible_eval.clj:   87  clojure.tools.nrepl.middleware.interruptible-eval/evaluate/fn
                  AFn.java:  152  clojure.lang.AFn/applyToHelper
                  AFn.java:  144  clojure.lang.AFn/applyTo
                  core.clj:  657  clojure.core/apply
                  core.clj: 1965  clojure.core/with-bindings*
                  core.clj: 1965  clojure.core/with-bindings*
               RestFn.java:  425  clojure.lang.RestFn/invoke
    interruptible_eval.clj:   85  clojure.tools.nrepl.middleware.interruptible-eval/evaluate
    interruptible_eval.clj:   55  clojure.tools.nrepl.middleware.interruptible-eval/evaluate
    interruptible_eval.clj:  222  clojure.tools.nrepl.middleware.interruptible-eval/interruptible-eval/fn/fn
    interruptible_eval.clj:  190  clojure.tools.nrepl.middleware.interruptible-eval/run-next/fn
                  AFn.java:   22  clojure.lang.AFn/run
   ThreadPoolExecutor.java: 1149  java.util.concurrent.ThreadPoolExecutor/runWorker
   ThreadPoolExecutor.java:  624  java.util.concurrent.ThreadPoolExecutor$Worker/run
               Thread.java:  748  java.lang.Thread/run

@dotemacs
Copy link
Owner

dotemacs commented Aug 1, 2018

Looking at your code:

(pdf-split/merge-pddocuments
  :docs (pdf-split/split-pdf :input path :start 1 :end 4)
  :output "test.pdf")

What does the argument to :docs look like?

I'm asking because looking at the discussion above, there is this example:

(merge-pddocuments :docs [pddoc1 pddoc2 pddoc3] :output "output.pdf")

Which looks like the argument to :docs should be a vector.

Is your result of

(pdf-split/split-pdf :input path :start 1 :end 4)

a vector?

I can see that the docstring talks of a list here:

https://github.com/dotemacs/pdfboxing/blob/master/src/pdfboxing/split.clj#L35

So see which one is applicable to you and let me know. Because this might be a case where I might need to update the docstring or the code depending on what you find out.

Thanks

@alanmarazzi
Copy link

alanmarazzi commented Aug 2, 2018

It's a vector:

=> (def s (pdf-split/split-pdf :input path :start 1 :end 4))
=> (type s)
clojure.lang.PersistentVector

I'm getting the same error with:

(pdf-split/merge-pddocuments
  :docs (apply list (pdf-split/split-pdf :input path :start 1 :end 4))
  :output "test.pdf")

This is how the list looks:

=> (apply list s)
(#object[org.apache.pdfbox.pdmodel.PDDocument 0x2932d27f "org.apache.pdfbox.pdmodel.PDDocument@2932d27f"] #object[org.apache.pdfbox.pdmodel.PDDocument 0x7ce0fee0 "org.apache.pdfbox.pdmodel.PDDocument@7ce0fee0"] #object[org.apache.pdfbox.pdmodel.PDDocument 0x3aa4ccc1 "org.apache.pdfbox.pdmodel.PDDocument@3aa4ccc1"] #object[org.apache.pdfbox.pdmodel.PDDocument 0x719b3937 "org.apache.pdfbox.pdmodel.PDDocument@719b3937"])

This is a ~20MB PDF with more than 40,000 pages if it might be an issue

@dotemacs
Copy link
Owner

dotemacs commented Aug 2, 2018

Can you see if the

(pdf-split/merge-pddocuments
  :docs [pddoc1 pddoc2 pddoc3]
  :output "test.pdf")

will work if you supply any other documents?

And can you see if you can merge documents that are split, saved to the disk first?

Basically, just trying to see where the issue could be.

@alanmarazzi
Copy link

Exact same error with another PDF with both a vector and a list. It looks like it's trying to get to the PDF somehow wich is closed

@dotemacs
Copy link
Owner

dotemacs commented Aug 2, 2018

Exact same error with another PDF with both a vector and a list.

OK.

It looks like it's trying to get to the PDF somehow wich is closed

Yea.

I won't be able to have a look at this issue this week.
I'd appreciate if you could at least create a new issue so that I could track it.
But a PR with a fix would be even beter (hint, hint :) ).

Thanks

@alanmarazzi
Copy link

Not sure I'll be able to work on it unfortunately. I'll see what I can do

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants