Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem parsing a certain pdf #15

Closed
stefanste opened this issue Mar 27, 2015 · 6 comments
Closed

Problem parsing a certain pdf #15

stefanste opened this issue Mar 27, 2015 · 6 comments
Labels

Comments

@stefanste
Copy link

Hi, I'm getting an error when trying to load the pdf data for a certain pdf:
https://docs.google.com/file/d/0B4AGXAJrQz1RNE5OZHFTdWIycHc/edit?pli=1

pdf = CombinePDF.new('/home/stefan/Useful_KI_Information.pdf')
didn't find reference {:is_reference_only=>true, :indirect_generation_number=>0, :indirect_reference_id=>34}
Couldn't connect all values from references - didn't find reference {:DecodeParms=>{:Columns=>5, :Predictor=>12}, :Filter=>:FlateDecode, :ID=>["\xE9\xE3\xD7l\x19\v\xC9\x11\xB1\xD9\x99yM){\x1F", "'J\xB0\xCCk\xB4\xDFO\xA6\x83V\x9F\x1DM\x13\xB5"], :Index=>[35, 34], :Info=>nil, :Length=>112, :Prev=>253565, :Root=>{:is_reference_only=>true, :indirect_generation_number=>0, :indirect_reference_id=>36}, :Size=>35, :Type=>:XRef, :W=>[1, 3, 1], :raw_stream_content=>"h\xDEbb\x00\x01&F\x86\x1D\xB5\fL\f\f\x8C\xC7\x81$\xA3\x18\x0F\x98}\eD2\x80E\xA6\xBEG\x88\x80\xD50L\x9A\x0E\"\x99\xD7\x81H&\x7F\x90\x9A\xFD%`\xF6\x15\xB0\x9AV\x10\xC9\xCD\vVs\n,\xD2\x05\"\xF9\x0E\x81\xCD\x04\xEBe\xBC\x0F\xB4\xF7\xAFR\eX\x84\x19L\xB2\x81I\x06Ft\x92\xF9/vqF$q\xA6\xFF`\x11\x06\x80\x00\x03\x00\xBD\xCF\x14\\\r", :indirect_generation_number=>0, :indirect_reference_id=>20}!!!
didn't find reference {:is_reference_only=>true, :indirect_generation_number=>0, :indirect_reference_id=>34}
Couldn't connect all values from references - didn't find reference {:DecodeParms=>{:Columns=>5, :Predictor=>12}, :Filter=>:FlateDecode, :ID=>["\xE9\xE3\xD7l\x19\v\xC9\x11\xB1\xD9\x99yM){\x1F", "'J\xB0\xCCk\xB4\xDFO\xA6\x83V\x9F\x1DM\x13\xB5"], :Index=>[35, 34], :Info=>nil, :Length=>112, :Prev=>253565, :Root=>{:Metadata=>{:is_reference_only=>true, :indirect_generation_number=>0, :indirect_reference_id=>17}, :PageLabels=>{:is_reference_only=>true, :indirect_generation_number=>0, :indirect_reference_id=>31}, :Pages=>{:is_reference_only=>true, :indirect_generation_number=>0, :indirect_reference_id=>33}, :Type=>:Catalog, :indirect_generation_number=>0, :indirect_reference_id=>36}, :Size=>35, :Type=>:XRef, :W=>[1, 3, 1], :raw_stream_content=>"h\xDEbb\x00\x01&F\x86\x1D\xB5\fL\f\f\x8C\xC7\x81$\xA3\x18\x0F\x98}\eD2\x80E\xA6\xBEG\x88\x80\xD50L\x9A\x0E\"\x99\xD7\x81H&\x7F\x90\x9A\xFD%`\xF6\x15\xB0\x9AV\x10\xC9\xCD\vVs\n,\xD2\x05\"\xF9\x0E\x81\xCD\x04\xEBe\xBC\x0F\xB4\xF7\xAFR\eX\x84\x19L\xB2\x81I\x06Ft\x92\xF9/vqF$q\xA6\xFF`\x11\x06\x80\x00\x03\x00\xBD\xCF\x14\\\r", :indirect_generation_number=>0, :indirect_reference_id=>20}!!!
couldn't connect a reference!!! could be a null or removed (empty) object, Silent error!!!
 Object raising issue: {:is_reference_only=>true, :indirect_generation_number=>0, :indirect_reference_id=>34, :referenced_object=>nil}
couldn't connect a reference!!! could be a null or removed (empty) object, Silent error!!!
 Object raising issue: {:is_reference_only=>true, :indirect_generation_number=>0, :indirect_reference_id=>31, :referenced_object=>nil}
couldn't connect a reference!!! could be a null or removed (empty) object, Silent error!!!
 Object raising issue: {:is_reference_only=>true, :indirect_generation_number=>0, :indirect_reference_id=>33, :referenced_object=>nil}
couldn't connect a reference!!! could be a null or removed (empty) object, Silent error!!!
 Object raising issue: {:is_reference_only=>true, :indirect_generation_number=>0, :indirect_reference_id=>33, :referenced_object=>nil}
couldn't connect a reference!!! could be a null or removed (empty) object, Silent error!!!
 Object raising issue: {:is_reference_only=>true, :indirect_generation_number=>0, :indirect_reference_id=>55, :referenced_object=>nil}
couldn't connect a reference!!! could be a null or removed (empty) object, Silent error!!!
 Object raising issue: {:is_reference_only=>true, :indirect_generation_number=>0, :indirect_reference_id=>33, :referenced_object=>nil}
couldn't connect a reference!!! could be a null or removed (empty) object, Silent error!!!
 Object raising issue: {:is_reference_only=>true, :indirect_generation_number=>0, :indirect_reference_id=>21, :referenced_object=>nil}
couldn't connect a reference!!! could be a null or removed (empty) object, Silent error!!!
 Object raising issue: {:is_reference_only=>true, :indirect_generation_number=>0, :indirect_reference_id=>33, :referenced_object=>nil}
couldn't connect a reference!!! could be a null or removed (empty) object, Silent error!!!
 Object raising issue: {:is_reference_only=>true, :indirect_generation_number=>0, :indirect_reference_id=>22, :referenced_object=>nil}
couldn't connect a reference!!! could be a null or removed (empty) object, Silent error!!!
 Object raising issue: {:is_reference_only=>true, :indirect_generation_number=>0, :indirect_reference_id=>33, :referenced_object=>nil}
couldn't connect a reference!!! could be a null or removed (empty) object, Silent error!!!
 Object raising issue: {:is_reference_only=>true, :indirect_generation_number=>0, :indirect_reference_id=>23, :referenced_object=>nil}

Using ruby 2.1.5. It's working for every PDF I've tried except this one.. also, looking through issues it's potentially related to #6

@boazsegev boazsegev added the bug label Mar 27, 2015
@stefanste
Copy link
Author

Also incase it helps, just found a slightly older version of the same file which DOES work. I can't spot any difference, only 9 bytes difference between the two!
https://drive.google.com/file/d/0B4AGXAJrQz1ROWFNMGFSSE5kQkU/view

boazsegev pushed a commit that referenced this issue Mar 27, 2015
@boazsegev
Copy link
Owner

Hi Stefan,

Thank you very much for opening this issue and helping make the combine_pdf gem even better.

I had a quick look at the issue and it seems that the PDF file you showed me uses the wrong version identifier. The PDF file states PDF version 1.3 but uses features introduced only in PDF version 1.5 ...

... for performance reasons, the parser didn't check if Object Streams existed in this PDF file (following the rule that in version 1.3 they shouldn't exist) and didn't attempt to extract the data from them.

I updated the parser so that it will always search for Object Streams and this resolved the issue on my system.

Please install version 0.1.18. It should solve the issue for you.

Again, thanks you for opening this issue.

@stefanste
Copy link
Author

Yep, works like a charm now. Thanks for the amazingly quick fix!

@ChigboIO
Copy link

Hi guys,

See this issue again on v0.2.31 when trying to combine a pdf generated by CombinePDF.

Couldn't connect reference for {:is_reference_only=>true, :indirect_generation_number=>0, :indirect_reference_id=>0, :referenced_object=>nil}

Do I have to update something on system?

Thank you.

@boazsegev
Copy link
Owner

Hi @andela-echigbo ,

Thanks for posting.

I'm assuming the error you're posting shows Warning at the beginning?

I'm not sure, but in my head this is expected behavior. Please let me know if you're experiencing this as an error or a warning...

Let me explain why I suspect you're referencing an expected behavior when importing CombinePDF data.

PDFs have, sometimes, NULL objects (like nil in Ruby), with values that are empty.

In CombinePDF, the NULL object is object 0,0 (references in PDF files consist of two numbers).

These NULL objects are (often, but depending on the PDF specification version) marked by a reference to a non-existing object (which, according to the specification, faults to NULL).

The final value caused by this "broken" reference is nil (in Ruby) or NULL (in PDF jargon).

HOWEVER, this is not always the case. Different PDF authoring systems designate different objects for the NULL object (or they use the null keyword and a higher PDF version number) and sometimes these's a real issue with objects and missing references... that's where the warning comes in.

@ChigboIO
Copy link

Yes, you're right @boazsegev, it's a warning and things still works anyways. I'm sorry I didn't state that.
Thanks you

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants