-
Notifications
You must be signed in to change notification settings - Fork 227
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adobe Reader cannot open files created with pdfPig, and when opening it on chrome, a transparent png is corrupted #395
Comments
I tried something new : Extracting 1 page from the "corrupted file" to see what happens, and I have an exception. Code used : var pdfPath = @"C:\TEST_SOURCE_FILE.pdf";
var stream = File.OpenRead(pdfPath);
var sourceDocument = PdfDocument.Open(stream);
var documentBuilder = new PdfDocumentBuilder();
documentBuilder.AddPage(sourceDocument, 1);
var newDoc = documentBuilder.Build();
var targetDocument = PdfDocument.Open(newDoc);
var targetDocumentBuilder = new PdfDocumentBuilder();
targetDocumentBuilder.AddPage(targetDocument, 1); Exception :
The exception is thrown on the last line of code written in this comment. |
This exactly describes the problem I am having as well. |
I'll try to take a look at this soon. Hopefully won't be too hard to determine issue with the example PDFs provided. |
Thanks for the heads up @plaisted ! 😄 |
It appears this is related to string encoding and the way PdfPig parses string. Pdfs can store raw byte data in a string (colorspace data in this case) and I think PdfPig is not properly handling the raw byte data. PdfPig internally converts the byte data to a c# string and then uses the c# string when it's serializing it again. At a minimum it's changing the encoding but suspect it may be corrupting the data as well. I'm don't think there's a fix without modifying the way PdfPig handles string which would take some thought. |
Is there a way to detect when raw byte data is stored in the pdf, and in this case storing |
Conceptually yes, but string handling in PDFs is pretty complicated and don't want to break existing functionality. Additionally I don't think PdfPig is handling string encoding correctly overall, it treats non-unicode strings as ISO-8859 encoded. By the PDF spec they actually use I'll try to look a little more later this week, I may be able to fix this current issue and leave the ISO-8859 inconsistency as is for now. |
Think there is a fix in #401 if you want to test. I copied page from the TEST_SOURCE_FILE to new PDF and it was no longer corrupt. |
I just tested it and it works with all the files I used. Thanks a lot ! |
I don't have the rights to do it, but you can link PR #401 to this issue. |
PR #401 has just been merged to master, therefore I close this issue. |
Hello,
I use PdfPig to open multipages PDFs, split them and recreate them.
Here is a sample of the code :
It usually works fine, but we have some files which cannot be opened after the split operation with Adobe reader :
translation :
The thing is that when I try to open it in chrome, it works fine, only with a slight issue : the PDF contains a transparent PNG and in the split file it seems to be corrupted :
Original image
Split file image
I can't send you the file since it contains sensitive data (Edit : I have since succeeded in creating another file which allows us to reproduce the behaviour; It can be found at the end), but I may have a clue :
I saw this issue from another pdf library :
parallax/jsPDF#862
They had something similar happening, and it seems that it was because something was missing with the PNG predictors :
jsPDF fix
I didn't succeed to recreate another PDF with the same issue.=> There is a file to recreate the issue at the end of this comment.Here are some informations about the file found with PDF Architect :
If I succeed into creating a pdf without sensitive informations and with the issue, I'll upload it.
If you need any other information, feel free to ask me.
EDIT
I succeded in creating a file which have the same issue :
TEST_SOURCE_FILE.pdf
I used the same PNG (copied from source pdf), pasted it in a word doc and used PDFCreator to create this new pdf.
When using it with the code above, the output file is corrupted :
CORRUPTED_FILE.pdf
I saw a thread talking about png profiles which can also create issues, but I don't know if it's the case here :
https://legacy.imagemagick.org/discourse-server/viewtopic.php?t=32930
The text was updated successfully, but these errors were encountered: