-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error while process PDF #781
Comments
I ran the pdf thru the GUI description service and it said the it was Well-Formed and valid and event outcome was a success. I tried the package on Ripple and it received the same error. On ripple I also tried editing the daitss-config.yml file under the transform_service I changed "skip_undefined" from false to true and it went thru the pdf steps it did not archive due to another issue on Ripple which will be handled next week involving squid. |
The package on production is in the stashspace named: Github_781. It is in the directory: /var/daitss/data/stash/Github_781/ETAL9VQ5Q_V6OA41. On Ripple its in the workspace: /var/daitss/data/work/ENF28E4YI_X7LTMP. On ripple the original package is in: /var/daitss/ops/stephen/AA00038892_00002 |
This package fails with PDF to PDF/A conversion with PdfaPilot. Would need to submit an issue ticket to PdfaPilot vendor. Alternatively, you can try to get this package ingested by turning off pdfa normalization. |
Here is the instruction, https://github.com/daitss/core/wiki/Turn-off-PDF-to-PDFA-normalization |
Email from Carol: Response from callas. Looks like you can fix those PDFs with PDFapilot, though I am not sure how you want to pursue it seems it means the SIPs will be changed. -Carol Hello Carol, as David has already mentioned the cases have underlying issues, however, in both cases the PDF structure seems to be corrupt. Acrobat is still able to display the file, however the more thorough analysis with the PDF/A validator/converter fails. We will further investigate to make sure that this assumption is correct. There is, however, already a known workaround for that problem: Both files can actually be converted when they are first converted to PostScript and back to PDF. You can do so by using ./pdfaPilot --redistill on command line. Would that work for you as a - at least temporary - solution? Best regards, --------------- Original Message --------------- Hi Carol, I've reproduced the problem for both files. The underlying cause appears to be different for both files, they will be looked at by development to determine what is causing this and whether anything can be done about it. I'll keep you posted! --------------- Original Message --------------- Hi Dietrich, Our sys admin has installed the new version of PDFaPIlot, . Some of the problem files can now ben converted but the following two still give out errors during the conversion: http://www.fcla.edu/daitss-test/files/00004-04-2009.pdf http://www.fcla.edu/daitss-test/files/00004-04-2009.pdf Progress 100 % Errors 16660 Device process color used but no PDF/A OutputIntent Errors 114 Font not embedded (and text rendering mode not 3) Errors 24 Annotation has no Flags entry Errors 24 Annotation not set to print Errors 6280 CharSet missing for Type 1 font Summary Corrections 72 Summary Errors 23102 Summary Warnings 0 Summary Infos 0 Duration 00:54 Error 1000 Unknown error (unknown exception) http://www.fcla.edu/daitss-test/files/09-06-2013.pdf http://www.fcla.edu/daitss-test/files/09-06-2013.pdf Serialization This pdfaPilot instance is running with a Coldspare or Developer license and may only be used in production as a temporary replacement for a full license on another computer. Input /home/cchou/pdfaError/GH_781/09-06-2013.pdf Pages 32 PDFA Regular Progress 100 % Summary Corrections 0 Summary Errors 0 Summary Warnings 0 Summary Infos 0 Duration 00:01 Error 1010 The PDF file may be corrupt (unable to open PDF file). Here is the pdfapilot version the sys admin has installed for us. 2000-2016 callas software gmbh Can you take a look again and provide us some solutions? Thanks, -Carol On Mon, Oct 10, 2016 at 5:09 AM, Dietrich von Seggern <d.seggern@callassoftware.com mailto:d.seggern@callassoftware.com> wrote: what version of pdfaPilot are you using? I was not able to reproduce any issues with the current release (callas pdfaPilot CLI 6.0.245 (x64)) on a Mac. The reason my either be the font situation or the version. Best regards, -- Meet us at: callas VIP Event, Berlin: November 7 - 8 (+ 9) PDF Day Australia, Sydney: November 25
--
|
Do we still have the original SIPs? We may need to fix the PDFs in the original SIPs (in consultation with their owners) and resubmit and abort the stashed SIPs with corrupt files. We'll need to discuss this. |
This is worth emailing UF about, since they seem to have done multiple submissions of 3 different package names. They may need to authorize that we 'abort' some of the duplicates, and then we'll have fewer problem packages to deal with. Determine if we still have the SIPs. If we do, we should experiment with correcting one of the problem PDFs with PDF/A pilot by converting to PDF/A and back to PDF. Based on the results of this investigation decide how to proceed. |
I did some validation of the PDFs remaining in the DAITSS Github_781 stashspace using description.fcla.edu. The results:
So it appears that the valid and well-formed PDFs may archive if the PDF/A Pilot is turned off. UF may need to recreate the other two. Carol - can you confirm my conclusions? |
I attempted to obtain details about the validity of the 4 remaining PDFs from Adobe Acrobat 9's Preflight feature but didn't have much success. |
The original packages for this issue: AA00038892_00002, AA00047064_00008, and UF00098620_00421 are in: /var/daitss/ops/exceptions/tickets/GitHub_781 on darchive. |
I received the follow error on a package with a pdf file:
error while processing 1(sip-files/09-06-2013.pdf): bad status
http://transform.fda.fcla.edu/transform/pdf_norm?location=file:/var/daitss/data/work/ETAL9VQ5Q_V6OA41/files/original/1/data: 500
/opt/pdfapilot/pdfaPilot /var/daitss/data/work/ETAL9VQ5Q_V6OA41/files/original/1/data --fontfolder=/usr/share/fonts/msttcorefonts/ --onlypdfa --substitute --outputfile=/var/daitss/tmp/d20160317-22104-1k0gniu/data/transformed.pdf --report=XML,IFNOPDFA,PATH=/var/daitss/tmp/d20160317-22104-1k0gniu/pdfapilot_report.xml failed, output: Input /var/daitss/data/work/ETAL9VQ5Q_V6OA41/files/original/1/data
Pages 32
PDFA Regular
Progress 100 %
Summary Corrections 0
Summary Errors 0
Summary Warnings 0
Summary Infos 0
Duration 00:05
Error 1010 The PDF file may be corrupt (unable to open PDF file).
The text was updated successfully, but these errors were encountered: