Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PDF Extract no longer extracts all images under JDK14 #2031

Open
JamzTheMan opened this issue Jun 18, 2020 · 4 comments
Open

PDF Extract no longer extracts all images under JDK14 #2031

JamzTheMan opened this issue Jun 18, 2020 · 4 comments
Labels
bug medium Medium priority bug/enhancement

Comments

@JamzTheMan
Copy link
Member

JamzTheMan commented Jun 18, 2020

Describe the bug
PDF Extractor can no longer write out certain images like jpeg2000.

To Reproduce
Steps to reproduce the behavior:

  1. Attempt to extract out various PDF's (need to find a PDF example that doesn't have copy write material) but for my tests I tried various Paizo PDF's and Page_13.pdf from @Phergus (but hesitant to post as that is also probably copyrighted material)

Expected behavior
All images in PDF should extract as it does using MapTool 1.7.x or TokenTool 2.1.

MapTool Info

  • Version: DEVELOP branch
  • Install: New, Upgrade [previous version], or JAR [Java Version]

Desktop (please complete the following information):

  • OS: ALL
  • Version ALL

Additional context
I've tried to capture failed Image.io write of jpg and write image as png instead which fixes some issues but several images are still not being written out. Further debugging needs to done.

Can compare more with TokenTool DEVELOP branch as that now also uses JDK14 and uses virtually same code except images are stored in memory (because only one page worth of images is shown at a time vs extract of whole PDF)

@JamzTheMan JamzTheMan added the bug label Jun 18, 2020
@JamzTheMan JamzTheMan added this to To do in MapTool 1.8.0 via automation Jun 18, 2020
@JamzTheMan JamzTheMan added the medium Medium priority bug/enhancement label Jun 18, 2020
@Phergus Phergus added this to To do in MapTool 1.9.0 via automation Feb 7, 2021
@Phergus Phergus removed this from To do in MapTool 1.8.0 Feb 7, 2021
@Phergus
Copy link
Contributor

Phergus commented Apr 16, 2021

Next release will be using Java 16. With the updated ImageIO plugins from #2495 I am seeing the correct results with PDF import on the ones I've tried. Still need to test against the Paizo files.

@Phergus Phergus moved this from To do to In progress in MapTool 1.9.0 Apr 21, 2021
@Phergus
Copy link
Contributor

Phergus commented Apr 23, 2021

The Paizo PDFs are still an issue and TT 2.2 extracts more images from other PDFs.

@Phergus Phergus removed this from In progress in MapTool 1.9.0 Jun 7, 2021
@Zahariel
Copy link

Has this been addressed at all? It still doesn't seem to work very well with Paizo's files; it manages to extract "some" images but it's not at all reliable, even with One File Per Chapter files that aren't 600 pages long. (I'm ok with it not working well with a 600 page PDF!)

@Phergus
Copy link
Contributor

Phergus commented Oct 25, 2022

Nothing so far. Someone is going to have to get up to speed on the PDF extraction code and do some serious debugging.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug medium Medium priority bug/enhancement
Projects
None yet
Development

No branches or pull requests

3 participants