HSSFWorkbook.getAllEmbeddedObjects wasn't getting all embedded objects, only the top-level ones. This fixes it to get them recursively.
HSSFWorkbook.getAllEmbeddedObjects wasn't getting all embedded objects, only the top-level ones.
I may be missing something, but I can't see how your new method ends up getting called?
Also, any chance of a unit test for this (likely with a test file), which shows that some objects were missed before, but are now being fetched?
Now that you ask it, I'm not sure if the diff is complete or not. I'll have to check our offline copy again. The summary of the issue was that the original code only returned the images at the top-level, not those inside groups.
As for samples, the only one I have, I'm not allowed to share.
If you do manage to dig out the full patch, that'd be handy. What you could then do is write a very small program that opens each test file we have in turn, and prints out the number of embedded parts. Run that with pre and post fixed jars, if one of them shows a different number then we have an existing test file we can use for the unit test!
Fixing missing call to the getAllEmbeddedObjects()
https://gist.github.com/trejkaz/5779009 shows results before and after the patch.
I guess there are more sample files than anyone expected - v3.9 only found embedded objects in 4 files and with the fix, 10 files have embedded objects.
Looks like files will tend to either have one kind or the other then - I spot lots of files we're now finding things in, but none we found before getting more
I've applied your two patches in r1493001, along with two new unit tests based on your gist, thanks!
Dummy commit to try if we can close some old github PRs via commit-co…
…mments: closes #2, closes #3, closes #4, closes #19
git-svn-id: https://svn.apache.org/repos/asf/poi/trunk@1722963 13f79535-47bb-0310-9956-ffa450edef68