You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm trying to write an addition to a production environment document classification application by using images extracted from office documents.
Our team has been using oletools to extract macros from files we're looking at, and at first glance it would appear as though oletools would support image extraction given that it works with Microsoft Compound files, but none of the tools seem to look inside the "Data" directory within the file where the images are held.
I was hoping that oletools could add a module that would extract all nonstandard media from office files in a way that they could be used for other tools. Another good question oletools could answer is whether a document contains embedded pictures without extracting them.
The text was updated successfully, but these errors were encountered:
For now I do not plan to parse the internal structure of Word/Excel/PPT/etc files in oletools, as that would require a lot of work. However, if you are willing to contribute some code to do so, please do not hesitate to send me a pull request.
I did start some code in direction of "let's understand the structure as office does it" with the ppt_record_parser . However, there is just so much different stuff in these files and sometimes microsoft does not adhere to its own standards (or I misread them), so pretty early I fell back to just parse the type of data needed to extract macros and ignored the rest. But it should be easily expendable (at least for ppt where everything is record-based).
I'm trying to write an addition to a production environment document classification application by using images extracted from office documents.
Our team has been using oletools to extract macros from files we're looking at, and at first glance it would appear as though oletools would support image extraction given that it works with Microsoft Compound files, but none of the tools seem to look inside the "Data" directory within the file where the images are held.
I was hoping that oletools could add a module that would extract all nonstandard media from office files in a way that they could be used for other tools. Another good question oletools could answer is whether a document contains embedded pictures without extracting them.
The text was updated successfully, but these errors were encountered: