Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem with json export #20

Closed
lesouriciergris opened this issue Mar 25, 2024 · 7 comments
Closed

Problem with json export #20

lesouriciergris opened this issue Mar 25, 2024 · 7 comments

Comments

@lesouriciergris
Copy link

Hi,

With a panel of more than 5.000 PDFs to analyse, we have identified another problem with some PDFs (sample attached) . This sample is a Microsoft Excel file, modified and exported to PDF.

essai.pdf

With peepdf -j essai.pdf :

[!] Error: Exception while generating the JSON report

Traceback (most recent call last):
  File "/home/crc/.local/lib/python3.10/site-packages/peepdf/peepdf.py", line 33
2, in main
    jsonReport = getPeepJSON(statsDict, VERSION)
  File "/home/crc/.local/lib/python3.10/site-packages/peepdf/PDFUtils.py", line
668, in getPeepJSON
    ids[idx] = ids[idx].split(f"Version {idx}: ")[1]
IndexError: list index out of range

The error seems to come from <id0>Version 1:, export in json is expecting <id0>Version 0:

With peepdf -x essai.pdf everything works :

<peepdf_analysis version="3.0.3" url="https://github.com/digitalsleuth/peepdf-3" author="Jose Miguel Esparza and Corey Forman">
  <date>2024-03-25 15:14:52</date>
  <basic>
    <filename>essai.pdf</filename>
    <md5>0e65ea47e9b51b80b22f0dbb2b8b8856</md5>
    <sha1>7c20c50251e7a2789fe7d4dd4607d8f27e0f400a</sha1>
    <sha256>2f3d70ae88d35a17a8d654de8a257c687abce28b6aa376f9c0b96fc913908458</sha256>
    <size>33234</size>
    <id0>Version 1: [ &lt;54B84DA444D598449A57423BE8579728&gt; &lt;54B84DA444D598449A57423BE8579728&gt; ]</id0>
    <detection/>
    <pdf_version>1.7</pdf_version>
    <binary status="true"/>
    <linearized status="false"/>
    <encrypted status="false"/>
    <updates>1</updates>
    <num_objects>30</num_objects>
    <num_streams>5</num_streams>
    <comments>0</comments>
    <errors num="2">
      <error_message>EOL not found</error_message>
      <error_message>No indirect objects found in the body</error_message>
    </errors>
  </basic>
  <advanced>
    <version num="0" type="original">
      <catalog object_id="1"/>
      <info object_id="3"/>
      <objects num="30">
        <object id="1" compressed="false"/>
        <object id="2" compressed="false"/>
        <object id="3" compressed="false"/>
        <object id="4" compressed="false"/>
        <object id="5" compressed="false"/>
        <object id="6" compressed="false"/>
        <object id="7" compressed="false"/>
        <object id="8" compressed="false"/>
        <object id="9" compressed="false"/>
        <object id="10" compressed="true"/>
        <object id="11" compressed="true"/>
        <object id="12" compressed="true"/>
        <object id="13" compressed="true"/>
        <object id="14" compressed="true"/>
        <object id="15" compressed="true"/>
        <object id="16" compressed="true"/>
        <object id="17" compressed="true"/>
        <object id="18" compressed="true"/>
        <object id="19" compressed="true"/>
        <object id="20" compressed="true"/>
        <object id="21" compressed="false"/>
        <object id="22" compressed="true"/>
        <object id="23" compressed="true"/>
        <object id="24" compressed="true"/>
        <object id="25" compressed="true"/>
        <object id="26" compressed="false"/>
        <object id="27" compressed="false"/>
        <object id="28" compressed="false"/>
        <object id="29" compressed="false"/>
        <object id="30" compressed="false"/>
      </objects>
      <streams num="5">
        <stream id="5" xref_stream="false" object_stream="false" encoded="true"/>
        <stream id="21" xref_stream="false" object_stream="true" encoded="true"/>
        <stream id="27" xref_stream="false" object_stream="false" encoded="true"/>
        <stream id="28" xref_stream="false" object_stream="false" encoded="false"/>
        <stream id="30" xref_stream="true" object_stream="false" encoded="true"/>
      </streams>
      <js_objects/>
      <suspicious_elements>
        <triggers>
          <trigger name="/Names">
            <container_object id="13"/>
          </trigger>
        </triggers>
      </suspicious_elements>
      <suspicious_urls/>
    </version>
    <version num="1" type="update">
      <catalog object_id="1"/>
      <info object_id="3"/>
      <objects num="0"/>
      <streams num="0"/>
      <js_objects/>
      <suspicious_elements/>
      <suspicious_urls/>
    </version>
  </advanced>
</peepdf_analysis>
@digitalsleuth
Copy link
Owner

Hi @lesouriciergris , thanks for identifying this. I'll look at this as well.
Cheers!

@digitalsleuth
Copy link
Owner

@lesouriciergris , I've identified the issue and it has been resolved, but this fix will be released once the other issue you raised is resolved.

@lesouriciergris
Copy link
Author

Great , thanks a lot

Good luck for the other issue.

@kandji-alex
Copy link

I belive I've ran into the same issue, thanks for fixing!

@digitalsleuth
Copy link
Owner

Hi @kandji-alex This issue has been identified and has been resolved in the next upcoming release. I'm currently doing some linting and will be releasing this in the next 24 hours.

Cheers!

@digitalsleuth
Copy link
Owner

Hi @kandji-alex and @lesouriciergris , these issues are now fixed in the latest release, v4.0.0. Sorry for the delay!

Cheers!

@digitalsleuth
Copy link
Owner

Haven't received any updates on this. If the issue still exists, please open a new issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants