Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crash when scanning text files containing a 0 byte in it #155

Closed
1 task done
agateau-gg opened this issue Dec 16, 2021 · 8 comments · Fixed by #202
Closed
1 task done

Crash when scanning text files containing a 0 byte in it #155

agateau-gg opened this issue Dec 16, 2021 · 8 comments · Fixed by #202
Labels
status:confirmed This issue has been reviewed and confirmed type:bug Something isn't working

Comments

@agateau-gg
Copy link
Collaborator

GitGuardian Shield Version

  • I can reproduce this bug in the latest version

Command executed

ggshield scan path ThirdPartyNotices.rtf

Describe the bug

ggshield crashes when scanning the attached file (attached as zip because GitHub does not support rtf files). This is because the final byte of the file is a 0, as can be seen on this hex dump:

$ tail -n 4 ThirdPartyNotices.rtf | xxd
00000000: 3d3d 3d3d 3d3d 3d3d 3d3d 3d3d 3d3d 3d3d  ================
00000010: 3d3d 3d3d 3d3d 3d3d 3d3d 3d3d 3d3d 3d3d  ================
00000020: 3d3d 3d3d 3d3d 3d3d 3d5c 7061 720d 0a45  =========\par..E
00000030: 4e44 204f 4620 5c63 6170 7320 2e4e 4554  ND OF \caps .NET
00000040: 2043 6f6d 7069 6c65 7220 506c 6174 666f   Compiler Platfo
00000050: 726d 5c63 6170 7330 2020 4e4f 5449 4345  rm\caps0  NOTICE
00000060: 5320 414e 4420 494e 464f 524d 4154 494f  S AND INFORMATIO
00000070: 4e5c 7061 720d 0a7d 0d0a 00              N\par..}...

Expected behavior

ggshield should either:

  • scan the file without failing
  • provide a clear error message about the problem

Traceback (if available)

Traceback (most recent call last):
  File "/home/agateau/src/ggshield/ggshield/dev_scan.py", line 168, in path_cmd
    results = files.scan(
  File "/home/agateau/src/ggshield/ggshield/scan/scannable.py", line 139, in scan
    scan = future.result()
  File "/usr/lib/python3.8/concurrent/futures/_base.py", line 437, in result
    return self.__get_result()
  File "/usr/lib/python3.8/concurrent/futures/_base.py", line 389, in __get_result
    raise self._exception
  File "/usr/lib/python3.8/concurrent/futures/thread.py", line 57, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/home/agateau/src/py-gitguardian/pygitguardian/client.py", line 279, in multi_content_scan
    request_obj = Document.SCHEMA.load(documents, many=True)
  File "/home/agateau/src/ggshield/.venv/lib/python3.8/site-packages/marshmallow/schema.py", line 719, in load
    return self._do_load(
  File "/home/agateau/src/ggshield/.venv/lib/python3.8/site-packages/marshmallow/schema.py", line 904, in _do_load
    raise exc
marshmallow.exceptions.ValidationError: {0: {'document': ['document has null characters']}}

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/agateau/src/ggshield/.venv/bin/ggshield", line 33, in <module>
    sys.exit(load_entry_point('ggshield', 'console_scripts', 'ggshield')())
  File "/home/agateau/src/ggshield/ggshield/cmd.py", line 229, in cli_wrapper
    return_code: int = cli.main(standalone_mode=False)
  File "/home/agateau/src/ggshield/.venv/lib/python3.8/site-packages/click/core.py", line 1053, in main
    rv = self.invoke(ctx)
  File "/home/agateau/src/ggshield/.venv/lib/python3.8/site-packages/click/core.py", line 1659, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/agateau/src/ggshield/.venv/lib/python3.8/site-packages/click/core.py", line 1659, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/agateau/src/ggshield/.venv/lib/python3.8/site-packages/click/core.py", line 1395, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/agateau/src/ggshield/.venv/lib/python3.8/site-packages/click/core.py", line 754, in invoke
    return __callback(*args, **kwargs)
  File "/home/agateau/src/ggshield/.venv/lib/python3.8/site-packages/click/decorators.py", line 26, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/home/agateau/src/ggshield/ggshield/dev_scan.py", line 181, in path_cmd
    return handle_exception(error, config.verbose)
  File "/home/agateau/src/ggshield/ggshield/utils.py", line 275, in handle_exception
    raise click.ClickException(str(e))
click.exceptions.ClickException: {0: {'document': ['document has null characters']}}
@agateau-gg agateau-gg added type:bug Something isn't working status:new This issue needs to be reviewed status:confirmed This issue has been reviewed and confirmed and removed status:new This issue needs to be reviewed labels Dec 16, 2021
@agateau-gg agateau-gg changed the title Crash when scanning text files with 0 bytes in it Crash when scanning text files containing a 0 byte in it Mar 8, 2022
@jhult
Copy link

jhult commented Mar 31, 2022

I recently bumped into this. Any news on a fix?

@agateau-gg
Copy link
Collaborator Author

Hi Jonathan,

We don't have anyone working on this bug right now. I am going to propose it for our next sprint.

@agateau-gg
Copy link
Collaborator Author

@jhult I am looking into this, can you share more information about the file which caused the failure? What type was it? Was it corrupted? How long was it?

agateau-gg added a commit that referenced this issue Apr 12, 2022
Extract the decoding logic used when reading files from Docker images
and reuse it when scanning files from the file system.
agateau-gg added a commit that referenced this issue Apr 15, 2022
Extract the decoding logic used when reading files from Docker images
and reuse it when scanning files from the file system.
agateau-gg added a commit that referenced this issue Apr 15, 2022
Extract the decoding logic used when reading files from Docker images
and reuse it when scanning files from the file system.
@cyprianbergoniatmo
Copy link

when will this be released into a version?

@agateau-gg
Copy link
Collaborator Author

when will this be released into a version?

We have a few issues to iron out in main before we release. Hopefully next week.

@rgajason
Copy link
Contributor

How's that release coming? We need this fix as well.

@agateau-gg
Copy link
Collaborator Author

Sorry for the delay, that one has been more complicated to wrap. We plan to release it next Monday.

@agateau-gg
Copy link
Collaborator Author

@rgajason ggshield 1.12.0 has just been released.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status:confirmed This issue has been reviewed and confirmed type:bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants