Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rtfobj: bug with regex on Python 3 (unicode instead of bytes) #692

Closed
decalage2 opened this issue Jun 3, 2021 · 0 comments
Closed

rtfobj: bug with regex on Python 3 (unicode instead of bytes) #692

decalage2 opened this issue Jun 3, 2021 · 0 comments
Assignees
Milestone

Comments

@decalage2
Copy link
Owner

some regex are not explicitly typed as bytes, so they are unicode strings for Python 3. This causes an exception when scanning RTF data, which is bytes. This happens for OLE2Link objects:

rtfobj 0.60 on Python 2.7.18 - http://decalage.info/python/oletools
===============================================================================
File: 'sample_with_external_link_to_doc.rtf' - size: 50810 bytes
---+----------+---------------------------------------------------------------
id |index     |OLE Object
---+----------+---------------------------------------------------------------
0  |00002A8Fh |format_id: 2 (Embedded)
   |          |class name: 'OLE2Link'
   |          |data size: 2560
   |          |MD5 = 'a8f34530b8f91fc93ef5113f4be1601a'
   |          |CLSID: 88D96A0C-F192-11D4-A65F-0040963251E5
   |          |SAX XML Reader 6.0 (msxml6.dll)
   |          |Possibly an exploit for the OLE2Link vulnerability (VU#921560,
   |          |CVE-2017-0199)
   |          |URL extracted: https://raw.githubusercontent.com/decalage2/olet
   |          |ools/master/tests/test-data/msodde/harmless-clean.doc
---+----------+---------------------------------------------------------------

c:\>py -3 rtfobj.py sample_with_external_link_to_doc.rtf
rtfobj 0.60 on Python 3.9.0 - http://decalage.info/python/oletools
===============================================================================
File: 'sample_with_external_link_to_doc.rtf' - size: 50810 bytes
---+----------+---------------------------------------------------------------
id |index     |OLE Object
---+----------+---------------------------------------------------------------
Traceback (most recent call last):
  File "rtfobj.py", line 1085, in <module>
    main()
  File "rtfobj.py", line 1080, in main
    process_file(container, filename, data, output_dir=options.output_dir,
  File "rtfobj.py", line 927, in process_file
    found_list =  re.findall(r'[a-fA-F0-9\x0D\x0A]{128,}',data)
  File "C:\Program Files\Python39\lib\re.py", line 241, in findall
    return _compile(pattern, flags).findall(string)
TypeError: cannot use a string pattern on a bytes-like object

Solution: make regex byte strings

@decalage2 decalage2 self-assigned this Jun 3, 2021
@decalage2 decalage2 added this to the oletools 0.60 milestone Jun 3, 2021
c-rosenberg pushed a commit to HeinleinSupport/oletools that referenced this issue Dec 2, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant