Nuclei parser: UnicodeDecodeError on special url characters (%c0) #9201

Tlafay1 · 2023-12-20T14:28:36Z

Bug description
When importing a nuclei scan containing special URL characters in the matched-at section of the json, the exception UnicodeDecodeError is thrown (see stacktrace below). To my understanding, this is due to the character being decoded in hyperlink.parse (dojo/models.py:2543), and therefore interpreted as a special character (when it should just be treated as normal characters).

Steps to reproduce
Steps to reproduce the behavior:

Run a scan using nuclei: nuclei -target scanme.nmap.org -json-export /tmp/nuclei-poc.json.
In the json report, modify any matched-at field by appending /%c0 at the end of the existing url.
Import the json in any engagement.
Notice the import failure, with the error message similar to the stacktrace .

Expected behavior
Import is successful

Deployment method

Docker Compose
Kubernetes
GoDojo

Environment information

Operating System: [Linux kali 6.5.0]
DefectDojo version (see footer) or commit message: [DefectDojo/release/2.29.3]

Logs
I removed the code from the try/except block in importer.py to backtrack the issue. I also purposefully removed a second error since fixing this one fixes everything:

----------------------------------
url: https://example.com/%c0
----------------------------------

[20/Dec/2023 13:35:22] ERROR [dojo.engagement.views:983] 'utf-8' codec can't decode byte 0xc0 in position 0: invalid start byte
Traceback (most recent call last):
  File "/app/dojo/engagement/views.py", line 945, in post
    test, finding_count, closed_finding_count, _ = importer.import_scan(
                                                   ^^^^^^^^^^^^^^^^^^^^^
  File "/app/dojo/importers/importer/importer.py", line 456, in import_scan
    parsed_findings = parser.get_findings(scan, test)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/dojo/tools/nuclei/parser.py", line 63, in get_findings
    endpoint = Endpoint.from_uri(matched)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/dojo/models.py", line 2546, in from_uri
    url = hyperlink.parse(url=uri)
          ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/hyperlink/_url.py", line 2447, in parse
    dec_url = DecodedURL(enc_url, lazy=lazy)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/hyperlink/_url.py", line 2046, in __init__
    self.host, self.userinfo, self.path, self.query, self.fragment
                              ^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/hyperlink/_url.py", line 2177, in path
    [
  File "/usr/local/lib/python3.11/site-packages/hyperlink/_url.py", line 2178, in <listcomp>
    _percent_decode(p, raise_subencoding_exc=True)
  File "/usr/local/lib/python3.11/site-packages/hyperlink/_url.py", line 766, in _percent_decode
    return unquoted_bytes.decode(subencoding)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc0 in position 0: invalid start byte

The text was updated successfully, but these errors were encountered:

Tlafay1 · 2023-12-20T14:31:12Z

If needed I can fix this in a pull request relatively soon, as I understand the root cause and could find a fix pretty quickly

manuel-sommer · 2023-12-20T17:45:14Z

@Tlafay1 could you please provide me a sample output? I will make a PR.

manuel-sommer · 2023-12-20T20:47:14Z

See PR @Tlafay1

Tlafay1 · 2023-12-21T07:45:56Z

@Tlafay1 could you please provide me a sample output? I will make a PR.

I'm not sure what you mean by sample output, are you talking about a nuclei scan that introduces the bug ?

manuel-sommer · 2023-12-21T07:51:56Z

@Tlafay1 could you please provide me a sample output? I will make a PR.

I'm not sure what you mean by sample output, are you talking about a nuclei scan that introduces the bug ?

Yes, I was talking aboiut a scan that introduces the bug, but I already was able to reproduce it, see PR.

* 🐛 fix issue #9201 * flake8

manuel-sommer · 2023-12-22T17:49:01Z

This can be closed.

Tlafay1 added the bug label Dec 20, 2023

manuel-sommer added a commit to manuel-sommer/django-DefectDojo that referenced this issue Dec 20, 2023

🐛 fix issue DefectDojo#9201

08107a5

manuel-sommer mentioned this issue Dec 20, 2023

🐛 fix issue #9201 #9202

Merged

Maffooch pushed a commit that referenced this issue Dec 22, 2023

🐛 fix issue #9201 (#9202)

f49910d

* 🐛 fix issue #9201 * flake8

mtesauro closed this as completed Dec 22, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Nuclei parser: UnicodeDecodeError on special url characters (%c0) #9201

Nuclei parser: UnicodeDecodeError on special url characters (%c0) #9201

Tlafay1 commented Dec 20, 2023

Tlafay1 commented Dec 20, 2023

manuel-sommer commented Dec 20, 2023

manuel-sommer commented Dec 20, 2023

Tlafay1 commented Dec 21, 2023

manuel-sommer commented Dec 21, 2023

manuel-sommer commented Dec 22, 2023

Nuclei parser: UnicodeDecodeError on special url characters (%c0) #9201

Nuclei parser: UnicodeDecodeError on special url characters (%c0) #9201

Comments

Tlafay1 commented Dec 20, 2023

Tlafay1 commented Dec 20, 2023

manuel-sommer commented Dec 20, 2023

manuel-sommer commented Dec 20, 2023

Tlafay1 commented Dec 21, 2023

manuel-sommer commented Dec 21, 2023

manuel-sommer commented Dec 22, 2023