Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

extract_unencoded_url is too greedy when parsing Windows command lines #53

Closed
0x4d4c opened this issue Dec 8, 2022 · 4 comments
Closed
Assignees
Labels
bug Something isn't working

Comments

@0x4d4c
Copy link

0x4d4c commented Dec 8, 2022

I'm parsing input containing examples of PowerShell or cmd.exe command lines. When a command flag with a slash comes after an URL, then the flag is included in the extracted URL.

Here is an example:

list(iocextract.extract_unencoded_urls("command.exe https://pypi.org/project/iocextract/ /f"))
  # => ['https://pypi.org/project/iocextract/ /f']

The trailing /f should not be included in the extracted URL.

@DragonistYJ
Copy link

DragonistYJ commented Dec 8, 2022 via email

@battleoverflow battleoverflow self-assigned this Dec 8, 2022
@battleoverflow battleoverflow added the bug Something isn't working label Dec 8, 2022
@battleoverflow
Copy link
Contributor

Hi, @0x4d4c!

I think I was able to fix the issue in a way that shouldn't disrupt normal extraction. I decided to add a new regex expression to the strip parameter. You can see an example of my solution below. Since most URLs do not contain whitespace, this new code will extract anything that follows the pattern: whitespace + /\ + character, so something like https://example.com/f should still work.

If you run into any issues, feel free to let me know. I'll ping you when a new version is available from PyPi so you can test out this new addition.

Example:

import iocextract

def locate_url():
    data = "command.exe https://pypi.org/project/iocextract/ /f /n /a \s ///xhh /no \\\\f /d \a"
    return list(iocextract.extract_unencoded_urls(data, strip=True))

print(locate_url()) # => ['https://pypi.org/project/iocextract/']

I'll close this issue as soon as the new release is out.

@battleoverflow battleoverflow mentioned this issue Dec 8, 2022
@battleoverflow
Copy link
Contributor

You can download the new version from PyPi now.

New release: https://pypi.org/project/iocextract/1.13.2/

@0x4d4c
Copy link
Author

0x4d4c commented Dec 13, 2022

Wow, that was blazing fast! I tested the new release from PyPI and my sample files are processed correctly now. Thank you very much!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants