Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fails to parse this url correctly #40

Closed
Ben-Steele opened this issue Mar 11, 2020 · 3 comments
Closed

Fails to parse this url correctly #40

Ben-Steele opened this issue Mar 11, 2020 · 3 comments
Assignees
Labels
bug Something isn't working
Projects

Comments

@Ben-Steele
Copy link

The url is:
https://www.mysite.com/endpoint?param=abc--~C<http://anothersite.com/myfile.zip>

the trailing > is always stripped off the url even through it is part of it. When I extract_iocs I get:
https://www.mysite.com/endpoint?param=abc--~C<http://anothersite.com/myfile.zip

I can give the real url that I discovered this issue with, but it is malicious so I didn't want to include it here.

@Ben-Steele
Copy link
Author

^ This is not a valid URL, but some applications with url encode it and follow the link.

@cmmorrow cmmorrow self-assigned this May 22, 2020
@cmmorrow cmmorrow added this to To do in Issues via automation May 22, 2020
@battleoverflow battleoverflow added the bug Something isn't working label Jan 6, 2023
@battleoverflow
Copy link
Contributor

Hi, @Ben-Steele!

The ability to control the end punctuation should now be finished.

If you are using iocextract as a library, you can remove the punctuation restriction like this:

import iocextract

def rm_puncutation():
    for url in iocextract.extract_urls("https://www.mysite.com/endpoint?param=abc--~C<http://anothersite.com/myfile.zip>", refang=True, open_punc=True):
        print(url)

rm_puncutation()

If you're using it as a CLI, this command will do the same thing:

iocextract --input urls.txt --extract-urls --open

A new version is not available yet on PyPI. I will post another comment here once a new version is available for download.

@battleoverflow
Copy link
Contributor

The new PyPI package is now available!

PyPI: https://pypi.org/project/iocextract/1.13.8/
GitHub Releases: https://github.com/InQuest/python-iocextract/releases/tag/v1.13.8

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
No open projects
Issues
  
To do
Development

No branches or pull requests

3 participants