Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make sure the URL to capture is actually a URL #32

Closed
Rafiot opened this issue Jul 22, 2023 · 4 comments
Closed

Make sure the URL to capture is actually a URL #32

Rafiot opened this issue Jul 22, 2023 · 4 comments

Comments

@Rafiot
Copy link
Member

Rafiot commented Jul 22, 2023

          Hello there,

I would like to report a similar "bug".

I recently discovered in one of my workflow, that I was submitting bogus URL to pylookyloo, by prefixing a single-quote by a double-quote, and pylookyloo accepted it anyway, leading to bogus scan.

Example of my bogus workflow :

$ lookyloo --listing --query "'http://google.fr'"
https://lookyloo.circl.lu/tree/18f658c0-48b3-47ae-8b5d-35b3bd4c7fc1

But if you look at the lookyloo scan, it does not work as the URL was not correctly formatted.

Hence, could you improve the URL argument parsing in order to prevent bogus results ?
For instance by removing all potential quotes around the URL, and also by ensuring that the URL is correctly structured (using this lib ?)

Cheers, and keep up the good work, CIRCL tools are awesome !

Originally posted by @maaaaz in #4 (comment)

@Rafiot
Copy link
Member Author

Rafiot commented Jul 22, 2023

@maaaaz I just moved you issue here because the one you were referring to was definitely closed (it was simply to make sure the URL we submit the capture to starts with https://).

In your case, you want to make sure the thing submitted to Lookyloo is a valid URL.

I could add a few checks and stripping characters, but I feel it will quickly become unmanageable as people will pass things to capture in in all kind of weird ways, and making the difference with sane but unhelpful encoding as you have with the quotes and clearly broken and insane will be hard.

Important to also keep in mind that the query passed in the API can be a path on the filesystem, and not necessarily a URL.

It would make more sense to the preprocessing on your side (as you know the data you're getting), and make sure what you pass to lookyloo is valid. What I can do on pylookyloo side is making sure to raise an exception of the query is improperly formatted.

@maaaaz
Copy link

maaaaz commented Jul 22, 2023

What I can do on pylookyloo side is making sure to raise an exception of the query is improperly formatted.

Sure, it would be the best way to fix: from the server-side, return an error when bogus data is sent.

Cheers!

@Rafiot
Copy link
Member Author

Rafiot commented Jul 24, 2023

Okay, I investigated further, and the fix will be completely server side with no validation from the client. The reason is that you can pass a "google.fr" and lookyloo will take care of adding http:// so validating a URL in pylookyloo will raise exceptions that aren't really a problem.

I'll instead improve the error message and return it to the user.

@Rafiot
Copy link
Member Author

Rafiot commented Jul 24, 2023

Improve error message: ail-project/LacusCore@dfeb6d6

Rafiot added a commit to Lookyloo/lookyloo that referenced this issue Jul 24, 2023
Rafiot added a commit to Lookyloo/lookyloo that referenced this issue Jul 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants