Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make it possible to disallow slow queries (or warn user about the problem) #315

Merged
merged 8 commits into from
Jan 10, 2023

Conversation

msm-code
Copy link
Contributor

Your checklist for this pull request

  • I've read the contributing guideline.
  • I've tested my changes by building and running mquery, and testing changed functionality (if applicable)
  • I've added automated tests for my change (if applicable, optional)
  • I've updated documentation to reflect my change (if applicable) - not yet, I'll add a separate issue for documentation update

What is the current behaviour?

#308

Currently, people who don't understand how mquery and mwdb work behind the scene can execute very bad yara queries and it'll be (a) expensive (b) inefficient (c) take forever to run.

What is the new behaviour?

image

If configuration option query_disallow_degenerate is set to "true", users won't be able to run any queries that would end up doing a whole backend scan. Of course they can still run very inefficient queries like "aaa" | "bbb" | ccc" | "ddd" or just {00 00 00}, so this won't help in every case, but that's all we can do at this stage (others will be handled by #297).

I know this is not exactly what #308 asked for. I wonder if anyone really wants to use the "soft" mode - warn only, don't reject the query. If not, I think it's good as just a configuration option.

Closing issues

fixes #308

"Some of the rules would require a Yara scan of every indexed "
"file, and this is not allowed by this instance. "
f"Problematic rules: {degenerate_rule_names}. "
f"Read {doc_url} for more details."
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

link is not clickable, I guess it should :)

Copy link
Contributor Author

@msm-code msm-code Dec 29, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not possible with an error message (without severe hacks, like putting HTML code into a Python exception, or - even worse - parsing the error message on the frontend).

I guess I'll do it the harder way, and put the related alert logic into the frontend. This will require some changes, so I'll address this a bit later (in this PR).

@ITAYC0HEN
Copy link
Collaborator

I wonder if anyone really wants to use the "soft" mode - warn only

Sometimes users really want to just list all the files that has a certain imphash value, and it's very important for their research. (i hope this falls under "degenerate query")

pe.imphash() == "b8bb385806b89680e13fc0cf24f4431e"

On other times, users don't really understand the problem.

Combining warn-only mode, and #297 will make sure that even if the user really wants to proceed, they can and it will not be too bad for the system (thanks to the file-limit)

@@ -335,11 +335,28 @@ def query(
rule_author=rule.author,
is_global=rule.is_global,
is_private=rule.is_private,
is_degenerate=rule.parse().is_degenerate,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could probably call rule.parse() just once, somewhere above? In case it gets expensive in the future

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Performance hit is negliglible, and even if it wasn't, it's already cached:

    def parse(self) -> UrsaExpression:
        if self.__parsed is None:
            self.__parsed = self.__parse_internal()
        return self.__parsed

I worry more about readability. There's no way to assign it to a temporary variable in a list comprehension, but I can rewrite this all to an explicit for loop if you prefer 🤔 It would look like this:

response_rules = []
for rule in rules:
    parsed_rule = rule.parse()
    response_rules.append(
        ParseResponseSchema(
            rule_name=rule.name,
            rule_author=rule.author,
            is_global=rule.is_global,
            is_private=rule.is_private,
            is_degenerate=parsed_rule.is_degenerate,
            parsed=parsed_rule.query,
        )
    )
return response_rules

I prefer the current version, but I'm not overly attached to it if you prefer this one.

src/app.py Outdated Show resolved Hide resolved
Co-authored-by: Michał Praszmo <michalpr@cert.pl>
@msm-code
Copy link
Contributor Author

(i hope this falls under "degenerate query")

It does :). No viable ngrams == "degenerate".

Combining warn-only mode, and #297 will make sure that even if the user really wants to proceed, they can and it will not be too bad for the system (thanks to the file-limit)

ACK, thanks for input. I'll rework the code then to just warn the user in the 'non-enforcing' mode.

@ITAYC0HEN
Copy link
Collaborator

So 3 modes? Allow\Ignore, Warn, Prevent\Block?

@ITAYC0HEN
Copy link
Collaborator

ITAYC0HEN commented Dec 30, 2022

it can also be nice if file-limit can be configured to work:

  1. always
  2. only in warn and block modes for degenerate queries
  3. only in block mode for degenerate queries
  4. never

@msm-code
Copy link
Contributor Author

So 3 modes? Allow\Ignore, Warn, Prevent\Block?

I was considering it, but I think just two modes - Warn and Block are enough. It will be very easy to bypass the warning, and even when we trust users absolutely it may be nice to get an information that rule is inefficient (this can for example happen when copy-pasting rules from the internet - sometimes they look good but have something like or pe.imphash == '1234' that ruins it for mquery. I have a WIP docs page about writing good yara rules).

So in short, I don't see the application for Always mode, but I'm not fundamentally against it if anyone needs it.

it can also be nice if file-limit can be configured to work:

Haven't thought of that yet. I don't immediately like the idea, because:

But this is probably discussion for #297

@msm-code
Copy link
Contributor Author

msm-code commented Jan 5, 2023

@nazywam @ITAYC0HEN can you re-review?

Changes since the last time:

  • I've changed query_disallow_degenerate to query_allow_slow. Nitpicker in me doesn't like the new name (slow queries are still possible, just harder), but it's probably easier to understand
  • If that config key is set to "false" or empty, database will reject the query outright, and link the docs (I've updated them too)
  • If that config is set to "true", database will ask the user if they're sure, the query button will change to scary red, and the search will start if the user clicks it again:

image

Also I've sneakily fixed broken CSS in the dropdowns.

Things I didn't change:

  • Link is still not clickable. After tackling that for a while, I think this would require quite a few changes in the error handling on both server and client (at least if we want to do it correctly). I don't feel it's important enough improvement to justify the number of changes (link is copy&pasteable so the user can easily copy it). Maybe we can make it a (low priority?) issue.
  • As per my previous message, there's no "Allow/Ignore" mode, only "Warn" and "Prevent/Block"

@msm-code msm-code changed the title Feature/disallow degenerate queries Make it possible to disallow slow queries (and warn the user about the problem) Jan 5, 2023
@msm-code msm-code changed the title Make it possible to disallow slow queries (and warn the user about the problem) Make it possible to disallow slow queries (or warn user about the problem) Jan 5, 2023
src/app.py Show resolved Hide resolved
@ITAYC0HEN
Copy link
Collaborator

LGTM overall. Didn't check, but I hope that this feature works smartly with API requests as well, in some intuitive way.
Or is it?

@msm-code
Copy link
Contributor Author

Didn't check, but I hope that this feature works smartly with API requests as well, in some intuitive way. Or is it?

Don't know about smartly, but it works:

  • The /api/query/ API has one new optional parameter - force_slow_queries (default is false). It works as expected - without it slow queries will be rejected. With the parameter set to true slow queries will be accepted (if mquery's configuration allows it).
  • One caveat is that mquery error responses are not particularly API-friendly - just a JSON with an error message (no error code or something similar). So the API client doesn't know (without parsing the error message) what is the error cause (slow yara rule, syntax error, real internal server error, etc).

@yankovs yankovs mentioned this pull request Jan 10, 2023
4 tasks
@msm-code msm-code merged commit 9b90874 into master Jan 10, 2023
@msm-code msm-code deleted the feature/disallow-degenerate-queries branch January 10, 2023 23:33
@msm-code msm-code mentioned this pull request Feb 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Advise the user that the Yara query they're about to execute is inefficient
4 participants