Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Another specialisation for select - select with datasets. #101

Merged
merged 1 commit into from
Apr 30, 2020

Conversation

msm-code
Copy link
Contributor

Ok, why would anyone need this. This is intended to decrease speed to first
result in mquery. Instead of doing one big query, mquery will query datasets
one by one.

Now I can hear you asking: WHY would the logic for this be in mquery? Isn't
async querying a pretty generic use-case? So I agree, but implementing this
in mquery opens a can of worms that I'm not ready to handle, for example:

  • Database can change during the query (solution: persistent snapshots?)
  • Database can't know when it's OK to stop streaming (for example, user
    cancelled a query, or yara can't catch up).
  • It makes database performance even more opaque - is it idle? Or are there
    tasks busy in the background?

Because of this, I think that querying DSets one by one is a good solution for
now. The proper solution would be something like persistent snapshots +
continuable iterators... But yeah, why use a cannon when small PR like this
will do the trick.

Bonus points: we can run queries in parallel!

Ok, why would anyone need this. This is intended to decrease speed to first
result in mquery. Instead of doing one big query, mquery will query datasets
one by one.

Now I can hear you asking: WHY would the logic for this be in mquery? Isn't
async querying a pretty generic use-case? So I agree, but implementing this
in mquery opens a can of worms that I'm not ready to handle, for example:

- Database can change during the query (solution: persistent snapshots?)
- Database can't know when it's OK to stop streaming (for example, user
  cancelled a query, or yara can't catch up).
- It makes database performance even more opaque - is it idle? Or are there
  tasks busy in the background?

Because of this, I think that querying DSets one by one is a good solution for
now. The proper solution would be something like persistent snapshots +
continuable iterators... But yeah, why use a cannon when small PR like this
will do the trick.

Bonus points: we can run queries in parallel!
@msm-code msm-code force-pushed the feature/select-with-datasets branch from 01a942d to b813d0a Compare April 29, 2020 23:31
@msm-code msm-code merged commit 374418a into master Apr 30, 2020
@msm-code msm-code deleted the feature/select-with-datasets branch April 30, 2020 20:08
@msm-code msm-code linked an issue May 2, 2020 that may be closed by this pull request
@msm-code msm-code mentioned this pull request May 2, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Query timeout
1 participant