-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cache dependencies when running queries through FlowAPI #1284
Conversation
Codecov Report
@@ Coverage Diff @@
## master #1284 +/- ##
=========================================
- Coverage 94.17% 89.47% -4.7%
=========================================
Files 155 18 -137
Lines 7499 1587 -5912
Branches 697 57 -640
=========================================
- Hits 7062 1420 -5642
+ Misses 331 159 -172
+ Partials 106 8 -98
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome stuff!
Need to follow this up quite quickly with automatic cache shrinking I think - I'd also like just a little bit in the docs to mention this, because it could be a major trap for the unwary.
from typing import NamedTuple | ||
|
||
|
||
class FlowmachineServerConfig(NamedTuple): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Huh. I did not know you could do that!
Co-Authored-By: Jonathan Gray <jonathan.gray@flowminder.org>
Closes #1152
I have:
Description
This pull request adds a new argument
store_dependencies
(defaultFalse
) to theQuery.store
method. Settingstore_dependencies=True
will cause all the unstored dependencies of a query to be stored before the query itself is stored.When running queries through the API the Flowmachine server will, by default, use
store(store_dependencies=True)
to execute and store a query. This behaviour can be switched off by setting the environment variableFLOWMACHINE_SERVER_DISABLE_DEPENDENCY_CACHING=true
, in which case only the top-level query will be cached.A side-effect of this change is that the pre-cache script is no longer required in the docs build, so I have removed it.
The dependency storing is handled by a new method
Query._store_dependencies_and_make_sql
, which constructs a dependency graph of unstored dependencies and callsstore()
on each of them in an appropriate order, then waits for all of those to finish before returning the output ofself._make_sql
. When callingstore
withstore_dependencies=True
,_store_dependencies_and_make_sql
is passed instead of_make_sql
as theddl_ops_func
inwrite_query_to_cache
.I initially tried handling the dependency storing in an outer function which then calls
query.store()
after storing all the dependencies, but eventually decided pushing this down to run withinwrite_query_to_cache
is better because this way the top-level query is in "executing" state from the start, so a different thread can't start storing the same query before all the dependencies have started running.I've added a function
store_queries_in_order
inutils.py
, which takes a dependency graph and stores the queries in an order that ensures each query is stored after its dependencies. I've also added a methodQuery._unstored_dependencies_graph
, which creates a dependency graph of only the unstored dependencies of a query (and excludes unstored dependencies of stored queries). This is aQuery
method rather than a function inutils.py
primarily due to import issues withQueryStateMachine
.