Cache dependencies when running queries through FlowAPI #1284

jc-harrison · 2019-09-18T13:38:21Z

I have:

Formatted any Python files with black
Brought the branch up to date with master
Added any relevant Github labels
Added tests for any new additions
Added or updated any relevant documentation
Added an Architectural Decision Record (ADR), if appropriate
Added an MPLv2 License Header if appropriate
Updated the Changelog

Description

This pull request adds a new argument store_dependencies (default False) to the Query.store method. Setting store_dependencies=True will cause all the unstored dependencies of a query to be stored before the query itself is stored.

When running queries through the API the Flowmachine server will, by default, use store(store_dependencies=True) to execute and store a query. This behaviour can be switched off by setting the environment variable FLOWMACHINE_SERVER_DISABLE_DEPENDENCY_CACHING=true, in which case only the top-level query will be cached.

A side-effect of this change is that the pre-cache script is no longer required in the docs build, so I have removed it.

The dependency storing is handled by a new method Query._store_dependencies_and_make_sql, which constructs a dependency graph of unstored dependencies and calls store() on each of them in an appropriate order, then waits for all of those to finish before returning the output of self._make_sql. When calling store with store_dependencies=True, _store_dependencies_and_make_sql is passed instead of _make_sql as the ddl_ops_func in write_query_to_cache.

I initially tried handling the dependency storing in an outer function which then calls query.store() after storing all the dependencies, but eventually decided pushing this down to run within write_query_to_cache is better because this way the top-level query is in "executing" state from the start, so a different thread can't start storing the same query before all the dependencies have started running.

I've added a function store_queries_in_order in utils.py, which takes a dependency graph and stores the queries in an order that ensures each query is stored after its dependencies. I've also added a method Query._unstored_dependencies_graph, which creates a dependency graph of only the unstored dependencies of a query (and excludes unstored dependencies of stored queries). This is a Query method rather than a function in utils.py primarily due to import issues with QueryStateMachine.

…chestration

codecov · 2019-09-18T16:28:56Z

Codecov Report

Merging #1284 into master will decrease coverage by 4.69%.
The diff coverage is 50%.

@@            Coverage Diff            @@
##           master    #1284     +/-   ##
=========================================
- Coverage   94.17%   89.47%   -4.7%     
=========================================
  Files         155       18    -137     
  Lines        7499     1587   -5912     
  Branches      697       57    -640     
=========================================
- Hits         7062     1420   -5642     
+ Misses        331      159    -172     
+ Partials      106        8     -98

Flag	Coverage Δ
#flowapi_unit_tests	`82.36% <50%> (-0.19%)`	⬇️
#flowauth_unit_tests	`93.65% <ø> (ø)`	⬆️
#flowclient_unit_tests	`78.78% <ø> (ø)`	⬆️
#flowetl_unit_tests	`?`
#flowkit_jwt_generator_unit_tests	`100% <ø> (ø)`	⬆️
#flowmachine_unit_tests	`?`
#integration_tests	`?`

Impacted Files	Coverage Δ
...it_jwt_generator/flowkit_jwt_generator/fixtures.py	`100% <ø> (ø)`	⬆️
flowapi/flowapi/query_endpoints.py	`79.71% <50%> (-9.85%)`	⬇️
flowapi/flowapi/api_spec.py	`25.53% <0%> (-68.09%)`	⬇️
flowclient/flowclient/client.py	`78.78% <0%> (-16.02%)`	⬇️
flowapi/flowapi/user_model.py	`89.28% <0%> (-5.96%)`	⬇️
...e/server/query_schemas/joined_spatial_aggregate.py
...achine/core/server/query_schemas/daily_location.py
...e/flowmachine/features/raster/raster_statistics.py
...lowmachine/flowmachine/features/spatial/circles.py
... and 124 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f4fe0d9...fceb96e. Read the comment docs.

greenape

Awesome stuff!

Need to follow this up quite quickly with automatic cache shrinking I think - I'd also like just a little bit in the docs to mention this, because it could be a major trap for the unwary.

flowmachine/flowmachine/core/dummy_query.py

greenape · 2019-09-23T08:51:59Z

flowmachine/flowmachine/core/server/server_config.py

+from typing import NamedTuple
+
+
+class FlowmachineServerConfig(NamedTuple):


Huh. I did not know you could do that!

Co-Authored-By: Jonathan Gray <jonathan.gray@flowminder.org>

jc-harrison added 30 commits August 27, 2019 17:24

Add utility function to store all dependencies in a correct order

091a296

Implement dependency-storing in FlowMachine server

59efea6

Add docstrings

a7292e7

Try removing the pre-cache script

073e608

Change default behaviour to store dependencies

6125207

Fix missing import

2345e42

Merge branch 'master' of github.com:Flowminder/FlowKit into simple-or…

ac6cf42

…chestration

Fix run_tests script

7f5d149

Fix integration tests

b9b27c4

Kill running queries before resetting cache.

57432a0

Don't kill the current connection when killing all running queries

41e9eee

Use correct application name to kill queries.

e43a886

Update CHANGELOG.md

6e06f4b

Add integration tests for content of Flowmachine cache

7b30f7e

Merge branch 'master' of github.com:Flowminder/FlowKit into simple-or…

a4848ce

…chestration

Add is_stored property to DummyQuery

f76d496

Determine DummyQuery is_stored from redis state

4742d23

Add unit tests for unstored_dependencies_graph, store_queries_in_order

90f17f7

Fix query_id property for derived queries

7fab6c5

Merge branch 'master' of github.com:Flowminder/FlowKit into simple-or…

e58f9b3

…chestration

Remove kill-all-queries query for now

022eb5f

Merge branch 'master' of github.com:Flowminder/FlowKit into simple-or…

5107939

…chestration

Add debug message

b4e1f6b

Extend token lifetime again

a08a315

Put docs pre-cache script back in

c5642a8

Take docs pre-cache script back out

f9451e9

More debug messages

2d35ade

Wait for queries to finish before ending tests

0d5f5b4

Remove some debug messages

9780fc7

Actually remove the debug messages

1529d43

jc-harrison added 2 commits September 18, 2019 17:55

Add missing import

3adede4

More logs in docs build

27d8173

greenape mentioned this pull request Sep 19, 2019

Rename md5 attribute query_id #1288

Closed

jc-harrison added 13 commits September 19, 2019 15:19

Merge branch 'master' into simple-orchestration

59fcaf4

Remove unnecessary 'environment' block

11ccc74

s/md5/query_id

e084cc1

Fix test

e627d2d

Wait for FlowMachine and FlowAPI before starting docs build

eeb7e01

Allow store_dependencies=True in ModelResult.to_sql

eac6d0b

Reduce sleep time in Circle config

fcd9e80

Use a namedtuple to pass config options

515e9a3

Type annotations

a81d4fa

More log messages

807acc7

Remove extra argument

f69afd5

Move action request loading back into try clause

2015c32

Merge branch 'master' into simple-orchestration

e17494f

greenape requested changes Sep 23, 2019

View reviewed changes

jc-harrison and others added 4 commits September 23, 2019 10:09

Update flowmachine/flowmachine/core/dummy_query.py

ff934ab

Co-Authored-By: Jonathan Gray <jonathan.gray@flowminder.org>

Add note to docs

93dd5e2

Add ModelResult test

f0f8a55

Merge branch 'master' into simple_orchestration

b3ec70f

jc-harrison mentioned this pull request Sep 23, 2019

Automatic cache management #1307

Closed

Update ModelResult test

bc82a85

greenape approved these changes Sep 23, 2019

View reviewed changes

Add server config tests

b598e5f

jc-harrison added the ready-to-merge Label indicating a PR is OK to automerge label Sep 23, 2019

Update test

fceb96e

mergify bot merged commit 956469a into master Sep 23, 2019

mergify bot deleted the simple-orchestration branch September 23, 2019 11:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cache dependencies when running queries through FlowAPI #1284

Cache dependencies when running queries through FlowAPI #1284

jc-harrison commented Sep 18, 2019

codecov bot commented Sep 18, 2019 •

edited

greenape left a comment

greenape Sep 23, 2019

		from typing import NamedTuple


		class FlowmachineServerConfig(NamedTuple):

Cache dependencies when running queries through FlowAPI #1284

Cache dependencies when running queries through FlowAPI #1284

Conversation

jc-harrison commented Sep 18, 2019

I have:

Description

codecov bot commented Sep 18, 2019 • edited

Codecov Report

greenape left a comment

Choose a reason for hiding this comment

greenape Sep 23, 2019

Choose a reason for hiding this comment

codecov bot commented Sep 18, 2019 •

edited