Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

compute: use drop_dataflow #18442

Draft
wants to merge 6 commits into
base: main
Choose a base branch
from
Draft

Conversation

teskje
Copy link
Contributor

@teskje teskje commented Mar 28, 2023

I'm using this PR to track testing of drop_dataflow for dropping dataflows in COMPUTE.

Testing

  • CI passes
    • PR checks
    • Nightlies
  • Cleanup of log events vetted
    • ComputeLog
      • DataflowCurrent (retracted in handle_allow_compaction)
      • DataflowDependency (retracted in handle_allow_compaction)
      • FrontierCurrent (retracted in handle_allow_compaction)
      • ImportFrontierCurrent (fixed by compute: make import frontier logging drop-safe #18531)
      • FrontierDelay (retracted in handle_allow_compaction)
      • PeekCurrent (retracted when peek is served or canceled)
      • PeekDuration (never retracted)
    • TimelyLog
      • Operates (shutdown event emitted in Drop impls for operators and dataflows)
      • Channels (retracted on dataflow shutdown)
      • Elapsed (retracted on operator retraction)
      • Histogram (retracted on operator retraction)
      • Addresses (retracted on operator or channel retraction)
      • Parks (never retracted)
      • MessagesSent (retracted on channel retraction)
      • MessagesReceived (retracted on channel retraction)
      • Reachability (fixed by Drop implementation for Tracker TimelyDataflow/timely-dataflow#517)
    • DifferentialLog
      • ArrangementBatches (DD emits drop events in the Spine Drop impl)
      • ArrangementRecords (DD emits drop events in the Spine Drop impl)
      • Sharing (DD emits retraction events in the TraceAgent Drop impl)
  • No resource leaks or crashes on staging

Motivation

  • This PR adds a known-desirable feature.

Advances #2392.

Tips for reviewer

Checklist

  • This PR has adequate test coverage / QA involvement has been duly considered.
  • This PR has an associated up-to-date design doc, is a design doc (template), or is sufficiently small to not require a design.
  • This PR evolves an existing $T ⇔ Proto$T mapping (possibly in a backwards-incompatible way) and therefore is tagged with a T-proto label.
  • If this PR will require changes to cloud orchestration, there is a companion cloud PR to account for those changes that is tagged with the release-blocker label (example).
  • This PR includes the following user-facing behavior changes:

@teskje
Copy link
Contributor Author

teskje commented Mar 30, 2023

For stability testing, I ran the following script against my staging env:

#! /bin/env python3

from threading import Thread

import pg8000.native

def run_loop(sql):
    conn = pg8000.native.Connection([...])
    while True:
        try:
            for line in sql:
                conn.run(line)
        except Exception as exc:
            print(f"error: {exc}")

q1 = """
SELECT
	l_returnflag,
	l_linestatus,
	sum(l_quantity) AS sum_qty,
	sum(l_extendedprice) AS sum_base_price,
	sum(l_extendedprice * (1 - l_discount)) AS sum_disc_price,
	sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) AS sum_charge,
	avg(l_quantity) AS avg_qty,
	avg(l_extendedprice) AS avg_price,
	avg(l_discount) AS avg_disc,
	count(*) AS count_order
FROM
	lineitem
WHERE
	l_shipdate <= DATE '1998-12-01' - INTERVAL '60' day
GROUP BY
	l_returnflag,
	l_linestatus
ORDER BY
	l_returnflag,
	l_linestatus
"""

workloads = [
    # one-off selects
    [q1],
    # indexes
    [
        f"CREATE VIEW v_indexes AS {q1}",
        "CREATE DEFAULT INDEX on v_indexes",
        "SELECT mz_internal.mz_sleep(10)",
        "DROP VIEW v_indexes",
    ],
    # indexes fast
    [
        f"CREATE VIEW v_indexes_fast AS {q1}",
        "CREATE DEFAULT INDEX on v_indexes_fast",
        "DROP VIEW v_indexes_fast",
    ],
    # MVs
    [
        f"CREATE MATERIALIZED VIEW mv_mvs AS {q1}",
        "SELECT mz_internal.mz_sleep(10)",
        "DROP MATERIALIZED VIEW mv_mvs",
    ],
    # MVs fast
    [
        f"CREATE MATERIALIZED VIEW mv_mvs_fast AS {q1}",
        "DROP MATERIALIZED VIEW mv_mvs_fast",
    ],
    # subscribes
    [
        f"DECLARE c CURSOR FOR SUBSCRIBE ({q1}); FETCH 1 c",
    ],
]

threads = []
for sql in workloads:
    thread = Thread(target=run_loop, args=(sql,))
    thread.start()
    threads.append(thread)

for t in threads:
    t.join()

I let that run against a replica with the following configurations:

  • cpu_limit: 6
  • memory_limit: 48GiB
  • scale: 4
  • workers: 5

(This is a "small" instance but with the "scale" bumped up to ensure we also test multi-process communication.)

After 15 hours, everything looks good! None of the replica pods crashed or produced errors. Resource usage looks stable throughout the entire time frame:
Screenshot 2023-03-30 at 10 55 59

After shutting the test down, there don't appear to be any logging leaks: All introspection sources only contain records from system dataflows.

@teskje
Copy link
Contributor Author

teskje commented Mar 31, 2023

My initial test on staging had two flaws:

  • The DDL statements seemed to overwhelm environmentd and made it very slow to handle queries, which is why only 1300 dataflows were created and dropped during the 15 hours.
  • SELECT statements set an until frontier, which makes the dataflow shut down immediately after having produced the required snapshot, so drop_dataflow is not really exercised by SELECTs.

Because of these issues, I ran a second test with the following changes:

  • The load generation script runs only SELECTs, no DDL statements, in four threads.
  • The adapter code is patched to not set the until frontier when planning peeks.

Apart from that I also increased the number of replica processes to increase the chance of finding issues with the inter-process communication. This is the new replica configuration:

  • cpu_limit: 4
  • memory_limit: 8GiB
  • scale: 16
  • workers: 3

After 18 hours, 123000 dataflows have been created and dropped. As with the first test, none of the replica pods crashed or produced errors, and resource usage looks stable throughout the entire time frame:

Screenshot 2023-03-31 at 16 04 45

(The stray green line is me switching on memory profiling.)

benesch added a commit to benesch/timely-dataflow that referenced this pull request Mar 31, 2023
Between 2673e8c and ff516c8, drop_dataflow appears to be working
well, based on testing in Materialize
(MaterializeInc/materialize#18442). So remove the "public beta" warning
from the docstring.

Fix TimelyDataflow#306.
@teskje teskje force-pushed the drop-dataflow branch 2 times, most recently from f40e78b to a9420f1 Compare April 4, 2023 10:43
This commit moves some code around in preparation for adding support for
active dataflow cancellation. These changes are not required, but they
slighlty improve readability.

* Add a `ComputeState` constructor method. This allows us to make some
  of the `ComputeState` fields immediately private.
* Factor out collection dropping code from `handle_allow_compaction`
  into a `drop_collection` method.
This commit patches timely to get drop-safetly for reachability log
events (TimelyDataflow/timely-dataflow#517).

We need to revert this before we can merge.
This commit implements active dataflow cancellation in compute, by
invoking timely's `drop_dataflow` method when a dataflow is allowed to
compact to the empty frontier.
This commit adds a test verifying that active dataflow cancellation
actually works. It does so by installing a divergent dataflow, dropping
it and then checking the introspection sources to ensure it doesn't
exist anymore.
@philip-stoev
Copy link
Contributor

I would like to stress test this with the RQG. @teskje please let me know when the time will be right to do this.

@teskje
Copy link
Contributor Author

teskje commented Apr 5, 2023

@philip-stoev The time is right now, please go ahead! Note that you have to set the active_dataflow_cancellation feature flag to enable this feature.

@philip-stoev
Copy link
Contributor

Item No 1. This situation does not produce a cancellation:

  1. Run
CREATE TABLE t1 (f1 INTEGER) ;
INSERT INTO t1 WITH MUTUALLY RECURSIVE flip(x INTEGER) AS (VALUES(1) EXCEPT ALL SELECT * FROM flip) SELECT * FROM flip;
  1. Kill the psql client from the outside using killall

Copy link
Contributor

@philip-stoev philip-stoev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pushed extensions to your testdrive test, please take them along for the ride to main.

As a separate effort, in order to complement your test, I created a stress test around a divergent WMR dataflow plus INSERT ... SELECT and SUBSCRIBE cursors. Unfortunately:

  • INSERT ... SELECTs run only one statement at a time
  • cursors cause CRDB to just consume CPU like crazy
  • there must be some other bottleneck in the adapter because even SET queries are slow under load.

A size 8-8 cluster was used.

So I do not think I was able to drive as much concurrency as one would want. Either way, there were dozens of active subscribe dataflows in the system (and all of them would be subject to forcible cancellation) There were no panics , deadlocks or obvious memory leaks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants