Skip to content

coord: Parallelize builtin table writes with side effects#23551

Merged
ParkMyCar merged 6 commits intoMaterializeInc:mainfrom
ParkMyCar:coord/ddl-parallelize-a-few-things
Dec 5, 2023
Merged

coord: Parallelize builtin table writes with side effects#23551
ParkMyCar merged 6 commits intoMaterializeInc:mainfrom
ParkMyCar:coord/ddl-parallelize-a-few-things

Conversation

@ParkMyCar
Copy link
Copy Markdown
Contributor

@ParkMyCar ParkMyCar commented Nov 29, 2023

This PR improves the latency of creation DDL, e.g. CREATE TABLE, by waiting for builtin table updates to complete concurrently with any side effects, e.g. creating a storage collection.

Part of this change which might be particularly controversial, is refactoring the handling of group commits, and removing the notion of blocking. Now it's up to the caller as to whether or not they want to block on a group commit finishing, by .await-ing a returned Future. This change makes it possible to wait on builtin table updates concurrently with other tasks. Specifically we no longer call self.group_commit_apply(...) directly in group_commit_initiate(...) instead we queue a GroupCommitApply task on the internal command sender. I believe this is okay to do because all internal commands are handled before user commands, so we'll still process the group commit apply before users could observe any intermediate state, e.g. a table being created but not showing up in mz_tables. Briefly I was concerned about the "late" GroupCommitApply message moving the write timestamp backwards, but this invariant is enforced in the implementations of the timestamp oracle.

Open Telemetry Trace

Screenshot 2023-11-29 at 5 53 34 PM

The above is the result of running CREATE TABLE t1 (x int, y text, z timestamp) on staging with this built deployed. You can see that "group_commit_initiate" (builtin table updates) and "insert_without_overwrite" (a side effect), get run in parallel.

Motivation

Improves https://github.com/MaterializeInc/database-issues/issues/7078

Tips for reviewer

Hiding white space is helpful since a number of things either got indented or de-dented a level.

This PR is split into two commits which can be reviewed independently:

  1. Refactoring builtin table updates
  2. Refactoring sequencing so the relevant DDLs run their side effects concurrently with the builtin table updates.

Checklist

  • This PR has adequate test coverage / QA involvement has been duly considered.
  • This PR has an associated up-to-date design doc, is a design doc (template), or is sufficiently small to not require a design.
  • If this PR evolves an existing $T ⇔ Proto$T mapping (possibly in a backwards-incompatible way), then it is tagged with a T-proto label.
  • If this PR will require changes to cloud orchestration or tests, there is a companion cloud PR to account for those changes that is tagged with the release-blocker label (example).
  • This PR includes the following user-facing behavior changes:
    • Internal performance improvement

@ParkMyCar ParkMyCar marked this pull request as ready for review November 29, 2023 21:43
@ParkMyCar ParkMyCar requested review from a team as code owners November 29, 2023 21:43
@ParkMyCar ParkMyCar requested a review from jkosh44 November 29, 2023 21:43
@ParkMyCar ParkMyCar force-pushed the coord/ddl-parallelize-a-few-things branch from a0d9c6f to 790f656 Compare November 29, 2023 22:03
Comment on lines +361 to +370
// Note: while technically we should have `GroupCommitApply` update the notifies, we
// know that internal commands get processed before any user commands, so the writes
// are still guaranteed to be observable before any user commands.
for notify in notifies {
// We don't care if the listeners have gone away.
let _ = notify.send(());
}

// Trigger a GroupCommitApply, which will run before any user commands since we're
// sending it on the internal command sender.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This scares me a bit. It's not obviously correct and seems like something that's easy to accidentally break without realizing it. Why not just send the replies after we apply the timestamp?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't send the notifies after we apply the timestamp because that requires waiting for the Coordinator to run the queued GroupCommitApply command, which it won't be able to do because the current command is waiting for the notify to resolve, so we end up in a dead lock.

Joe and I talked in a Huddle, the correctness property here relies on internal commands being run before user commands. We queue a GroupCommitApply on the internal command sender before notifying any waiters, so we know the write will be applied before any user commands could observe the intermediate state.

We don't love relying on this property though, and conveniently some PlatformV2 work will soon make it possible to give an owned handle to the timestmap oracle, to this spawned task. Then we could explicitly apply the write in the task instead of relying on the queued command to get processed before user commands. Waiting to hear back on how soon the shareable timestamp oracle will be ready!


/// Submit a write to a system table.
///
/// This method will block the Coordinator on doing some initial work, and then returns a
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

on doing some initial work

This seems a bit vague, what exactly does it wait for?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated this comment to be more descriptive

})
})
.await;
.await?;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did we add a ?? Doesn't this always return an error making the rest of this function un-reachable?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't mean to do this, was a cargo culting issue. Fortunately our tests caught the bug! I updated this and made the assertions stronger here.

@def-
Copy link
Copy Markdown
Contributor

def- commented Nov 30, 2023

The test failures look legitimate. I'll wait until it's green to run nightly/coverage.

@ParkMyCar ParkMyCar force-pushed the coord/ddl-parallelize-a-few-things branch from 790f656 to f72a513 Compare November 30, 2023 16:11
@jkosh44
Copy link
Copy Markdown
Contributor

jkosh44 commented Nov 30, 2023

We should also open follow-up issues for the dependency tables.

EDIT: Oops I commented this on the wrong PR, please ignore.

@ParkMyCar ParkMyCar force-pushed the coord/ddl-parallelize-a-few-things branch from 254394b to db89ed7 Compare December 1, 2023 18:40
Copy link
Copy Markdown
Contributor

@jkosh44 jkosh44 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

.metrics
.append_table_duration_seconds
.with_label_values(&[label]);
.with_label_values(&["false"]);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just realizing, is always using false here correct? Whether we block or not is determined by the caller so we don't really know if we're going to block here. Should we just remove the labels from this Histogram?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're totally right! Removing this label would be best, I'm not sure if we can do that though/if it's forward compatible, let me check

Comment on lines +101 to +105
// Run our side effects concurrently with the table updates.
let ((), ()) = futures::future::join(side_effects_fut, table_updates).await;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For my own Rust learnings, how does side_effects_fut and table_updates run concurrently if they both have a mutable reference to the Coordinator?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or am I wrong about them both needing a mutable reference?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

table_updates does not actually have a mutable reference to the Coordinator, which is how they're able to both run concurrently

Comment on lines +407 to +410
// Note: while technically we should have `GroupCommitApply` update the notifies, we
// know that internal commands get processed before any user commands, so the writes
// are still guaranteed to be observable before any user commands, because we
// submitted the GroupCommitApply above.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we get rid of this commit now or combine it with the one above on line 394?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I'm a bit confused, what do you mean by this?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might be misreading, but it seems like the following two comments are trying to say the same thing:

// Trigger a GroupCommitApply, which will run before any user commands since we're
// sending it on the internal command sender.

// Note: while technically we should have GroupCommitApply update the notifies, we
// know that internal commands get processed before any user commands, so the writes
// are still guaranteed to be observable before any user commands, because we
// submitted the GroupCommitApply above.

So it might be a bit easier to maintain if we combined them into a single comment. That's subjective though, so feel free to ignore.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah ha! Updated the comment and combined the two

Comment on lines +426 to +430
let cloud_resource_controller = self
.cloud_resource_controller
.as_ref()
.cloned()
.ok_or(AdapterError::Unsupported("AWS PrivateLink connections"))?;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where did this come from? Or am I missing something obvious?

Copy link
Copy Markdown
Contributor Author

@ParkMyCar ParkMyCar Dec 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was split out from an error case which used to occur after the catalog transaction. But I moved this up so we'd error before the transaction. Turns out this made a bunch of tests fail, so I refactored it again to just emit warnings which is what was suggested in Slack

@ParkMyCar ParkMyCar force-pushed the coord/ddl-parallelize-a-few-things branch from db89ed7 to 79845a0 Compare December 4, 2023 13:51
@ParkMyCar ParkMyCar force-pushed the coord/ddl-parallelize-a-few-things branch from 79845a0 to fa01c77 Compare December 4, 2023 15:17
@shepherdlybot
Copy link
Copy Markdown

shepherdlybot bot commented Dec 4, 2023

Risk Score:81 / 100 Bug Hotspots:4 Resilience Coverage:60%

Mitigations

Completing required mitigations increases Resilience Coverage.

  • (Required) Code Review 🔍 Detected
  • (Required) Feature Flag
  • (Required) Integration Test
  • (Required) Observability 🔍 Detected
  • (Required) QA Review 🔍 Detected
  • Unit Test
Bug Hotspots:

What's This?

File Percentile
../sequencer/inner.rs 98
../coord/command_handler.rs 92
../src/coord.rs 100
../coord/ddl.rs 93

@ParkMyCar ParkMyCar force-pushed the coord/ddl-parallelize-a-few-things branch 2 times, most recently from e308e8b to 6dff1ed Compare December 4, 2023 20:27
@ParkMyCar ParkMyCar force-pushed the coord/ddl-parallelize-a-few-things branch from 6dff1ed to 7bf7a40 Compare December 4, 2023 21:01
@ParkMyCar ParkMyCar merged commit 06d8f66 into MaterializeInc:main Dec 5, 2023
@ParkMyCar ParkMyCar deleted the coord/ddl-parallelize-a-few-things branch June 17, 2024 20:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants