Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage: simplify upsert state management into read and write halves #18878

Merged
merged 1 commit into from
Apr 20, 2023

Conversation

guswynn
Copy link
Contributor

@guswynn guswynn commented Apr 20, 2023

This pr is in service of simplifying: https://github.com/MaterializeInc/materialize/pull/18810/files. In the upsert case, we require a single hashmap lookup per-key, which the current upsert implementation does not do correctly. We also want to simplify the traitification of upsert, by making it so the trait interface can just be a multiget, and a multiput. In this pr, we do this by using a temporary (but re-used) hashmap to perform the upsert logic on.

This pr will cause a known regression on feature benchmarking for upsert. This is considered unimportant, because upsert is already significantly faster than it has to be, we are replacing it with io, and we prefer a simpler implementation. Its slower because we:

  • Have an additional hashmap lookup
  • Clone an extra Row (for the previous value)

Motivation

  • This PR refactors existing code.

Checklist

  • This PR has adequate test coverage / QA involvement has been duly considered.
  • This PR has an associated up-to-date design doc, is a design doc (template), or is sufficiently small to not require a design.
  • This PR evolves an existing $T ⇔ Proto$T mapping (possibly in a backwards-incompatible way) and therefore is tagged with a T-proto label.
  • If this PR will require changes to cloud orchestration, there is a companion cloud PR to account for those changes that is tagged with the release-blocker label (example).
  • This PR includes the following user-facing behavior changes:

@guswynn guswynn requested review from a team, petrosagg and moulimukherjee April 20, 2023 18:51
Copy link
Contributor

@moulimukherjee moulimukherjee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(with my very limited knowledge) lgtm! And I will rebase my WIP PR after this change

Copy link
Contributor

@petrosagg petrosagg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something seems off with this implementation. I would expect that we iterate over the pending commands and for every key we get the current value, if any, from state. That would form the command_state HashMap.

Then we would run the upsert logic exactly as it was before but replace state with command_state.

Then, after we're done emitting data we would write all the changes of command_state back to state. The last part needs some care so that we preserve deletions.

src/storage/src/render/upsert.rs Outdated Show resolved Hide resolved
src/storage/src/render/upsert.rs Outdated Show resolved Hide resolved
src/storage/src/render/upsert.rs Outdated Show resolved Hide resolved
src/storage/src/render/upsert.rs Outdated Show resolved Hide resolved
@guswynn guswynn force-pushed the upsert-simplify branch 2 times, most recently from 7f74438 to 0fa28c4 Compare April 20, 2023 20:27
let state_value = state.get(key);
commands_state
.entry(*key)
.or_insert_with(|| state_value.cloned());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not move the whole state.get(key).cloned() in the closure?

match value {
Some(value) => {
if let Some(old_value) = state.insert(key, value.clone()) {
if let Some(old_value) = command_state.take() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: command_state.replace(value.clone()) exists in Option and does the same as the insert on the hashmap!

output_updates.push((old_value, ts, -1));
}
*command_state = None;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is superfluous

Copy link
Contributor

@petrosagg petrosagg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good modulo these last comments

@guswynn guswynn enabled auto-merge (rebase) April 20, 2023 21:02
@guswynn guswynn merged commit b385729 into MaterializeInc:main Apr 20, 2023
@guswynn guswynn mentioned this pull request Apr 24, 2023
34 tasks
@guswynn guswynn deleted the upsert-simplify branch April 26, 2023 18:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants