Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Design doc for upsert order by timestamp #18722

Conversation

moulimukherjee
Copy link
Contributor

@moulimukherjee moulimukherjee commented Apr 11, 2023

Design doc for custom ORDER BY using the TIMESTAMP metadata field for ENVELOPE UPSERT

Motivation

#16512

Tips for reviewer

Checklist

  • This PR has adequate test coverage / QA involvement has been duly considered.
  • This PR has an associated up-to-date design doc, is a design doc (template), or is sufficiently small to not require a design.
  • This PR evolves an existing $T ⇔ Proto$T mapping (possibly in a backwards-incompatible way) and therefore is tagged with a T-proto label.
  • If this PR will require changes to cloud orchestration, there is a companion cloud PR to account for those changes that is tagged with the release-blocker label (example).
  • This PR includes the following user-facing behavior changes:

@moulimukherjee moulimukherjee force-pushed the envelope-upsert-order-by-design-doc branch from 2448fa8 to 3501358 Compare April 11, 2023 21:54
@moulimukherjee moulimukherjee changed the title WIP: design doc for upsert order by Design doc for upsert order by timestamp Apr 12, 2023
@moulimukherjee moulimukherjee marked this pull request as ready for review April 12, 2023 16:48
@moulimukherjee moulimukherjee requested review from guswynn, benesch and a team April 12, 2023 18:18
@moulimukherjee moulimukherjee mentioned this pull request Apr 12, 2023
5 tasks
Copy link
Contributor

@guswynn guswynn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good!

Copy link
Member

@benesch benesch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is wonderful, thanks! I've CC'd the surfaces team for review of the SQL bits, but you may need to ping them in #team-surfaces too.


## Lifecycle

What will it take to promote this to the next stage i.e. Alpha?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing to watch out for is that as soon as we roll this out to any customers, the syntax will be semi-stable, in that we'll have to be willing to write a migration if we want to change it, and also warn all the customers who were using the old syntax that the syntax has changed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is only true after its not unsafe, right?

| Err(key3, some_error2) |
+------------------------+
```
Note: As shown in the example above, the errors are still implicitly ordered by offset as we do not persist any extra metadata for them separately. A later error with the same key will always overwrite a previous one.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is surprising, IMO! Perhaps we should be attaching the timestamp column to the errors, too? I don't actually understand in what case we can end up with Err((key, error_text))—if Avro decoding of the value fails, but not the key?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if Avro decoding of the value fails, but not the key?
yes!

For some context, a somewhat recent change that made errors retractable: #14209. Which also made how errors work very complex.

Errors in the upsert source are a tricky subject. In general, the current "design" (evolved through quite some iterations) assumes that you only ORDER BY the offset. That's, for example, why handling errors works the way it does, above: when a new error comes in, it can only have an offset that is larger than the error that we re-hydrated from persist, so it always takes precedence. Ordering by a custom column breaks that.

To fix that, we'd have to graft the ORDER BY field on to errors, which feel brittle enough as they are. 🙈

Galaxy-brain idea, where I'm not at all sure we should do it. We could have a new implementation of the UPSERT machinery that is only used for new sources when a custom ORDER BY is used. We could fix our issues around errors and other things.

Copy link
Contributor Author

@moulimukherjee moulimukherjee Apr 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To clarify further, if say an Error is persisted, we sort of lose the kafka metadata, since those don't get persisted. But for the incoming kafka stream, we will actually have the metadata, so we will have some order value. So, it's more like a comparison between None (when hydrating state from persist) with Some(ts) (value coming from kafka) and the latter taking precedence.

This is also how currently upsert behaves, because we do not persist the offset information. We always let value coming from kafka with Some(offset) override previous state, but in this case it makes sense because new kafka data is always after previously persisted (Could there be scenarios where it isn't?)

Copy link
Contributor Author

@moulimukherjee moulimukherjee Apr 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could have a new implementation of the UPSERT machinery that is only used for new sources when a custom ORDER BY is used.

Does it mean that the upsert operator will only have the kafka ingestion as the input and there will be no read from persist?

Or I am guessing you mean that do a separate upsert implementation where we persist the order by field on errors?

If we could persist the order by information regardless whether it's error or not, it would make things explicit for the current default order by offset too (since we don't persist it as well) and would make it easier to use offset as a tie breaker (#18722 (comment))

moulimukherjee and others added 2 commits April 13, 2023 00:00
Co-authored-by: Nikhil Benesch <nikhil.benesch@gmail.com>
Copy link
Contributor

@aljoscha aljoscha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like it! Though there are some tough open questions. 🙈 We'll have to ponder yet some more it seems, but it's good that you brought them up!

| Err(key3, some_error2) |
+------------------------+
```
Note: As shown in the example above, the errors are still implicitly ordered by offset as we do not persist any extra metadata for them separately. A later error with the same key will always overwrite a previous one.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if Avro decoding of the value fails, but not the key?
yes!

For some context, a somewhat recent change that made errors retractable: #14209. Which also made how errors work very complex.

Errors in the upsert source are a tricky subject. In general, the current "design" (evolved through quite some iterations) assumes that you only ORDER BY the offset. That's, for example, why handling errors works the way it does, above: when a new error comes in, it can only have an offset that is larger than the error that we re-hydrated from persist, so it always takes precedence. Ordering by a custom column breaks that.

To fix that, we'd have to graft the ORDER BY field on to errors, which feel brittle enough as they are. 🙈

Galaxy-brain idea, where I'm not at all sure we should do it. We could have a new implementation of the UPSERT machinery that is only used for new sources when a custom ORDER BY is used. We could fix our issues around errors and other things.

@moulimukherjee moulimukherjee requested a review from a team April 13, 2023 22:50
…mp.md

Co-authored-by: Ben Kirwin <ben@kirw.in>
@andrewrodriguez-m
Copy link
Contributor

LGTM. I'm implementing a similarly sounding feature (WITHIN TIMESTAMP ORDER BY in subscribe) but I think the use cases are different enough that the difference in syntax makes sense. In the subscribe case we want to be able to order by any column and can only order within timestamp efficiently.

…mp.md

Co-authored-by: Matt Jibson <matt.jibson@gmail.com>
@aljoscha
Copy link
Contributor

I might be losing track of the various resolved/un-resolved threads. 😅

Are these the issues we still have open:

  1. Ordering of errors is problematic because we store neither the offset (our default ordering) nor a timestamp (a configured ordering) with errors. This shows up when we receive errors that could replace errors that we already have in state.
  2. What to use as a tie-breaker when we have multiple updates for a key at the same timestamp.

Where 1. was not a problem before because updates that arrived later always had a higher offset, so would always take precedence. Which was also true after a restart where we re-ingested our state from persist (which happens outside the operator, though).

And 2. is only a problem now because, again, before we couldn't have two updates with the same offset. The sensible tie-breaker seems offset here, but that is problematic because we don't necessarily store the offset in our in-memory state or persist.

We could resolve 2. by always including the offset in the row, when you request ordering by timestamp. We could resolve 1. by including the requested ordering column (and the offset, for tie-breaking?!?) in the error. Not sure I like either of those. 😅

@moulimukherjee
Copy link
Contributor Author

moulimukherjee commented Apr 17, 2023

From a 1:1 with @aljoscha,

What if we always ask the user to include OFFSET in the order by to make it explicit?
cc @morsapaes @sjwiesman

Valid sql examples:

  • CREATE SOURCE ... INCLUDE TIMESTAMP, OFFSET ENVELOPE UPSERT ( ORDER BY ( TIMESTAMP, OFFSET ) ASC )
  • CREATE SOURCE ... INCLUDE OFFSET ENVELOPE UPSERT ( ORDER BY ( OFFSET ) ) (mimicking default behaviour explicitly)
  • CREATE SOURCE ... INCLUDE TIMESTAMP AS ts , OFFSET AS o ENVELOPE UPSERT ( ORDER BY ( ts, o ) )

Invalid sql examples:

  • CREATE SOURCE ... INCLUDE TIMESTAMP ENVELOPE UPSERT ( ORDER BY ( TIMESTAMP ) ASC ) (OFFSET is not included in order by)

One of the benefits is, then the offset will always be persisted and helps with debuggability as well. If the user changes the ordering to say (OFFSET, TIMESTAMP) instead, it would still work, but the second order by of TIMESTAMP is sort of meaningless as it's just offset ordering then. Realistically anything which would come after OFFSET will not make a difference in the result, but this can still be allowed.

So, for custom order by if we always ask the user to include offset, this resolves our tie-breaker explicitly which works for valid row objects.

For ordering of errors though, the issue still remains that they'll be implicitly offset ordered, because none of this information is going to be persisted (they can be kept in in-memory state though, but we'll lose it upon restart). But I think it should be fine, because the ability to retract or override errors remains unchanged.

@moulimukherjee
Copy link
Contributor Author

@aljoscha Does it make sense to have the explicit inclusion of offset as a follow up if needed? Since this is behind an unstable flag and without an user, the implicit behavior should be low risk (it would still default to offset, but under the hood).

So, basically proposing the following:

  • For tie-break scenarios with timestamps, we fallback to the offset value implicitly (we can have an update later to always ask for an offset in the order since this is still behind unsafe flag)
  • No change in error behaviour, later error always overwrites previous same as without an order by

Copy link
Contributor

@guswynn guswynn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For tie-break scenarios with timestamps, we fallback to the offset value implicitly (we can have an update later to always ask for an offset in the order since this is still behind unsafe flag)

This seems reasonable to keep implicit. It also seems reasonable to enforce ORDER BY (TIMESTAMP, OFFSET) for now, especially considering this syntax will start behind unsafe mode, and does not overly complicate the sql parser. I think with those changes, all the outstanding issues that can be resolved for this limited-scope design are resolved, so we should go ahead and merge!

@aljoscha
Copy link
Contributor

Yes, I think we should go ahead and merge this!

I am slightly confused by your message, @guswynn

This seems reasonable to keep implicit. It also seems reasonable to enforce ORDER BY (TIMESTAMP, OFFSET) for now,

Enforcing people use TIMESTAMP, OFFSET means we wouldn't need the implicit offset tie-breaker. At least in my understanding of the thing. 😅

Also, can we please create a follow-up issue for figuring out ordering of errors. We say in the doc that newer ones always replace previous ones (regardless of timestamp), because we don't store the timestamp with the error. We might want to revisit that, but can go ahead with the rest of the design/impl.

@moulimukherjee
Copy link
Contributor Author

moulimukherjee commented Apr 19, 2023

Updated the design doc with these changes and created #18842 to track the error ordering issue.

And yup, if the OFFSET is included in the order by columns, it's not implicit anymore.

@moulimukherjee moulimukherjee merged commit 94cea54 into MaterializeInc:main Apr 19, 2023
@moulimukherjee moulimukherjee deleted the envelope-upsert-order-by-design-doc branch April 19, 2023 15:57
Copy link
Member

@benesch benesch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a quick post-merge syntax note!


The option will be part of the `ENVELOPE UPSERT` clause with the following grammar:

`ENVELOPE UPSERT [(ORDER BY (<expr>) [ASC])]`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the change to support multiple columns, this should be updated to:

ENVELOPE UPSERT [(ORDER BY (<expr> [ASC], ...))]

To match the way this works for SELECT, the ASC should be optionally attached to each expression.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see what you mean. I saw ORDER BY (col1, col2) ASC is legal as well in select statements, so had kept that (I guess it's the same thing as you mentioned where the entire tuple is the expression). I will change it to ordering per expression.

The `ASC` modifier is optional noise for specifying ascending ordering, for symmetry with the `ORDER BY` clause in `SELECT` statements.

Examples of valid syntax and semantics:
- `CREATE SOURCE ... INCLUDE TIMESTAMP ENVELOPE UPSERT ( ORDER BY ( TIMESTAMP, OFFSET ) ASC )`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CREATE SOURCE ... INCLUDE TIMESTAMP ENVELOPE UPSERT ( ORDER BY ( TIMESTAMP  ASC, OFFSET ASC) )

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will update the design doc in follow up implementation PR #18567

Copy link
Contributor Author

@moulimukherjee moulimukherjee Apr 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw, I just tried out select statements in materialize. Parentheses around expressions with ordering is giving an error.

select * from texttext order by (key ASC, text DESC);
ERROR:  Expected right parenthesis, found ASC
LINE 1: select * from texttext order by (key ASC, text DESC);

The following works though

select * from texttext order by key ASC, text;

select * from texttext order by (key, text) ASC;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we lose the parentheses around the multiple order by statements with order then? Or go back to limiting it to one tuple expression with optional ASC?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right! The syntax for the ORDER BY clause in SELECT is:

select_order_by ::= ORDER BY <order_by_col> [, <order_by_col>]*
order_by_col := <expr> [ASC | DESC]
expr := <ident> | ( <expr> ) | ...

Whereas the syntax for the ORDER BY option in CREATE SOURCE will be:

create_source_option_order_by ::= ORDER BY (<order_by_col> [, <order_by_col>]*)
order_by_col := <expr> [ASC | DESC]
expr := <ident> | ( <expr> ) | ...

So the syntaxes have to slightly deviate, because in CREATE SOURCE we need parens around the order by columns in order to disambiguate between a new <order_by_col> and a new CREATE SOURCE option.

The equivalent of

select * from texttext order by (key, text) ASC;

would be:

CREATE SOURCE ... ENVELOPE UPSERT (ORDER BY ((key, text) ASC))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. Thanks for explaining that!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants