Design doc for upsert order by timestamp #18722

moulimukherjee · 2023-04-11T21:40:12Z

Design doc for custom ORDER BY using the TIMESTAMP metadata field for ENVELOPE UPSERT

Motivation

#16512

Tips for reviewer

Checklist

This PR has adequate test coverage / QA involvement has been duly considered.
This PR has an associated up-to-date design doc, is a design doc (template), or is sufficiently small to not require a design.
This PR evolves an existing $T ⇔ Proto$T mapping (possibly in a backwards-incompatible way) and therefore is tagged with a T-proto label.
If this PR will require changes to cloud orchestration, there is a companion cloud PR to account for those changes that is tagged with the release-blocker label (example).
This PR includes the following user-facing behavior changes:

guswynn

looks good!

doc/developer/design/20230411_envelope_upsert_order_by_timestamp.md

…mp.md Co-authored-by: Gus Wynn <guswynn@gmail.com>

benesch

This is wonderful, thanks! I've CC'd the surfaces team for review of the SQL bits, but you may need to ping them in #team-surfaces too.

doc/developer/design/20230411_envelope_upsert_order_by_timestamp.md

benesch · 2023-04-13T06:18:00Z

doc/developer/design/20230411_envelope_upsert_order_by_timestamp.md

+
+## Lifecycle
+
+What will it take to promote this to the next stage i.e. Alpha?


One thing to watch out for is that as soon as we roll this out to any customers, the syntax will be semi-stable, in that we'll have to be willing to write a migration if we want to change it, and also warn all the customers who were using the old syntax that the syntax has changed.

This is only true after its not unsafe, right?

doc/developer/design/20230411_envelope_upsert_order_by_timestamp.md

benesch · 2023-04-13T06:30:35Z

doc/developer/design/20230411_envelope_upsert_order_by_timestamp.md

+                  | Err(key3, some_error2) |
+                  +------------------------+
+```
+Note: As shown in the example above, the errors are still implicitly ordered by offset as we do not persist any extra metadata for them separately. A later error with the same key will always overwrite a previous one.


This is surprising, IMO! Perhaps we should be attaching the timestamp column to the errors, too? I don't actually understand in what case we can end up with Err((key, error_text))—if Avro decoding of the value fails, but not the key?

if Avro decoding of the value fails, but not the key?
yes!

For some context, a somewhat recent change that made errors retractable: #14209. Which also made how errors work very complex.

Errors in the upsert source are a tricky subject. In general, the current "design" (evolved through quite some iterations) assumes that you only ORDER BY the offset. That's, for example, why handling errors works the way it does, above: when a new error comes in, it can only have an offset that is larger than the error that we re-hydrated from persist, so it always takes precedence. Ordering by a custom column breaks that.

To fix that, we'd have to graft the ORDER BY field on to errors, which feel brittle enough as they are. 🙈

Galaxy-brain idea, where I'm not at all sure we should do it. We could have a new implementation of the UPSERT machinery that is only used for new sources when a custom ORDER BY is used. We could fix our issues around errors and other things.

To clarify further, if say an Error is persisted, we sort of lose the kafka metadata, since those don't get persisted. But for the incoming kafka stream, we will actually have the metadata, so we will have some order value. So, it's more like a comparison between None (when hydrating state from persist) with Some(ts) (value coming from kafka) and the latter taking precedence.

This is also how currently upsert behaves, because we do not persist the offset information. We always let value coming from kafka with Some(offset) override previous state, but in this case it makes sense because new kafka data is always after previously persisted (Could there be scenarios where it isn't?)

We could have a new implementation of the UPSERT machinery that is only used for new sources when a custom ORDER BY is used.

~~Does it mean that the upsert operator will only have the kafka ingestion as the input and there will be no read from persist?~~

Or I am guessing you mean that do a separate upsert implementation where we persist the order by field on errors?

If we could persist the order by information regardless whether it's error or not, it would make things explicit for the current default order by offset too (since we don't persist it as well) and would make it easier to use offset as a tie breaker (#18722 (comment))

doc/developer/design/20230411_envelope_upsert_order_by_timestamp.md

Co-authored-by: Nikhil Benesch <nikhil.benesch@gmail.com>

aljoscha

I like it! Though there are some tough open questions. 🙈 We'll have to ponder yet some more it seems, but it's good that you brought them up!

doc/developer/design/20230411_envelope_upsert_order_by_timestamp.md

aljoscha · 2023-04-13T15:21:21Z

doc/developer/design/20230411_envelope_upsert_order_by_timestamp.md

+                  | Err(key3, some_error2) |
+                  +------------------------+
+```
+Note: As shown in the example above, the errors are still implicitly ordered by offset as we do not persist any extra metadata for them separately. A later error with the same key will always overwrite a previous one.


if Avro decoding of the value fails, but not the key?
yes!

For some context, a somewhat recent change that made errors retractable: #14209. Which also made how errors work very complex.

Errors in the upsert source are a tricky subject. In general, the current "design" (evolved through quite some iterations) assumes that you only ORDER BY the offset. That's, for example, why handling errors works the way it does, above: when a new error comes in, it can only have an offset that is larger than the error that we re-hydrated from persist, so it always takes precedence. Ordering by a custom column breaks that.

To fix that, we'd have to graft the ORDER BY field on to errors, which feel brittle enough as they are. 🙈

Galaxy-brain idea, where I'm not at all sure we should do it. We could have a new implementation of the UPSERT machinery that is only used for new sources when a custom ORDER BY is used. We could fix our issues around errors and other things.

doc/developer/design/20230411_envelope_upsert_order_by_timestamp.md

…mp.md Co-authored-by: Ben Kirwin <ben@kirw.in>

andrewrodriguez-m · 2023-04-14T17:15:27Z

LGTM. I'm implementing a similarly sounding feature (WITHIN TIMESTAMP ORDER BY in subscribe) but I think the use cases are different enough that the difference in syntax makes sense. In the subscribe case we want to be able to order by any column and can only order within timestamp efficiently.

doc/developer/design/20230411_envelope_upsert_order_by_timestamp.md

…mp.md Co-authored-by: Matt Jibson <matt.jibson@gmail.com>

aljoscha · 2023-04-17T14:07:52Z

I might be losing track of the various resolved/un-resolved threads. 😅

Are these the issues we still have open:

Ordering of errors is problematic because we store neither the offset (our default ordering) nor a timestamp (a configured ordering) with errors. This shows up when we receive errors that could replace errors that we already have in state.
What to use as a tie-breaker when we have multiple updates for a key at the same timestamp.

Where 1. was not a problem before because updates that arrived later always had a higher offset, so would always take precedence. Which was also true after a restart where we re-ingested our state from persist (which happens outside the operator, though).

And 2. is only a problem now because, again, before we couldn't have two updates with the same offset. The sensible tie-breaker seems offset here, but that is problematic because we don't necessarily store the offset in our in-memory state or persist.

We could resolve 2. by always including the offset in the row, when you request ordering by timestamp. We could resolve 1. by including the requested ordering column (and the offset, for tie-breaking?!?) in the error. Not sure I like either of those. 😅

moulimukherjee · 2023-04-17T16:35:33Z

From a 1:1 with @aljoscha,

What if we always ask the user to include OFFSET in the order by to make it explicit?
cc @morsapaes @sjwiesman

Valid sql examples:

CREATE SOURCE ... INCLUDE TIMESTAMP, OFFSET ENVELOPE UPSERT ( ORDER BY ( TIMESTAMP, OFFSET ) ASC )
CREATE SOURCE ... INCLUDE OFFSET ENVELOPE UPSERT ( ORDER BY ( OFFSET ) ) (mimicking default behaviour explicitly)
CREATE SOURCE ... INCLUDE TIMESTAMP AS ts , OFFSET AS o ENVELOPE UPSERT ( ORDER BY ( ts, o ) )

Invalid sql examples:

CREATE SOURCE ... INCLUDE TIMESTAMP ENVELOPE UPSERT ( ORDER BY ( TIMESTAMP ) ASC ) (OFFSET is not included in order by)

One of the benefits is, then the offset will always be persisted and helps with debuggability as well. If the user changes the ordering to say (OFFSET, TIMESTAMP) instead, it would still work, but the second order by of TIMESTAMP is sort of meaningless as it's just offset ordering then. Realistically anything which would come after OFFSET will not make a difference in the result, but this can still be allowed.

So, for custom order by if we always ask the user to include offset, this resolves our tie-breaker explicitly which works for valid row objects.

For ordering of errors though, the issue still remains that they'll be implicitly offset ordered, because none of this information is going to be persisted (they can be kept in in-memory state though, but we'll lose it upon restart). But I think it should be fine, because the ability to retract or override errors remains unchanged.

moulimukherjee · 2023-04-18T16:56:44Z

@aljoscha Does it make sense to have the explicit inclusion of offset as a follow up if needed? Since this is behind an unstable flag and without an user, the implicit behavior should be low risk (it would still default to offset, but under the hood).

So, basically proposing the following:

For tie-break scenarios with timestamps, we fallback to the offset value implicitly (we can have an update later to always ask for an offset in the order since this is still behind unsafe flag)
No change in error behaviour, later error always overwrites previous same as without an order by

guswynn

For tie-break scenarios with timestamps, we fallback to the offset value implicitly (we can have an update later to always ask for an offset in the order since this is still behind unsafe flag)

This seems reasonable to keep implicit. It also seems reasonable to enforce ORDER BY (TIMESTAMP, OFFSET) for now, especially considering this syntax will start behind unsafe mode, and does not overly complicate the sql parser. I think with those changes, all the outstanding issues that can be resolved for this limited-scope design are resolved, so we should go ahead and merge!

aljoscha · 2023-04-19T12:39:29Z

Yes, I think we should go ahead and merge this!

I am slightly confused by your message, @guswynn

This seems reasonable to keep implicit. It also seems reasonable to enforce ORDER BY (TIMESTAMP, OFFSET) for now,

Enforcing people use TIMESTAMP, OFFSET means we wouldn't need the implicit offset tie-breaker. At least in my understanding of the thing. 😅

Also, can we please create a follow-up issue for figuring out ordering of errors. We say in the doc that newer ones always replace previous ones (regardless of timestamp), because we don't store the timestamp with the error. We might want to revisit that, but can go ahead with the rest of the design/impl.

moulimukherjee · 2023-04-19T15:48:38Z

Updated the design doc with these changes and created #18842 to track the error ordering issue.

And yup, if the OFFSET is included in the order by columns, it's not implicit anymore.

benesch

Just a quick post-merge syntax note!

benesch · 2023-04-20T06:49:57Z

doc/developer/design/20230411_envelope_upsert_order_by_timestamp.md

+
+The option will be part of the `ENVELOPE UPSERT` clause with the following grammar:
+
+`ENVELOPE UPSERT [(ORDER BY (<expr>) [ASC])]`


With the change to support multiple columns, this should be updated to:

ENVELOPE UPSERT [(ORDER BY (<expr> [ASC], ...))]

To match the way this works for SELECT, the ASC should be optionally attached to each expression.

Ah I see what you mean. I saw ORDER BY (col1, col2) ASC is legal as well in select statements, so had kept that (I guess it's the same thing as you mentioned where the entire tuple is the expression). I will change it to ordering per expression.

benesch · 2023-04-20T06:50:12Z

doc/developer/design/20230411_envelope_upsert_order_by_timestamp.md

+The `ASC` modifier is optional noise for specifying ascending ordering, for symmetry with the `ORDER BY` clause in `SELECT` statements.
+
+Examples of valid syntax and semantics:
+- `CREATE SOURCE ... INCLUDE TIMESTAMP ENVELOPE UPSERT ( ORDER BY ( TIMESTAMP, OFFSET ) ASC )`


CREATE SOURCE ... INCLUDE TIMESTAMP ENVELOPE UPSERT ( ORDER BY ( TIMESTAMP ASC, OFFSET ASC) )

I will update the design doc in follow up implementation PR #18567

Btw, I just tried out select statements in materialize. Parentheses around expressions with ordering is giving an error.

select * from texttext order by (key ASC, text DESC); ERROR: Expected right parenthesis, found ASC LINE 1: select * from texttext order by (key ASC, text DESC);

The following works though

select * from texttext order by key ASC, text; select * from texttext order by (key, text) ASC;

Should we lose the parentheses around the multiple order by statements with order then? Or go back to limiting it to one tuple expression with optional ASC?

Right! The syntax for the ORDER BY clause in SELECT is:

select_order_by ::= ORDER BY <order_by_col> [, <order_by_col>]* order_by_col := <expr> [ASC | DESC] expr := <ident> | ( <expr> ) | ...

Whereas the syntax for the ORDER BY option in CREATE SOURCE will be:

create_source_option_order_by ::= ORDER BY (<order_by_col> [, <order_by_col>]*) order_by_col := <expr> [ASC | DESC] expr := <ident> | ( <expr> ) | ...

So the syntaxes have to slightly deviate, because in CREATE SOURCE we need parens around the order by columns in order to disambiguate between a new <order_by_col> and a new CREATE SOURCE option.

The equivalent of

select * from texttext order by (key, text) ASC;

would be:

CREATE SOURCE ... ENVELOPE UPSERT (ORDER BY ((key, text) ASC))

Makes sense. Thanks for explaining that!

design doc for upsert order by

3501358

moulimukherjee force-pushed the envelope-upsert-order-by-design-doc branch from 2448fa8 to 3501358 Compare April 11, 2023 21:54

added unresolved question

def8ea8

moulimukherjee changed the title ~~WIP: design doc for upsert order by~~ Design doc for upsert order by timestamp Apr 12, 2023

moulimukherjee marked this pull request as ready for review April 12, 2023 16:48

moulimukherjee requested review from guswynn, benesch and a team April 12, 2023 18:18

moulimukherjee mentioned this pull request Apr 12, 2023

Upsert order by timestamp #18567

Closed

5 tasks

guswynn approved these changes Apr 12, 2023

View reviewed changes

moulimukherjee and others added 6 commits April 12, 2023 14:24

Update doc/developer/design/20230411_envelope_upsert_order_by_timesta…

b9a2693

…mp.md Co-authored-by: Gus Wynn <guswynn@gmail.com>

Update doc/developer/design/20230411_envelope_upsert_order_by_timesta…

27bcb1b

…mp.md Co-authored-by: Gus Wynn <guswynn@gmail.com>

Update doc/developer/design/20230411_envelope_upsert_order_by_timesta…

88962c0

…mp.md Co-authored-by: Gus Wynn <guswynn@gmail.com>

Update doc/developer/design/20230411_envelope_upsert_order_by_timesta…

afadf58

…mp.md Co-authored-by: Gus Wynn <guswynn@gmail.com>

Addressing review feedback

0dd7212

slight rewording, adding stops

b66f7ec

benesch reviewed Apr 13, 2023

View reviewed changes

benesch mentioned this pull request Apr 13, 2023

[Epic] Improve & Standardize SUBSCRIBE output #10593

Closed

moulimukherjee and others added 2 commits April 13, 2023 00:00

Apply suggestions from code review

6fcc5a7

Co-authored-by: Nikhil Benesch <nikhil.benesch@gmail.com>

minor formatting corrections

d0d0f1b

aljoscha reviewed Apr 13, 2023

View reviewed changes

Mouli Mukherjee added 2 commits April 13, 2023 14:52

updating the syntax as per feedback and added more examples

c79d488

Added example of tie break using offset

5e198b1

moulimukherjee requested a review from a team April 13, 2023 22:50

correcting example

409a601

bkirwi reviewed Apr 14, 2023

View reviewed changes

doc/developer/design/20230411_envelope_upsert_order_by_timestamp.md Outdated Show resolved Hide resolved

Update doc/developer/design/20230411_envelope_upsert_order_by_timesta…

4cb084e

…mp.md Co-authored-by: Ben Kirwin <ben@kirw.in>

maddyblue approved these changes Apr 15, 2023

View reviewed changes

doc/developer/design/20230411_envelope_upsert_order_by_timestamp.md Outdated Show resolved Hide resolved

Update doc/developer/design/20230411_envelope_upsert_order_by_timesta…

5c15f5a

…mp.md Co-authored-by: Matt Jibson <matt.jibson@gmail.com>

guswynn approved these changes Apr 18, 2023

View reviewed changes

update to always require offset in order by

ddf723b

moulimukherjee mentioned this pull request Apr 19, 2023

Figure out how errors should be ordered in upsert order by #18842

Open

moulimukherjee merged commit 94cea54 into MaterializeInc:main Apr 19, 2023

moulimukherjee deleted the envelope-upsert-order-by-design-doc branch April 19, 2023 15:57

benesch reviewed Apr 20, 2023

View reviewed changes

materialize-bot mentioned this pull request Apr 21, 2023

release: v0.52.0 required reviews #18907

Closed

12 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Design doc for upsert order by timestamp #18722

Design doc for upsert order by timestamp #18722

moulimukherjee commented Apr 11, 2023 •

edited

Loading

guswynn left a comment

benesch left a comment

benesch Apr 13, 2023

guswynn Apr 13, 2023

benesch Apr 13, 2023

aljoscha Apr 13, 2023

moulimukherjee Apr 13, 2023 •

edited

Loading

moulimukherjee Apr 13, 2023 •

edited

Loading

aljoscha left a comment

aljoscha Apr 13, 2023

andrewrodriguez-m commented Apr 14, 2023

aljoscha commented Apr 17, 2023

moulimukherjee commented Apr 17, 2023 •

edited

Loading

moulimukherjee commented Apr 18, 2023

guswynn left a comment

aljoscha commented Apr 19, 2023

moulimukherjee commented Apr 19, 2023 •

edited

Loading

benesch left a comment

benesch Apr 20, 2023

moulimukherjee Apr 20, 2023

benesch Apr 20, 2023

moulimukherjee Apr 20, 2023

moulimukherjee Apr 20, 2023 •

edited

Loading

moulimukherjee Apr 20, 2023

benesch Apr 20, 2023

moulimukherjee Apr 20, 2023


		## Lifecycle

		What will it take to promote this to the next stage i.e. Alpha?


		The option will be part of the `ENVELOPE UPSERT` clause with the following grammar:

		`ENVELOPE UPSERT [(ORDER BY (<expr>) [ASC])]`

Design doc for upsert order by timestamp #18722

Design doc for upsert order by timestamp #18722

Conversation

moulimukherjee commented Apr 11, 2023 • edited Loading

Motivation

Tips for reviewer

Checklist

guswynn left a comment

Choose a reason for hiding this comment

benesch left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

moulimukherjee Apr 13, 2023 • edited Loading

Choose a reason for hiding this comment

moulimukherjee Apr 13, 2023 • edited Loading

Choose a reason for hiding this comment

aljoscha left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

andrewrodriguez-m commented Apr 14, 2023

aljoscha commented Apr 17, 2023

moulimukherjee commented Apr 17, 2023 • edited Loading

moulimukherjee commented Apr 18, 2023

guswynn left a comment

Choose a reason for hiding this comment

aljoscha commented Apr 19, 2023

moulimukherjee commented Apr 19, 2023 • edited Loading

benesch left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

moulimukherjee Apr 20, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

moulimukherjee commented Apr 11, 2023 •

edited

Loading

moulimukherjee Apr 13, 2023 •

edited

Loading

moulimukherjee Apr 13, 2023 •

edited

Loading

moulimukherjee commented Apr 17, 2023 •

edited

Loading

moulimukherjee commented Apr 19, 2023 •

edited

Loading

moulimukherjee Apr 20, 2023 •

edited

Loading