-
Notifications
You must be signed in to change notification settings - Fork 457
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Design doc for upsert order by timestamp #18722
Design doc for upsert order by timestamp #18722
Conversation
2448fa8
to
3501358
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good!
doc/developer/design/20230411_envelope_upsert_order_by_timestamp.md
Outdated
Show resolved
Hide resolved
doc/developer/design/20230411_envelope_upsert_order_by_timestamp.md
Outdated
Show resolved
Hide resolved
doc/developer/design/20230411_envelope_upsert_order_by_timestamp.md
Outdated
Show resolved
Hide resolved
doc/developer/design/20230411_envelope_upsert_order_by_timestamp.md
Outdated
Show resolved
Hide resolved
doc/developer/design/20230411_envelope_upsert_order_by_timestamp.md
Outdated
Show resolved
Hide resolved
doc/developer/design/20230411_envelope_upsert_order_by_timestamp.md
Outdated
Show resolved
Hide resolved
doc/developer/design/20230411_envelope_upsert_order_by_timestamp.md
Outdated
Show resolved
Hide resolved
…mp.md Co-authored-by: Gus Wynn <guswynn@gmail.com>
…mp.md Co-authored-by: Gus Wynn <guswynn@gmail.com>
…mp.md Co-authored-by: Gus Wynn <guswynn@gmail.com>
…mp.md Co-authored-by: Gus Wynn <guswynn@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is wonderful, thanks! I've CC'd the surfaces team for review of the SQL bits, but you may need to ping them in #team-surfaces too.
doc/developer/design/20230411_envelope_upsert_order_by_timestamp.md
Outdated
Show resolved
Hide resolved
|
||
## Lifecycle | ||
|
||
What will it take to promote this to the next stage i.e. Alpha? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing to watch out for is that as soon as we roll this out to any customers, the syntax will be semi-stable, in that we'll have to be willing to write a migration if we want to change it, and also warn all the customers who were using the old syntax that the syntax has changed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is only true after its not unsafe
, right?
doc/developer/design/20230411_envelope_upsert_order_by_timestamp.md
Outdated
Show resolved
Hide resolved
doc/developer/design/20230411_envelope_upsert_order_by_timestamp.md
Outdated
Show resolved
Hide resolved
| Err(key3, some_error2) | | ||
+------------------------+ | ||
``` | ||
Note: As shown in the example above, the errors are still implicitly ordered by offset as we do not persist any extra metadata for them separately. A later error with the same key will always overwrite a previous one. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is surprising, IMO! Perhaps we should be attaching the timestamp column to the errors, too? I don't actually understand in what case we can end up with Err((key, error_text))
—if Avro decoding of the value fails, but not the key?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if Avro decoding of the value fails, but not the key?
yes!
For some context, a somewhat recent change that made errors retractable: #14209. Which also made how errors work very complex.
Errors in the upsert source are a tricky subject. In general, the current "design" (evolved through quite some iterations) assumes that you only ORDER BY
the offset. That's, for example, why handling errors works the way it does, above: when a new error comes in, it can only have an offset that is larger than the error that we re-hydrated from persist, so it always takes precedence. Ordering by a custom column breaks that.
To fix that, we'd have to graft the ORDER BY
field on to errors, which feel brittle enough as they are. 🙈
Galaxy-brain idea, where I'm not at all sure we should do it. We could have a new implementation of the UPSERT machinery that is only used for new sources when a custom ORDER BY
is used. We could fix our issues around errors and other things.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To clarify further, if say an Error is persisted, we sort of lose the kafka metadata, since those don't get persisted. But for the incoming kafka stream, we will actually have the metadata, so we will have some order value. So, it's more like a comparison between None
(when hydrating state from persist) with Some(ts)
(value coming from kafka) and the latter taking precedence.
This is also how currently upsert behaves, because we do not persist the offset information. We always let value coming from kafka with Some(offset)
override previous state, but in this case it makes sense because new kafka data is always after previously persisted (Could there be scenarios where it isn't?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could have a new implementation of the UPSERT machinery that is only used for new sources when a custom ORDER BY is used.
Does it mean that the upsert operator will only have the kafka ingestion as the input and there will be no read from persist?
Or I am guessing you mean that do a separate upsert implementation where we persist the order by field on errors?
If we could persist the order by information regardless whether it's error or not, it would make things explicit for the current default order by offset too (since we don't persist it as well) and would make it easier to use offset as a tie breaker (#18722 (comment))
doc/developer/design/20230411_envelope_upsert_order_by_timestamp.md
Outdated
Show resolved
Hide resolved
doc/developer/design/20230411_envelope_upsert_order_by_timestamp.md
Outdated
Show resolved
Hide resolved
Co-authored-by: Nikhil Benesch <nikhil.benesch@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like it! Though there are some tough open questions. 🙈 We'll have to ponder yet some more it seems, but it's good that you brought them up!
doc/developer/design/20230411_envelope_upsert_order_by_timestamp.md
Outdated
Show resolved
Hide resolved
| Err(key3, some_error2) | | ||
+------------------------+ | ||
``` | ||
Note: As shown in the example above, the errors are still implicitly ordered by offset as we do not persist any extra metadata for them separately. A later error with the same key will always overwrite a previous one. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if Avro decoding of the value fails, but not the key?
yes!
For some context, a somewhat recent change that made errors retractable: #14209. Which also made how errors work very complex.
Errors in the upsert source are a tricky subject. In general, the current "design" (evolved through quite some iterations) assumes that you only ORDER BY
the offset. That's, for example, why handling errors works the way it does, above: when a new error comes in, it can only have an offset that is larger than the error that we re-hydrated from persist, so it always takes precedence. Ordering by a custom column breaks that.
To fix that, we'd have to graft the ORDER BY
field on to errors, which feel brittle enough as they are. 🙈
Galaxy-brain idea, where I'm not at all sure we should do it. We could have a new implementation of the UPSERT machinery that is only used for new sources when a custom ORDER BY
is used. We could fix our issues around errors and other things.
doc/developer/design/20230411_envelope_upsert_order_by_timestamp.md
Outdated
Show resolved
Hide resolved
…mp.md Co-authored-by: Ben Kirwin <ben@kirw.in>
LGTM. I'm implementing a similarly sounding feature (WITHIN TIMESTAMP ORDER BY in subscribe) but I think the use cases are different enough that the difference in syntax makes sense. In the subscribe case we want to be able to order by any column and can only order within timestamp efficiently. |
doc/developer/design/20230411_envelope_upsert_order_by_timestamp.md
Outdated
Show resolved
Hide resolved
…mp.md Co-authored-by: Matt Jibson <matt.jibson@gmail.com>
I might be losing track of the various resolved/un-resolved threads. 😅 Are these the issues we still have open:
Where 1. was not a problem before because updates that arrived later always had a higher offset, so would always take precedence. Which was also true after a restart where we re-ingested our state from persist (which happens outside the operator, though). And 2. is only a problem now because, again, before we couldn't have two updates with the same offset. The sensible tie-breaker seems offset here, but that is problematic because we don't necessarily store the offset in our in-memory state or persist. We could resolve 2. by always including the offset in the row, when you request ordering by timestamp. We could resolve 1. by including the requested ordering column (and the offset, for tie-breaking?!?) in the error. Not sure I like either of those. 😅 |
From a 1:1 with @aljoscha, What if we always ask the user to include Valid sql examples:
Invalid sql examples:
One of the benefits is, then the offset will always be persisted and helps with debuggability as well. If the user changes the ordering to say (OFFSET, TIMESTAMP) instead, it would still work, but the second order by of TIMESTAMP is sort of meaningless as it's just offset ordering then. Realistically anything which would come after OFFSET will not make a difference in the result, but this can still be allowed. So, for custom order by if we always ask the user to include offset, this resolves our tie-breaker explicitly which works for valid row objects. For ordering of errors though, the issue still remains that they'll be implicitly offset ordered, because none of this information is going to be persisted (they can be kept in in-memory state though, but we'll lose it upon restart). But I think it should be fine, because the ability to retract or override errors remains unchanged. |
@aljoscha Does it make sense to have the explicit inclusion of offset as a follow up if needed? Since this is behind an unstable flag and without an user, the implicit behavior should be low risk (it would still default to offset, but under the hood). So, basically proposing the following:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For tie-break scenarios with timestamps, we fallback to the offset value implicitly (we can have an update later to always ask for an offset in the order since this is still behind unsafe flag)
This seems reasonable to keep implicit. It also seems reasonable to enforce ORDER BY (TIMESTAMP, OFFSET)
for now, especially considering this syntax will start behind unsafe
mode, and does not overly complicate the sql parser. I think with those changes, all the outstanding issues that can be resolved for this limited-scope design are resolved, so we should go ahead and merge!
Yes, I think we should go ahead and merge this! I am slightly confused by your message, @guswynn
Enforcing people use Also, can we please create a follow-up issue for figuring out ordering of errors. We say in the doc that newer ones always replace previous ones (regardless of timestamp), because we don't store the timestamp with the error. We might want to revisit that, but can go ahead with the rest of the design/impl. |
Updated the design doc with these changes and created #18842 to track the error ordering issue. And yup, if the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a quick post-merge syntax note!
|
||
The option will be part of the `ENVELOPE UPSERT` clause with the following grammar: | ||
|
||
`ENVELOPE UPSERT [(ORDER BY (<expr>) [ASC])]` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the change to support multiple columns, this should be updated to:
ENVELOPE UPSERT [(ORDER BY (<expr> [ASC], ...))]
To match the way this works for SELECT
, the ASC
should be optionally attached to each expression.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I see what you mean. I saw ORDER BY (col1, col2) ASC
is legal as well in select statements, so had kept that (I guess it's the same thing as you mentioned where the entire tuple is the expression). I will change it to ordering per expression.
The `ASC` modifier is optional noise for specifying ascending ordering, for symmetry with the `ORDER BY` clause in `SELECT` statements. | ||
|
||
Examples of valid syntax and semantics: | ||
- `CREATE SOURCE ... INCLUDE TIMESTAMP ENVELOPE UPSERT ( ORDER BY ( TIMESTAMP, OFFSET ) ASC )` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CREATE SOURCE ... INCLUDE TIMESTAMP ENVELOPE UPSERT ( ORDER BY ( TIMESTAMP ASC, OFFSET ASC) )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will update the design doc in follow up implementation PR #18567
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Btw, I just tried out select statements in materialize. Parentheses around expressions with ordering is giving an error.
select * from texttext order by (key ASC, text DESC);
ERROR: Expected right parenthesis, found ASC
LINE 1: select * from texttext order by (key ASC, text DESC);
The following works though
select * from texttext order by key ASC, text;
select * from texttext order by (key, text) ASC;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we lose the parentheses around the multiple order by statements with order then? Or go back to limiting it to one tuple expression with optional ASC?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right! The syntax for the ORDER BY
clause in SELECT
is:
select_order_by ::= ORDER BY <order_by_col> [, <order_by_col>]*
order_by_col := <expr> [ASC | DESC]
expr := <ident> | ( <expr> ) | ...
Whereas the syntax for the ORDER BY
option in CREATE SOURCE
will be:
create_source_option_order_by ::= ORDER BY (<order_by_col> [, <order_by_col>]*)
order_by_col := <expr> [ASC | DESC]
expr := <ident> | ( <expr> ) | ...
So the syntaxes have to slightly deviate, because in CREATE SOURCE
we need parens around the order by columns in order to disambiguate between a new <order_by_col>
and a new CREATE SOURCE
option.
The equivalent of
select * from texttext order by (key, text) ASC;
would be:
CREATE SOURCE ... ENVELOPE UPSERT (ORDER BY ((key, text) ASC))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. Thanks for explaining that!
Design doc for custom ORDER BY using the TIMESTAMP metadata field for ENVELOPE UPSERT
Motivation
#16512
Tips for reviewer
Checklist
$T ⇔ Proto$T
mapping (possibly in a backwards-incompatible way) and therefore is tagged with aT-proto
label.