-
Notifications
You must be signed in to change notification settings - Fork 458
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Surface dataflow errors (indexes and sinks) in a system table/view #7804
Comments
@frankmcsherry & @benesch We discussed this a while ago in slack and you both had opinions. The one remaining question I have is what the timeline of the error views should be. We could have an error view per different timeline that we support or we could say the error views are all in the system timeline (same as the other system views/logs/tables) but we add the timestamp of the error (which is in the timeline of the index/sink) as a data column. I think the latter is easier for users, because they don't need to fiddle with different tables and it makes the error views easily joinable with the other system views. They would roughly reflect "errors as of now", independent of where we are in the timeline of the dataflow. Which I find useful because it allows answering the question "can I query this thing now". What do you think? |
Also, this one might be affected by whatever comes out of #8008. |
If they are in the |
Can you say more about why we need to surface them? I can see "want" to surface them, but what is the actual requirement? If they are errors that happen in the course of evaluation, we capture them in an arrangement already. Is the goal to have some broad view over a timeline of all errors in things folks have written? Is it instead to get a system-wide view of the errors that are happening in the system? If it is important for the errors to be aligned with the times at which they are produced (e.g. a |
It's intended as a way to get a system-wide overview of the status of indexes and sinks (and views, in the end). Right now, the only way of figuring out whether a view/index is wedged is a) try and peek/query that one specific view/index, or b) hope that there is something in the logs. Especially in a cloud setting, I don't think trawling through logs is very ergonomic. For context:
(I'm using view and index somewhat interchangeably above because when querying a view you transitively are querying the index, and get its errors. But yes, the dataflow layer only knows indexes and sinks.) |
Logging the results of an internal discussion on this:
For addressing system errors, we should add an |
Closing this one for now:
|
We need to surface the errors that we get in the "errors" part of a
CollectionBundle
, which we usually get fromContext::lookup_id()
during rendering. Here's an example call site, where we ignore the errors:materialize/src/dataflow/src/render/sinks.rs
Line 60 in 480aa30
I mention indexes and sinks in the title because these are the two "items" that we currently render dataflow operator graphs for, and which we can identify and join with information in the catalog.
Concretely, we should have new system views (or tables, or log)
mz_index_errors
andmz_sink_errors
that surface the error (probably as a string) along with theGlobalId
(also as a string) of the index or sink in which the errors occured. We could also just have one viewmz_dataflow_errors
, we'll see how ergonomic either option is when we get there but this shouldn't be that hard part.Example of a view that could be built with this (from Nikhil):
Some things to consider:
The text was updated successfully, but these errors were encountered: