-
Notifications
You must be signed in to change notification settings - Fork 457
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
compute: tokenize error logging #18772
Conversation
702204a
to
eccdfa0
Compare
55f4749
to
30b98ee
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The first four commits look good to me; I only have a minor remark about some warning messages that appear to have been shortened in some places.
I am uncertain about the last commit. The previous code organization was already in place prior to my start in Materialize and I do not think I have a clear understanding of why these functions were not part of Context
to start with or otherwise why they were not collected under some other abstraction. Perhaps we can ask for someone with historical background to manifest before moving on with this change? Also, I did not give enough thinking to the question of whether there may also be a better code organization that we could think of rather than having all of the functions collected under Context
.
@@ -508,8 +541,12 @@ where | |||
continue; | |||
} | |||
let message = "Non-positive multiplicity in DistinctBy Retractions"; | |||
warn!(?row, ?count, debug_name, "[customer-data] {message}"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One downside to this approach is that we lose structured field logging. How ugly would it be to supply some macros that take an ErrorLogger
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've not been able to come up with anything ergonomic so far. We could probably introduce something like:
log_dataflow_error!(
self.error_logger(),
"Non-positive multiplicity in DistinctBy Retractions",
?row, ?count, debug_name,
"[customer-data] Non-positive multiplicity in DistinctBy Retractions",
);
Doesn't seem too bad, actually. Especially because we could make it possible to reference the message in the details format string, and make the debug_name
be implicitly added to all logs.
log_dataflow_error!(
self.error_logger(),
"Non-positive multiplicity in DistinctBy Retractions",
?row, ?count,
"[customer-data] {message}",
);
I'm just not sure whether my macro foo is up to this task :)
In any case, I'd rather make this the scope of another PR, since I foresee some bikeshedding.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! I'm also happy to do this if you don't want to deal with it.
self.error_logger(), | ||
) | ||
.leave_region() | ||
self.render_reduce_plan(reduce_plan, ok, err, key_arity) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a fan of this change (throughout).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 modulo some of the details we discussed, and as mentioned, we can do a followup to this for the most bikesheddable bits.
4158a99
to
4911650
Compare
This commit introduces a new `ErrorLogger` type that will be used for logging errors inside dataflows. This type wraps a dataflow shutdown token and encapsules both shutdown-checked logging and our `warn!`/`error!` approach to logging dataflow errors.
This commit extends the rendering `Context` by a `shutdown_token` field that can be used by rendered operators to check whether the dataflow is in the progess of shutting down. The shutdown token is replaced for each entry in `objects_to_build`, to support finer-grained dataflow shutdown in a future where dataflows can have more than one export. This commit also adjusts variable naming to make it clear that the various `Context::export_*` methods receive a list of all dependency IDs, which include `objects_to_build` IDs, rather than just IDs of dataflow imports. As a consequence, there is no need to adjust the export methods to ensure that they also pick up the new object tokens.
This commit modifies the error logging behavior of compute operators to only log messages when the dataflow is not in the process of shutting down. This avoids producing noise caused by dataflow operators emitting partial update sets when they are cancelled. The shutdown checking is performed through a shutdown token and wrapped inside the new `ErrorLogger` type that gets passed around to the various operator renderers.
This commit converts all freestanding reduce and top-k rendering functions that want to do error logging into methods of the `Context` type. The benefit is that we can rid of two parameters, `debug_name` and `error_logger`, in exchange for a new `&self` parameter, and changed indentation. Note that the top-k functions where previously definied inside the scope of `Context::render_topk` (for no apparent reason). Making them methods of `Context` actually reduces indentation.
4911650
to
bcc7f8d
Compare
Everywhere we log something in compute rendering, we include the debug name of the dataflow in the logged details. It makes sense to build the dataflow name logging directly into the `ErrorLogger`, so we don't have to repeat ourselves at the callsites.
I asked in Slack and there doesn't seem to be a specific reason. This should be ready for another look! I've addressed the comments and added another commit that pulls the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for asking in Slack; the changes LGTM! I think it is quite tricky to test this at the moment, but maybe we can think about an approach when we allow persist_source
to stop emitting early as well.
TFTRs despite the huge diff! |
This PR modifies the error logging behavior of compute operators to only log messages to Sentry when the dataflow is not in the process of shutting down. This avoids producing noise caused by dataflow operators emitting partial update sets when they are cancelled. It is a precondition for enabling the faster dataflow shutdown methods described in the "Faster Dataflow Shutdown" design.
The shutdown checking is performed through a shutdown token that is wrapped inside a new
ErrorLogger
type that gets passed around to the various operator renderers. AnErrorLogger
encapsulates both shutdown checking and ourwarn!
/error!
logging format.Motivation
Enables #18760.
Tips for reviewer
Look at the commits and their descriptions separately!
MaybeValidatingRow
trait into a newmz_compute::render::errors
module, which is also where the newErrorLogger
type lives.ErrorLogger
type.Context
and updates the top-level rendering code to make sure the token gets added to the exports depending on it.ErrorLogger
, rather than using thewarn!
/error!
macros directly.Context
type. Rather than requiringdebug_name
anderror_logger
arguments, they can now get these fields directly from the context, reducing the number of function arguments that need to be passed.Unfortunately, the diff for the last commit is quite hairy, since it changes not only the indentation level, but also causes rustfmt to re-wrap many of the higher-indented lines. Hiding whitespace changes helps a bit, but still produces a diff much larger than it should be. Since the last commit is a pure code organization change and doesn't affect functionality, I would be fine with dropping it if you'd rather not review it.
One thought about theThis is now implemented in the sixth commit!ErrorLogger
type: It would be quite easy to also wrap thedebug_name
, in addition to the token, inside it. Then we wouldn't have to pass that around in rendering anymore. The drawback is that we'd lose our current flexibility in formatting the "details" part. We'd probably need to put the dataflow name as a fixed prefix of the message, so constructs like "found invalid accumulations in dataflow {debug_name}" would no longer be possible.If anyone has an idea how to actually test this change, I'd be happy to hear it! I couldn't come up with a way to provoke invalid retractions during dataflow shutdown (they should be impossible today), much less doing so in a reliable way.
Checklist
$T ⇔ Proto$T
mapping (possibly in a backwards-incompatible way) and therefore is tagged with aT-proto
label.