Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

doc/dev: document our approach to error handling/printing #18741

Merged
merged 2 commits into from
Apr 17, 2023

Conversation

aljoscha
Copy link
Contributor

Motivation

Socialize the changes implemented in #18583 and #18632, and make sure that we have a more coherent strategy going forward.

Checklist

  • This PR has adequate test coverage / QA involvement has been duly considered.
  • This PR has an associated up-to-date design doc, is a design doc (template), or is sufficiently small to not require a design.
  • This PR evolves an existing $T ⇔ Proto$T mapping (possibly in a backwards-incompatible way) and therefore is tagged with a T-proto label.
  • If this PR will require changes to cloud orchestration, there is a companion cloud PR to account for those changes that is tagged with the release-blocker label (example).
  • This PR includes the following user-facing behavior changes:

@aljoscha
Copy link
Contributor Author

I added all (currently known to me) TLs, to let you voice concerns and/or learn about this.

correctly when needed. With [`thiserror`] this will happen automatically when
you use the `#[from]` attribute.

Generally, the `Display` impl of your error type should _not_ print the chain
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the Display trait from thiserror follow this guideline? If so, this paragraph should be more obviously linked to the previous one. As a non-expert in rust errors, it's unclear to me what I need to do exactly, even if I use thiserror. I think these 3 paragraphs could be made more clear with a one sentence happy path: "use thiserror" and then have the other paragraphs maybe in a separate section documenting what to do if you can't use thiserror. That'd make it super clear to a reader what they should probably do.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's very good feedback! I'll try that

#### Printing errors

As mentioned above, the `Display` impl of an error should not print the chain
of source errors. Whenever you _do_ need to print an error with its chain of
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused, because afaik, we only ever trace/log errors or surface them to users. Are there other ways we print errors? When surfacing to users they all get converted into a hopefully structured SQL error, and (almost?) never have their chain included. I've read this paragraph a few times and don't understand when I'd want to use those functions still.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should have included more context here or in the PR description! This whole thing got started by https://github.com/MaterializeInc/materialize/issues/18490#issuecomment-1495385815.

As to where we print errors and where we use the _with_causes variants, I'll try and list the different "flavours" we have, with an example. The only real user facing errors should be (as you pointed out) the errors we report back via the SQL client/pgwire and the errors that we report in the error/status history collections.

Errors that we report in mz_source_status_history (and related friends and versions for sinks):

error: format!("{}", error.display_with_causes()),

Structured SQL errors which include a chain of errors in the details, or in the error itself:

storage_error.source().map(|source_error| source_error.to_string_with_causes())

and
Self::FetchingCsrSchemaFailed { cause, .. } => Some(cause.to_string_with_causes()),

A lot of persist errors, which we mostly log internally, or when panick'ing:

err.display_with_causes()

Top-level error handlers for the processes:

eprintln!("environmentd: {}", err.display_with_causes());

Some that might be pathological cases, where we already have the chain of errors in the unstructured error itself:

AdapterError::Unstructured(e) => write!(f, "{}", e.display_with_causes()),

sql_err!("{}", e.display_with_causes())

Copy link
Contributor

@teskje teskje left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems reasonable to me.

In the compute controller, we put some effort into defining separate thiserror-based error types to reflect that different controller methods can return different sets of errors. The idea is that if a method cannot return a given error variant that variant should not appear in the method's error type. The benefit is that callers know exactly what types of errors to expect, which isn't the case when every method returns the same large error enum. The drawback is that Rust demands a bunch of boilerplate to define all the different error enums (roughly one per API method), so I'm not sure if this is something we should make a general recommendation.

@aljoscha
Copy link
Contributor Author

I changed the first paragraph to make it more obvious what you should do in the common case. Please take another look 🙏

@aljoscha aljoscha merged commit fe76628 into MaterializeInc:main Apr 17, 2023
@aljoscha aljoscha deleted the docs-style-errors branch April 17, 2023 12:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants