Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[4.0] Consistent public error types in the Rust API #745

Open
2 tasks done
cdisselkoen opened this issue Mar 25, 2024 · 6 comments
Open
2 tasks done

[4.0] Consistent public error types in the Rust API #745

cdisselkoen opened this issue Mar 25, 2024 · 6 comments
Assignees
Labels
4.0 backlog We hope to work on this in the future feature-request Request for a new feature

Comments

@cdisselkoen
Copy link
Contributor

cdisselkoen commented Mar 25, 2024

Category

Error message improvements

Describe the feature you'd like to request

Currently Cedar is inconsistent about what error data is public (accessible to Cedar consumers and thus part of our stable interface) and what error data is private (implementation details we're free to change at will, e.g., to improve error messages). In fact, at least six different patterns are currently in use in different places in the Cedar codebase, as described below. This issue requests that Cedar standardize all of its error types into Pattern C. This will be a breaking change to some APIs.

Pattern A

Methods that can return errors do not publicize what type they are returning, just impl miette::Diagnostic. Callers (Cedar consumers) can use the methods on miette::Diagnostic to get the error's message, help text, source location, etc but no other info.

Policy::to_json() currently takes this approach:

pub fn to_json(&self) -> Result<serde_json::Value, impl miette::Diagnostic>

This option is maximally flexible for Cedar developers, as Cedar can make any changes to error structures that it wants, even in patch releases, as long as whatever is returned still implements miette::Diagnostic (which will always be the case). But, this option is undesirable because it does not provide enough information to callers (Cedar consumers). Callers cannot distinguish between different error cases in a structured way, and cannot get other information about the error in a structured way -- for instance, seeing which EntityUid was responsible for an EntityDoesNotExist error, except by inspecting the error message.

Pattern B

Methods that can return errors return a named error type, but that type is a struct with private field(s). Of course, the error type implements miette::Diagnostic so provides all of those methods, to get the error's message, help text, source location, etc. The error type can also provide additional public methods if it wants.

eval_expression() (the public interface for the Cedar evaluator) currently takes this approach:

pub fn eval_expression(
    request: &Request,
    entities: &Entities,
    expr: &Expression,
) -> Result<EvalResult, EvaluationError>

where callers (Cedar consumers) just see EvaluationError as a struct with private fields in the docs.

This option is still extremely flexible for Cedar developers, as Cedar can make any changes to error structures that it wants, even in patch releases, as long as whatever is returned still implements miette::Diagnostic and any additional public methods we've already committed to providing for that error type. By choosing what public methods to provide, we have full control over what's public for each error type. However, the public methods we provide do have to "make sense" for all enum variants, assuming there is an enum under the hood. We can't have different public methods per enum variant, like we can in Pattern C. For this reason, and to improve consumers' ability to pattern-match on Cedar error types, we prefer Pattern C.

Pattern C (Proposed to standardize on)

Methods that can return errors return a public enum, but each enum variant is a single struct with private field(s). The enum itself, and also all of the member structs, implement miette::Diagnostic. The member structs can also provide additional public methods if they want, as could the enum.

TypeErrorKind currently takes something close to this approach. As you can see in its docs, it is a public enum with 13 variants, most of which are structs with private fields:

pub enum TypeErrorKind {
    UnexpectedType(UnexpectedType),
    IncompatibleTypes(IncompatibleTypes),
    ...
}

where callers (Cedar consumers) just see UnexpectedType and IncompatibleTypes as structs with private fields in the docs.

This option is somewhat less flexible for Cedar developers than Patterns A or B. Cedar cannot remove or rename enum variants in patch releases, and can't add new enum variants either unless the overall enum is marked as non_exhaustive (like TypeErrorKind is). However, Cedar can still add or remove fields from the member structs (like UnexpectedType) at will, as long as all public methods are maintained.

Callers (Cedar consumers) get significantly more structured information than they do with Patterns A or B. They get not only the methods on miette::Diagnostic and any other public methods provided by the enum, but also can match on the error kind, so they can for instance distinguish between UnexpectedType and IncompatibleTypes in a structured way, that is, without matching on the error message. And, they can use whatever variant-specific public methods Cedar chooses to provide on each member struct.

Pattern D

Methods that can return errors return a public enum, where each enum variant contains its data inline, or, contains a struct with public fields. The enum itself, and also all of the member structs (if any), implement miette::Diagnostic. The member structs (if any), as well as the public enum, can also provide additional public methods if they want, but this would be only a convenience and not expose any new information, because all of the information is already public.

SchemaError currently takes something close to this approach. As you can see in its docs, it is a public enum with 21 variants, which contain a mix of inline data related to the error and/or other error types with public fields (like EntityAttrEvaluationError).

pub enum SchemaError {
    Serde(serde::Error),
    ...
    UndeclaredEntityTypes(HashSet<String>),
    ...
    CycleInActionHierarchy(EntityUid),
    ...
    ActionAttrEval(EntityAttrEvaluationError),
    ExprEscapeUsed,
}

Callers (Cedar consumers) see detailed data, like the HashSet of entity types that were found to be undeclared in UndeclaredEntityTypes, or the public fields of EntityAttrEvaluationError. This is in contrast to Pattern C, where UndeclaredEntityTypes would contain a struct called UndeclaredEntityTypes instead of containing the HashSet directly; and the member struct UndeclaredEntityTypes would perhaps have a public method for iterating over the individual types, but would perhaps not promise that it's a HashSet (so Cedar would be free to change that implementation detail in patch releases), and would also allow Cedar to add more information to UndeclaredEntityType (like a source location) without breaking changes.

This option is much less flexible for Cedar developers than the above patterns. Like Pattern C, Cedar cannot remove or rename enum variants in patch releases, and cannot add enum variants unless the enum is marked non_exhaustive (which SchemaError is not). Unlike Pattern C, with Pattern D Cedar cannot add or remove data from the enum variants or their member structs, or change the type of any contained data. For instance, Cedar could not change the HashSet above to a BTreeSet or Vec, and could not add source locations to these error types. With Pattern C, Cedar could make those changes as long as we still provide the same public methods on each member struct. (We would write the public methods to, e.g., not expose the collection type contained in UndeclaredEntityTypes, but only allow iterating over it.)

Another complication of this option is the need to use only public-visible types in the enum variants and member structs. For instance, the EntityUid in the CycleInActionHierarchy variant is a public EntityUid, not a Core EntityUID. Of course, the validator internally needs to return Core EntityUID as it has no way to name the public-interface EntityUid (depending on cedar-policy would be circular). So, Cedar currently duplicates the entire SchemaError enum -- one declaration in the validator crate, using internal types, and another declaration in the cedar-policy crate, using public-interface types, with conversion methods between them. Patterns B and C also have to deal with converting internal types to public-interface types, but this is both easier and potentially more performant to do within the type's public methods rather than in the type itself. Pattern C would probably require us to re-declare the public enum, with wrapped member structs and wrapped public methods.

Pattern E (definitely bad)

Some Cedar errors are in weird in-between states, for instance EntitiesError (docs), which is a public enum with member structs that have public fields but are not themselves interface types, only Core types. This is unclear to callers (Cedar consumers) what they're "allowed" to match on. Depending on the syntactic situation, they might be able to access the fields of the member structs, or might have to take a dependency on cedar-policy-core to do so (because they're unable to syntactically name the type without importing it from Core). This is both confusing and bad.

Pattern F (definitely bad)

Another bad state is also exemplified by EntitiesError (sorry, EntitiesError): one of its enum variants is Duplicate which contains a Core EntityUID as inline data. To the extent that fields are public (like this one), they should only contain public-interface types. I'm not sure why Rust allows the Core type EntityUID to leak out of the public interface in this way. Regardless of whether Rust should or should not allow it, Cedar should not do this in its error types.

Describe alternatives you've considered

Patterns A, B, D are plausible alternatives (with B the most plausible of those), but this issue proposes standardizing on Pattern C.

Additional context

No response

Is this something that you'd be interested in working on?

  • 👋 I may be able to implement this feature request
  • ⚠️ This feature might incur a breaking change
@cdisselkoen cdisselkoen added pending-triage Hasn't been triaged yet feature-request Request for a new feature backlog We hope to work on this in the future 4.0 and removed pending-triage Hasn't been triaged yet labels Mar 25, 2024
@cdisselkoen
Copy link
Contributor Author

cdisselkoen commented Mar 25, 2024

Errors to take a standardization pass over: (in alphabetical order)

Edit 4/15/24: added tentative assignments

All of these are currently public types in our public API.

@cdisselkoen
Copy link
Contributor Author

Small amendment, since Pattern C doesn't say this explicitly: It's also OK for an enum variant to be itself another enum that complies with the Pattern C guidelines. It doesn't always have to be a struct with private fields.

Unresolved question: Is it OK for an enum variant to be serde_json::Error, or do we want to wrap that? (E.g., in PolicyToJsonError.)

Another unresolved question: The current EvaluationError is a struct-with-private-fields, with a public method error_kind(&self) -> &EvaluationErrorKind where EvaluationErrorKind is an enum (that we would plan to conform to Pattern C). Is this OK, or do we want to convert to Pattern C's suggestion, requiring that EvaluationError is an enum directly (and thus each member struct contains the private advice and source_loc fields itself)? Previously we said (e.g., as referenced in this comment) that we prefer the *Kind pattern in order to avoid duplicating these fields on every enum variant. Do we want to reverse that decision for consistency in Pattern C, or allow this pattern (in addition to Pattern C)? What's most consistent for users and what leads to the best/cleanest code internally?

@khieta
Copy link
Contributor

khieta commented Mar 25, 2024

  1. I'm fine with either, but I slightly prefer a wrapped error instead of serde_json::Error.
  2. I believe the advice field could be replaced with miette's help. Is there anything in miette that we can use in place of the source_loc field? If so, I'd prefer the approach in Pattern C.

@cdisselkoen
Copy link
Contributor Author

Currently, miette's help is sourced from the advice field of this error. I will see if it's possible to refactor to remove the field and just compute the advice directly in help()

@cdisselkoen
Copy link
Contributor Author

that last suggestion became #748

@john-h-kastner-aws
Copy link
Contributor

john-h-kastner-aws commented Apr 30, 2024

UnsupportedFeature -- this is currently publically exported in api.rs but I don't think it's actually used, so we can just remove it from public API?

This should be used inside of SchemaError::UnsupportedFeature, but that currently holds a String instead. We should unexport it as part of updating SchemaError.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
4.0 backlog We hope to work on this in the future feature-request Request for a new feature
Projects
Status: In Progress
Development

No branches or pull requests

3 participants