Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cedar Function Macros #61

Open
wants to merge 17 commits into
base: main
Choose a base branch
from
Open

Cedar Function Macros #61

wants to merge 17 commits into from

Conversation

aaronjeline
Copy link
Contributor

@aaronjeline aaronjeline commented Mar 20, 2024

Rendered

Discussion Summary (05/14/2024):
This RFC has a lot of discussion, and hasn't produced a consensus. Proposing we track use-cases that could be solved by this RFC and restart discussion if we hit 10 use cases.

text/0059-functions.md Outdated Show resolved Hide resolved
text/0059-functions.md Outdated Show resolved Hide resolved
Signed-off-by: Aaron Eline <aeline+github@amazon.com>
Signed-off-by: Aaron Eline <aeline+github@amazon.com>
text/0061-functions.md Outdated Show resolved Hide resolved
text/0061-functions.md Outdated Show resolved Hide resolved
text/0061-functions.md Outdated Show resolved Hide resolved
text/0061-functions.md Outdated Show resolved Hide resolved
text/0061-functions.md Outdated Show resolved Hide resolved
text/0061-functions.md Outdated Show resolved Hide resolved
Signed-off-by: Aaron Eline <aeline+github@amazon.com>
Copy link
Contributor

@cdisselkoen cdisselkoen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally approve of this RFC, but withholding my final approval pending resolution of some of the comments in other reviews

text/0061-functions.md Outdated Show resolved Hide resolved
text/0061-functions.md Outdated Show resolved Hide resolved
text/0061-functions.md Outdated Show resolved Hide resolved
Comment on lines 132 to 137
permit(principal,action,resource) when {
foo(1, "hello", principal) // Parse error
};
permit(principal,action,resource) when {
bar(1, "hello", principal) // Parse error
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This implies that the parser is stateful in a way it is not currently today. In order to determine whether these examples are parse errors, the parser has to refer to information it has already parsed earlier in the file (the definition of foo). I'm not sure this is the design we want.

An alternative would be that calling a nonexistent function, or calling with the wrong number of arguments, is not a parse error but an evaluation error, and also detected by validation of course.

Edit: With the below stipulation that function declarations do not have to lexically precede use, this is even more problematic.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The argument against making them runtime errors is that that reifies functions. Under the current proposal, functions could be fully erased.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you not do the checking in two steps? Basically:

  1. While processing an input file, store all function defs in a global table that logs name, args, body. For each function application, also consult the table for consistency. If the function is not yet in the table, then add it with the arity and "none" for the body. If a def comes after a call, check it for consistency and replace "none".
  2. Make sure no functions in the table have "none" for the body.

Doesn't sound hard?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's hard, I think Craig's point is that until now we have not had to deal with any kind of use-def relationships in the parser, and this RFC changes that. If we make it a runtime error, our parser stays the way it is.
I think it's worth the change, and I think the RFC doesn't meet our needs if it's a runtime error, but the callout is correct.

text/0061-functions.md Outdated Show resolved Hide resolved
3. Function application with incorrect arity -->

### Naming
Should these really be called `function`s? They are actually `macro`s.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I agree that these are macros, in the current design. They are pure functions.

If we want to avoid associations with other languages' functions or macros, we could also pick a third term, like snippet, predicate, def, ....

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fine with snippet or def

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would claim these are best communicated as macros, for two reasons:

  1. Most readers are used to CBV languages. So they inherently assume (having not actually read the docs) that for functions when you write f(1+2,principal.id like "foo") then you will evaluate 1+2 and then principal.id like "foo", and then call the function with the results. They will not imagine that inlining will happen, and that short-circuiting and whatnot can change the results.
  2. Readers are familiar with macros in C, and perhaps other languages. For macros they know that when you write f(1+2,principal.id like "foo") you are substituting full expressions 1+2 and principal.id like "foo" into the body, you are not evaluating the them first.

So I claim that by calling them macros we help users naturally go to the right semantics in their heads. Yes, it's true that we are really (or equivalently) providing is call-by-name functions, but calling them macros is more illuminating, IMO.

In sum: I propose changing the whole RFC to be "Cedar Function Macros" and then use "macro" for short, throughout.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Writing up the function vs macro discussion in the RFC...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also prefer macro. To me function suggests that I should be able to nest calls, whereas I can easily accept that a macro shouldn't call another macro. I'm not a fan of predicate since that suggests that the resulting expression must be boolean-typed. No preference on snippet, except that I haven't seen it used before so it feels non-standard.

Also, not a fan of "Cedar Function Macros". I found this terminology confusing on my initial read through -- just pick function or macro.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy with macro. Maybe inline function would be another option (though it still may suggest CBV to some readers)?

aaronjeline and others added 3 commits March 21, 2024 13:14
Co-authored-by: Craig Disselkoen <cdiss@amazon.com>
Signed-off-by: Aaron Eline <aeline+github@amazon.com>
Signed-off-by: Aaron Eline <aeline+github@amazon.com>
@cdisselkoen cdisselkoen added the pending This RFC is pending; for definitions see the README label Mar 21, 2024
@aaronjeline aaronjeline changed the title Cedar Functions Cedar Function Macros Mar 22, 2024
Copy link
Contributor

@khieta khieta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a couple comments, and there are several existing threads to address, but overall I'm in favor of this proposal.

text/0061-functions.md Outdated Show resolved Hide resolved
text/0061-functions.md Outdated Show resolved Hide resolved
Comment on lines 64 to 67
1. They are extremely heavyweight, requiring modifying the source of the Cedar evaluator. This means that users who want to stay on official versions of Cedar have no choice but to attempt to submit a PR and get it accepted into the mainline. This process does not scale.
1. For data structures that are relatively standard (ex: SemVer, or OID Users as proposed in [RFC 58](https://github.com/cedar-policy/rfcs/blob/cdisselkoen/standard-library/text/0058-standard-library.md)), it’s hard to know what’s in-demand enough to be included, and how to balance that against avoiding bloat. There’s no way to naturally observe usage because the only way to “install” the extension pre-acceptance is to vend a modified version of Cedar.
2. Users may have data structures that are totally bespoke to their systems. It makes no sense to include these in the standard Cedar distribution at all, yet users may still want some way to build abstractions.
2. They are too powerful. Extensions are implemented via arbitrary Rust code, which is essential for encoding features that cannot be represented via Cedar expressions (such as IP Addresses), but opens the door for a wide range of bugs/design issues. It’s trivial to design an extension that is difficult to validate and/or logically encode for analysis. Problematically, extension functions can potentially exhibit non-determinism, non-termination, or non-linear performance; interact with the operating system; or violate memory safety. This raises the code review burden when considering an extension function's implementation.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great summary of the current state of extension functions. This is why I'm in favor of this RFC: there is currently no way to build abstractions in Cedar outside of (built-in) extension functions.

So although I prefer "macro" to "function" for the feature in this RFC, I'd like to throw another name into the ring: User-defined extension functions

(Note that this would make the most sense if we used CBV for both types of calls.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

User-defined extension functions

I prefer not to call them functions because they are not CBV.

text/0061-functions.md Outdated Show resolved Hide resolved
text/0061-functions.md Outdated Show resolved Hide resolved
aaronjeline and others added 4 commits March 22, 2024 13:29
Co-authored-by: Kesha Hietala <khieta@amazon.com>
Co-authored-by: Kesha Hietala <khieta@amazon.com>
Co-authored-by: Kesha Hietala <khieta@amazon.com>
@mwhicks1
Copy link
Contributor

I've taken a pass over the whole RFC, bringing the terminology and presentation up to date with all the various discussion threads. I'm approving the current version.

@mwhicks1 mwhicks1 self-requested a review March 23, 2024 20:01
Signed-off-by: Mike Hicks <mwhicks@amazon.com>
@emina
Copy link
Contributor

emina commented Mar 24, 2024

Thank you @aaronjeline and @mwhicks1 for writing this RFC!

I appreciate that the current version of the proposal is simpler than the original. In particular, by proposing macros with call-by-name semantics rather than functions with call-by-value semantics, we avoid complicating the language semantics, validation, and analysis.

Despite this, I'm not in favor of adding macros to the language at this time. The pros don’t outweigh the cons (yet) in my opinion, and this is a major addition to the language that we cannot take back. Macros will create a management problem for services, and they will introduce two significant sources of confusion and complexity into policies:

  1. Bad error messages due to inlining. Consider this example:
def foo(?a, ?b, ?c) ?c + (?a + ?b);

permit(principal,action,resource) when {
    foo(0, 9223372036854775807,  2) // overflow error
};

When this policy errors at runtime, what source location should be shown as the source of the error? Should it be the location in the macro body (?a + ?b)? Should it be just invocation of the macro in the policy? Some combination of the two? The same problem applies to validation errors too. Without good error messages, errors like this are notoriously hard to understand and debug. And providing good error messages involves tracking source locations through macro expansions and retaining this information in this AST, and potentially EST too. Another challenge is that source location tracking is tied to the problem of where and how macros are stored, and what source locations mean in the service setting where macros are not stored in a file. All of this is non-trivial to implement (even for the restrict macro system proposed here), and it will require non-trivial (potentially breaking) changes to the EST.

  1. Bad performance due to hidden costs. Right now, Cedar enjoys the nice property that the runtime cost of a policy is evident from the policy text: it is worst-case quadratic in the size of policy text and input. This will no longer be the case with macros. Users will need to think about the runtime cost of their policies after macro expansion. And macros make it possible to write a policy whose actual expanded size is exponential in the size of the text written by the user.

As a pathological example, consider this macro and its application:

def double (?x) { left: ?x, right: ?x }

permit(principal,action,resource) when {
    double(double(double (double {})))) has left
};

This policy expands to an AST that is exponential in the size of the CST: with 4 calls to double, we've created a parse tree (and a runtime value) of size 2^4.

These problems aren’t present in Cedar right now because each policy is self-contained, and its meaning and cost are evident from the text. That’s the advantage of the lack of abstraction. There are downsides, of course, as outlined in this RFC. In my view, we are yet to see compelling evidence that abstraction provides enough benefits in a very restricted authorization DSL like Cedar to justify the added complexity. For that reason, I’m voting against accepting the RFC at this time.

@mwhicks1
Copy link
Contributor

Thanks for providing this carefully thought-out critique! I am still optimistic that macros are a net win. Here are some responses.

Bad error messages due to inlining.

This seems like a surmountable problem. I presume there is much we can learn from far more interesting macro systems' handling of errors. Off the top of my head:

For run-time evaluation, you don't have to in-line the call in advance, you can in-line by creating the expression on the fly, while evaluating, basically resulting in a call-by-name function call. The in-lined body can include source locations that reflect the macro body in the right places, and the macro args in the others. The error message can include a stack trace, which a user might expect. It should be possible to prove (or test) that that doing this is equivalent to in-advance in-lining. For validation, I think you can do a similar thing.

For Lean models, obviously you don't need to track the source locations. You don't even need to do on-the-fly inlining, at least at first, by relying on DRT against the Rust, and PBT within the Rust, to show equivalence. We can prove equivalence in Lean later, if we wish.

Bad performance due to hidden costs.

This is a fair point, and I worried about this, too. Here are some mitigating factors that come to mind:

  • Services are welcome to impose bounds based on the in-lined policy size, rather than the concrete policy size. Users can be made aware that this is the case, in which case both users and services benefit: Policies are more compact when stored (and read), even if they are larger when evaluated. We could imagine warning users, with a policy linter, if they stray into pathological territory.
  • In-lining on the fly, during evaluation, should help mitigate maximum in-memory size, in some cases at least.

Ultimately, I feel that the readability and reusability benefits of macros are significant; the increase in complexity to support them is small; and that we are failing to solve the problem posed by today's extension functions if we do not provide them. Yes, we cannot stop users from using them in unwise ways, but that's true in general. We can guide them with linting, documentation, etc. toward good solutions.

@emina
Copy link
Contributor

emina commented Mar 25, 2024

Can you say more about how on-the-fly macro expansion would work?

The way I imagine it would work means we’d have to break the is_authorized API (among other breaking changes). So, is_authorized would have to take an extra environment argument, which is a binding from macro names to their bodies. The evaluator would use this environment to inline macro arguments into macro bodies on the fly. (Actually, I think the evaluator might have to expand the arguments before inlining them, for hygiene.)

In addition to breaking the API, we’d also have to extend the AST (and possibly break the EST), at least in Rust.

Doing macro expansion statically, prior to evaluation, avoids the above problems, but has the issue that constructing useful error messages becomes hard.

As for Lean, formalizing function calls (CBV or CBN) is possible but tricky. For example, it will be non-trivial to prove termination (because the size of the AST won’t be usable as a natural decreasing measure during recursive calls). If we want to formalize this, we’ll definitely want to keep the alternative formalization separate from the core semantics / validation / analysis formalization, because we don’t want to handle on-the-fly inlining in these more complicated components too.

So, we would need an additional concrete semantics for Cedar, with macros and CBN semantics, and a proof that this alternative semantics is equivalent to our existing semantics post-expansion.

This all adds a hefty engineering and maintenance burden. My concern continues to be that this burden as well as the potential confusion / usability / performance issues are not offset by the upsides of macros for Cedar, at least from I’ve seen so far.

How to weigh these tradeoffs is, of course, subjective, and I realize that mine is a minority opinion :)

@max2me
Copy link

max2me commented Mar 25, 2024

I struggle with cognitive load of reading functions without type annotations. Having type annotations would allow creation of tooling that helps functions easier to write and use.

Here is my naive take on questions raised regarding type annotations:

  1. Do we allow type declarations allowed in policy sets, or just in schemas?

I would start out with just those in schemas

  1. Are type annotations on functions enforced dynamically à la "contracts," or are they just ignored at runtime?

Similar to schemas, by default they could be ignored at runtime.

  1. Are type annotations required or optional?

I'd make them required at the beginning. We can relax the rule in the future.

  1. Will we have types to support generics, i.e., polymorphism?

I would start out without support for generics and wait for strong use cases needing them to emerge.

@aaronjeline
Copy link
Contributor Author

5. Are type annotations required or optional?

I'd make them required at the beginning. We can relax the rule in the future.

This raises two issues:

  1. This feels weird if you aren't using a schema at all
  2. This gives you no way to write polymorphic functions unless we extend the type annotations with parametric polymoprhism (which seems like a log)

@max2me
Copy link

max2me commented Mar 25, 2024

  1. Are type annotations required or optional?

I'd make them required at the beginning. We can relax the rule in the future.

This raises two issues:

  1. This feels weird if you aren't using a schema at all
  2. This gives you no way to write polymorphic functions unless we extend the type annotations with parametric polymoprhism (which seems like a log)

Except almost everyone should be using schemas. So.. :)

@mwhicks1
Copy link
Contributor

Comments from Dan Wang (who doesn't have a GitHub account), plus my responses:

I would consider not calling this concept a macro or using the def syntax. Theorem provers have the concept of abbreviation. For which the proposal smells more like. Lean has a similar concept. I find the call by name conversation a bit confusing to. To me this just "expand the parse tree".

Abbreviations in Twelf, per the link, look like type abbreviations, not term-level functions. They don't seem to have parameters. In Lean, an "abbreviation" seems to be a non-unicode way of specifying a unicode character.

It is call-by-name in the sense that you can really do the calls on the fly, rather than all in advance, as an evaluation strategy.

The fact these cannot refer to other macros is a bit confusing. There very second class nature, make me want to make their invocation standout with a similar to the rusts "!" convention.

The double(double(..) example shows that even without non-recursive-self-reference you have blowup, so why not allow macros to refer to other macros?

  1. consider some name other than macro.
  2. make the call of these things standout (aka don't treat them like normal function calls.. because they are not)
  3. remove self-reference restriction to make them more powerful.

It's a fair point that even without self-calls there is an opportunity for bad behavior, and thus we might relax that. Our thinking was that we can add internal calls later, if customers demand it. But maybe we should consider it now.

I assume a debugging aid there will be a mode to see the fully expanded view of the code

That seems reasonable.

@emina
Copy link
Contributor

emina commented Mar 25, 2024

Abbreviations in Twelf, per the link, look like type abbreviations, not term-level functions. They don't seem to have parameters. In Lean, an "abbreviation" seems to be a non-unicode way of specifying a unicode character.

Lean abbrev is more powerful than that. It can be used to define type abbreviations but it can also be used to define function-like macros. Something like:

abbrev foo (v : Nat) : Nat := 1 + v

Signed-off-by: Mike Hicks <mwhicks@amazon.com>
@philhassey
Copy link

Hi, I did some thinking here with a few folks, just some observations:

  1. This would begin a journey into having a fully featured Cedar "programming" language. I could imagine it would be very useful to be able to define constants and even types with macro-methods on them. At that point, I'd also want to be able to define packages, so I wouldn't always be populating a global namespace anytime I used these features.

  2. Related to 1, a Cedar policy is currently just a list of policy statements. This change would be a notable departure from that, as now a Cedar policy would be not only policy statements, but whatever macros, etc, are also in the document. So a tradeoff is made in terms of understandability - you have to understand more language features, but your policy statements could become more terse.

  3. I could also see accomplishing all the same stuff through a non-Cedar templating language of choice. This would keep the core Cedar be just policy, and leave the meta-templating outside the scope of Cedar. The drawback here, being you'd have to learn Cedar and whatever your templating language is.

In any case, I really liked some of the comments / notes in the RFC such as:

  • Wanting to keep performance provable
  • Wanting to avoid user-made extensions because they could have non-deterministic results which breaks the ability to re-play a decision
  • The use of the word macro seems better than function for what these are

In the Go implementation, I think adding user-made extensions wouldn't be terribly difficult, but the concerns about breaking the analyzability / determinism give me pause to think.

@200sc
Copy link

200sc commented Mar 25, 2024

Similar concerns are already present in comments, but:

  1. Is this problem not adequately solved by external templating today?
  2. Is this the right part of the cedar ecosystem to insert macros?
    2a. Given this introduces the ability to define the same macro twice, presumably causing an error, then without a packaging system this makes macros unusable once you reach a large enough mass of cedar policy. Schemas already exist to contain these kinds of unique identifiers.
  3. Philosophically is cedar the place where users spend their time and define their logic (constants, packages and scopes, functions) or is cedar a data format for the result of higher-level language? Currently Cedar is trivial to understand because everything can be interpreted in the same way: forbid or permit according to these rules. From an auditing perspective cedar is thus an ideal, if verbose, format for expressing exactly what is and is not allowed. Adding macros within cedar itself adds a layer of indirection to cedar's auditability; if Cedar is eventually going to become something like a Turing complete programming language, that seems appropriate, but I'd appreciate an intentional decision stating that intended direction.

@emina
Copy link
Contributor

emina commented Mar 26, 2024

Thank you @philhassey and @200sc for the comments!

My take is along the same lines: I want all the abstraction in a general purpose language :) but not necessarily in a DSL like Cedar, for the reasons mentioned in the discussion and in your comments.

To answer the last point raised by @200sc: Cedar won't become Turing complete. Analyzability is a key tenet of the design.

It is worth noting that macros won't compromise logical analyzability---that is, the ability to mechanically reason about policy behavior by reducing its semantics to a logical formula. But they can hurt understandability as much as they help, as all abstractions do.

In my view, the upsides of abstraction (terseness and expression reuse) are not worth the added complexity for policy writers, readers, services and applications, and implementors of Cedar.

@mwhicks1
Copy link
Contributor

@emina @philhassey @200sc all express this sentiment: No abstractions in Cedar, which is basically a data format; abstractions can be in a general-purpose language.

I see the appeal of this position, but I fear that taking it fails to solve the problems that motivated this RFC.

Cedar is not just a data format. It was specifically designed so its policies can be authored, read, and understood by human beings. One reflection of this design is our use of a custom syntax rather than JSON (as with AWS IAM) or XML (as with XACML). Customers specifically tell us they appreciate the greater readability of Cedar’s syntax. Another reflection of this design is keeping Cedar’s mechanisms simple, which aids understanding by humans and analysis tools.

Cedar supports the “policies as code” methodology: authorization policies are written as code, in a language such as Cedar, and kept separate (as much as possible) from the application(s) that use those policies. Doing so makes policies easier to audit and maintain. As more authorization logic is kept in the application code, the less one can understand about the application’s security rules by just looking at the Cedar policies. And the harder it is to evolve those policies, since you need to evolve the application code, too.

Let’s look at the initial SemVer example. If we do nothing, then policies using SemVer are harder to write and understand. If we support macros as proposed, these tasks are made easier. The example I added yesterday is also much easier to read once you include macros. Conversely, it is easier to misunderstand or make mistakes without them. Users might like to address these difficulties. How should they do so?

One idea is to use a non-Cedar macro-supporting language. For example, I could store my Cedar policies in a text file with macro constructions from the C preprocessor:

#define semver(Major, Minor, Patch) \
  { major : Major, minor : Minor, patch : Patch }

#define semverGT(Lhs, Rhs) \ 
  if Lhs.major == Rhs.major then \
    if Lhs.minor == Rhs.minor then \
      Lhs.patch > Rhs.patch \
    else \
      Lhs.minor > Rhs.minor \
  else \
    Lhs.major > Rhs.major

permit (principal, action == Action::"someAction", resource)
when {
     semverGT(resource.apiVersion, semver(2,1,0))
};

This is not too bad. But it has some drawbacks:

  • The C macro language is tricky — it is not hygienic, and you have to remember to put everything on one line. Readers might misinterpret the effects of the macro system. The proposed Cedar macro system is designed to be easy to understand, and harder to use incorrectly (e.g., by preventing non termination, or hygiene issues). Maybe there is an existing language that does the same.
  • You cannot validate, transform, analyze, or otherwise manipulate the policies directly, you have to do so with the post-processed policies. This adds complication, especially policy transformations. Even without transformation, validation and analysis results might be less informative — they will be expressed in terms of the post-processed policies, not in terms of the macro abstractions, which might lead to duplication or other confusion. The Cedar macro system proposed here can help. Importantly, it is not adding any expressive power to Cedar, which means that it is not making validation or analysis complexity any worse.

I do agree with the concerns about managing macro definitions. Introducing them probably means thinking about a packaging system soon, if not immediately. We already have interest in packaging and sharing schema definitions (see RFC 58), so this would be a natural step. But I don’t feel that we need a package system before introducing macros. It can be a fast-follow.

I note that telling users to use their own templating system essentially punts the management question to them. They, too, might want to manage common abstractions by packages or names. Cedar’s reason for existence is to help users avoid tricky details about managing their authorization posture. Surfacing the need for package/definition management in Cedar means that it becomes easier for users to collaborate on building policy abstractions that others can use.

As for moving Cedar towards a general-purpose language: As @emina says, we want to keep Cedar analyzable, which means there is a firm ceiling over how general-purpose it can get.

@philhassey
Copy link

Hi @mwhicks1 - just to clarify my statement:

I could imagine it would be very useful to be able to define constants and even types with macro-methods on them. At that point, I'd also want to be able to define packages, so I wouldn't always be populating a global namespace anytime I used these features.

I'm not objecting in general, but I am saying "if we do this, we'll want a few more things". Which relates well to your comment:

But I don’t feel that we need a package system before introducing macros. It can be a fast-follow.

I feel the same way about adding in constants, and types with macro-methods. At which point Cedar would be a much richer language. I do think it's reasonable at this point to consider if that's where we want to end up though. I'm not sure.

@aaronjeline
Copy link
Contributor Author

Responding to various comments:
From @philhassey

In the Go implementation, I think adding user-made extensions wouldn't be terribly difficult, but the concerns about breaking the analyzability / determinism give me pause to think.

Even worse, it's another step where we need to make sure we have the exact same behavior between the Rust and the Go, since they won't be able to share this code easily. While this is also true of the macro system, it's not true of each individual macro.

From @200sc

Is this problem not adequately solved by external templating today?
No, as A) template holes can only be ?principal and ?resource, and even if we relax that restriction, template holes can't take arguments.

Given this introduces the ability to define the same macro twice, presumably causing an error, then without a packaging system this makes macros unusable once you reach a large enough mass of cedar policy.
Not disagreeing with the general point, however the proposal as stated now does allow for namespacing your macros.

text/0061-functions.md Outdated Show resolved Hide resolved
text/0061-functions.md Outdated Show resolved Hide resolved
text/0061-functions.md Outdated Show resolved Hide resolved
text/0061-functions.md Outdated Show resolved Hide resolved
text/0061-functions.md Outdated Show resolved Hide resolved
text/0061-functions.md Outdated Show resolved Hide resolved
text/0061-functions.md Outdated Show resolved Hide resolved
text/0061-functions.md Outdated Show resolved Hide resolved
text/0061-functions.md Outdated Show resolved Hide resolved
text/0061-functions.md Outdated Show resolved Hide resolved

Of course, these drawbacks do not necessarily speak against Cedar functions generally, but suggest that for suitably general use-cases (like decimal numbers!), an extension function might be warranted instead.

### Hidden performance costs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me, this (tied with the general principal to avoid redundant language features) are the most concerning parts of this RFC. Since there are reasonable mitigations (e.g., the vscode plugin can display the expanded policy) I think this is ok if there is a sufficient level of interest in macros (let's say 5 independent users asking for the feature).

So far, I don't think we've reached this level. So I'm not opposed to this RFC, but I'd vote against doing it now.

@D-McAdams
Copy link
Contributor

Adding my two cents: I'm inclined to "no".

It puts a huge burden on services built around Cedar. To keep up with Cedar releases, these parties need to schedule a large engineering project for storing these fragments and utilizing them efficiently. If they don't, they fall behind the language treadmill. Or, we end up with a bunch of opt-in features for Cedar such that when someone says they are "Cedar compliant", nobody knows what that means until they read the matrix of opt-in features within the service documentation.

We also have audiences who are already concerned that Cedar may be growing too complex when their customers are trying to achieve simple things. Evolving Cedar toward something that starts looking like a full-fledged programming language is a slippery slope that runs a real risk of turning away these audiences.

I'm sympathetic to the goals, and agree with other comments that audiences could use their favorite standard templating language and preprocess the Cedar themselves. And, I wouldn't be adverse to vendors supporting this feature on their own timelines, as an extension. Or, even for Cedar to suggest a common approach to this.

I also noted this discussion began with a request for a "Semver" extension, which I'm also sympathetic too. Somewhat ironically, I'm wondering if it would have been faster to just implement "Semver"!

@aaronjeline
Copy link
Contributor Author

aaronjeline commented Mar 28, 2024

to keep up with Cedar releases, these parties need to schedule a large engineering project for storing these fragments and utilizing them efficiently

This is a good point, and suggests what others are saying about needing a higher bar for requests from users

I'm sympathetic to the goals, and agree with other comments that audiences could use their favorite standard templating language and preprocess the Cedar themselves. And, I wouldn't be adverse to vendors supporting this feature on their own timelines, as an extension

I disagree with this for two reasons:

  1. Having each service implement their own templating feature means they won't be compatible, you can't easily move between them. If people come to expect a templating feature, then you still have to solve the all the service-oriented problems around storing/retrieiving functions. In the worst case, we evolve towards an ad-hoc, poorly specified standard instead of an intentionally thought out one. (Regardless of it's this RFC or another shot at it). It won't be able to integrate with features like Cedar namespaces or a potential future module system, so that will also be left for service owners to figure our ad-hoc.

  2. Text based templates are fundamentally the wrong tool for working with Cedar policies. Cedar policies are not linear sequences of characters, they are trees. A generic templating tool such as the C preprocessor is not able to reason about the actual structures users are manipulating, just the text based shadow of it. In the current proposal for example: every macro invocation is guaranteed to produce a valid cedar expression. It's fairly trivial to extend the current proposal to allow for type checking over macro functions in isolation. These two things are impossible for the text-based case. If we worry about the cognitive complexity of reading policies with macro invocations (a fair concern!) then the overhead of reading policies with arbitrary text substitutions is going to be worse and easier to mess up. This is why we have features like policy templates and the JSON EST.

for Cedar to suggest a common approach to this.

I think if we want to suggest a common approach to this, it should be something like this RFC.

@mwhicks1
Copy link
Contributor

I also noted this discussion began with a request for a "Semver" extension, which I'm also sympathetic too. Somewhat ironically, I'm wondering if it would have been faster to just implement "Semver"!

To address this last point: We didn't just implement SemVer because doing so has maintenance costs which don't scale well. Each extension function we add increases the testing, validation, analysis, etc. burden of core Cedar. Instead, we wanted to explore a single mechanism that addresses them in one fell swoop. This is more scalable on the core Cedar team, and gives more agency to Cedar users.

I am still bullish that what's proposed here is the right thing to do. But I also appreciate the words of caution. The right course is to gather more information about uses/needs, and to think about open questions like packaging/management of macro definitions (which still aren't solved by the current extension function mechanism, even if we thought that was the right approach instead).

One thought: Could we add Cedar macros as an experimental feature, and let customers use it and report back?

@cdisselkoen
Copy link
Contributor

Chiming in here to substantially agree with @aaronjeline and @mwhicks1, particularly their most recent two comments each. I think it would be a big mistake to encourage folks to use "standard" templating languages on top of Cedar, instead of building something like this RFC, which can enforce Cedar macro hygiene, syntax checking, etc etc. To the extent that one of the concerns raised in comments here is that this feature increases the cognitive complexity of Cedar : using a general-purpose templating language on top of Cedar would be even worse in terms of cognitive complexity, and wouldn't integrate with Cedar validation and analysis, like the macros in this proposal would.

Co-authored-by: Andrew Wells <130512013+andrewmwells-amazon@users.noreply.github.com>
@emina
Copy link
Contributor

emina commented Mar 29, 2024

wouldn't integrate with Cedar validation and analysis, like the macros in this proposal would.

Can you say what you mean by integration?

If an external templating system simply expands whatever the user writes into a Cedar policy, the analysis will handle the expanded policy fine and so will validation.

It’s worth noting that custom macros don’t provide any advantage for analysis, which would have to inline every invocation of the macro in order to capture its semantics at the invocation point. (That’s how we handle extension functions too—the definition is inlined at the call site, after appropriate sequencing of argument evaluation to account for errors.)

In this proposal specifically, macros aren’t helping with validation either, because they aren’t typed and cannot be type checked independently of call sites. We could consider writing a type inference algorithm that infers types for the polymorphic macros, but that’s another big endeavor, source of complexity, and engineering burden.

An advantage that this proposal has over an external templating system is the potential for better error messages, because a custom macro system (like this one) could propagate source location information through expansion. (And maybe this is what integration is referring to in the parent comment?)

Still, I strongly urge caution here, and specifically, not underestimating the burden that this feature imposes on users, services, and language implementers.

If I had to take a guess at the implementation and maintenance effort for Cedar alone, I would say we’re looking at an additional ~10K LOC, including the Rust implementation (with proper expansion tracking, error messages, etc), Lean model and proof (by having a parallel semantics with CBN and proving it equivalent to the core semantics after expansion), and diff testing.

We need to be really sure that this is the right direction for Cedar before proceeding, and not just because of the engineering effort.

In my view, this is not the right direction from a language design point of view either. As several other commenters noted, this would push Cedar toward looking more like a general-purpose language. And that’s a slippery slope, which inevitably leads to more complexity and feature creep.

@aaronjeline
Copy link
Contributor Author

aaronjeline commented Mar 29, 2024

Can you say what you mean by integration?

syntax highlighting, autocomplete, hover tooltips, potential for type annotations, using the cedar namespace system, using whatever stdlib mechanism we may decide on, checking if a macro is a valid expression, rendering the macro as an EST, runtime errors that track proper source locations, type checking errors that track proper source locations, anything that requires parsing or thinking about the macro in isolation becomes impossible in the "generic template" system answer.

As several other commenters noted, this would push Cedar toward looking more like a general-purpose language. And that’s a slippery slope, which inevitably leads to more complexity and feature creep.

This feels like a very ill-defined argument. We still have the guardrail of analysis preventing us from expanding. This adds no expressive power to the language. Templates did not meet this criticism, were we wrong to miss it there? Action groups likewise. Surely opening of the extension mechanism so users can easily share extensions would hit this bar. I'm not saying it's an incorrect argument, it just feels vague.

I think there's three arguments happening here, and I'll try to distill them.

  1. This feature is both a large implementation lift and are large conceptual addition to the language. We have not met the bar of evidence that users want this feature to even begin to consider it.

This is fine. I may personally disagree, but I'm ok with declining for now and seeing if more evidence accumulates. If that evidence ends up accumulating, we could potentially revive this RFC or consider a different mechanism for the same goal. If it doesn't, we won't.

  1. We think users will want this feature, but we don't want to own it (for the reasons above), so we should recommend they use an external template system.

This feels very wrong to me. For all the reasons I've described above. I'll add one more here. If a macro system within cedar brings it too close to a general purpose languages, why does a recommended macro system outside of cedar not do that? Again, operationally, you have all the same concerns. Semantically/readability wise, you have all the same concerns except with less guadrails. Standards wise it's a mess. To a person writing/reading policies as per our recommendations, it's equivalent complexity. (It's probably definitionally more complexity as these macros would be text macros) If we know people want this feature, and the feature is feasible, then it feels weird to recommend what is essentially a workaround. Is there an existing template system we can point to that we think is equivalently readable?

  1. Regardless of whether people want this feature, we shouldn't do it because we believe abstraction mechanisms like this one don't belong in Cedar. In this world, we also don't recommend using external templating systems, because we think reading/writing cedar policies in terms of a context of non-locally-defined substitutions is not a good idea.

Obviously I disagree (I wrote the RFC :D ), but there's probably more to talk about here.

If I've misstated anyone's position, please correct me.

@max2me
Copy link

max2me commented Mar 29, 2024

The more I think about this feature, the more torn on it I become. I see value it provides in streamlining policies and reducing duplication. On the other hand, introducing brand new standalone concepts like these macros feels a bit heavy handed, with implications for integrated services.

May I share an idea to get some feedback on it?

Extension methods in C# inspired this approach. Basically, we'd allow users to define their own type extensions in schema directly. Semver example could then look like this:

// Schema declaration
type SemVer = {
	major: Long,
	minor: Long,
	patch: Long,
	greaterThan(?other: SemVer): {
		if this.major == ?other.major then 
			if this.minor == ?other.minor then
				this.patch > ?other.patch
			else
				this.minor > ?other.minor
		else
			this.major > ?other.major
	}
}

// Use in policy (assuming resource.apiVersion: SemVer)
// Permit access when the api is greater than 2.1
permit (principal, action == Action::"someAction", resource)
when {
    resource.apiVersion.greaterThan({ major: 2, minor: 1, patch: 0 })
};

I understand that this goes against one of our tenets that schema is not supposed to affect authorization outcome. But I also feel like we've taken incremental steps to relax that stance with schema-based parsing of the request data and schema-based validation of the request.

@emina
Copy link
Contributor

emina commented Mar 29, 2024

Thanks @aaronjeline for summarizing the three arguments! That's really helpful. I'm in camps (1) and (3), and below, I'll try to explain why this RFC is triggering my spidey sense, with the full acknowledgement that this opinion is biased (informed? :D) by my own programming experience (Racket, Lean, and a long time ago, Java) and design preferences (minimalist).

  1. This feature is both a large implementation lift and are large conceptual addition to the language. We have not met the bar of evidence that users want this feature to even begin to consider it.

I believe this is true. We haven't had any users (that I know of) ask for this feature. We have had many users ask for all other features that have required this level of engineering effort (e.g., partial evaluation).

We have also not considered a feature that departs this much from the original language design, which focuses on simplicity and "what you see is what you get" behavior. This focus is lost with no gain in expressiveness and with minor (in my eyes) gain in convenience. It is worth noting that this RFC doesn't solve the problem that originally inspired it, SemVer comparison, because the original ask was for the comparison function to provide a similar API to decimals and IP addresses, which requires writing an extension function to parse SemVer strings.

The effects of supporting this feature are wide-ranging and global. This feature is not only an engineering lift for us, but also for all applications and services that use Cedar. We would be forcing them to deal with macro storage, management, distribution, and updates---or risk falling behind the language treadmill as @D-McAdams's comment explains. Services and applications are our main customers, even more so than individual policy writers, and I would want to see at least 5 services / applications actively advocate for this feature before adding it to the language (borrowing 5 from @andrewmwells-amazon's comment).

  1. Regardless of whether people want this feature, we shouldn't do it because we believe abstraction mechanisms like this one don't belong in Cedar. In this world, we also don't recommend using external templating systems, because we think reading/writing cedar policies in terms of a context of non-locally-defined substitutions is not a good idea.

I believe this is true as well, and the following is going to stray a bit into subjective preferences and design philosophy, so please bear with me :)

Cedar is not a general-purpose language, and it was designed explicitly to be analyzable and simple. The current proposal is to add a feature to the language that is extremely limited (for good reasons), and not really capable of letting you build real abstractions---which are built through composition. Cedar macros cannot call other macros (for good reasons). They are not real CBV functions (for good reasons), which is what most programmers expect from functions. They are also a very weak syntactic abstraction (for good reasons), because they are lacking all the features you find in powerful macro systems (such as Racket or Lean). In other words, they land us in the uncanny valley of language design (in my eyes), where Cedar looks sort-of general purpose without actually being so. This then creates the expectation of even more features you find in a general purpose language (as noted by @philhassey): global constants, module system, packaging system, etc. And that lands us in a state where Cedar is not beautiful (again, in my eyes) either through simplicity or power.

I cannot think of any popular, simple, standalone DSLs that have things like macros, modules, packages, etc. (I can think of some that do have these features but are considered complex, and are well out of bounds of what is analyzable.) This is likely my own blind spot, and I'd be super curious if someone can point me at some DSLs where this has been done well, where it's considered simple and ergonomic, and that we should emulate :)

Cedar is not used like most programming languages. Most programming languages are file-oriented; people write a single conceptual unit (application) that is thousands of lines of interconnected code; and in that world, things like modules and abstractions are absolutely essential. While some people may use Cedar by putting all policies in a single file, that is not the case for a large number of users who interact with Cedar through a service or an application. Here, we see two modes of use: a (relatively!) small number of static policies, or a small number of templates with thousands of instantiation. In both cases, the actual logical unit (the authz model) is not thousands of lines of hand-written Cedar code, and in my view, macros or other heavyweight abstraction mechanism are not needed to build and maintain this logical unit.

Finally, thank you all for a really great discussion. I've found it immensely helpful in crystallizing my own understanding of my gut reaction :))

@sarahcec sarahcec added archived This RFC is pending, but no action is currently planned -- please comment if you are interested and removed pending This RFC is pending; for definitions see the README labels May 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
archived This RFC is pending, but no action is currently planned -- please comment if you are interested
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet