Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EBNF grammar for the text format #82

Closed
Geal opened this issue Oct 14, 2021 · 8 comments
Closed

EBNF grammar for the text format #82

Geal opened this issue Oct 14, 2021 · 8 comments

Comments

@Geal
Copy link
Contributor

Geal commented Oct 14, 2021

while the binary format and Datalog engines can handle fact names or strings with various kinds of characters, the text format should specify what it accepts. More generally, we need a grammar for the entire language, to make sure all implementations parse the same way.

For now, let'sspecify this:

  • fact and rule names begin with a-zA-Z, then the rest of accepted characters are a-zA-Z0-9_:
  • variable names begin with a $, then the rest of accepted characters are a-zA-Z0-9_`
  • strings contain UTF-8 characters, without BOM
@Geal
Copy link
Contributor Author

Geal commented Oct 15, 2021

Here's a first draft:

<elements> ::= (<element> | <comment> )*
<element> ::= <sp>? ( <policy> | <check> | <fact> | <rule> ) <sp>? ";" <sp>?
<comment> ::= "//" ([a-z] | [A-Z] ) ([a-z] | [A-Z] | [0-9] | "_" | ":" | " " | "\t" | "(" | ")" | "$" | "[" | "]" )* "\n"

<fact> ::= <name> "(" <sp>? <fact_term> (<sp>? "," <sp>? <fact_term> )* <sp>? ")"
<rule> ::= <predicate> <sp>? "<-" <sp>? <rule_body>
<check> ::= "check" <sp> "if" <sp> <rule_body>
<policy> ::= ("allow" | "deny") <sp> "if" <sp> <rule_body>

<rule_body> ::= <rule_body_element> <sp>? ("," <sp>? <rule_body_element> <sp>?)*
<rule_body_element> ::= <predicate> | <expression>

<predicate> ::= <name> "(" <sp>? <term> (<sp>? "," <sp>? <term> )* <sp>? ")"
<name> ::= ([a-z] | [A-Z] ) ([a-z] | [A-Z] | [0-9] | "_" | ":" )*
<term> ::= <fact_term> | <variable>
<fact_term> ::= <boolean> | <string> | <number> | <bytes> | <date> | <set>


<string> ::= "\"" ([a-z] | [A-Z] | [0-9] | "\\" | "?" | "." | "*" | "_" | " " )* "\""
<number> ::= [0-9]+
<variable> ::= "$" ([a-z] | [A-Z] | [0-9] ) ([a-z] | [A-Z] | [0-9] | "_" )*
<bytes> ::= "hex:" ([a-z] | [0-9] )+
<boolean> ::= "true" | "false"
<date> ::= [0-9]* "-" [0-9] [0-9] "-" [0-9] [0-9] "T" [0-9] [0-9] ":" [0-9] [0-9] ":" [0-9] [0-9] ( "Z" | ( "+" [0-9] [0-9] ":" [0-9] [0-9] ))
<set> ::= "[" <sp>? ( <fact_term> ( <sp>? "," <sp>? <fact_term>)* <sp>? )? "]"

<expression> ::= <expression_element> (<sp>? <operator> <sp>? <expression_element>)*
<expression_element> ::= <expression_unary> | (<expression_term> <expression_method>? ) 
<expression_unary> ::= "!" <sp>? <expression>
<expression_method> ::= "." <method_name> "(" <sp>? (<term> ( <sp>? "," <sp>? <term>)* )? <sp>? ")" 
<method_name> ::= ([a-z] | [A-Z] ) ([a-z] | [A-Z] | [0-9] | "_" )*

<expression_term> ::= <term> | ("(" <sp>? <expression> <sp>? ")")
<operator> ::= "<" | ">" | "<=" | ">=" | "==" | "&&" | "||" | "+" | "-" | "*" | "/" 

<sp> ::= (" " | "\t" | "\n")+

It can be tested online with this test code:

right("file1", "read");
check if resource($0), operation("read"), right($0, "read");

right($0, "read") <- resource($0), user_id($1), owner($1, $0);

check if time(2018-12-20T00:00:00+00:00);
allow if true;
deny if false;
check if 1 <= 1;
check if 1 + 2 * 3 - 4 / 2 == 5;
check if "aaabde".matches("a*c?.e");
check if "hello world".starts_with("hello") && "hello world".ends_with("world");

@Geal
Copy link
Contributor Author

Geal commented Oct 19, 2021

I guess the last roadblock here is defining which characters are accepted in strings. Since this a grammar for a programming language, not a serialization format, I guess w can accept any printable character, including UTF-8 chars?

@divarvel
Copy link
Collaborator

Looks good! That's an important part.

I have a couple questions:

  • unicode letters are not allowed anymore in variables and fact names (compared to the current rust impl). Is that something we want, or a constraint from ebnf?
  • would it make sense to authorize : in variable names as well?

divarvel added a commit to biscuit-auth/biscuit-haskell that referenced this issue Oct 21, 2021
The EBNF grammar defined in biscuit-auth/biscuit#82
makes the parsing rules a bit more explicit:

- fact names can only start with a letter
- variable names can't contain a colon
- fact an variable names cannot contain non-ascii letters or numbers
@Geal
Copy link
Contributor Author

Geal commented Oct 21, 2021

I think we can allow unicode letters in variables, except space characters and $.,()[]. I guess we can authorize : too

@divarvel
Copy link
Collaborator

divarvel commented Oct 22, 2021

i think we should have

<block> ::= (<block_element> | <comment> )*
<block_element> ::= <sp>? ( <check> | <fact> | <rule> ) <sp>? ";" <sp>?
<authorizer> ::= (<authorizer_element> | <comment> )*
<authorizer_element> ::= <sp>? ( <policy> | <check> | <fact> | <rule> ) <sp>? ";" <sp>?

as blocks and authorizers don't appear in the same context

@Geal
Copy link
Contributor Author

Geal commented Oct 29, 2021

right, that makes sense

@Geal
Copy link
Contributor Author

Geal commented Oct 29, 2021

current version is at cbc7aac, I think we'll need to add precisions in the future

@Geal Geal mentioned this issue Oct 29, 2021
divarvel added a commit to biscuit-auth/biscuit-haskell that referenced this issue Oct 31, 2021
The EBNF grammar defined in biscuit-auth/biscuit#82
makes the parsing rules a bit more explicit:

- fact names can only start with a letter
@Geal
Copy link
Contributor Author

Geal commented Feb 25, 2022

closing this because v2 has shipped

@Geal Geal closed this as completed Feb 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants