EBNF grammar for the text format #82

Geal · 2021-10-14T20:26:20Z

while the binary format and Datalog engines can handle fact names or strings with various kinds of characters, the text format should specify what it accepts. More generally, we need a grammar for the entire language, to make sure all implementations parse the same way.

For now, let'sspecify this:

fact and rule names begin with a-zA-Z, then the rest of accepted characters are a-zA-Z0-9_:
variable names begin with a $, then the rest of accepted characters are a-zA-Z0-9_`
strings contain UTF-8 characters, without BOM

The text was updated successfully, but these errors were encountered:

Geal · 2021-10-15T21:37:26Z

Here's a first draft:

<elements> ::= (<element> | <comment> )*
<element> ::= <sp>? ( <policy> | <check> | <fact> | <rule> ) <sp>? ";" <sp>?
<comment> ::= "//" ([a-z] | [A-Z] ) ([a-z] | [A-Z] | [0-9] | "_" | ":" | " " | "\t" | "(" | ")" | "$" | "[" | "]" )* "\n"

<fact> ::= <name> "(" <sp>? <fact_term> (<sp>? "," <sp>? <fact_term> )* <sp>? ")"
<rule> ::= <predicate> <sp>? "<-" <sp>? <rule_body>
<check> ::= "check" <sp> "if" <sp> <rule_body>
<policy> ::= ("allow" | "deny") <sp> "if" <sp> <rule_body>

<rule_body> ::= <rule_body_element> <sp>? ("," <sp>? <rule_body_element> <sp>?)*
<rule_body_element> ::= <predicate> | <expression>

<predicate> ::= <name> "(" <sp>? <term> (<sp>? "," <sp>? <term> )* <sp>? ")"
<name> ::= ([a-z] | [A-Z] ) ([a-z] | [A-Z] | [0-9] | "_" | ":" )*
<term> ::= <fact_term> | <variable>
<fact_term> ::= <boolean> | <string> | <number> | <bytes> | <date> | <set>


<string> ::= "\"" ([a-z] | [A-Z] | [0-9] | "\\" | "?" | "." | "*" | "_" | " " )* "\""
<number> ::= [0-9]+
<variable> ::= "$" ([a-z] | [A-Z] | [0-9] ) ([a-z] | [A-Z] | [0-9] | "_" )*
<bytes> ::= "hex:" ([a-z] | [0-9] )+
<boolean> ::= "true" | "false"
<date> ::= [0-9]* "-" [0-9] [0-9] "-" [0-9] [0-9] "T" [0-9] [0-9] ":" [0-9] [0-9] ":" [0-9] [0-9] ( "Z" | ( "+" [0-9] [0-9] ":" [0-9] [0-9] ))
<set> ::= "[" <sp>? ( <fact_term> ( <sp>? "," <sp>? <fact_term>)* <sp>? )? "]"

<expression> ::= <expression_element> (<sp>? <operator> <sp>? <expression_element>)*
<expression_element> ::= <expression_unary> | (<expression_term> <expression_method>? ) 
<expression_unary> ::= "!" <sp>? <expression>
<expression_method> ::= "." <method_name> "(" <sp>? (<term> ( <sp>? "," <sp>? <term>)* )? <sp>? ")" 
<method_name> ::= ([a-z] | [A-Z] ) ([a-z] | [A-Z] | [0-9] | "_" )*

<expression_term> ::= <term> | ("(" <sp>? <expression> <sp>? ")")
<operator> ::= "<" | ">" | "<=" | ">=" | "==" | "&&" | "||" | "+" | "-" | "*" | "/" 

<sp> ::= (" " | "\t" | "\n")+

It can be tested online with this test code:

right("file1", "read");
check if resource($0), operation("read"), right($0, "read");

right($0, "read") <- resource($0), user_id($1), owner($1, $0);

check if time(2018-12-20T00:00:00+00:00);
allow if true;
deny if false;
check if 1 <= 1;
check if 1 + 2 * 3 - 4 / 2 == 5;
check if "aaabde".matches("a*c?.e");
check if "hello world".starts_with("hello") && "hello world".ends_with("world");

Geal · 2021-10-19T20:54:23Z

I guess the last roadblock here is defining which characters are accepted in strings. Since this a grammar for a programming language, not a serialization format, I guess w can accept any printable character, including UTF-8 chars?

divarvel · 2021-10-20T13:48:04Z

Looks good! That's an important part.

I have a couple questions:

unicode letters are not allowed anymore in variables and fact names (compared to the current rust impl). Is that something we want, or a constraint from ebnf?
would it make sense to authorize : in variable names as well?

The EBNF grammar defined in biscuit-auth/biscuit#82 makes the parsing rules a bit more explicit: - fact names can only start with a letter - variable names can't contain a colon - fact an variable names cannot contain non-ascii letters or numbers

Geal · 2021-10-21T20:41:28Z

I think we can allow unicode letters in variables, except space characters and $.,()[]. I guess we can authorize : too

divarvel · 2021-10-22T07:55:09Z

i think we should have

<block> ::= (<block_element> | <comment> )*
<block_element> ::= <sp>? ( <check> | <fact> | <rule> ) <sp>? ";" <sp>?
<authorizer> ::= (<authorizer_element> | <comment> )*
<authorizer_element> ::= <sp>? ( <policy> | <check> | <fact> | <rule> ) <sp>? ";" <sp>?

as blocks and authorizers don't appear in the same context

Geal · 2021-10-29T20:13:43Z

right, that makes sense

Geal · 2021-10-29T20:26:03Z

current version is at cbc7aac, I think we'll need to add precisions in the future

The EBNF grammar defined in biscuit-auth/biscuit#82 makes the parsing rules a bit more explicit: - fact names can only start with a letter

Geal · 2022-02-25T22:59:26Z

closing this because v2 has shipped

divarvel mentioned this issue Oct 21, 2021

biscuit: update parser to conform to the EBNF grammar biscuit-auth/biscuit-haskell#22

Merged

Geal mentioned this issue Oct 29, 2021

Biscuit 2.0 #72

Closed

Geal closed this as completed Feb 25, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

EBNF grammar for the text format #82

EBNF grammar for the text format #82

Geal commented Oct 14, 2021

Geal commented Oct 15, 2021

Geal commented Oct 19, 2021

divarvel commented Oct 20, 2021

Geal commented Oct 21, 2021

divarvel commented Oct 22, 2021 •

edited

Geal commented Oct 29, 2021

Geal commented Oct 29, 2021 •

edited

Geal commented Feb 25, 2022

EBNF grammar for the text format #82

EBNF grammar for the text format #82

Comments

Geal commented Oct 14, 2021

Geal commented Oct 15, 2021

Geal commented Oct 19, 2021

divarvel commented Oct 20, 2021

Geal commented Oct 21, 2021

divarvel commented Oct 22, 2021 • edited

Geal commented Oct 29, 2021

Geal commented Oct 29, 2021 • edited

Geal commented Feb 25, 2022

divarvel commented Oct 22, 2021 •

edited

Geal commented Oct 29, 2021 •

edited