Skip to content
Permalink
master
Go to file
21 contributors

Users who have contributed to this file

@Gabriel439 @Nadrieril @f-f @SiriusStarr @philandstuff @SupraSummus @singpolyma @sjakobi @travisbrown @srenatus @ocharles @hauleth
933 lines (790 sloc) 31.9 KB
; ABNF syntax based on RFC 5234
;
; The character encoding for Dhall is UTF-8
;
; Some notes on implementing this grammar:
;
; First, do not use a lexer to tokenize the file before parsing. Instead, treat
; the individual characters of the file as the tokens to feed into the parser.
; You should not use a lexer because Dhall's grammar supports two features which
; cannot be correctly supported by a lexer:
;
; * String interpolation (i.e. "foo ${Natural/toInteger bar} baz")
; * Nested block comments (i.e. "{- foo {- bar -} baz -}")
;
; Second, this grammar assumes that your parser can backtrack and/or try
; multiple parses simultaneously. For example, consider this expression:
;
; List ./MyType
;
; A parser might first try to parse the period as the beginning of a field
; selector, only to realize immediately afterwards that `/MyType` is not a valid
; name for a field. A conforming parser must backtrack so that the expression
; `./MyType` can instead be correctly interpreted as a relative path
;
; Third, if there are multiple valid parses then prefer the first parse
; according to the ordering of alternatives. That is, the order of evaluation
; of the alternatives is left-to-right.
;
; For example, the grammar for single quoted string literals is:
;
; single-quote-continue =
; "'''" single-quote-continue
; / "${" complete-expression "}" single-quote-continue
; / "''${" single-quote-continue
; / "''"
; / %x20-10FFFF single-quote-continue
; / tab single-quote-continue
; / end-of-line single-quote-continue
;
; single-quote-literal = "''" single-quote-continue
;
; ... which permits valid parses for the following code:
;
; "''''''''''''''''"
;
; If you tried to parse all alternatives then there are at least two valid
; interpretations for the above code:
;
; * A single quoted literal with four escape sequences of the form "'''"
; * i.e. "''" followed by "'''" four times in a row followed by "''"
; * Four empty single quoted literals
; * i.e. "''''" four times in a row
;
; The correct interpretation is the first one because parsing the escape
; sequence "'''" takes precedence over parsing the termination sequence "''",
; according to the order of the alternatives in the `single-quote-continue`
; rule.
;
; Some parsing libraries do not backtrack by default but allow the user to
; selectively backtrack in certain parts of the grammar. Usually parsing
; libraries do this to improve efficiency and error messages. Dhall's grammar
; takes that into account by minimizing the number of rules that require the
; parser to backtrack and comments below will highlight where you need to
; explicitly backtrack
;
; Specifically, if you see an uninterrupted literal in a grammar rule such as:
;
; "->"
;
; ... or:
;
; %x66.6f.72.61.6c.6c
;
; ... then that string literal is parsed as a single unit, meaning that you
; should backtrack if you parse only part of the literal
;
; In all other cases you can assume that you do not need to backtrack unless
; there is a comment explicitly asking you to backtrack
;
; When parsing a repeated construct, prefer alternatives that parse as many
; repetitions as possible. On in other words:
;
; [a] = a / ""
;
; a* = a* a / ""
;
; Note that the latter rule also specifies that repetition produces
; left-associated expressions. For example, function application is
; left-associative and all operators are left-associative when they are not
; parenthesized.
;
; Additionally, try alternatives in an order that minimizes backtracking
; according to the following rule:
;
; (a / b) (c / d) = a c / a d / b c / b d
; NOTE: There are many line endings in the wild
;
; See: https://en.wikipedia.org/wiki/Newline
;
; For simplicity this supports Unix and Windows line-endings, which are the most
; common
end-of-line =
%x0A ; "\n"
/ %x0D.0A ; "\r\n"
; This rule matches all characters that are not:
;
; * not ASCII
; * not part of a surrogate pair
; * not a "non-character"
valid-non-ascii =
%x80-D7FF
; %xD800-DFFF = surrogate pairs
/ %xE000-FFFD
; %xFFFE-FFFF = non-characters
/ %x10000-1FFFD
; %x1FFFE-1FFFF = non-characters
/ %x20000-2FFFD
; %x2FFFE-2FFFF = non-characters
/ %x30000-3FFFD
; %x3FFFE-3FFFF = non-characters
/ %x40000-4FFFD
; %x4FFFE-4FFFF = non-characters
/ %x50000-5FFFD
; %x5FFFE-5FFFF = non-characters
/ %x60000-6FFFD
; %x6FFFE-6FFFF = non-characters
/ %x70000-7FFFD
; %x7FFFE-7FFFF = non-characters
/ %x80000-8FFFD
; %x8FFFE-8FFFF = non-characters
/ %x90000-9FFFD
; %x9FFFE-9FFFF = non-characters
/ %xA0000-AFFFD
; %xAFFFE-AFFFF = non-characters
/ %xB0000-BFFFD
; %xBFFFE-BFFFF = non-characters
/ %xC0000-CFFFD
; %xCFFFE-CFFFF = non-characters
/ %xD0000-DFFFD
; %xDFFFE-DFFFF = non-characters
/ %xE0000-EFFFD
; %xEFFFE-EFFFF = non-characters
/ %xF0000-FFFFD
; %xFFFFE-FFFFF = non-characters
/ %x100000-10FFFD
; %x10FFFE-10FFFF = non-characters
tab = %x09 ; "\t"
block-comment = "{-" block-comment-continue
block-comment-char =
%x20-7F
/ valid-non-ascii
/ tab
/ end-of-line
block-comment-continue =
"-}"
/ block-comment block-comment-continue
/ block-comment-char block-comment-continue
not-end-of-line = %x20-7F / valid-non-ascii / tab
; NOTE: Slightly different from Haskell-style single-line comments because this
; does not require a space after the dashes
line-comment = "--" *not-end-of-line end-of-line
whitespace-chunk =
" "
/ tab
/ end-of-line
/ line-comment
/ block-comment
whsp = *whitespace-chunk
; nonempty whitespace
whsp1 = 1*whitespace-chunk
; Uppercase or lowercase ASCII letter
ALPHA = %x41-5A / %x61-7A
; ASCII digit
DIGIT = %x30-39 ; 0-9
ALPHANUM = ALPHA / DIGIT
HEXDIG = DIGIT / "A" / "B" / "C" / "D" / "E" / "F"
; A simple label cannot be one of the reserved keywords
; listed in the `keyword` rule.
; A PEG parser could use negative lookahead to
; enforce this, e.g. as follows:
; simple-label =
; keyword 1*simple-label-next-char
; / !keyword (simple-label-first-char *simple-label-next-char)
simple-label-first-char = ALPHA / "_"
simple-label-next-char = ALPHANUM / "-" / "/" / "_"
simple-label = simple-label-first-char *simple-label-next-char
quoted-label-char =
%x20-5F
; %x60 = '`'
/ %x61-7E
quoted-label = *quoted-label-char
; NOTE: Dhall does not support Unicode labels, mainly to minimize the potential
; for code obfuscation
label = ("`" quoted-label "`" / simple-label)
; A nonreserved-label cannot not be any of the reserved identifiers for builtins
; (unless quoted).
; Their list can be found in the `builtin` rule.
; The only place where this restriction applies is bound variables.
; A PEG parser could use negative lookahead to avoid parsing those identifiers,
; e.g. as follows:
; nonreserved-label =
; builtin 1*simple-label-next-char
; / !builtin label
nonreserved-label = label
; An any-label is allowed to be one of the reserved identifiers (but not a keyword).
any-label = label
; Allow specifically `Some` in record and union labels.
any-label-or-some = any-label / Some
; Dhall's double-quoted strings are similar to JSON strings (RFC7159) except:
;
; * Dhall strings support string interpolation
;
; * Dhall strings also support escaping string interpolation by adding a new
; `\$` escape sequence
;
; * Dhall strings also allow Unicode escape sequences of the form `\u{XXX}`
double-quote-chunk =
interpolation
; '\' Beginning of escape sequence
/ %x5C double-quote-escaped
/ double-quote-char
double-quote-escaped =
%x22 ; '"' quotation mark U+0022
/ %x24 ; '$' dollar sign U+0024
/ %x5C ; '\' reverse solidus U+005C
/ %x2F ; '/' solidus U+002F
/ %x62 ; 'b' backspace U+0008
/ %x66 ; 'f' form feed U+000C
/ %x6E ; 'n' line feed U+000A
/ %x72 ; 'r' carriage return U+000D
/ %x74 ; 't' tab U+0009
/ %x75 unicode-escape ; 'uXXXX' / 'u{XXXX}' U+XXXX
; Valid Unicode escape sequences are as follows:
;
; * Exactly 4 hexadecimal digits without braces:
; `\uXXXX`
; * 1-6 hexadecimal digits within braces (with optional zero padding):
; `\u{XXXX}`, `\u{000X}`, `\u{XXXXX}`, `\u{00000XXXXX}`, etc.
; Any number of leading zeros are allowed within the braces preceding the 1-6
; digits specifying the codepoint.
;
; From these sequences, the parser must also reject any codepoints that are in
; the following ranges:
;
; * Surrogate pairs: `%xD800-DFFF`
; * Non-characters: `%xNFFFE-NFFFF` / `%x10FFFE-10FFFF` for `N` in `{ 0 .. F }`
;
; See the `valid-non-ascii` rule for the exact ranges that are not allowed
unicode-escape = unbraced-escape / "{" braced-escape "}"
; All valid last 4 digits for unicode codepoints (outside Plane 0): `0000-FFFD`
unicode-suffix = (DIGIT / "A" / "B" / "C" / "D" / "E") 3HEXDIG
/ "F" 2HEXDIG (DIGIT / "A" / "B" / "C" / "D")
; All 4-hex digit unicode escape sequences that are not:
;
; * Surrogate pairs (i.e. `%xD800-DFFF`)
; * Non-characters (i.e. `%xFFFE-FFFF`)
;
unbraced-escape =
(DIGIT / "A" / "B" / "C") 3HEXDIG
/ "D" ("0" / "1" / "2" / "3" / "4" / "5" / "6" / "7") HEXDIG HEXDIG
; %xD800-DFFF Surrogate pairs
/ "E" 3HEXDIG
/ "F" 2HEXDIG (DIGIT / "A" / "B" / "C" / "D")
; %xFFFE-FFFF Non-characters
; All 1-6 digit unicode codepoints that are not:
;
; * Surrogate pairs: `%xD800-DFFF`
; * Non-characters: `%xNFFFE-NFFFF` / `%x10FFFE-10FFFF` for `N` in `{ 0 .. F }`
;
; See the `valid-non-ascii` rule for the exact ranges that are not allowed
braced-codepoint =
("1" / "2" / "3" / "4" / "5" / "6" / "7" / "8" / "9" / "A" / "B" / "C" / "D" / "E" / "F" / "10") unicode-suffix; (Planes 1-16)
/ unbraced-escape ; (Plane 0)
/ 1*3HEXDIG ; %x000-FFF
; Allow zero padding for braced codepoints
braced-escape = *"0" braced-codepoint
; Printable characters except double quote and backslash
double-quote-char =
%x20-21
; %x22 = '"'
/ %x23-5B
; %x5C = "\"
/ %x5D-7F
/ valid-non-ascii
double-quote-literal = %x22 *double-quote-chunk %x22
; NOTE: The only way to end a single-quote string literal with a single quote is
; to either interpolate the single quote, like this:
;
; ''ABC${"'"}''
;
; ... or concatenate another string, like this:
;
; ''ABC'' ++ "'"
;
; If you try to end the string literal with a single quote then you get "'''",
; which is interpreted as an escaped pair of single quotes
single-quote-continue =
interpolation single-quote-continue
/ escaped-quote-pair single-quote-continue
/ escaped-interpolation single-quote-continue
/ "''" ; End of text literal
/ single-quote-char single-quote-continue
; Escape two single quotes (i.e. replace this sequence with "''")
escaped-quote-pair = "'''"
; Escape interpolation (i.e. replace this sequence with "${")
escaped-interpolation = "''${"
single-quote-char =
%x20-7F
/ valid-non-ascii
/ tab
/ end-of-line
single-quote-literal = "''" end-of-line single-quote-continue
interpolation = "${" complete-expression "}"
text-literal = (double-quote-literal / single-quote-literal)
; RFC 5234 interprets string literals as case-insensitive and recommends using
; hex instead for case-sensitive strings
;
; If you don't feel like reading hex, these are all the same as the rule name.
; Keywords that should never be parsed as identifiers
if = %x69.66
then = %x74.68.65.6e
else = %x65.6c.73.65
let = %x6c.65.74
in = %x69.6e
as = %x61.73
using = %x75.73.69.6e.67
merge = %x6d.65.72.67.65
missing = %x6d.69.73.73.69.6e.67
Infinity = %x49.6e.66.69.6e.69.74.79
NaN = %x4e.61.4e
Some = %x53.6f.6d.65
toMap = %x74.6f.4d.61.70
assert = %x61.73.73.65.72.74
forall-keyword = %x66.6f.72.61.6c.6c ; "forall"
forall-symbol = %x2200 ; Unicode FOR ALL
forall = forall-symbol / forall-keyword
with = %x77.69.74.68
; Unused rule that could be used as negative lookahead in the
; `simple-label` rule for parsers that support this.
keyword =
if / then / else
/ let / in
/ using / missing
/ assert / as
/ Infinity / NaN
/ merge / Some / toMap
/ forall-keyword
/ with
; Note that there is a corresponding parser test in
; `tests/parser/success/builtinsA.dhall`. Please update it when
; you modify this `builtin` rule.
builtin =
Natural-fold
/ Natural-build
/ Natural-isZero
/ Natural-even
/ Natural-odd
/ Natural-toInteger
/ Natural-show
/ Integer-toDouble
/ Integer-show
/ Integer-negate
/ Integer-clamp
/ Natural-subtract
/ Double-show
/ List-build
/ List-fold
/ List-length
/ List-head
/ List-last
/ List-indexed
/ List-reverse
/ Text-show
/ Bool
/ True
/ False
/ Optional
/ None
/ Natural
/ Integer
/ Double
/ Text
/ List
/ Type
/ Kind
/ Sort
; Reserved identifiers, needed for some special cases of parsing
Optional = %x4f.70.74.69.6f.6e.61.6c
Text = %x54.65.78.74
List = %x4c.69.73.74
Location = %x4c.6f.63.61.74.69.6f.6e
; Reminder of the reserved identifiers, needed for the `builtin` rule
Bool = %x42.6f.6f.6c
True = %x54.72.75.65
False = %x46.61.6c.73.65
None = %x4e.6f.6e.65
Natural = %x4e.61.74.75.72.61.6c
Integer = %x49.6e.74.65.67.65.72
Double = %x44.6f.75.62.6c.65
Type = %x54.79.70.65
Kind = %x4b.69.6e.64
Sort = %x53.6f.72.74
Natural-fold = %x4e.61.74.75.72.61.6c.2f.66.6f.6c.64
Natural-build = %x4e.61.74.75.72.61.6c.2f.62.75.69.6c.64
Natural-isZero = %x4e.61.74.75.72.61.6c.2f.69.73.5a.65.72.6f
Natural-even = %x4e.61.74.75.72.61.6c.2f.65.76.65.6e
Natural-odd = %x4e.61.74.75.72.61.6c.2f.6f.64.64
Natural-toInteger = %x4e.61.74.75.72.61.6c.2f.74.6f.49.6e.74.65.67.65.72
Natural-show = %x4e.61.74.75.72.61.6c.2f.73.68.6f.77
Natural-subtract = %x4e.61.74.75.72.61.6c.2f.73.75.62.74.72.61.63.74
Integer-toDouble = %x49.6e.74.65.67.65.72.2f.74.6f.44.6f.75.62.6c.65
Integer-show = %x49.6e.74.65.67.65.72.2f.73.68.6f.77
Integer-negate = %x49.6e.74.65.67.65.72.2f.6e.65.67.61.74.65
Integer-clamp = %x49.6e.74.65.67.65.72.2f.63.6c.61.6d.70
Double-show = %x44.6f.75.62.6c.65.2f.73.68.6f.77
List-build = %x4c.69.73.74.2f.62.75.69.6c.64
List-fold = %x4c.69.73.74.2f.66.6f.6c.64
List-length = %x4c.69.73.74.2f.6c.65.6e.67.74.68
List-head = %x4c.69.73.74.2f.68.65.61.64
List-last = %x4c.69.73.74.2f.6c.61.73.74
List-indexed = %x4c.69.73.74.2f.69.6e.64.65.78.65.64
List-reverse = %x4c.69.73.74.2f.72.65.76.65.72.73.65
Text-show = %x54.65.78.74.2f.73.68.6f.77
; Operators
combine = %x2227 / "/\"
combine-types = %x2A53 / "//\\"
equivalent = %x2261 / "==="
prefer = %x2AFD / "//"
lambda = %x3BB / "\"
arrow = %x2192 / "->"
complete = "::"
exponent = "e" [ "+" / "-" ] 1*DIGIT
numeric-double-literal = [ "+" / "-" ] 1*DIGIT ( "." 1*DIGIT [ exponent ] / exponent)
minus-infinity-literal = "-" Infinity
plus-infinity-literal = Infinity
double-literal =
; "2.0"
numeric-double-literal
; "-Infinity"
/ minus-infinity-literal
; "Infinity"
/ plus-infinity-literal
; "NaN"
/ NaN
natural-literal =
; Hexadecimal with "0x" prefix
"0" %x78 1*HEXDIG
; Decimal; leading 0 digits are not allowed
/ ("1" / "2" / "3" / "4" / "5" / "6" / "7" / "8" / "9") *DIGIT
; ... except for 0 itself
/ "0"
integer-literal = ( "+" / "-" ) natural-literal
; If the identifier matches one of the names in the `builtin` rule, then it is a
; builtin, and should be treated as the corresponding item in the list of
; "Reserved identifiers for builtins" specified in the `standard/README.md` document.
; It is a syntax error to specify a de Bruijn index in this case.
; Otherwise, this is a variable with name and index matching the label and index.
identifier = variable / builtin
variable = nonreserved-label [ whsp "@" whsp natural-literal ]
; Printable characters other than " ()[]{}<>/\,"
;
; Excluding those characters ensures that paths don't have to end with trailing
; whitespace most of the time
path-character =
; %x20 = " "
%x21
; %x22 = "\""
; %x23 = "#"
/ %x24-27
; %x28 = "("
; %x29 = ")"
/ %x2A-2B
; %x2C = ","
/ %x2D-2E
; %x2F = "/"
/ %x30-3B
; %x3C = "<"
/ %x3D
; %x3E = ">"
; %x3F = "?"
/ %x40-5A
; %x5B = "["
; %x5C = "\"
; %x5D = "]"
/ %x5E-7A
; %x7B = "{"
/ %x7C
; %x7D = "}"
/ %x7E
quoted-path-character =
%x20-21
; %x22 = "\""
/ %x23-2E
; %x2F = "/"
/ %x30-7F
/ valid-non-ascii
unquoted-path-component = 1*path-character
quoted-path-component = 1*quoted-path-character
path-component = "/" ( unquoted-path-component / %x22 quoted-path-component %x22 )
; The last path-component matched by this rule is referred to as "file" in the semantics,
; and the other path-components as "directory".
path = 1*path-component
local =
parent-path
/ here-path
/ home-path
; NOTE: Backtrack if parsing this alternative fails
;
; This is because the first character of this alternative will be "/", but
; if the second character is "/" or "\" then this should have been parsed
; as an operator instead of a path
/ absolute-path
parent-path = ".." path ; Relative path
here-path = "." path ; Relative path
home-path = "~" path ; Home-anchored path
absolute-path = path ; Absolute path
; `http[s]` URI grammar based on RFC7230 and RFC 3986 with some differences
; noted below
scheme = %x68.74.74.70 [ %x73 ] ; "http" [ "s" ]
; NOTE: This does not match the official grammar for a URI. Specifically:
;
; * this does not support fragment identifiers, which have no meaning within
; Dhall expressions and do not affect import resolution
; * the characters "(" ")" and "," are not included in the `sub-delims` rule:
; in particular, these characters can't be used in authority, path or query
; strings. This is because those characters have other meaning in Dhall
; and it would be confusing for the comma in
; [http://example.com/foo, bar]
; to be part of the URL instead of part of the list. If you need a URL
; which contains parens or a comma, you must percent-encode them.
;
; Reserved characters in quoted path components should be percent-encoded
; according to https://tools.ietf.org/html/rfc3986#section-2
http-raw = scheme "://" authority path-abempty [ "?" query ]
path-abempty = *( "/" segment )
; NOTE: Backtrack if parsing the optional user info prefix fails
authority = [ userinfo "@" ] host [ ":" port ]
userinfo = *( unreserved / pct-encoded / sub-delims / ":" )
host = IP-literal / IPv4address / domain
port = *DIGIT
IP-literal = "[" ( IPv6address / IPvFuture ) "]"
IPvFuture = "v" 1*HEXDIG "." 1*( unreserved / sub-delims / ":" )
; NOTE: Backtrack when parsing each alternative
IPv6address = 6( h16 ":" ) ls32
/ "::" 5( h16 ":" ) ls32
/ [ h16 ] "::" 4( h16 ":" ) ls32
/ [ h16 *1( ":" h16 ) ] "::" 3( h16 ":" ) ls32
/ [ h16 *2( ":" h16 ) ] "::" 2( h16 ":" ) ls32
/ [ h16 *3( ":" h16 ) ] "::" h16 ":" ls32
/ [ h16 *4( ":" h16 ) ] "::" ls32
/ [ h16 *5( ":" h16 ) ] "::" h16
/ [ h16 *6( ":" h16 ) ] "::"
h16 = 1*4HEXDIG
ls32 = h16 ":" h16 / IPv4address
IPv4address = dec-octet "." dec-octet "." dec-octet "." dec-octet
; NOTE: Backtrack when parsing these alternatives
dec-octet = "25" %x30-35 ; 250-255
/ "2" %x30-34 DIGIT ; 200-249
/ "1" 2DIGIT ; 100-199
/ %x31-39 DIGIT ; 10-99
/ DIGIT ; 0-9
; Look in RFC3986 3.2.2 for
; "A registered name intended for lookup in the DNS"
domain = domainlabel *("." domainlabel) [ "." ]
domainlabel = 1*ALPHANUM *(1*"-" 1*ALPHANUM)
segment = *pchar
pchar = unreserved / pct-encoded / sub-delims / ":" / "@"
query = *( pchar / "/" / "?" )
pct-encoded = "%" HEXDIG HEXDIG
unreserved = ALPHANUM / "-" / "." / "_" / "~"
; this is the RFC3986 sub-delims rule, without "(", ")" or ","
; see comments above the `http-raw` rule above
sub-delims = "!" / "$" / "&" / "'" / "*" / "+" / ";" / "="
http = http-raw [ whsp using whsp1 import-expression ]
; Dhall supports unquoted environment variables that are Bash-compliant or
; quoted environment variables that are POSIX-compliant
env = "env:"
( bash-environment-variable
/ %x22 posix-environment-variable %x22
)
; Bash supports a restricted subset of POSIX environment variables. From the
; Bash `man` page, an environment variable name is:
;
; > A word consisting only of alphanumeric characters and under-scores, and
; > beginning with an alphabetic character or an under-score
bash-environment-variable = (ALPHA / "_") *(ALPHANUM / "_")
; The POSIX standard is significantly more flexible about legal environment
; variable names, which can contain alerts (i.e. '\a'), whitespace, or
; punctuation, for example. The POSIX standard says about environment variable
; names:
;
; > The value of an environment variable is a string of characters. For a
; > C-language program, an array of strings called the environment shall be made
; > available when a process begins. The array is pointed to by the external
; > variable environ, which is defined as:
; >
; > extern char **environ;
; >
; > These strings have the form name=value; names shall not contain the
; > character '='. For values to be portable across systems conforming to IEEE
; > Std 1003.1-2001, the value shall be composed of characters from the portable
; > character set (except NUL and as indicated below).
;
; Note that the standard does not explicitly state that the name must have at
; least one character, but `env` does not appear to support this and `env`
; claims to be POSIX-compliant. To be safe, Dhall requires at least one
; character like `env`
posix-environment-variable = 1*posix-environment-variable-character
; These are all the characters from the POSIX Portable Character Set except for
; '\0' (NUL) and '='. Note that the POSIX standard does not explicitly state
; that environment variable names cannot have NUL. However, this is implicit
; in the fact that environment variables are passed to the program as
; NUL-terminated `name=value` strings, which implies that the `name` portion of
; the string cannot have NUL characters
posix-environment-variable-character =
%x5C ; '\' Beginning of escape sequence
( %x22 ; '"' quotation mark U+0022
/ %x5C ; '\' reverse solidus U+005C
/ %x61 ; 'a' alert U+0007
/ %x62 ; 'b' backspace U+0008
/ %x66 ; 'f' form feed U+000C
/ %x6E ; 'n' line feed U+000A
/ %x72 ; 'r' carriage return U+000D
/ %x74 ; 't' tab U+0009
/ %x76 ; 'v' vertical tab U+000B
)
; Printable characters except double quote, backslash and equals
/ %x20-21
; %x22 = '"'
/ %x23-3C
; %x3D = '='
/ %x3E-5B
; %x5C = "\"
/ %x5D-7E
import-type = missing / local / http / env
hash = %x73.68.61.32.35.36.3a 64HEXDIG ; "sha256:XXX...XXX"
import-hashed = import-type [ whsp1 hash ]
; "http://example.com"
; "./foo/bar"
; "env:FOO"
import = import-hashed [ whsp as whsp1 (Text / Location) ]
expression =
; "\(x : a) -> b"
lambda whsp "(" whsp nonreserved-label whsp ":" whsp1 expression whsp ")" whsp arrow whsp expression
; "if a then b else c"
/ if whsp1 expression whsp then whsp1 expression whsp else whsp1 expression
; "let x : t = e1 in e2"
; "let x = e1 in e2"
; We allow dropping the `in` between adjacent let-expressions; the following are equivalent:
; "let x = e1 let y = e2 in e3"
; "let x = e1 in let y = e2 in e3"
/ 1*let-binding in whsp1 expression
; "forall (x : a) -> b"
/ forall whsp "(" whsp nonreserved-label whsp ":" whsp1 expression whsp ")" whsp arrow whsp expression
; "a -> b"
;
; NOTE: Backtrack if parsing this alternative fails
/ operator-expression whsp arrow whsp expression
; "a with x = b"
;
; NOTE: Backtrack if parsing this alternative fails
/ with-expression
; "merge e1 e2 : t"
;
; NOTE: Backtrack if parsing this alternative fails since we can't tell
; from the keyword whether there will be a type annotation or not
/ merge whsp1 import-expression whsp1 import-expression whsp ":" whsp1 application-expression
; "[] : t"
;
; NOTE: Backtrack if parsing this alternative fails since we can't tell
; from the opening bracket whether or not this will be an empty list or
; a non-empty list
/ empty-list-literal
; "toMap e : t"
;
; NOTE: Backtrack if parsing this alternative fails since we can't tell
; from the keyword whether there will be a type annotation or not
/ toMap whsp1 import-expression whsp ":" whsp1 application-expression
; "assert : Natural/even 1 === False"
/ assert whsp ":" whsp1 expression
; "x : t"
/ annotated-expression
; Nonempty-whitespace to disambiguate `env:VARIABLE` from type annotations
annotated-expression = operator-expression [ whsp ":" whsp1 expression ]
; "let x = e1"
let-binding = let whsp1 nonreserved-label whsp [ ":" whsp1 expression whsp ] "=" whsp expression whsp
; "[] : t"
empty-list-literal =
"[" whsp [ "," whsp ] "]" whsp ":" whsp1 application-expression
with-expression =
import-expression 1*(whsp1 with whsp1 with-clause)
with-clause =
any-label-or-some *(whsp "." whsp any-label-or-some) whsp "=" whsp operator-expression
operator-expression = equivalent-expression
; Nonempty-whitespace to disambiguate `http://a/a?a`
equivalent-expression = import-alt-expression *(whsp equivalent whsp import-alt-expression)
import-alt-expression = or-expression *(whsp "?" whsp1 or-expression)
or-expression = plus-expression *(whsp "||" whsp plus-expression)
; Nonempty-whitespace to disambiguate `f +2`
plus-expression = text-append-expression *(whsp "+" whsp1 text-append-expression)
text-append-expression = list-append-expression *(whsp "++" whsp list-append-expression)
list-append-expression = and-expression *(whsp "#" whsp and-expression)
and-expression = combine-expression *(whsp "&&" whsp combine-expression)
combine-expression = prefer-expression *(whsp combine whsp prefer-expression)
prefer-expression = combine-types-expression *(whsp prefer whsp combine-types-expression)
combine-types-expression = times-expression *(whsp combine-types whsp times-expression)
times-expression = equal-expression *(whsp "*" whsp equal-expression)
equal-expression = not-equal-expression *(whsp "==" whsp not-equal-expression)
not-equal-expression = application-expression *(whsp "!=" whsp application-expression)
; Import expressions need to be separated by some whitespace, otherwise there
; would be ambiguity: `./ab` could be interpreted as "import the file `./ab`",
; or "apply the import `./a` to label `b`"
application-expression =
first-application-expression *(whsp1 import-expression)
first-application-expression =
; "merge e1 e2"
merge whsp1 import-expression whsp1 import-expression
; "Some e"
/ Some whsp1 import-expression
; "toMap e"
/ toMap whsp1 import-expression
/ import-expression
import-expression = import / completion-expression
completion-expression =
selector-expression [ whsp complete whsp selector-expression ]
; `record.field` extracts one field of a record
;
; `record.{ field0, field1, field2 }` projects out several fields of a record
;
; NOTE: Backtrack when parsing the `*("." ...)`. The reason why is that you
; can't tell from parsing just the period whether "foo." will become "foo.bar"
; (i.e. accessing field `bar` of the record `foo`) or `foo./bar` (i.e. applying
; the function `foo` to the relative path `./bar`)
selector-expression = primitive-expression *(whsp "." whsp selector)
selector = any-label / labels / type-selector
labels =
"{" whsp [ "," whsp ] [ any-label-or-some whsp *("," whsp any-label-or-some whsp) [ "," whsp ] ] "}"
type-selector = "(" whsp expression whsp ")"
; NOTE: Backtrack when parsing the first three alternatives (i.e. the numeric
; literals). This is because they share leading characters in common
primitive-expression =
; "2.0"
double-literal
; "2"
/ natural-literal
; "+2"
/ integer-literal
; '"ABC"'
/ text-literal
; "{ foo = 1 , bar = True }"
; "{ foo : Integer, bar : Bool }"
/ "{" whsp [ "," whsp ] record-type-or-literal whsp "}"
; "< Foo : Integer | Bar : Bool >"
; "< Foo | Bar : Bool >"
/ "<" whsp [ "|" whsp ] union-type whsp ">"
; "[1, 2, 3]"
/ non-empty-list-literal
; "x"
; "x@2"
/ identifier
; "( e )"
/ "(" complete-expression ")"
record-type-or-literal =
empty-record-literal
/ [non-empty-record-type-or-literal]
empty-record-literal = "=" [ whsp "," ]
non-empty-record-type-or-literal =
(non-empty-record-type / non-empty-record-literal)
non-empty-record-type =
record-type-entry *(whsp "," whsp record-type-entry) [ whsp "," ]
record-type-entry = any-label-or-some whsp ":" whsp1 expression
non-empty-record-literal =
record-literal-entry *(whsp "," whsp record-literal-entry) [ whsp "," ]
; If the `record-literal-normal-entry` is absent, that represents a punned
; record entry, such as in `{ x }`, which is a short-hand for `{ x = x }`
record-literal-entry =
any-label-or-some [record-literal-normal-entry]
record-literal-normal-entry =
*(whsp "." whsp any-label-or-some) whsp "=" whsp expression
; If the `union-type-entry` is absent, that represents an empty union
; alternative, such as in `< Heads | Tails >`
union-type =
[union-type-entry *(whsp "|" whsp union-type-entry) [ whsp "|" ]]
; x : Natural
; x
union-type-entry = any-label-or-some [ whsp ":" whsp1 expression ]
non-empty-list-literal =
"[" whsp [ "," whsp ] expression whsp *("," whsp expression whsp) [ "," whsp ] "]"
; This just adds surrounding whitespace for the top-level of the program
complete-expression = whsp expression whsp
You can’t perform that action at this time.