Skip to content

Commit

Permalink
Add raw string literal ambiguity document
Browse files Browse the repository at this point in the history
  • Loading branch information
emberian committed Jul 21, 2014
1 parent 19e1f5c commit 1a1a9d5
Showing 1 changed file with 29 additions and 0 deletions.
29 changes: 29 additions & 0 deletions src/grammar/raw-string-literal-ambiguity.md
@@ -0,0 +1,29 @@
Rust's lexical grammar is not context-free. Raw string literals are the source
of the problem. Informally, a raw string literal is an `r`, followed by `N`
hashes (where N can be zero), a quote, any characters, then a quote followed
by `N` hashes. This grammar describes this as best possible:

R -> 'r' S
S -> '"' B '"'
S -> '#' S '#'
B -> . B
B -> ε

Where `.` represents any character, and `ε` the empty string. Consider the
string `r#""#"#`. This string is not a valid raw string literal, but can be
accepted as one by the above grammar, using the derivation:

R : #""#"#
S : ""#"
S : "#
B : #
B : ε

(Where `T : U` means the rule `T` is applied, and `U` is the remainder of the
string.) The difficulty arises from the fact that it is fundamentally
context-sensitive. In particular, the context needed is the number of hashes.
I know of no way to resolve this, but also have not come up with a proof that
it is not context sensitive. Such a proof would probably use the pumping lemma
for context-free languages, but I (cmr) could not come up with a proof after
spending a few hours on it, and decided my time best spent elsewhere. Pull
request welcome!

0 comments on commit 1a1a9d5

Please sign in to comment.