Skip to content

Commit

Permalink
Implement (contextual) keywords and use their versioning from v2 (#723)
Browse files Browse the repository at this point in the history
Closes #568 

There is still one outstanding issue where we return a `Vec<TokenKind>`
from `next_token`; it'd like to return a more specialized type and
ideally pass it on stack (2x2 bytes), rather than on-heap (extra 3x8
bytes for the Vec handle + indirection). We should name it better and
properly show that we can return at most 2 token kinds (single token
kind or identifier + kw combo).

To do:
- [x] Return tokens from `next_token` via stack

Apart from that, I think this is a more correct approach than #598,
especially accounting for the new keyword definition format in DSL v2.

The main change is that we only check the keyword trie and additionally
the (newly introduced) compound keyword scanners only after the token
has been lexed as an identifier. For each context, we collect Identifier
scanners used by the keywords and attempt promotion there.

The existing lexing performance is not impacted from what I've seen when
running the sanctuary tests and I can verify (incl. CST tests) that we
now properly parse source that uses contextual keywords (e.g. `from`)
and that the compound keywords (e.g. `ufixedMxN`) are properly
versioned.

This adapts the existing `codegen_grammar` interface that's a leftover
from DSLv1; I did that to work on finishing #638; once this is merged
and we now properly parse contextual keywords, I'll move to clean it up
and reduce the parser codegen indirection (right now we go from v2 -> v1
model -> code generator -> Tera templates; it'd like to at least cut out
the v1 model and/or simplify visiting v2 from the existing
`CodeGenerator`).

Please excuse the WIP comments in the middle; the first and the last
ones should make sense when reviewing. I can simplify this a bit for
review, if needed.
  • Loading branch information
Xanewok committed Jan 8, 2024
1 parent 662a672 commit b3dc6bc
Show file tree
Hide file tree
Showing 93 changed files with 9,493 additions and 5,616 deletions.
5 changes: 5 additions & 0 deletions .changeset/dry-turtles-rhyme.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
---
"@nomicfoundation/slang": minor
---

Properly parse unreserved keywords in an identifier position, i.e. `from`, `emit`, `global` etc.
5 changes: 4 additions & 1 deletion crates/codegen/grammar/src/grammar.rs
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ use semver::Version;

use crate::parser_definition::{ParserDefinitionRef, TriviaParserDefinitionRef};
use crate::visitor::{GrammarVisitor, Visitable};
use crate::{PrecedenceParserDefinitionRef, ScannerDefinitionRef};
use crate::{KeywordScannerDefinitionRef, PrecedenceParserDefinitionRef, ScannerDefinitionRef};

pub struct Grammar {
pub name: String,
Expand Down Expand Up @@ -36,6 +36,7 @@ impl Grammar {
#[derive(Clone)]
pub enum GrammarElement {
ScannerDefinition(ScannerDefinitionRef),
KeywordScannerDefinition(KeywordScannerDefinitionRef),
TriviaParserDefinition(TriviaParserDefinitionRef),
ParserDefinition(ParserDefinitionRef),
PrecedenceParserDefinition(PrecedenceParserDefinitionRef),
Expand All @@ -45,6 +46,7 @@ impl GrammarElement {
pub fn name(&self) -> &'static str {
match self {
Self::ScannerDefinition(scanner) => scanner.name(),
Self::KeywordScannerDefinition(scanner) => scanner.name(),
Self::TriviaParserDefinition(trivia_parser) => trivia_parser.name(),
Self::ParserDefinition(parser) => parser.name(),
Self::PrecedenceParserDefinition(precedence_parser) => precedence_parser.name(),
Expand Down Expand Up @@ -80,6 +82,7 @@ impl Visitable for GrammarElement {
fn accept_visitor<V: GrammarVisitor>(&self, visitor: &mut V) {
match self {
Self::ScannerDefinition(scanner) => scanner.accept_visitor(visitor),
Self::KeywordScannerDefinition(scanner) => scanner.accept_visitor(visitor),
Self::TriviaParserDefinition(trivia_parser) => trivia_parser.accept_visitor(visitor),
Self::ParserDefinition(parser) => parser.accept_visitor(visitor),
Self::PrecedenceParserDefinition(precedence_parser) => {
Expand Down
7 changes: 6 additions & 1 deletion crates/codegen/grammar/src/parser_definition.rs
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,10 @@ use std::fmt::Debug;
use std::rc::Rc;

use crate::visitor::{GrammarVisitor, Visitable};
use crate::{PrecedenceParserDefinitionRef, ScannerDefinitionRef, VersionQualityRange};
use crate::{
KeywordScannerDefinitionRef, PrecedenceParserDefinitionRef, ScannerDefinitionRef,
VersionQualityRange,
};

/// A named wrapper, used to give a name to a [`ParserDefinitionNode`].
#[derive(Clone, Debug)]
Expand Down Expand Up @@ -59,6 +62,7 @@ pub enum ParserDefinitionNode {
Sequence(Vec<Named<Self>>),
Choice(Named<Vec<Self>>),
ScannerDefinition(ScannerDefinitionRef),
KeywordScannerDefinition(KeywordScannerDefinitionRef),
TriviaParserDefinition(TriviaParserDefinitionRef),
ParserDefinition(ParserDefinitionRef),
PrecedenceParserDefinition(PrecedenceParserDefinitionRef),
Expand Down Expand Up @@ -128,6 +132,7 @@ impl Visitable for ParserDefinitionNode {
}

Self::ScannerDefinition(_)
| Self::KeywordScannerDefinition(_)
| Self::TriviaParserDefinition(_)
| Self::ParserDefinition(_)
| Self::PrecedenceParserDefinition(_) => {}
Expand Down
95 changes: 95 additions & 0 deletions crates/codegen/grammar/src/scanner_definition.rs
Original file line number Diff line number Diff line change
Expand Up @@ -65,3 +65,98 @@ impl Visitable for ScannerDefinitionNode {
}
}
}

pub trait KeywordScannerDefinition: Debug {
fn name(&self) -> &'static str;
fn identifier_scanner(&self) -> &'static str;
fn definitions(&self) -> &[KeywordScannerDefinitionVersionedNode];
}

pub type KeywordScannerDefinitionRef = Rc<dyn KeywordScannerDefinition>;

impl Visitable for KeywordScannerDefinitionRef {
fn accept_visitor<V: GrammarVisitor>(&self, visitor: &mut V) {
visitor.keyword_scanner_definition_enter(self);
}
}

#[derive(Debug)]
pub struct KeywordScannerDefinitionVersionedNode {
// Underlying keyword scanner (i.e. identifier scanner)
pub value: KeywordScannerDefinitionNode,
/// When the keyword scanner is enabled
pub enabled: Vec<VersionQualityRange>,
/// When the keyword is reserved, i.e. can't be used in other position (e.g. as a name)
pub reserved: Vec<VersionQualityRange>,
}

#[derive(Clone, Debug)]
pub enum KeywordScannerDefinitionNode {
Optional(Box<Self>),
Sequence(Vec<Self>),
Choice(Vec<Self>),
Atom(String),
// No repeatable combinators, because keywords are assumed to be finite
}

impl From<KeywordScannerDefinitionNode> for ScannerDefinitionNode {
fn from(val: KeywordScannerDefinitionNode) -> Self {
match val {
KeywordScannerDefinitionNode::Optional(node) => {
ScannerDefinitionNode::Optional(Box::new((*node).into()))
}
KeywordScannerDefinitionNode::Sequence(nodes) => {
ScannerDefinitionNode::Sequence(nodes.into_iter().map(Into::into).collect())
}
KeywordScannerDefinitionNode::Atom(string) => ScannerDefinitionNode::Literal(string),
KeywordScannerDefinitionNode::Choice(nodes) => {
ScannerDefinitionNode::Choice(nodes.into_iter().map(Into::into).collect())
}
}
}
}

/// A [`KeywordScannerDefinitionRef`] that only has a single atom value.
///
/// The main usage for this type is to construct a keyword trie in parser generator, as trie will
/// only work with single atom values and keyword promotion needs to additionally account for
/// keyword reservation, rather than just literal presence.
#[derive(Clone)]
pub struct KeywordScannerAtomic(KeywordScannerDefinitionRef);

impl KeywordScannerAtomic {
/// Wraps the keyword scanner definition if it is a single atom value.
pub fn try_from_def(def: &KeywordScannerDefinitionRef) -> Option<Self> {
match def.definitions() {
[KeywordScannerDefinitionVersionedNode {
value: KeywordScannerDefinitionNode::Atom(_),
..
}] => Some(Self(def.clone())),
_ => None,
}
}
}

impl std::ops::Deref for KeywordScannerAtomic {
type Target = KeywordScannerDefinitionRef;

fn deref(&self) -> &Self::Target {
&self.0
}
}

impl KeywordScannerAtomic {
pub fn definition(&self) -> &KeywordScannerDefinitionVersionedNode {
let def = &self.0.definitions().get(0);
def.expect("KeywordScannerAtomic should have exactly one definition")
}
pub fn value(&self) -> &str {
match self.definition() {
KeywordScannerDefinitionVersionedNode {
value: KeywordScannerDefinitionNode::Atom(atom),
..
} => atom,
_ => unreachable!("KeywordScannerAtomic should have a single atom value"),
}
}
}
7 changes: 4 additions & 3 deletions crates/codegen/grammar/src/visitor.rs
Original file line number Diff line number Diff line change
@@ -1,14 +1,15 @@
use crate::{
Grammar, ParserDefinitionNode, ParserDefinitionRef, PrecedenceParserDefinitionNode,
PrecedenceParserDefinitionRef, ScannerDefinitionNode, ScannerDefinitionRef,
TriviaParserDefinitionRef,
Grammar, KeywordScannerDefinitionRef, ParserDefinitionNode, ParserDefinitionRef,
PrecedenceParserDefinitionNode, PrecedenceParserDefinitionRef, ScannerDefinitionNode,
ScannerDefinitionRef, TriviaParserDefinitionRef,
};

pub trait GrammarVisitor {
fn grammar_enter(&mut self, _grammar: &Grammar) {}
fn grammar_leave(&mut self, _grammar: &Grammar) {}

fn scanner_definition_enter(&mut self, _scanner: &ScannerDefinitionRef) {}
fn keyword_scanner_definition_enter(&mut self, _scanner: &KeywordScannerDefinitionRef) {}
fn trivia_parser_definition_enter(&mut self, _trivia_parser: &TriviaParserDefinitionRef) {}
fn parser_definition_enter(&mut self, _parser: &ParserDefinitionRef) {}
fn precedence_parser_definition_enter(&mut self, _parser: &PrecedenceParserDefinitionRef) {}
Expand Down
88 changes: 88 additions & 0 deletions crates/codegen/parser/generator/src/keyword_scanner_definition.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,88 @@
use codegen_grammar::{
KeywordScannerDefinitionNode, KeywordScannerDefinitionRef, ScannerDefinitionNode,
};
use proc_macro2::TokenStream;
use quote::{format_ident, quote};

use crate::parser_definition::VersionQualityRangeVecExtensions;
use crate::scanner_definition::ScannerDefinitionNodeExtensions;

pub trait KeywordScannerDefinitionExtensions {
fn to_scanner_code(&self) -> TokenStream;
}

impl KeywordScannerDefinitionExtensions for KeywordScannerDefinitionRef {
fn to_scanner_code(&self) -> TokenStream {
let name_ident = format_ident!("{}", self.name());
let token_kind = quote! { TokenKind::#name_ident };

let kw_scanners: Vec<_> = self
.definitions()
.iter()
.map(|versioned_kw| {
let scanner = versioned_kw.value.to_scanner_code();
let enabled_cond = versioned_kw.enabled.as_bool_expr();
let reserved_cond = versioned_kw.reserved.as_bool_expr();

// Simplify the emitted code if we trivially know that reserved or enabled is true
match (&*reserved_cond.to_string(), &*enabled_cond.to_string()) {
("true", _) => quote! {
if #scanner {
KeywordScan::Reserved(#token_kind)
} else {
KeywordScan::Absent
}
},
("false", _) => quote! {
if #enabled_cond && #scanner {
KeywordScan::Present(#token_kind)
} else {
KeywordScan::Absent
}
},
(_, "true") => quote! {
if #scanner {
if #reserved_cond {
KeywordScan::Reserved(#token_kind)
} else {
KeywordScan::Present(#token_kind)
}
} else {
KeywordScan::Absent
}
},
(_, "false") => quote! {
if #reserved_cond && #scanner {
KeywordScan::Reserved(#token_kind)
} else {
KeywordScan::Absent
}
},
_ => quote! {
if (#reserved_cond || #enabled_cond) && #scanner {
if #reserved_cond {
KeywordScan::Reserved(#token_kind)
} else {
KeywordScan::Present(#token_kind)
}
} else {
KeywordScan::Absent
}
},
}
})
.collect();

match &kw_scanners[..] {
[] => quote! { KeywordScan::Absent },
multiple => quote! { scan_keyword_choice!(input, ident, #(#multiple),*) },
}
}
}

impl KeywordScannerDefinitionExtensions for KeywordScannerDefinitionNode {
fn to_scanner_code(&self) -> TokenStream {
// This is a subset; let's reuse that
ScannerDefinitionNode::from(self.clone()).to_scanner_code()
}
}
1 change: 1 addition & 0 deletions crates/codegen/parser/generator/src/lib.rs
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
mod ast_model;
mod keyword_scanner_definition;
mod parser_definition;
mod precedence_parser_definition;
mod rust_generator;
Expand Down
43 changes: 40 additions & 3 deletions crates/codegen/parser/generator/src/parser_definition.rs
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ use codegen_grammar::{
use inflector::Inflector;
use proc_macro2::TokenStream;
use quote::{format_ident, quote};
use semver::Version;

pub trait ParserDefinitionExtensions {
fn to_parser_code(&self) -> TokenStream;
Expand Down Expand Up @@ -138,6 +139,21 @@ impl ParserDefinitionNodeExtensions for ParserDefinitionNode {
}
}

// Keyword scanner uses the promotion inside the parse_token
Self::KeywordScannerDefinition(scanner_definition) => {
let kind = format_ident!("{name}", name = scanner_definition.name());

let parse_token = if is_trivia {
format_ident!("parse_token")
} else {
format_ident!("parse_token_with_trivia")
};

quote! {
self.#parse_token::<#lex_ctx>(input, TokenKind::#kind)
}
}

Self::TriviaParserDefinition(trivia_parser_definition) => {
let function_name =
format_ident!("{}", trivia_parser_definition.name().to_snake_case());
Expand Down Expand Up @@ -299,13 +315,24 @@ impl ParserDefinitionNodeExtensions for ParserDefinitionNode {

pub trait VersionQualityRangeVecExtensions {
fn wrap_code(&self, if_true: TokenStream, if_false: Option<TokenStream>) -> TokenStream;
// Quotes a boolean expression that is satisfied for the given version quality ranges
fn as_bool_expr(&self) -> TokenStream;
}

impl VersionQualityRangeVecExtensions for Vec<VersionQualityRange> {
fn wrap_code(&self, if_true: TokenStream, if_false: Option<TokenStream>) -> TokenStream {
fn as_bool_expr(&self) -> TokenStream {
if self.is_empty() {
if_true
quote!(true)
} else {
// Optimize for legibility; return `false` for "never enabled"
match self.as_slice() {
[VersionQualityRange {
from,
quality: VersionQuality::Removed,
}] if from == &Version::new(0, 0, 0) => return quote!(false),
_ => {}
}

let flags = self.iter().map(|vqr| {
let flag = format_ident!(
"version_is_at_least_{v}",
Expand All @@ -317,8 +344,18 @@ impl VersionQualityRangeVecExtensions for Vec<VersionQualityRange> {
quote! { !self.#flag }
}
});
quote! { #(#flags)&&* }
}
}

fn wrap_code(&self, if_true: TokenStream, if_false: Option<TokenStream>) -> TokenStream {
if self.is_empty() {
if_true
} else {
let condition = self.as_bool_expr();

let else_part = if_false.map(|if_false| quote! { else { #if_false } });
quote! { if #(#flags)&&* { #if_true } #else_part }
quote! { if #condition { #if_true } #else_part }
}
}
}
Expand Down

0 comments on commit b3dc6bc

Please sign in to comment.