-
Notifications
You must be signed in to change notification settings - Fork 144
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Declarative macro expansion #573
Comments
Rustc is moving towards a model where |
I did consider implementing it like that, and I suppose it would be the best option if you really wanted to keep your token stream immutable, but to me it seemed like a total pain to handle multi-character tokens with that method. For example, if you have a right-shift-assign operator, that’s three tokens you now have to care about and deal with. So I think that the logic in the parser would be significantly complicated by that. Additionally, it would also make it much more complicated for the Pratt parser-based expression precedence system to work with a system such as that. |
Rust-analyzer uses |
This is the first pass at implementing macros more testcases are needed. This does not support repetition matchers but it supports simple declarative macros and transcribes them. The approach taken here is that we reuse our existing parser to call the apropriate functions as specified as part of the MacroFragmentType enum if the parser does not have errors parsing that item then it must be a match. Then once we match a rule we have a map of the token begin/end offsets for each fragment match, this is then used to adjust and create a new token stream for the macro rule definition so that when we feed it to the parser the tokens are already substituted. The resulting expression or item is then attached to the respective macro invocation and this is then name resolved and used for hir lowering. Fixes #17 #22 Addresses #573
This is the first pass at implementing macros more testcases are needed. This does not support repetition matchers but it supports simple declarative macros and transcribes them. The approach taken here is that we reuse our existing parser to call the apropriate functions as specified as part of the MacroFragmentType enum if the parser does not have errors parsing that item then it must be a match. Then once we match a rule we have a map of the token begin/end offsets for each fragment match, this is then used to adjust and create a new token stream for the macro rule definition so that when we feed it to the parser the tokens are already substituted. The resulting expression or item is then attached to the respective macro invocation and this is then name resolved and used for hir lowering. Fixes #17 #22 Addresses #573
This is the first pass at implementing macros more testcases are needed. This does not support repetition matchers but it supports simple declarative macros and transcribes them. The approach taken here is that we reuse our existing parser to call the apropriate functions as specified as part of the MacroFragmentType enum if the parser does not have errors parsing that item then it must be a match. Then once we match a rule we have a map of the token begin/end offsets for each fragment match, this is then used to adjust and create a new token stream for the macro rule definition so that when we feed it to the parser the tokens are already substituted. The resulting expression or item is then attached to the respective macro invocation and this is then name resolved and used for hir lowering. Fixes #17 #22 Addresses #573
This is the first pass at implementing macros more testcases are needed. This does not support repetition matchers but it supports simple declarative macros and transcribes them. The approach taken here is that we reuse our existing parser to call the apropriate functions as specified as part of the MacroFragmentType enum if the parser does not have errors parsing that item then it must be a match. Then once we match a rule we have a map of the token begin/end offsets for each fragment match, this is then used to adjust and create a new token stream for the macro rule definition so that when we feed it to the parser the tokens are already substituted. The resulting expression or item is then attached to the respective macro invocation and this is then name resolved and used for hir lowering. Fixes #17 #22 Addresses #573
This is the first pass at implementing macros more testcases are needed. This does not support repetition matchers but it supports simple declarative macros and transcribes them. The approach taken here is that we reuse our existing parser to call the apropriate functions as specified as part of the MacroFragmentType enum if the parser does not have errors parsing that item then it must be a match. Then once we match a rule we have a map of the token begin/end offsets for each fragment match, this is then used to adjust and create a new token stream for the macro rule definition so that when we feed it to the parser the tokens are already substituted. The resulting expression or item is then attached to the respective macro invocation and this is then name resolved and used for hir lowering. Fixes #17 #22 Addresses #573
938: First pass at declarative macro expansion r=philberty a=philberty This does not support repetition matchers but it supports simple declarative macros and transcribes them. The approach taken here is that we reuse our existing parser to call the apropriate functions as specified as part of the MacroFragmentType enum if the parser does not have errors parsing that item then it must be a match. Then once we match a rule we have a map of the token begin/end offsets for each fragment match, this is then used to adjust and create a new token stream for the macro rule definition so that when we feed it to the parser the tokens are already substituted. The resulting expression or item is then attached to the respective macro invocation and this is then name resolved and used for hir lowering. Fixes #17 #22 Addresses #573 Co-authored-by: Philip Herron <philip.herron@embecosm.com>
This is now being done as part of the macros milestone we used much of the content here to begin our approach |
This issue is designed to document the planned macro expansion system for declarative macros and to allow discussion regarding potential improvements to it.
All macro invocations are parsed in the Parser into either
MacroInvocation
orMacroInvocationSemi
objects, depending on context.MacroInvocationSemi
s can beItem
s, associate items, or statements - basically anywhere where a semicolon would be required (though curly bracket macros do not require one).MacroInvocation
s can be types, patterns or expressions - basically anywhere where a semicolon should never be used. This was done largely due to the Reference's separation into these two forms, and has its advantages and disadvantages.However, in order to simplify expansion, both
MacroInvocation
andMacroInvocationSemi
classes have aMacroInvocData
member variable, which stores the actual "data" of the macro (including name of macro and all tokens used inside). This was intended to allow the MacroExpander to operate by expanding instances of the data class rather than having to write two virtually identical methods for each expand method.The basic idea for macro expansion is for the
MacroExpander
to basically scan through the entire AST, storing any macro definitions found (MacroRulesDefinition
for declarative macros) and then attempting to expand any macro invocations found. The result of the expanded macro invocation (if it is possible to expand it currently) will then replace the macro invocation in the AST.To actually expand declarative macros, it is planned for the
expand_decl_macro
method inMacroExpander
be used. This takes the invocation data and the rules definition, and should return anASTFragment
. This sounds simple enough, but the actual implementation required is quite complicated and has multiple potential approaches, as detailed in #17. In short, Rust macro definitions can (and frequently do) have several potential "match patterns", and if one does not match, then the next should be tried. The basic consensus at the time was to follow this "backtracking" approach, rather than attempt a predictive one or use a generated parser.With the backtracking approach, the macro parser would call specific methods of a new
Parser
instance on the token stream stored in the macro - e.g.parse_expr
when an expression non-terminal was found as the next token in the match pattern. If there was a parse error, the macro parser would abort parsing that match pattern and attempt the next. If the macro parser successfully matched the entire "match pattern" with no parse errors, then the "replacement pattern" specified by that match-replacement pair would be used to generate the AST fragment.Since the regular
Parser
class is being reused by theMacroParser
, it requires some sort of (wrapped) stream of tokens to operate on. However, due to how Rust uses the concept of a "token tree" (either a non-delimiter token or a token tree within delimiters) in macros (and also since the Reference models it as such), I thought it was easier to store macro tokens in a token tree form. As such, there needs to be a conversion of this token tree form into a token stream before theParser
uses it. However, intuition tells me that there are likely issues with doing this that I haven't thought of yet.Another issue is that the
Parser
may, in rare cases, mutate the token stream. For example, it will split a>>
(right-shift) token into two>
(right angle bracket) tokens if it expects right angle brackets at the end of a generic type, for instance. However, this will theoretically apply for some different "matched patterns" but not others - i.e. it could be a generic close in one, but a right-shift in a different one. The best way I could think of to solve that issue (i.e. to avoid expensive copies of the entire token stream each time) was to record each time the parser modified the token stream inside the stream's wrapper, and have a "reset" function on the wrapper that reversed all changes. For example, a stack holding pairs of "token pointer" and "stream index" inside the wrapper could be used to implement this.The last paragraph was where I got up to when thinking about the implementation. I'm sure there's other issues or implementation details that I haven't thought of yet.
Notes:
macro_rules
, while procedural macros are defined differently.SingleASTNode
s, where aSingleASTNode
is a single AST node of any type - expression, statement, item, etc.SingleASTNode
should be a tagged union, but due to implementation difficulties it is currently a struct and so wastes memory. There are advantages and disadvantages to this representation as well.AST::Token
stores a pointer toLexer::Token
is to prevent the requirement for potentially expensive re-conversion of tokens between the two types, while still allowingAST::Token
to be represented heterogeneously as a kind ofTokenTree
. The re-conversions would be required if, for example, there was a macro invocation within the expression being matched.Parser
splits tokens like>>
into two>
s in certain contexts is because it seemed to me like the best solution to the problem at the time. The problem is that the lexer, which has no syntactic information available (only regular information), cannot disambiguate between a right-shift and two right angle brackets without a space between them. I thought that this would be the best option as it best reflects the "true" token stream as interpreted by the parser, and options seemed to require more special handling.Parser
is a template class is to allow the use of different "wrappers" for token streams, other than theLexer
class that it was originally built for. Similarly, the reason why the parser callsadd_error
instead ofrust_error_at
is to allow parse errors to be discarded by the macro parser when it tries a new match pattern -rust_error_at
cannot be recovered from.The text was updated successfully, but these errors were encountered: