Macro expansion #17

NalaGinrut · 2020-05-05T00:43:55Z

The macro expansion happens before the resolution.

This issue is opened for the record of pipeline in kanban.

NalaGinrut · 2020-05-16T15:03:59Z

Here're related RFCs:

1584-macros This is the macros 2.0
2361-dbg-macro
0378-expr-macros It seems the nightly haven't implemented it completely, But I'll open an issue for it. Macro invocation with curly-braces as a statement #22

NalaGinrut · 2020-05-16T18:27:06Z

The current parser will return the MacroItem as unique_ptr, so that we lose the chance to store the macro definition into a table for expansion step. It's too slow if we query its definition by traversing the AST since the AST is unnecessarily balanced.

If we just store its unique_ptr into the table and don't put MacroItem on AST tree, then it can be a solution. But we will lose the chance to dump the macro definition in AST dumper. Because the expansion will only substitute for the matched case.

So I think we may have to maintain 2 copies of each macro definition. This is not the best way, but could be optimized in the future refactoring.

Any better idea?
@philberty @SimplyTheOther

SimplyTheOther · 2020-05-17T07:05:29Z

Wouldn’t you be able to store a reference to the MacroItem (or rather a reference to a unique_ptr containing the MacroItem) in the table?

Alternatively, you could parse the macro definition into a different form during the expansion phase and put the results of that into the table, and maybe even remove the macro definition item from the constructed AST after that stage if it isn’t required anymore.

Or just move the MacroItem unique_ptr into the table while in the expansion phase, so that the MacroItem is still in the AST during the AST dump phase and so remains in the dump.

NalaGinrut · 2020-05-17T08:12:52Z

Wouldn’t you be able to store a reference to the MacroItem (or rather a reference to a unique_ptr containing the MacroItem) in the table?

Yes, I can, but sounds not safe to store the unwrapped reference into the table, since it's not unique anymore.

Alternatively, you could parse the macro definition into a different form during the expansion phase and put the results of that into the table, and maybe even remove the macro definition item from the constructed AST after that stage if it isn’t required anymore.

I think you're suggesting a pass to eliminate dead macros, yes I can do that.
But I don't think it's related to my question, my question is about how to query it quickly. The elimination has to rely on it.

Or just move the MacroItem unique_ptr into the table while in the expansion phase, so that the MacroItem is still in the AST during the AST dump phase and so remains in the dump.

But I have to search for the MacroItem in expansion by traversing AST for the first time.
Hmm...maybe you mean a pass to do eliminate-macro-node in AST and build lookup table in a row. Let me think about it...

Let me elaborate on the situation for other readers here: the parser interfaces are all unified to return unique_ptr, so I have no chance to move it into a lookup table before return it. But I can do it in expansion since the reference is unique.

SimplyTheOther · 2020-05-18T14:04:10Z

Yeah, I think I misunderstood what you were trying to say. I assumed you originally meant that you'd do a "macro definition finding" pass over the AST that located all definitions and stored a parsed form in a table.
Personally I think that doing this pass would be the "cleanest" option if it isn't too slow or have other problems.

Yes, I can, but sounds not safe to store the unwrapped reference into the table, since it's not unique anymore.
As long as the lifetime of the table is shorter than the lifetime of the AST node (which it should be, unless the macro nodes are deleted during some other process), the reference should be fine. However, I can see how this could become an issue in the future and why you'd be wary about it.

I think you're suggesting a pass to eliminate dead macros, yes I can do that.
But I don't think it's related to my question, my question is about how to query it quickly. The elimination has to rely on it.

But I have to search for the MacroItem in expansion by traversing AST for the first time.
Hmm...maybe you mean a pass to do eliminate-macro-node in AST and build lookup table in a row. Let me think about it...

Not even mainly centred around "eliminating" macros rather than just "detecting" them, but yes, that's what I was suggesting. It would be the same kind of thing as a pass for name resolution - the AST is recursively traversed and any macro definitions found can be parsed to a usable form and stored in a table (or possibly some kind of reverse-linked list of tables to preserve scope or something).

Let me elaborate on the situation for other readers here: the parser interfaces are all unified to return unique_ptr, so I have no chance to move it into a lookup table before return it. But I can do it in expansion since the reference is unique.

We could take the scorched-earth approach and make everything shared_ptr if we can't find any other satisfactory options, though I'm a bit worried about the performance implications of that.

NalaGinrut · 2020-05-18T17:53:34Z

Not even mainly centred around "eliminating" macros rather than just "detecting" them, but yes, that's what I was suggesting. It would be the same kind of thing as a pass for name resolution - the AST is recursively traversed and any macro definitions found can be parsed to a usable form and stored in a table (or possibly some kind of reverse-linked list of tables to preserve scope or something).

I think we may have to do it in the name resolution, since the expansion and resolution may be interleaved if there are macros in macros. Fortunately, Rust doesn't support forward declaration, so the lookup table can be built without any backward tracing.

We could take the scorched-earth approach and make everything shared_ptr if we can't find any other satisfactory options, though I'm a bit worried about the performance implications of that.

Definitely, I just want to make the minimum change. So far, I think unique_ptr is not a bad choice, we don't have to change if the macro is the only concern.

SimplyTheOther · 2020-08-14T14:21:12Z

@NalaGinrut Are you currently working on this? If not, I was intending to have a try at creating some of the macro expansion code.

NalaGinrut · 2020-08-15T10:10:56Z

Currently, no. I'm very busy on my startup issues. Feel free to try what you like. I'll be back ASAP.

SimplyTheOther · 2021-01-03T12:00:07Z

One issue I have discovered while attempting to start introducing macro expansion is how the parsing of macro_rules declarative macros should be implemented.

Essentially, macro_rules declarative macros work by declaring several different patterns that the invocation tokens can match, which are tried in order until one pattern "matches" the invocation. This is kind of a bad description, so it might help to look at the description of it in the spec or on Daniel Keep's website (which I personally believe gives a particularly good explanation).

Anyway, the issue I encountered was in implementing this "matching" feature. I have thought of a few approaches:

Do what is stated as conceptually occurring, i.e. matching the invocation and definition tokens one at a time until they differ (or a non-terminal in the definition, e.g. an expression capture, fails to parse). This could be considered a "backtracking" approach.
The advantages of this approach are its conceptual simplicity and probably more straightforward implementation.
The disadvantage is inefficiency - any parsed captures will be disposed of after each rule is tried upon failure, even if captures would be in the "same" spot, requiring lots of re-parsing, which could be very inefficient and slow if there are a large number of rules and small number of captures.
mrustc seems to have originally implemented a system similar to this, and decided to change it, according to their docs, due to efficiency considerations regarding tokens. I believe that the token efficiency consideration would not apply to gccrs due to a different lexer model (which uses shared_ptr, meaning copies should not have to occur).
Take an approach similar to rustc, which is a "NFA-based regex parser" "similar to an Earley parser". A basic description of it can be found on the rustc dev guide.
The advantage of this approach is that it seems to work well, judging by rustc's use of it and apparent lack of plans to replace it. From my preliminary research on Earley parsers (i.e. reading the Wikipedia article), it appears that they prevent the "combinatorial explosion" caused by backtracking with the first possible solution, which would increase efficiency in the worst case - it has cubic time complexity.
The disadvantages are unknown to me at this point - I will have to do a lot more research to understand the concepts behind it and their implications. Also, rustc's use of it is apparently different to a pure Earley parser.
I believe mrustc also may have historically used a system similar to this - the "decision tree" approach mentioned in their docs seems to be a way of attempting to use a similar concept. On the other hand, I have a very poor understanding of what an Earley parser actually is, so I may be misunderstanding here, since the "decision tree" seems much weaker than an Earley parser judging by its failure to work correctly.
Take an approach similar to mrustc, which seems to use a simplified second parser for lookahead to determine whether a rule would work (i.e. only a cheaper parsing has to occur before a matching rule is found) and then uses the regular parser to actually parse the matching rule. The system is somewhat described here.
The advantage of this approach is that the use of the second parser reduces resource usage compared to using a full parser.
The disadvantage is that two separate parsers need to be maintained and kept "in sync", which is probably really annoying if they are not programmed in a similar "style".
Attempt to do some kind of lookahead (maybe just for non-terminal tokens) in a different way. Advantages and disadvantages obviously depend on the exact method of implementation.

Does anyone else have any opinions or thoughts on the possible options we could take here? I'm a bit unsure on how to proceed, but I'm leaning towards the first option (at least initially).

philberty · 2021-01-03T15:06:16Z

My gut feeling would be approach 1 or 3. I doubt many people have much experience with Earley parsers. Approach 3 seems reasonable and might be simpler to implement compared to approach 1, but i don't have any real opinion either way.

NalaGinrut · 2021-01-04T08:15:25Z

Before I make my comment, I have to mention that we're still in brainstorm, so I'm not pushing any of my suggestions.

The MBE in rustc is a NFA-based regex parser inspired by the spirit of Earley parser. This sounds like they designed a dedicated algorithm for their case, not using Earley parser directly. IMO, this is a good way for a project, but it may require much time.

Can we make our design more flexible that we can replace the algorithm in the future? If so, we may use backtracking way in the beginning. This may save time for us to find the best solution in the beginning.
@philberty @SimplyTheOther

SimplyTheOther · 2021-01-05T10:28:15Z

My gut feeling would be approach 1 or 3. I doubt many people have much experience with Earley parsers. Approach 3 seems reasonable and might be simpler to implement compared to approach 1, but i don't have any real opinion either way.

Can we make our design more flexible that we can replace the algorithm in the future? If so, we may use backtracking way in the beginning. This may save time for us to find the best solution in the beginning.
@philberty @SimplyTheOther

I was also leaning towards implementing a "backtracking"-style algorithm (i.e. option 1).

As for replacing the algorithm, I'm sure that the macro parser can be implemented in such a way that whatever calls the parse_single_macro function (or whatever it will be called) does not have to worry about the inner operations of the macro parsing algorithm. I'm not sure if this is what you mean by "replace the algorithm", or if you meant something more flexible than this.

NalaGinrut · 2021-01-11T00:25:43Z

@SimplyTheOther I think your opinion is to hygienically abstract it to parse_single_macro function, This is flexible enough to implement my idea "replace the algorithm". Let me emphasize hygiene here, in case it introduces global side-effects.

This is the first pass at implementing macros more testcases are needed. This does not support repetition matchers but it supports simple declarative macros and transcribes them. The approach taken here is that we reuse our existing parser to call the apropriate functions as specified as part of the MacroFragmentType enum if the parser does not have errors parsing that item then it must be a match. Then once we match a rule we have a map of the token begin/end offsets for each fragment match, this is then used to adjust and create a new token stream for the macro rule definition so that when we feed it to the parser the tokens are already substituted. The resulting expression or item is then attached to the respective macro invocation and this is then name resolved and used for hir lowering. Fixes #17 #22 Addresses #573

938: First pass at declarative macro expansion r=philberty a=philberty This does not support repetition matchers but it supports simple declarative macros and transcribes them. The approach taken here is that we reuse our existing parser to call the apropriate functions as specified as part of the MacroFragmentType enum if the parser does not have errors parsing that item then it must be a match. Then once we match a rule we have a map of the token begin/end offsets for each fragment match, this is then used to adjust and create a new token stream for the macro rule definition so that when we feed it to the parser the tokens are already substituted. The resulting expression or item is then attached to the respective macro invocation and this is then name resolved and used for hir lowering. Fixes #17 #22 Addresses #573 Co-authored-by: Philip Herron <philip.herron@embecosm.com>

NalaGinrut mentioned this issue May 16, 2020

Name resolution #16

Closed

SimplyTheOther mentioned this issue Dec 14, 2020

Expansion code for cfg attributes, bug fixes and minor enhancements in parser #62

Merged

SimplyTheOther mentioned this issue Jul 17, 2021

Declarative macro expansion #573

Closed

philberty added the enhancement label Jan 4, 2022

philberty added this to the Macro Expansion milestone Jan 4, 2022

philberty mentioned this issue Feb 16, 2022

First pass at declarative macro expansion #938

Merged

bors bot closed this as completed in 2c03f34 Feb 17, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Macro expansion #17

Macro expansion #17

NalaGinrut commented May 5, 2020

NalaGinrut commented May 16, 2020 •

edited

Loading

NalaGinrut commented May 16, 2020 •

edited

Loading

SimplyTheOther commented May 17, 2020 •

edited

Loading

NalaGinrut commented May 17, 2020 •

edited

Loading

SimplyTheOther commented May 18, 2020

NalaGinrut commented May 18, 2020

SimplyTheOther commented Aug 14, 2020

NalaGinrut commented Aug 15, 2020 •

edited

Loading

SimplyTheOther commented Jan 3, 2021

philberty commented Jan 3, 2021 •

edited

Loading

NalaGinrut commented Jan 4, 2021 •

edited

Loading

SimplyTheOther commented Jan 5, 2021

NalaGinrut commented Jan 11, 2021

Macro expansion #17

Macro expansion #17

Comments

NalaGinrut commented May 5, 2020

NalaGinrut commented May 16, 2020 • edited Loading

NalaGinrut commented May 16, 2020 • edited Loading

SimplyTheOther commented May 17, 2020 • edited Loading

NalaGinrut commented May 17, 2020 • edited Loading

SimplyTheOther commented May 18, 2020

NalaGinrut commented May 18, 2020

SimplyTheOther commented Aug 14, 2020

NalaGinrut commented Aug 15, 2020 • edited Loading

SimplyTheOther commented Jan 3, 2021

philberty commented Jan 3, 2021 • edited Loading

NalaGinrut commented Jan 4, 2021 • edited Loading

SimplyTheOther commented Jan 5, 2021

NalaGinrut commented Jan 11, 2021

NalaGinrut commented May 16, 2020 •

edited

Loading

NalaGinrut commented May 16, 2020 •

edited

Loading

SimplyTheOther commented May 17, 2020 •

edited

Loading

NalaGinrut commented May 17, 2020 •

edited

Loading

NalaGinrut commented Aug 15, 2020 •

edited

Loading

philberty commented Jan 3, 2021 •

edited

Loading

NalaGinrut commented Jan 4, 2021 •

edited

Loading