Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Translate uAST to old AST #17919

Merged
merged 30 commits into from Jun 15, 2021
Merged

Translate uAST to old AST #17919

merged 30 commits into from Jun 15, 2021

Conversation

mppf
Copy link
Member

@mppf mppf commented Jun 14, 2021

This PR adjusts the --compiler-library-parser flag to run some new
code to translate from uAST to the AST used by the production compiler.
Not all uAST classes have been translated so far but we do have calls and
function declarations.

The translation itself is implemented in the new convert-uast.cpp file.
It takes the approach of having per-type functions e.g.
convertFunction, convertVariable, ... and having these return the old
AST that they produce. Then there is a convertAST that uses the
ASTClassesList macros to dispatch to the appropriate convert* function
in the event we do not already know the type we are converting.

In working on this PR I encountered several issues with uAST that the PR
addresses:

  • this PR adjusts the astlocT type from the production compiler to be
    able to store a uAST ID which can be used to compute the source
    location. That way, the uAST frontend need not save these locations
    unless they are used & when we get to incrementally constructing the
    old AST from the uAST we can have more reuse if the source locations
    change. In order to handle this, changed the astlocT type to use
    methods for accessing the line/filename instead of fields.
  • there were problems in ID assignment and the saving of locations for
    uAST nodes. In particular the previous strategy didn't handle nested
    modules or top-level modules correctly when assigning IDs to modules.
    It was assigning the empty ID to each top-level module.
  • adjusts the very incomplete uAST dumper to store IDs in a column on
    the left to make it easier to read
  • adjusted the uAST literal support to store the text of the literal
    instead of the base
  • added some code to the compiler library to handle adding string
    escapes since the uAST strings are stored unescaped but the production
    compiler stores escaped strings
  • removed some unused code from compiler/include that was causing
    problems when compiling the uAST headers in the production compiler

Future Work:

  • add tests or a testing mode for --compiler-library-parser
  • consider renaming the --compiler-library-parser flag to, say,
    --next-frontend

Reviewed by @dlongnecke-cray - thanks!

  • full local testing

mppf added 26 commits June 8, 2021 17:07
---
Signed-off-by: Michael Ferguson <mppf@users.noreply.github.com>
---
Signed-off-by: Michael Ferguson <mppf@users.noreply.github.com>
---
Signed-off-by: Michael Ferguson <mppf@users.noreply.github.com>
---
Signed-off-by: Michael Ferguson <mppf@users.noreply.github.com>
---
Signed-off-by: Michael Ferguson <mppf@users.noreply.github.com>
We could find the filename and line number for uAST elements when
constructing the (old) AST for them. However the uAST is designed to
separate the filename and line number information from the other contents
of the source code in order to allow more incremental reuse. So it seems
unfortunate to compute the line numbers for everything when converting to
old AST.

Instead, this commit adjusts astlocT to store a chpl::ID that can be used
to retrieve the filename and line number when they are needed. Retrieving
these will set the filename and line number fields in the class (in case
caching them is useful). This allows us to put off finding the line
number and file name until there is an actual error. That should allow
more incremental reuse (once we get to reusing the old AST in some form)
and reduce the amount of lookups in the common case of correct code.

However there are still many uses of the location (especially getting the
file name) in the compiler sources that are not connected to reporting
errors. Hopefully many of these will go away as we migrate more of the
front end of the compiler to the uAST.

---
Signed-off-by: Michael Ferguson <mppf@users.noreply.github.com>
---
Signed-off-by: Michael Ferguson <mppf@users.noreply.github.com>
---
Signed-off-by: Michael Ferguson <mppf@users.noreply.github.com>
---
Signed-off-by: Michael Ferguson <mppf@users.noreply.github.com>
---
Signed-off-by: Michael Ferguson <mppf@users.noreply.github.com>
This reverts commit 602d6ea.

---
Signed-off-by: Michael Ferguson <mppf@users.noreply.github.com>
---
Signed-off-by: Michael Ferguson <mppf@users.noreply.github.com>
In order for IDs to be useful for every uAST node, need to have a good ID
for a top-level module. Additionally it is intuitive for the symbolPath
for a new function/module to include that function/module's name. So
this commit adjusts to do that.

---
Signed-off-by: Michael Ferguson <mppf@users.noreply.github.com>
---
Signed-off-by: Michael Ferguson <mppf@users.noreply.github.com>
---
Signed-off-by: Michael Ferguson <mppf@users.noreply.github.com>
---
Signed-off-by: Michael Ferguson <mppf@users.noreply.github.com>
---
Signed-off-by: Michael Ferguson <mppf@users.noreply.github.com>
---
Signed-off-by: Michael Ferguson <mppf@users.noreply.github.com>
---
Signed-off-by: Michael Ferguson <mppf@users.noreply.github.com>
Some uAST nodes are created temporarily during build.  These were saved
in the locations vector and their Ids were tried to be read. We were
doing that because during parsing the IDs aren't established.

To solve the problem, create a different map used during build (from ast
pointers to locations) and from there create the vector of (ID, Location)
while assigning IDs.

---
Signed-off-by: Michael Ferguson <mppf@users.noreply.github.com>
---
Signed-off-by: Michael Ferguson <mppf@users.noreply.github.com>
Since IDs are attached to modules, it's not possible to assign
IDs to all comments anyway, without complicating the IDs significantly.

In particular this case is not really possible to handle while
giving each comment a unique ID:

  someFile.chpl:
    // comment 1
    module M { }
    // comment 2
    module N { }

The problem here is that because we're not storing the file paths in the
IDs (just module paths), the comments are outside of any module. We could
imagine giving them ids 0 and 1, but that'd need to be unique per file,
and then we'd have to have a sense of file paths in (some) IDs. The
argument at this moment is that it isn't worth it to do so but we might
revisit it later.

So, instead, a tool needing to work with comments (such as a
documentation tool) should create its own maps of Locations of Comments
after parsing based upon the vector in Builder::Result.

---
Signed-off-by: Michael Ferguson <mppf@users.noreply.github.com>
---
Signed-off-by: Michael Ferguson <mppf@users.noreply.github.com>
---
Signed-off-by: Michael Ferguson <mppf@users.noreply.github.com>
---
Signed-off-by: Michael Ferguson <mppf@users.noreply.github.com>
---
Signed-off-by: Michael Ferguson <mppf@users.noreply.github.com>
mppf added 2 commits June 14, 2021 18:08
---
Signed-off-by: Michael Ferguson <mppf@users.noreply.github.com>
See e.g. test/arrays/intents/out/out-domain-no-elt.chpl

---
Signed-off-by: Michael Ferguson <mppf@users.noreply.github.com>
Copy link
Contributor

@dlongnecke-cray dlongnecke-cray left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. I had some questions.

/**
Return the name of the module containing this ID.
*/
UniqueString moduleNameForID(ID id);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this not work with the new strategy? Wouldn't a nested module contain itself (e.g. M.Inner would return M.Inner)? Or are we just removing this because we don't use it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're removing it because we don't use it but actually because M.Inner could be a module M containing a function Inner or a module M containing a module Inner it's not that easy. But we could pull out the top-level module name from the ID if we wanted to; I just don't know why we would need to. Additionally I've never really liked that Context.h talked about Modules because it's supposed to be query framework support (meaning I'd rather factor out Chapel-specific stuff like Modules elsewhere, like in uAST classes, if that's possible).

// Convert C escape characters into two characters: '\\' and the other character
static void addCharEscapingC(std::string& s, char c) {
switch (c) {
case '\"' :
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kind of want to suggest using a set for this, or something. However if we don't expect to add any more...

compiler/next/test/frontend/testFrontendQueries.cpp Outdated Show resolved Hide resolved
AST_NODE(ExternBlock) // old AST: ExternBlockStmt
AST_NODE(Implements) // old AST: ImplementsStmt
AST_NODE(Import) // old AST: ImportStmt
//AST_NODE(ExternBlock) // old AST: ExternBlockStmt
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will have to uncomment a bunch of these if I merge #17914 first.

const Location& locate(Context* context, const ASTNode* ast) {
QUERY_BEGIN(locate, context, ast);
const Location& locateID(Context* context, ID id) {
QUERY_BEGIN(locateID, context, id);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to simplify QUERY_BEGIN if we used __FUNCTION__ instead of explicitly writing out locateID (or the function name, in general)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used to do that but we need to be able to re-run the query when a dependency needs it and as a result we have to be able to save the function pointer in the maps supporting the queries. __FUNCTION__ __func__ aren't sufficient for that part.

compiler/next/test/frontend/testFrontendQueries.cpp Outdated Show resolved Hide resolved
compiler/next/test/frontend/testFrontendQueries.cpp Outdated Show resolved Hide resolved
compiler/next/test/frontend/testFrontendQueries.cpp Outdated Show resolved Hide resolved
mppf added 2 commits June 15, 2021 09:33
---
Signed-off-by: Michael Ferguson <mppf@users.noreply.github.com>
---
Signed-off-by: Michael Ferguson <mppf@users.noreply.github.com>
@mppf mppf merged commit 4990275 into chapel-lang:master Jun 15, 2021
@mppf mppf deleted the translate-uast branch June 15, 2021 16:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants