New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Translate uAST to old AST #17919
Translate uAST to old AST #17919
Conversation
--- Signed-off-by: Michael Ferguson <mppf@users.noreply.github.com>
--- Signed-off-by: Michael Ferguson <mppf@users.noreply.github.com>
--- Signed-off-by: Michael Ferguson <mppf@users.noreply.github.com>
--- Signed-off-by: Michael Ferguson <mppf@users.noreply.github.com>
--- Signed-off-by: Michael Ferguson <mppf@users.noreply.github.com>
We could find the filename and line number for uAST elements when constructing the (old) AST for them. However the uAST is designed to separate the filename and line number information from the other contents of the source code in order to allow more incremental reuse. So it seems unfortunate to compute the line numbers for everything when converting to old AST. Instead, this commit adjusts astlocT to store a chpl::ID that can be used to retrieve the filename and line number when they are needed. Retrieving these will set the filename and line number fields in the class (in case caching them is useful). This allows us to put off finding the line number and file name until there is an actual error. That should allow more incremental reuse (once we get to reusing the old AST in some form) and reduce the amount of lookups in the common case of correct code. However there are still many uses of the location (especially getting the file name) in the compiler sources that are not connected to reporting errors. Hopefully many of these will go away as we migrate more of the front end of the compiler to the uAST. --- Signed-off-by: Michael Ferguson <mppf@users.noreply.github.com>
--- Signed-off-by: Michael Ferguson <mppf@users.noreply.github.com>
--- Signed-off-by: Michael Ferguson <mppf@users.noreply.github.com>
--- Signed-off-by: Michael Ferguson <mppf@users.noreply.github.com>
--- Signed-off-by: Michael Ferguson <mppf@users.noreply.github.com>
This reverts commit 602d6ea. --- Signed-off-by: Michael Ferguson <mppf@users.noreply.github.com>
--- Signed-off-by: Michael Ferguson <mppf@users.noreply.github.com>
In order for IDs to be useful for every uAST node, need to have a good ID for a top-level module. Additionally it is intuitive for the symbolPath for a new function/module to include that function/module's name. So this commit adjusts to do that. --- Signed-off-by: Michael Ferguson <mppf@users.noreply.github.com>
--- Signed-off-by: Michael Ferguson <mppf@users.noreply.github.com>
--- Signed-off-by: Michael Ferguson <mppf@users.noreply.github.com>
--- Signed-off-by: Michael Ferguson <mppf@users.noreply.github.com>
--- Signed-off-by: Michael Ferguson <mppf@users.noreply.github.com>
--- Signed-off-by: Michael Ferguson <mppf@users.noreply.github.com>
--- Signed-off-by: Michael Ferguson <mppf@users.noreply.github.com>
Some uAST nodes are created temporarily during build. These were saved in the locations vector and their Ids were tried to be read. We were doing that because during parsing the IDs aren't established. To solve the problem, create a different map used during build (from ast pointers to locations) and from there create the vector of (ID, Location) while assigning IDs. --- Signed-off-by: Michael Ferguson <mppf@users.noreply.github.com>
--- Signed-off-by: Michael Ferguson <mppf@users.noreply.github.com>
Since IDs are attached to modules, it's not possible to assign IDs to all comments anyway, without complicating the IDs significantly. In particular this case is not really possible to handle while giving each comment a unique ID: someFile.chpl: // comment 1 module M { } // comment 2 module N { } The problem here is that because we're not storing the file paths in the IDs (just module paths), the comments are outside of any module. We could imagine giving them ids 0 and 1, but that'd need to be unique per file, and then we'd have to have a sense of file paths in (some) IDs. The argument at this moment is that it isn't worth it to do so but we might revisit it later. So, instead, a tool needing to work with comments (such as a documentation tool) should create its own maps of Locations of Comments after parsing based upon the vector in Builder::Result. --- Signed-off-by: Michael Ferguson <mppf@users.noreply.github.com>
--- Signed-off-by: Michael Ferguson <mppf@users.noreply.github.com>
--- Signed-off-by: Michael Ferguson <mppf@users.noreply.github.com>
--- Signed-off-by: Michael Ferguson <mppf@users.noreply.github.com>
--- Signed-off-by: Michael Ferguson <mppf@users.noreply.github.com>
--- Signed-off-by: Michael Ferguson <mppf@users.noreply.github.com>
See e.g. test/arrays/intents/out/out-domain-no-elt.chpl --- Signed-off-by: Michael Ferguson <mppf@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. I had some questions.
/** | ||
Return the name of the module containing this ID. | ||
*/ | ||
UniqueString moduleNameForID(ID id); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this not work with the new strategy? Wouldn't a nested module contain itself (e.g. M.Inner
would return M.Inner
)? Or are we just removing this because we don't use it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're removing it because we don't use it but actually because M.Inner
could be a module M containing a function Inner
or a module M containing a module Inner
it's not that easy. But we could pull out the top-level module name from the ID if we wanted to; I just don't know why we would need to. Additionally I've never really liked that Context.h talked about Modules because it's supposed to be query framework support (meaning I'd rather factor out Chapel-specific stuff like Modules elsewhere, like in uAST
classes, if that's possible).
// Convert C escape characters into two characters: '\\' and the other character | ||
static void addCharEscapingC(std::string& s, char c) { | ||
switch (c) { | ||
case '\"' : |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I kind of want to suggest using a set
for this, or something. However if we don't expect to add any more...
AST_NODE(ExternBlock) // old AST: ExternBlockStmt | ||
AST_NODE(Implements) // old AST: ImplementsStmt | ||
AST_NODE(Import) // old AST: ImportStmt | ||
//AST_NODE(ExternBlock) // old AST: ExternBlockStmt |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will have to uncomment a bunch of these if I merge #17914 first.
const Location& locate(Context* context, const ASTNode* ast) { | ||
QUERY_BEGIN(locate, context, ast); | ||
const Location& locateID(Context* context, ID id) { | ||
QUERY_BEGIN(locateID, context, id); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be possible to simplify QUERY_BEGIN
if we used __FUNCTION__
instead of explicitly writing out locateID
(or the function name, in general)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I used to do that but we need to be able to re-run the query when a dependency needs it and as a result we have to be able to save the function pointer in the maps supporting the queries. __FUNCTION__
__func__
aren't sufficient for that part.
--- Signed-off-by: Michael Ferguson <mppf@users.noreply.github.com>
--- Signed-off-by: Michael Ferguson <mppf@users.noreply.github.com>
Add performance annotations Adds annotation for a small performance regression in the compilation. PR: #17919 Plots: https://chapel-lang.org/perf/chapcs/?startdate=2021/06/03&enddate=2021/06/17&graphs=fftcompilationtimebypass,samplercompilationtimebypass,cgsparsecompilationtimebypass,compilationtime,jacobiastsize,jacobiemittedcodesize
This PR adjusts the
--compiler-library-parser
flag to run some newcode to translate from uAST to the AST used by the production compiler.
Not all uAST classes have been translated so far but we do have calls and
function declarations.
The translation itself is implemented in the new convert-uast.cpp file.
It takes the approach of having per-type functions e.g.
convertFunction
,convertVariable
, ... and having these return the oldAST that they produce. Then there is a
convertAST
that uses theASTClassesList macros to dispatch to the appropriate
convert*
functionin the event we do not already know the type we are converting.
In working on this PR I encountered several issues with uAST that the PR
addresses:
able to store a uAST ID which can be used to compute the source
location. That way, the uAST frontend need not save these locations
unless they are used & when we get to incrementally constructing the
old AST from the uAST we can have more reuse if the source locations
change. In order to handle this, changed the astlocT type to use
methods for accessing the line/filename instead of fields.
uAST nodes. In particular the previous strategy didn't handle nested
modules or top-level modules correctly when assigning IDs to modules.
It was assigning the empty ID to each top-level module.
the left to make it easier to read
instead of the base
escapes since the uAST strings are stored unescaped but the production
compiler stores escaped strings
problems when compiling the uAST headers in the production compiler
Future Work:
--compiler-library-parser
--compiler-library-parser
flag to, say,--next-frontend
Reviewed by @dlongnecke-cray - thanks!