Skip to content

Consider using ASTImporter to merge all translation units into a single ASTContext #551

@mattmccutchen-cci

Description

@mattmccutchen-cci

Clang has a feature called ASTImporter (main documentation, additional technical details) that can be used to merge multiple translation unit ASTs into the same ASTContext. As long as 3C remains committed to holding all the ASTs in memory for the duration of its execution (#488), we could consider using this. Here are some potential advantages and disadvantages I can think of (I haven't seriously researched or tested any of this, so I could be completely wrong):

  • Having a single ASTContext may simplify the implementation of some parts of 3C. For example, it may allow us to use a single Rewriter for all translation units, which would fix Merge rewrites to different #if branches of same header from different translation units #374.

  • The ASTImporter automatically checks for conflicting declarations, and its logic is probably more precise and comprehensive than anything we would implement in 3C (e.g., the current mergeDeclaration). However, we may find that we need custom logic in 3C. We might be able to run this logic after the AST merge, but if 3C needs to support something that ASTImporter doesn't allow, then we may not be able to use ASTImporter at all. (For example, Support codebases with different elements of the same name that are never linked together #341 would likely be completely incompatible with ASTImporter.)

  • Since Clang's AST data structure was originally designed to represent a single translation unit, for program elements that are only allowed to be defined once in each translation unit of the original program (e.g., structs), the merged AST may have no way to represent multiple definitions that came from different translation units, and ASTImporter may just pick an arbitrary one to keep. So if the definitions had different source locations, there may be no way for 3C to get all the source locations to rewrite all of the definitions. But if 3C handles that case poorly anyway (e.g., because definitions are matched by source location, as structs currently are: see structs are not global : good or bad (relates to preprocessor, in part) ? #499) and we only care about the case of a single definition at a single source location that is included by multiple translation units, then we're probably fine because the single occurrence in the merged AST will give us the common source location. However, we should note that committing to ASTImporter would make it hard to change 3C in the future to support multiple definitions of the program element with different source locations.

    If the merged AST does represent all the definitions, I imagine the SourceLocation reported by Clang will be different for each translation unit even if the actual source location is the same, so if we want to match the definitions by source location, we'd still need to use PersistentSourceLoc to do that.

I previously brought up this idea here.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions