Skip to content
This repository has been archived by the owner on Oct 6, 2020. It is now read-only.

Optimize AST per-node linter visits #28

Closed
mschwager opened this issue Sep 20, 2019 · 2 comments
Closed

Optimize AST per-node linter visits #28

mschwager opened this issue Sep 20, 2019 · 2 comments

Comments

@mschwager
Copy link
Collaborator

Instead of running each linter over the whole AST tree we should instead run each linter's visit functions as we recurse down the tree. This is possible because each linter maintains its own state, and we're not modifying the AST as we visit it. Therefore we can run all linters against every node and traverse the tree only once.

Inspiration from this comes from Instagram's static analysis post:

Our philosophy at Instagram is to “do the simple thing first”. Our first custom lint rules were implemented in a single file, with a single visitor, using shared state.
A single file, with a single visitor, using shared state.

The single visitor class had to be aware of the state and logic of all of our unrelated lint rules, and it wasn’t always clear what state corresponded with what rule. This works fine when you’ve got a few custom lint rules, but we have nearly a hundred custom lint rules, which made the single-visitor pattern unmaintainable.
It’s difficult to figure out what state and logic is associated with each check being performed.

Of course, one possible solution to this problem is to define multiple visitors and to have each visitor re-traverse the entire tree each time. However, this would incur a large performance penalty, and the linter must remain fast.
Each lint rule can re-traverse the tree, executing in sequence over the file. However, this frequent re-traversal would incur a large performance penalty.

Instead, we took inspiration from linters in other language ecosystems, like JavaScript’s ESLint, and developed a centralized visitor registry.
A centralized visitor registry. We can efficiently determine which nodes each lint rule cares about, saving time on nodes they don’t care about.

When a lint rule is initialized, all of the rule’s method overrides are stored in the registry. When we traverse the tree, we look up all of the registered visitors and call them. If a method is not implemented, we don’t need to call it.

I believe the best way to go about this will be to implement a NodeVisitor that takes in all the linters and iterates through them in the generic_visit call:

    def generic_visit(self, node):
        """Called if no explicit visitor function exists for a node."""
        for field, value in iter_fields(node):
            if isinstance(value, list):
                for item in value:
                    if isinstance(item, AST):
                        self.visit(item)
            elif isinstance(value, AST):
                self.visit(value)

Becomes something like:

    def generic_visit(self, node):
        """Called if no explicit visitor function exists for a node."""
        for linter in ALL_LINTERS:
            for field, value in iter_fields(node):
                if isinstance(value, list):
                    for item in value:
                        if isinstance(item, AST):
                            self.visit(item)
                elif isinstance(value, AST):
                    self.visit(value)
@mschwager
Copy link
Collaborator Author

Also note we should do some performance testing before and after to ensure this is actually accomplishing what we want.

@mschwager
Copy link
Collaborator Author

Actually, now that I think about it, we'll probably want to implement this in the visit function, not the generic_visit function. Implementing in generic_visit will essentially be the same as what we already have, whereas in visit it will call each linter once for each node.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant