Optimize AST per-node linter visits #28

mschwager · 2019-09-20T20:24:26Z

Instead of running each linter over the whole AST tree we should instead run each linter's visit functions as we recurse down the tree. This is possible because each linter maintains its own state, and we're not modifying the AST as we visit it. Therefore we can run all linters against every node and traverse the tree only once.

Inspiration from this comes from Instagram's static analysis post:

Our philosophy at Instagram is to “do the simple thing first”. Our first custom lint rules were implemented in a single file, with a single visitor, using shared state.
A single file, with a single visitor, using shared state.

The single visitor class had to be aware of the state and logic of all of our unrelated lint rules, and it wasn’t always clear what state corresponded with what rule. This works fine when you’ve got a few custom lint rules, but we have nearly a hundred custom lint rules, which made the single-visitor pattern unmaintainable.
It’s difficult to figure out what state and logic is associated with each check being performed.

Of course, one possible solution to this problem is to define multiple visitors and to have each visitor re-traverse the entire tree each time. However, this would incur a large performance penalty, and the linter must remain fast.
Each lint rule can re-traverse the tree, executing in sequence over the file. However, this frequent re-traversal would incur a large performance penalty.

Instead, we took inspiration from linters in other language ecosystems, like JavaScript’s ESLint, and developed a centralized visitor registry.
A centralized visitor registry. We can efficiently determine which nodes each lint rule cares about, saving time on nodes they don’t care about.

When a lint rule is initialized, all of the rule’s method overrides are stored in the registry. When we traverse the tree, we look up all of the registered visitors and call them. If a method is not implemented, we don’t need to call it.

I believe the best way to go about this will be to implement a NodeVisitor that takes in all the linters and iterates through them in the generic_visit call:

    def generic_visit(self, node):
        """Called if no explicit visitor function exists for a node."""
        for field, value in iter_fields(node):
            if isinstance(value, list):
                for item in value:
                    if isinstance(item, AST):
                        self.visit(item)
            elif isinstance(value, AST):
                self.visit(value)

Becomes something like:

    def generic_visit(self, node):
        """Called if no explicit visitor function exists for a node."""
        for linter in ALL_LINTERS:
            for field, value in iter_fields(node):
                if isinstance(value, list):
                    for item in value:
                        if isinstance(item, AST):
                            self.visit(item)
                elif isinstance(value, AST):
                    self.visit(value)

The text was updated successfully, but these errors were encountered:

mschwager · 2019-09-20T20:25:06Z

Also note we should do some performance testing before and after to ensure this is actually accomplishing what we want.

mschwager · 2019-09-20T20:27:48Z

Actually, now that I think about it, we'll probably want to implement this in the visit function, not the generic_visit function. Implementing in generic_visit will essentially be the same as what we already have, whereas in visit it will call each linter once for each node.

mschwager added a commit that referenced this issue Oct 10, 2019

Add benchmark testing capabilities, for #28

cdd15db

mschwager closed this as completed in 84832f9 Oct 10, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize AST per-node linter visits #28

Optimize AST per-node linter visits #28

mschwager commented Sep 20, 2019

mschwager commented Sep 20, 2019

mschwager commented Sep 20, 2019

Optimize AST per-node linter visits #28

Optimize AST per-node linter visits #28

Comments

mschwager commented Sep 20, 2019

mschwager commented Sep 20, 2019

mschwager commented Sep 20, 2019