Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance regression: pointless-exception-statement #8073

Closed
cdce8p opened this issue Jan 17, 2023 · 29 comments · Fixed by #8078
Closed

Performance regression: pointless-exception-statement #8073

cdce8p opened this issue Jan 17, 2023 · 29 comments · Fixed by #8078
Labels
Blocker 🙅 Blocks the next release performance
Milestone

Comments

@cdce8p
Copy link
Member

cdce8p commented Jan 17, 2023

Thanks for the new check! I've just tested it with Home-Assistant and was able to find a handful of issues. Unfortunately, the current implementation has significant performance problems.

Comparing 5612134 with 89a20e2, a full pylint ci run can take between 6% and 43% more time. The spread is likely a result of different caches as -j2 is used for the run.

before: 7:59
after: 8:29 - 11:35 (multiple runs)

Originally posted by @cdce8p in #7939 (review)

@Pierre-Sassoulas Pierre-Sassoulas added the Blocker 🙅 Blocks the next release label Jan 17, 2023
@Pierre-Sassoulas Pierre-Sassoulas added this to the 2.16.0 milestone Jan 17, 2023
@cdce8p
Copy link
Member Author

cdce8p commented Jan 17, 2023

Cross posting possible solutions from #7939 (comment)

Inference can be expensive, especially if the lru cache has reached is maxsize and old entries are overwritten.

We should not call for safe_infer for every Call node. Instead I would recommend to use a heuristic and filter likely candidates first.

  • Almost all builtin error names contain either Exception, Error, or Warning. That likely applies to custom exceptions as well. https://docs.python.org/3/library/exceptions.html#exception-hierarchy
  • This could be further improved with a check against other predefined names. For the stdlib, those would probably be KeyboardInterrupt, StopAsyncIteration, StopIteration.
  • Lastly a new config option could be added to allow the user to specify additional names (maybe even regex).

The goal IMO should be to capture 95% of the likely errors without the huge performance impact.

@jayaddison
Copy link
Contributor

I'm not yet able to replicate the reported performance degradations locally, when running the comparative versions of the pylint codebase (with and without #7939) on the codebase for astroid:

The benchmark process I'm using is:

pylint.git $ python3 -m venv .venv
pylint.git $ source .venv/bin/activate
pylint.git $ pip install -r requirements_test_min.txt
pylint.git $ export PYTHONPATH=.

pylint.git $ git checkout 89a20e2ccdb2927ac370f7c6663fc83318988e10
pylint.git $ rm -rf ~/.cache/pylint/

pylint.git $ time find .venv/lib/python3.*/site-packages/astroid/ -name '*.py' -exec python3 -m pylint -j2 --output /dev/null {} +

pylint.git $ git checkout 56121344f6c9c223d8e4477c0cdb58c46c840b4b
pylint.git $ rm -rf ~/.cache/pylint/

pylint.git $ time find .venv/lib/python3.*/site-packages/astroid/ -name '*.py' -exec python3 -m pylint -j2 --output /dev/null {} +

@cdce8p
Copy link
Member Author

cdce8p commented Jan 18, 2023

I'm not yet able to replicate the reported performance degradations locally, when running the comparative versions of the pylint codebase (with and without #7939) on the codebase for astroid

The codebases are likely too small to see any impact. We do cache the results of safe_infer but only the last 1024. I believe someone reported a memory issue some time ago which is why we limited the cache size.

If more calls are inferred and the cache is overwritten, inference needs to happen all over again. Combined with the non-deterministic behavior of -j2 (files are not linted in a consistent order), it would explain the large time increases.

You could try to reduce the cache size here and see if the impact becomes noticeable
https://github.com/PyCQA/pylint/blob/ee55ec8c17f525c0a35bae710a4b27ab591d688f/pylint/checkers/utils.py#L1350-L1357

@jayaddison
Copy link
Contributor

Maybe off-topic, but, while I'm experimenting with that: does increasing the cache size improve performance when pylint runs on the Home Assistant codebase?

@jayaddison
Copy link
Contributor

Ok, I do see some performance degradation for #7939 when compared against the previous commit on a larger codebase; I'll make some adjustments to the original pull request to see whether they improve the situation.

pylint.git $ git checkout 89a20e2ccdb2927ac370f7c6663fc83318988e10
HEAD is now at 89a20e2cc Add new check "pointless-exception-statement" (#7939)

pylint.git $ time find home-assistant/core.git/ -name '*.py' -exec python3 -m pylint -j2 --output /dev/null {} +

real    12m33.726s
user    23m25.588s
sys     0m47.887s

pylint.git $ rm -rf ~/.cache/pylint/
pylint.git $ git checkout 56121344f6c9c223d8e4477c0cdb58c46c840b4b
HEAD is now at 56121344f [pre-commit.ci] pre-commit autoupdate (#8016)

pylint.git $ time find home-assistant/core.git/ -name '*.py' -exec python3 -m pylint -j2 --output /dev/null {} +

real    11m59.741s
user    22m58.198s
sys     0m46.425s

@jayaddison
Copy link
Contributor

Maybe off-topic, but, while I'm experimenting with that: does increasing the cache size improve performance when pylint runs on the Home Assistant codebase?

It doesn't appear to, from an initial inspection: with an lru_cache of maxsize=8192 on safe_infer...

$ rm -rf ~/.cache/pylint/
$ time find home-assistant/core.git/ -name '*.py' -exec python3 -m pylint -j2 --output /dev/null {} + 
real	12m2.359s
user	23m1.790s
sys	0m46.339s

@jayaddison
Copy link
Contributor

A naive implementation of suffix-based name filtering before inference is applied doesn't appear to help much, if at all:

-            inferred = utils.safe_infer(expr)
+            inferred = None
+            if any(
+                expr.func.name.endswith(suffix)
+                for suffix in ("Error", "Exception", "Warning")
+            ):
+                inferred = utils.safe_infer(expr)
$ time find home-assistant/core.git/ -name '*.py' -exec python3 -m pylint -j2 --output /dev/null {} + 
real	12m0.502s
user	22m57.999s
sys	0m46.789s

@jayaddison
Copy link
Contributor

@cdce8p I'm unsure how to proceed with this - I was attempting to replicate and verify that performance is better without #7939 and that performance is degraded after it was merged, but I haven't managed to achieve that.

I believe that there may be a performance regression but I'm not certain about the conditions required for it to appear.

What can we do to find it?

@jayaddison
Copy link
Contributor

What can we do to find it?

I'm planning to run some more testing with the LRU cache size reduced, as suggested, to see if that helps track down the cause.

@cdce8p
Copy link
Member Author

cdce8p commented Jan 18, 2023

Maybe off-topic, but, while I'm experimenting with that: does increasing the cache size improve performance when pylint runs on the Home Assistant codebase?

It doesn't appear to, from an initial inspection: with an lru_cache of maxsize=8192 on safe_infer...

Saw the same. Setting maxsize=None actually resulted in a performance hit. 8:00 -> 9:38min
Would need to investigate this, but might be astroid caching then, not pylint.

--

A naive implementation of suffix-based name filtering before inference is applied doesn't appear to help much, if at all:

-            inferred = utils.safe_infer(expr)
+            inferred = None
+            if any(
+                expr.func.name.endswith(suffix)
+                for suffix in ("Error", "Exception", "Warning")
+            ):
+                inferred = utils.safe_infer(expr)

Took me a moment to verify but a naive implementation does fix the CI issue. Time is back down to 8:00.
However, I needed to adjust your snipped a bit as it crashed immediately for me. Turns out Attribute nodes don't have name but use attrname instead.

The change I tested: 46c8a97

It probably needs to be something like this

        if isinstance(expr, nodes.Call):
            inferred = None
            name = ""
            if isinstance(expr.func, nodes.Name):
                name = expr.func.name
            elif isinstance(expr.func, nodes.Attribute):
                name = expr.func.attrname
            if any(
                name.endswith(suffix)
                for suffix in ("Error", "Exception", "Warning")
            ):
                inferred = utils.safe_infer(expr)

@jayaddison
Copy link
Contributor

Saw the same. Setting maxsize=None actually resulted in a performance hit. 8:00 -> 9:38min

Took me a moment to verify but a naive implementation does fix the CI issue. Time is back down to 8:00.

Interesting, and good to hear, respectively - thanks for checking both of those!

I'm going to spend some time experimenting with your fix - feel free to open that as a pull request, if you like?

Also, but low priority: if you have time for any further testing: could you check whether #8075 also solves the problem? (it's a different approach -- and I don't think it's great -- but it could be useful to know whether it's valid)

@jayaddison
Copy link
Contributor

To maybe explain why I'm being so slow and dragging my feet on this change: I'm feeling very reluctant to combine any string-pattern-matching with inference, because it seems to me like much of the value of type inference is that we don't need to rely on any special patterns or strings. I'm trying to figure out how to reconcile that (if it's possible) and/or whether it's OK to simply accept a more straightforward solution (while anticipating that the decision might not be revisited for a long time, and that future potential confusion could result).

@jayaddison
Copy link
Contributor

Thinking aloud: could we omit the use of inference entirely for pointless-exception-statement, and rely on string matching only?

(yep, I'm still being slow. this was an early idea I had, but I dismissed it because I liked the precision of inference. now I'm navigating back towards it, because I think inference-matching-on-patterned-strings might be wasteful and not provide much value over simpler, faster matching-on-patterned-strings)

@jayaddison
Copy link
Contributor

could we omit the use of inference entirely for pointless-exception-statement, and rely on string matching only?

Yes, we can. Pull request on the way soon.

@jayaddison
Copy link
Contributor

could we omit the use of inference entirely for pointless-exception-statement, and rely on string matching only?

Yes, we can.

...but that may introduces some false positives, like these - so I'm navigating back towards a hybrid approach that uses string-pattern-matching first. If an expression looks like it may be an exception, then, astroid inference is applied for a more-precise check.

The current approach does not rely on any explicit Name or Attribute node type isinstance checks.

I'm still thinking about support for configurable match patterns and would welcome more opinions on that.

@Pierre-Sassoulas
Copy link
Member

Thank you for taking the time to analyse this issue in depth, much appreciated ! Could you give an example of what you mean by a configurable match pattern @jayaddison ?

@jayaddison
Copy link
Contributor

Thanks @Pierre-Sassoulas - I think I created quite a lot of noise, but hopefully some value, too :)

Configurable match patterns relate to this suggestion by @cdce8p:

Lastly a new config option could be added to allow the user to specify additional names (maybe even regex).

I'm not certain whether to implement this or not - I don't think it would be difficult to implement, so it's more a question of whether it's a good idea or not.

@Pierre-Sassoulas
Copy link
Member

I think if we add a config option it would be to use inference to its full capability. Frankly speed is not pylint's selling point so I think our focus is to avoid false positive, then to avoid false negative then to make the performance reasonable. Unless the performance are really really bad and make pylint unusable. Here clever filtering without using inference , then using inference should be enough, what do you think @cdce8p ?

@cdce8p
Copy link
Member Author

cdce8p commented Jan 18, 2023

Every new check we add has some kind of performance impact. The check for me is if it's unreasonable for the benefit it provides. That's largely subjective though. Aside from that, I agree that it's more important to avoid false positives then to catch all issues (false negatives).

To sum up, it's basically what you have implemented at the moment.

  • Use a (cheap) string filter to look for possible candidates.
  • Confirm them using inference.

About the config option, I don't know how common it is to name custom exceptions something different but you never know what people come up with. That's why I suggested it in the first place, especially since it fairly easy to implement. It can help find more issues that would get unnoticed otherwise.

@Pierre-Sassoulas
Copy link
Member

Pierre-Sassoulas commented Jan 18, 2023

The check for me is if it's unreasonable for the benefit it provides.

I think this check is very valuable as it's catching clearly wrong code. I've had a warning at work, on a ValueError where the code was silently not failing, it's been a long time since a check we added in later version found something that wasn't a nitpick 😄

About the config option, I don't know how common it is to name custom exceptions something different but you never know what people come up with.

Yeah clearly we're going to have false negatives on custom Exception not following the standard naming scheme if we do not implement an option and use a regex. But asking users to do configuration first is already a big pain point of pylint (while inference is the only thing that makes it stands out). If we're not using inference to its full potential because it's too costly performance wise what comparative advantage do we have compared to ruff or flake8 ? The performance are already not great anyway so it's not like the expectation of users is that pylint is going to be instantaneous.

@cdce8p
Copy link
Member Author

cdce8p commented Jan 18, 2023

The check for me is if it's unreasonable for the benefit it provides.

I think this check is very valuable as it's catching clearly wrong code.

I do agree it's valuable, however not worth the 10-40% (even only for large code bases) if 95% of all errors could be detected with only 1-2% additional time.

About the config option, I don't know how common it is to name custom exceptions something different but you never know what people come up with.

Yeah clearly we're going to have false negatives on custom Exception not following the standard naming scheme if we do not implement an option and use a regex. But asking users to do configuration first is already a big pain point of pylint (while inference is the only thing that makes it stands out).

Maybe I should have been more precise. The default config should be good enough and e.g. detect all issues with stdlib exceptions. The option would only be there for users who want to add / check more names if the default isn't good enough / doesn't work for them.

If we're not using inference to its full potential because it's too costly performance wise what comparative advantage do we have compared to ruff or flake8 ?

I don't believe flake8 would ever add a check like this. Without inference they couldn't be sure that a name actually refers to an exception. We can do that! Even if we don't catch every case.

The performance are already not great anyway so it's not like the expectation of users is that pylint is going to be instantaneous.

Please don't underestimate the impact of good performance. True pylint is already slow, but that isn't a good thing. IMO it's already too slow for pre-commit as an example. I usually stop the commit after the linters and add -n to skip pylint. Would love if that wouldn't be the case.
Although this is probably an unrealistic goal. Even mypy is fairly slow.

@jayaddison
Copy link
Contributor

however not worth the 10-40% (even only for large code bases)

While I do think that #7939 introduced a performance penalty, I haven't been able to replicate anywhere near the 40% degradation for the Home Assistant core.git repository. I've been clearing the filesystem cache (rm -rf ~/.cache/pylint/) before each run, and I wonder if that has something to do with it.

@Pierre-Sassoulas
Copy link
Member

I agree performance is important but pylint's differentiating factor is exhaustiveness not speed, if we sacrifice exhaustiveness for performance, pylint will be mediocre in every possible aspects.

You convinced me that in this case handling all the builtin exception and all the exception following the standard pattern for exception by default and having an option to handle the exception not following the standard pattern is reasonable.

@jayaddison
Copy link
Contributor

I'm optimistic that we can agree on a fix for this, whether it's #8073 or #8084, or another approach.

That said: I do feel fairly strongly that it's better to provide thorough coverage for smaller projects, and allow performance optimizations for larger/more sophisticated cases, instead of reducing coverage for smaller codebases.

If that opinion becomes too contentious here, then I'm fine with #7939 (the original implementation) being reverted to unblock a 2.16.0 release.

@nickdrozd
Copy link
Collaborator

I think Pylint could solve the halting problem and people would still complain that it was too slow. So I am in favor of the approach that flags most of the common cases as quickly as possible.

@cdce8p
Copy link
Member Author

cdce8p commented Jan 19, 2023

I'll write here to try and keep the discussion together.

--
@jayaddison You asked here #8084 (comment) why I preferred an incomplete check.

Tbh. I had assumed it's good enough. While thinking about a good answer, I just looked at the common exceptions from requests.exceptions and things fell apart from there. If only 2/3 of all exception names actually contain one of Error, Exception, or Warning, my idea isn't really practical even though the stdlib might be covered.

Relying on inference seems to be the best we've got. It's just unfortunate that every function call needs to be inferred for that.

Your solution to provide a flag is probably the better approach then. I'm just wondering if there is an alternative heuristic we can use 🤔

Assuming users (for large projects) at least use PascalCase, we might be able to use the regex for that as a fallback. It would at least filter out the functions.

[^\W\da-z][^\W_]+$

@jayaddison
Copy link
Contributor

Thanks, @cdce8p - good idea about PascalCase, that seems like an agreeable compromise -- catching all of the builtins, plus a larger set of real-world examples, while reducing the scope of configuration required.

I'll take a look at your updated comments (and some of the yet-to-be-resolved threads) in #8078 and will implement that suggestion there soon.

@jayaddison
Copy link
Contributor

Sorry, I'm having some doubts again while looking into implementing those changes. I think it would be better to revert #7939 until we are confident of a more precise cause for the regression (I have an idea that it could relate to non-assignment expressions that involve attribute access (foo.bar()) -- that's an as-yet unproven theory, though).

@cdce8p
Copy link
Member Author

cdce8p commented Jan 19, 2023

Sorry, I'm having some doubts again while looking into implementing those changes.

I've been testing Home Assistant with this change: 73372d4 and (with heuristic-exception-detection = True). The performance regression seems to have been mitigated.

A bit hard to tell for sure though with Github Actions. It's a bit flaky at the moment. Github had a performance issue a few hours ago might be just that. 3 out of 4 runs confirm it though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment