Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cache the scope name prefix to prevent scope traversal in a tight loop #708

Merged
merged 3 commits into from Jun 21, 2022

Conversation

lpetre
Copy link
Contributor

@lpetre lpetre commented Jun 20, 2022

Summary

We don't need to recompute this name prefix every time we try to lookup a qualified name for a node. This value is immutable when the scope is constructed.

Context

There is a certain type of file that causes the scope provider to take an extraordinary amount of time.

These files tend to fit a specific pattern:

  • very long and machine generated
  • access and assignment to the same variable repeatedly in the same scope

Here is an example: https://github.com/FreeOpcUa/python-opcua/blob/0.98.11/opcua/server/standard_address_space/standard_address_space_part9.py

I've been running these w/ yappi to figure out where the scope provider might be improved.

Test Plan

Previously this file would take ~70 seconds on my laptop (Apple M1 Pro, 32GB RAM). Now it takes ~30 seconds.

Here is the top hits from yappi (sorted by tsub) before and after:

Before

Clock type: WALL
Ordered by: tsub, desc

name                                                                                                  ncall                 tsub      ttot      tavg      
scope_provider.py:219 Assignment.get_qualified_names_for                                              15449687              88.09841  300.4678  0.000019
abc.py:137 __instancecheck__                                                                          153150594             66.41100  120.6396  0.000001
scope_provider.py:518 FunctionScope.get_qualified_names_for                                           210892                23.33361  391.2841  0.001855
<string>:1 QualifiedName.__hash__                                                                     20731299              11.22755  38.06354  0.000002
enum.py:610 QualifiedNameSource.__hash__                                                              20731299              10.38797  14.70056  0.000001
scope_provider.py:168 <setcomp>                                                                       2234                  9.668366  17.53884  0.007851
scope_provider.py:555 <setcomp>                                                                       24085                 8.870387  121.0159  0.005025

After


Clock type: WALL
Ordered by: tsub, desc

name                                                                                                  ncall                 tsub      ttot      tavg      
scope_provider.py:219 Assignment.get_qualified_names_for                                              15449687              26.64612  68.59105  0.000004
scope_provider.py:508 FunctionScope.get_qualified_names_for                                           210892                21.89870  155.3687  0.000737
abc.py:137 __instancecheck__                                                                          29553100              12.04560  24.11363  0.000001
<string>:1 QualifiedName.__hash__                                                                     20731299              10.47574  36.66161  0.000002
enum.py:610 QualifiedNameSource.__hash__                                                              20731299              9.950157  14.42594  0.000001
scope_provider.py:168 <setcomp>                                                                       2234                  9.336415  17.21738  0.007707

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jun 20, 2022
@lpetre lpetre requested a review from zsol June 20, 2022 12:03
@codecov-commenter
Copy link

codecov-commenter commented Jun 20, 2022

Codecov Report

Merging #708 (8860e95) into main (ea8d3d5) will increase coverage by 0.00%.
The diff coverage is 100.00%.

@@           Coverage Diff           @@
##             main     #708   +/-   ##
=======================================
  Coverage   94.80%   94.80%           
=======================================
  Files         247      247           
  Lines       25701    25711   +10     
=======================================
+ Hits        24365    24375   +10     
  Misses       1336     1336           
Impacted Files Coverage Δ
libcst/metadata/scope_provider.py 95.64% <100.00%> (+0.12%) ⬆️
libcst/codemod/commands/convert_type_comments.py 94.98% <0.00%> (-0.19%) ⬇️
...demod/commands/tests/test_convert_type_comments.py 97.43% <0.00%> (+0.13%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ea8d3d5...8860e95. Read the comment docs.

@@ -667,6 +657,7 @@ def __init__(
self.name = name
self.node = node
self._scope_overwrites = {}
self._name_prefix = self._make_name_prefix()
Copy link
Contributor Author

@lpetre lpetre Jun 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pyre is unhappy with this line. Any suggestions @zsol

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like a bug to me, might be worth raising with Pyre folks internally

Copy link
Member

@zsol zsol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why join(filter(None, ...)) all the time? I don't think any of those list items can be None. Seems easier to just use f-strings everywhere.

But this is only really important in get_qualified_names_for

return {QualifiedName(".".join(parts), QualifiedNameSource.LOCAL)}
return {
QualifiedName(
".".join(filter(None, [self.scope._name_prefix, full_name])),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Neither of these can be None, and this is a pretty hot function. This should be much faster

Suggested change
".".join(filter(None, [self.scope._name_prefix, full_name])),
f"{self.scope._name_prefix}.{full_name}",

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The filter(None,...) also removes empty strings, which is what I need here. Happy to consider other ways to do it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

f"{a}.{b}" if a else b is about 4x faster still

@@ -667,6 +657,7 @@ def __init__(
self.name = name
self.node = node
self._scope_overwrites = {}
self._name_prefix = self._make_name_prefix()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like a bug to me, might be worth raising with Pyre folks internally

@zsol
Copy link
Member

zsol commented Jun 20, 2022

Maybe worth putting a functools.lrucache on Scope.get_qualified_names_for, too. But we'd have to call self.get_qualified_names_for.cache_clear() on mutating operations in Scope. Wanna try it?

@lpetre lpetre force-pushed the fix_assignment_name_prefix branch from e81678c to 8860e95 Compare June 21, 2022 08:11
@lpetre lpetre merged commit 42164f8 into Instagram:main Jun 21, 2022
@lpetre lpetre deleted the fix_assignment_name_prefix branch June 21, 2022 09:11
netbsd-srcmastr pushed a commit to NetBSD/pkgsrc that referenced this pull request Sep 16, 2022
0.4.7 - 2022-07-12

Fixed
* Fix get_qualified_names_for matching on prefixes of the given name by @lpetre in Instagram/LibCST#719

Added
* Implement lazy loading mechanism for expensive metadata providers by @Chenguang-Zhu in Instagram/LibCST#720


0.4.6 - 2022-07-04

New Contributors
- @superbobry made their first contribution in Instagram/LibCST#702

Fixed
- convert_type_comments now preserves comments following type comments by @superbobry in Instagram/LibCST#702
- QualifiedNameProvider optimizations
  - Cache the scope name prefix to prevent scope traversal in a tight loop by @lpetre in Instagram/LibCST#708
  - Faster qualified name formatting by @lpetre in Instagram/LibCST#710
  - Prevent unnecessary work in Scope.get_qualified_names_for_ by @lpetre in Instagram/LibCST#709
- Fix parsing of parenthesized empty tuples by @zsol in Instagram/LibCST#712
- Support whitespace after ParamSlash by @zsol in Instagram/LibCST#713
- [parser] bail on deeply nested expressions by @zsol in Instagram/LibCST#718


0.4.5 - 2022-06-17

New Contributors

-   @zzl0 made their first contribution in Instagram/LibCST#704

Fixed

-   Only skip supported escaped characters in f-strings by @zsol in Instagram/LibCST#700
-   Escaping quote characters in raw string literals causes a tokenizer error by @zsol in Instagram/LibCST#668
-   Corrected a code example in the documentation by @zzl0 in Instagram/LibCST#703
-   Handle multiline strings that start with quotes by @zzl0 in Instagram/LibCST#704
-   Fixed a performance regression in libcst.metadata.ScopeProvider by @lpetre in Instagram/LibCST#698


0.4.4 - 2022-06-13

New Contributors

-   @adamchainz made their first contribution in Instagram/LibCST#688

Added

-   Add package links to PyPI by @adamchainz in Instagram/LibCST#688
-   native: add overall benchmark by @zsol in Instagram/LibCST#692
-   Add support for PEP-646 by @zsol in Instagram/LibCST#696

Updated

-   parser: use references instead of smart pointers for Tokens by @zsol in Instagram/LibCST#691
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants