perf(ngcc): only compute basePaths in TargetedEntryPointFinder when n… #36881

petebacondarwin · 2020-05-01T09:15:25Z

…eeded

Previously the basePaths were computed when the finder was instantiated.
This was a waste of effort in the case that the targeted entry-point is already
processed.

This change makes the computation of basePaths lazy, so that the work is
only done if they are actually needed.

Fixes #36874

JoostK · 2020-05-01T10:28:19Z

If I'm not mistaken, wouldn't the TargetedEntryPointFinder.targetNeedsProcessingOrCleaning call to determine if early bailout is possible still need to compute the base paths, through its call to getEntryPoint -> computePackagePath which then uses basePaths.

petebacondarwin · 2020-05-01T11:17:43Z

Agh!

petebacondarwin · 2020-05-01T11:23:38Z

Hmm. So we have two options, I think:

Consider the sourceDirectory as a basePath only initially when calling targetNeedsProcessingOrCleaning and only trigger getBasePaths if the package cannot be found without it.
This would speed up most runs, since most entry-points are in the node_modules folder (i.e. the sourceDirectory) and only impact on pre-processed entry-points that are to be found via the paths mappings anyway.
Somehow cache the computed basePaths (perhaps in the entry-point manifest file. So that it is only computed once.

I think I lean towards 2). WDYT?

gkalpak

✨

JoostK · 2020-05-01T12:10:37Z

A quick look into targetNeedsProcessingOrCleaning reveals that it's mostly interested in the package.json file, which is trivial to load without base paths. Unfortunately, the compiledByAngular flag is more tricky, as is may depend on configuration which may be loaded from a package configuration, thus requiring proper package path resolution to know where to look for the package's config file. Would that rule out option 1?

Another option would be to revisit the algorithmic complexity of getBasePaths so that it would be fast even when there's over 100 paths, assuming that that is why it's so slow?

Missed Joost's valid comment :D

gkalpak · 2020-05-01T12:15:25Z

FWIW, I would towards option 1 (esp. given that TargetEntryPointFinder#basePaths is only used in one place). This will always be faster than optimizing getBasePaths() 😁

If we wanted to get fancy, we could change TargetEntryPointFinder#basePaths to a custom structure that has an iterator which return the base path first and then lazily computes the rest of the base paths based on pathMappings if needed 🤓

petebacondarwin · 2020-05-01T14:23:21Z

I've gone with option 1 for now. Let's see how this pans out for @n9niwas's issue.

n9niwas · 2020-05-01T15:46:38Z

@petebacondarwin I applied this change and now it takes 50ms (vs 500ms previously) on average to process each module, so much better, thanks!

but I noticed that some modules still take 500ms
some of our tsconfig paths are also published through ng-packagr for internal use in other repositories
they don't exist in node_modules folder and shouldn't be processed by ngcc, but because they have package.json this logic still sends them to ngcc

not sure it's in the scope of this PR, but it would be nice to ignore those too

petebacondarwin · 2020-05-01T16:00:08Z

If you have packages imported via paths mappings then we still need to check them via ngcc, and unfortunately we have to compute the package folder of these by first computing the base paths. We could possibly get further performance improvements by caching these base paths once they are computed.

gkalpak

Missed opportunity to go fancy with iterators and generators, but works for me 😄

JoostK

What a lovely algorithm you have there 😄

I did some tests when there's lots of path mappings and the algorithmic improvement to the filtering of contained paths makes a significant difference.

Tests results for 400 invocations of getBasePaths when there's 300 path mappings:

Before

After 🎉

packages/compiler-cli/ngcc/src/entry_point_finder/utils.ts

…eeded Previously the `basePaths` were computed when the finder was instantiated. This was a waste of effort in the case that the targeted entry-point is already processed. This change makes the computation of `basePaths` lazy, so that the work is only done if they are actually needed. Fixes angular#36874

… when needed

petebacondarwin · 2020-05-02T16:01:50Z

I also did some benchmarking... From randomized paths I found that the average time "per original path" was reduced to about 20% from the original algorithm.

JoostK · 2020-05-02T17:07:46Z

I am seeing some interesting perf characteristics after the latest fixup: flattenTree is a lot slower when using records, presumably because Object.values is not nearly as fast as iterating a Map's values. The perf of addPaths is also slightly better for Map. Other than that, the original, recursive variant of addPath is surprisingly faster (~35%) for unknown reason.

Lastly, always having path: undefined makes all reads monomorphic, which is also a little faster.

I would definitely switch back to using Map, other than that LGTM.

petebacondarwin · 2020-05-02T19:36:12Z

Are you saying the recursive version is faster? Should we switch back to that version? What are you using to benchmark?

petebacondarwin · 2020-05-03T11:12:05Z

When I benchmarked again today running the following code 100x on each algorithm:

    function benchmark(fn: typeof dedupePaths) {
      const paths = pathStrings.map(absoluteFrom);
      let deduped = 0;
      const now = milliseconds(process.hrtime());
      for (let i = 0; i < 200; i++) {
        deduped = dedupePaths(paths).length;
      }
      const time = milliseconds(process.hrtime()) - now;
      console.log([
        fn.name, os, `${paths.length}`, `${deduped}`, time, time / paths.length, time / deduped
      ].join('::'));
    }

I got

Algorithm	ms per path	ms per deduped path
Map	2.415962675	2.461352285
Record	2.31117757	2.355814758

JoostK · 2020-05-03T11:55:24Z

Interesting :-)

Here's my results:

Map

dedupePaths :: OS/X :: 300 :: 200 :: 96.34445099532604 :: 0.32114816998442014 :: 0.48172225497663024
dedupePaths :: Windows :: 300 :: 200 :: 94.02359199523926 :: 0.3134119733174642 :: 0.47011795997619626
dedupePaths :: Unix :: 300 :: 200 :: 91.67663399875164 :: 0.3055887799958388 :: 0.4583831699937582
dedupePaths :: Native :: 300 :: 200 :: 90.23406199365854 :: 0.30078020664552846 :: 0.4511703099682927

Record

dedupePaths :: OS/X :: 300 :: 200 :: 150.2178319990635 :: 0.500726106663545 :: 0.7510891599953174
dedupePaths :: Windows :: 300 :: 200 :: 132.13881799578667 :: 0.4404627266526222 :: 0.6606940899789333
dedupePaths :: Unix :: 300 :: 200 :: 130.1959529966116 :: 0.4339865099887053 :: 0.650979764983058
dedupePaths :: Native :: 300 :: 200 :: 136.70849999785423 :: 0.45569499999284746 :: 0.6835424999892712

This is run from a single test in the jasmine_node_test //packages/compiler-cli/ngcc/test:test target.

petebacondarwin · 2020-05-03T13:06:10Z

I guess you are on Windows? I wonder if there is some difference there?
I ran this 100x and took the average. There were definitely some big outliers, do it might be worth running a few times.

petebacondarwin · 2020-05-03T13:07:25Z

Oh also the only difference between the two algorithms was the use of Map vs Record, right?

JoostK · 2020-05-03T15:14:03Z

Here's my tweaked setup: https://github.com/JoostK/perf-36881

yarn test 10000

On my MacBook Pro (late 2013) using NodeJS 10.15.0:

dedupePathsMap :: 300 :: 200 :: 4819.888596996665 :: 0.0016066295323322215 :: 0.0024099442984983326
dedupePathsRecord :: 300 :: 200 :: 6899.658298999071 :: 0.002299886099666357 :: 0.0034498291494995358
-30.14%

In latest NodeJS (14.1.0) the difference has become smaller:

dedupePathsMap :: 300 :: 200 :: 4455.834692999721 :: 0.001485278230999907 :: 0.0022279173464998603
dedupePathsRecord :: 300 :: 200 :: 5653.348865002394 :: 0.0018844496216674645 :: 0.002826674432501197
-21.18%

I'm using a fairly large iteration count as it shows far more stable results. Using a small iteration count, I'm seeing noticeable differences when e.g. swapping the benchmark ordering.

Anyway, this is more out of curiosity than it actually making a difference in real life.

petebacondarwin · 2020-05-03T16:57:04Z

Oh well the differences are minimal on my side. So I'll go with the Map version.

packages/compiler-cli/ngcc/src/entry_point_finder/utils.ts

@JoostK

This function needs to deduplicate the paths that are found from the paths mappings. Previously this deduplication was not linear and also called the expensive `relative()` function many times. This commit, suggested by @JoostK, reduces the complexity of the deduplication by using a tree structure built from the segments of each path. PR Close #36881

…eeded (#36881) Previously the `basePaths` were computed when the finder was instantiated. This was a waste of effort in the case that the targeted entry-point is already processed. This change makes the computation of `basePaths` lazy, so that the work is only done if they are actually needed. Fixes #36874 PR Close #36881

@JoostK

This function needs to deduplicate the paths that are found from the paths mappings. Previously this deduplication was not linear and also called the expensive `relative()` function many times. This commit, suggested by @JoostK, reduces the complexity of the deduplication by using a tree structure built from the segments of each path. PR Close #36881

angular-automatic-lock-bot · 2020-06-04T16:10:49Z

This issue has been automatically locked due to inactivity.
Please file a new issue if you are encountering a similar or related problem.

Read more about our automatic conversation locking policy.

_{This action has been performed automatically by a bot.}

…eeded (angular#36881) Previously the `basePaths` were computed when the finder was instantiated. This was a waste of effort in the case that the targeted entry-point is already processed. This change makes the computation of `basePaths` lazy, so that the work is only done if they are actually needed. Fixes angular#36874 PR Close angular#36881

@JoostK

This function needs to deduplicate the paths that are found from the paths mappings. Previously this deduplication was not linear and also called the expensive `relative()` function many times. This commit, suggested by @JoostK, reduces the complexity of the deduplication by using a tree structure built from the segments of each path. PR Close angular#36881

petebacondarwin added area: performance action: review The PR is still awaiting reviews from at least one requested reviewer target: patch This PR is targeted for the next patch release comp: ngcc labels May 1, 2020

ngbot bot modified the milestone: needsTriage May 1, 2020

pullapprove bot requested a review from gkalpak May 1, 2020 09:15

googlebot added the cla: yes label May 1, 2020

petebacondarwin mentioned this pull request May 1, 2020

CLI is stuck at "0% compiling" for some time #36874

Closed

2 tasks

gkalpak previously approved these changes May 1, 2020

View reviewed changes

petebacondarwin requested review from gkalpak and JoostK May 1, 2020 14:14

gkalpak approved these changes May 1, 2020

View reviewed changes

petebacondarwin force-pushed the ngcc-lazy-getBasePaths branch from 060ee61 to 58e4e9c Compare May 1, 2020 21:57

JoostK approved these changes May 1, 2020

View reviewed changes

JoostK added action: cleanup The PR is in need of cleanup, either due to needing a rebase or in response to comments from reviews and removed action: review The PR is still awaiting reviews from at least one requested reviewer labels May 1, 2020

petebacondarwin added 2 commits May 2, 2020 16:48

fixup! perf(ngcc): only compute basePaths in TargetedEntryPointFinder…

ca839a0

… when needed

petebacondarwin added action: merge The PR is ready for merge by the caretaker and removed action: cleanup The PR is in need of cleanup, either due to needing a rebase or in response to comments from reviews labels May 2, 2020

fixup! perf(ngcc): speed up the getBasePaths() computation

5f24a95

petebacondarwin added the action: cleanup The PR is in need of cleanup, either due to needing a rebase or in response to comments from reviews label May 3, 2020

petebacondarwin added action: cleanup The PR is in need of cleanup, either due to needing a rebase or in response to comments from reviews and removed action: cleanup The PR is in need of cleanup, either due to needing a rebase or in response to comments from reviews labels May 3, 2020

gkalpak approved these changes May 4, 2020

View reviewed changes

packages/compiler-cli/ngcc/src/entry_point_finder/utils.ts Outdated Show resolved Hide resolved

pullapprove bot requested a review from alxhub May 4, 2020 14:56

fixup! perf(ngcc): speed up the getBasePaths() computation

b99068c

petebacondarwin force-pushed the ngcc-lazy-getBasePaths branch from 51091a9 to b99068c Compare May 4, 2020 14:58

petebacondarwin removed the request for review from alxhub May 4, 2020 14:59

alxhub closed this in ec6b9cc May 4, 2020

petebacondarwin deleted the ngcc-lazy-getBasePaths branch May 4, 2020 20:00

angular-automatic-lock-bot bot locked and limited conversation to collaborators Jun 4, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(ngcc): only compute basePaths in TargetedEntryPointFinder when n… #36881

perf(ngcc): only compute basePaths in TargetedEntryPointFinder when n… #36881

petebacondarwin commented May 1, 2020

JoostK commented May 1, 2020

petebacondarwin commented May 1, 2020

petebacondarwin commented May 1, 2020

gkalpak left a comment

JoostK commented May 1, 2020

gkalpak commented May 1, 2020

petebacondarwin commented May 1, 2020

n9niwas commented May 1, 2020

petebacondarwin commented May 1, 2020

gkalpak left a comment

JoostK left a comment

petebacondarwin commented May 2, 2020

JoostK commented May 2, 2020 •

edited

petebacondarwin commented May 2, 2020

petebacondarwin commented May 3, 2020

JoostK commented May 3, 2020

petebacondarwin commented May 3, 2020

petebacondarwin commented May 3, 2020

JoostK commented May 3, 2020 •

edited

petebacondarwin commented May 3, 2020 •

edited

angular-automatic-lock-bot bot commented Jun 4, 2020

perf(ngcc): only compute basePaths in TargetedEntryPointFinder when n… #36881

perf(ngcc): only compute basePaths in TargetedEntryPointFinder when n… #36881

Conversation

petebacondarwin commented May 1, 2020

JoostK commented May 1, 2020

petebacondarwin commented May 1, 2020

petebacondarwin commented May 1, 2020

gkalpak left a comment

Choose a reason for hiding this comment

JoostK commented May 1, 2020

gkalpak commented May 1, 2020

petebacondarwin commented May 1, 2020

n9niwas commented May 1, 2020

petebacondarwin commented May 1, 2020

gkalpak left a comment

Choose a reason for hiding this comment

JoostK left a comment

Choose a reason for hiding this comment

petebacondarwin commented May 2, 2020

JoostK commented May 2, 2020 • edited

petebacondarwin commented May 2, 2020

petebacondarwin commented May 3, 2020

JoostK commented May 3, 2020

petebacondarwin commented May 3, 2020

petebacondarwin commented May 3, 2020

JoostK commented May 3, 2020 • edited

petebacondarwin commented May 3, 2020 • edited

angular-automatic-lock-bot bot commented Jun 4, 2020

JoostK commented May 2, 2020 •

edited

JoostK commented May 3, 2020 •

edited

petebacondarwin commented May 3, 2020 •

edited