Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf(ngcc): add support for parallel execution of tasks #32427

Closed
wants to merge 15 commits into from

Conversation

@gkalpak
Copy link
Member

commented Aug 30, 2019

This is a continuation of the work started in #32052 to speed up ngcc by adding support for executing tasks in parallel (when async execution is acceptable).

This PR includes numerous refactorings to clean up/simplify the code and prepare the introduction of parallel task execution. It also includes a tiny fix: only back up the original `prepublishOnly` script and not the overwritten one

See the individual commit messages for more details.


Notes on testing

I have added unit tests for all added/modified code except for the ClusterMaster class, because:

  • It mostly orchestrates other parts (that are independently tested).
  • The most important parts of its functionality are covered by integration tests.
  • I didn't want to spend too much time on testing it before a first review (in case the implementation changed drastically).

The integration tests at ngcc/test/integration/ngcc_spec.ts, which are run under bazel, don't play well with spawning multiple processes, so they are only run on one process (which means they don't exercise the ClusterExecutor).

The integration tests at integration/ngcc/test.sh do run in parallel mode (and only in that mode, since that is the only option with ivy-ngcc).

Other than that, I have manually verified that ngcc can successfully build angular.io in parallel mode (with various command-line argument combinations) and that the resulting node_modules/are identical to those produced in sync mode with the following exceptions:

  • package.json > __processed_by_ivy_ngcc__ might have properties in different order.
    This is expected, since now the order is which tests are run/completed is different and non-deterministic.
  • Some generated variable names can be different (mostly(/only?) noticed in UMD bundles). E.g. data_r50 vs data_r450.
    Not sure why that is and whether it is problematic.

Notes on performance

For processing angular.io, I have observed speed improvement between 1.8x (on my Windows laptop - 7 workers) to 2.6x (on Linux CI - 8 workers). This was a little surprizing, since task processing (which runs on multiple processes) takes up 80%-90% and all workers are busy most of the time (i.e. there are rarely idle workers).

More investigation is needed to better understand/improve those numbers, but this can happen after the PR lands.

Fun fact:
There doesn't seem to be much improvement above 4 workers. Even with up to 20 workers on CI, the duration was the same (if not worse: 29s vs 23s).


TODO

  • Get the PR reviewed.
  • Determine whether the differences in variable names (in UMD bundles?) are an issue.1
  • Determine whether we need unit tests for ClusterMaster.

1: The differences are in "shared context vars" generated via BindingScope#generateSharedContextVar() by the TemplateDefinitionBuilder instantiated in compileComponentFromMetadata(). Since these are local to the template, we don't mind having duplicate names in different entry-points or formats.

@mary-poppins

This comment has been minimized.

Copy link

commented Aug 30, 2019

@hiepxanh

This comment has been minimized.

Copy link

commented Sep 1, 2019

do it bro, we need ngc be faster and multi thread. every second you saving even 3 seconds is 1000 developers * 3 = 50 hours everytime they build their app. You are doing something is bigger meaning than you thought.

Copy link
Member

left a comment

Nothing here that I think should block the merging of this PR.
A handful of nits that I expect you can clean up before merge.
And a fairly chunky architectural suggestion, which if agreed could be done in a subsequent refactoring PR.

@gkalpak gkalpak force-pushed the gkalpak:perf-ngcc-parallelize branch from 42b152c to 0db6569 Sep 5, 2019
@gkalpak

This comment has been minimized.

Copy link
Member Author

commented Sep 5, 2019

Rebased on master and addressed feedback (either by adding fixup commits or by commenting).
@petebacondarwin, ptal.

I, also, created #32486 to capture ideas for future improvements/refactorings (as discussed here and on Slack).

@mary-poppins

This comment has been minimized.

Copy link

commented Sep 5, 2019

Copy link
Member

left a comment

reviewed the integration tests part and skimmed through the rest. one nit, otherwise lgtm

@IgorMinar

This comment has been minimized.

Copy link
Member

commented Sep 5, 2019

thanks George for implementing this feature!

@gkalpak

This comment has been minimized.

Copy link
Member Author

commented Sep 5, 2019

For reference:
Igor's nit was that we should give better error messages in integration/ngcc/test.sh were we exit 1.

@gkalpak gkalpak requested a review from angular/docs-infra as a code owner Sep 5, 2019
@mary-poppins

This comment has been minimized.

Copy link

commented Sep 5, 2019

@gkalpak gkalpak force-pushed the gkalpak:perf-ngcc-parallelize branch from ecb794c to 077a637 Sep 5, 2019
@mary-poppins

This comment has been minimized.

Copy link

commented Sep 5, 2019

gkalpak added 2 commits Aug 9, 2019
This commit addresses the review feedback from #32052, which was merged
before addressing the feedback there.
@gkalpak gkalpak removed the request for review from angular/docs-infra Sep 7, 2019
@matsko matsko closed this in bd1de32 Sep 9, 2019
matsko added a commit that referenced this pull request Sep 9, 2019
…the overwritten one (#32427)

In order to prevent `ngcc`'d packages (e.g. libraries) from getting
accidentally published, `ngcc` overwrites the `prepublishOnly` npm
script to log a warning and exit with an error. In case we want to
restore the original script (e.g. "undo" `ngcc` processing), we keep a
backup of the original `prepublishOnly` script.

Previously, running `ngcc` a second time (e.g. for a different format)
would create a backup of the overwritten `prepublishOnly` script (if
there was originally no `prepublishOnly` script). As a result, if we
ever tried to "undo" `ngcc` processing and restore the original
`prepublishOnly` script, the error-throwing script would be restored
instead.

This commit fixes it by ensuring that we only back up a `prepublishOnly`
script, iff it is not the one we created ourselves (i.e. the
error-throwing one).

PR Close #32427
matsko added a commit that referenced this pull request Sep 9, 2019
…rface (#32427)

To persist some of its state, `ngcc` needs to update `package.json`
files (both in memory and on disk).

This refactoring abstracts these operations behind the
`PackageJsonUpdater` interface, making it easier to orchestrate them
from different contexts (e.g. when running tasks in parallel on multiple
processes).

Inspired by/Based on @alxhub's prototype: alxhub/angular@cb631bd

PR Close #32427
matsko added a commit that referenced this pull request Sep 9, 2019
…face (#32427)

This change does not alter the current behavior, but makes it easier to
introduce new types of `Executors` , for example to do the required work
in parallel (on multiple processes).

Inspired by/Based on @alxhub's prototype: alxhub/angular@cb631bd

PR Close #32427
matsko added a commit that referenced this pull request Sep 9, 2019
Previously, `ngcc`'s programmatic API would run and complete
synchronously. This was necessary for specific usecases (such as how the
`@angular/cli` invokes `ngcc` as part of the TypeScript module
resolution process), but not for others (e.g. running `ivy-ngcc` as a
`postinstall` script).

This commit adds a new option (`async`) that enables turning on
asynchronous execution. I.e. it signals that the caller is OK with the
function call to complete asynchronously, which allows `ngcc` to
potentially run in a more efficient mode.

Currently, there is no difference in the way tasks are executed in sync
vs async mode, but this change sets the ground for adding new execution
options (that require asynchronous operation), such as processing tasks
in parallel on multiple processes.

NOTE:
When using the programmatic API, the default value for `async` is
`false`, thus retaining backwards compatibility.
When running `ngcc` from the command line (i.e. via the `ivy-ngcc`
script), it runs in async mode (to be able to take advantage of future
optimizations), but that is transparent to the caller.

PR Close #32427
matsko added a commit that referenced this pull request Sep 9, 2019
…ty processability (#32427)

In the past, a task's processability didn't use to be known in advance.
It was possible that a task would be created and added to the queue
during the analysis phase and then later (during the compilation phase)
it would be found out that the task (i.e. the associated format
property) was not processable.

As a result, certain checks had to be delayed, until a task's processing
had started or even until all tasks had been processed. Examples of
checks that had to be delayed are:
- Whether a task can be skipped due to `compileAllFormats: false`.
- Whether there were entry-points for which no format at all was
  successfully processed.

It turns out that (as made clear by the refactoring in 9537b2f), once
a task starts being processed it is expected to either complete
successfully (with the associated format being processed) or throw an
error (in which case the process will exit). In other words, a task's
processability is known in advance.

This commit takes advantage of this fact by moving certain checks
earlier in the process (e.g. in the analysis phase instead of the
compilation phase), which in turn allows avoiding some unnecessary work.
More specifically:

- When `compileAllFormats` is `false`, tasks are created _only_ for the
  first suitable format property for each entry-point, since the rest of
  the tasks would have been skipped during the compilation phase anyway.
  This has the following advantages:
  1. It avoids the slight overhead of generating extraneous tasks and
     then starting to process them (before realizing they should be
     skipped).
  2. In a potential future parallel execution mode, unnecessary tasks
     might start being processed at the same time as the first (useful)
     task, even if their output would be later discarded, wasting
     resources. Alternatively, extra logic would have to be added to
     prevent this from happening. The change in this commit avoids these
     issues.
- When an entry-point is not processable, an error will be thrown
  upfront without having to wait for other tasks to be processed before
  failing.

PR Close #32427
matsko added a commit that referenced this pull request Sep 9, 2019
… types (#32427)

Previously, `ngcc` needed to store some metadata related to the
processing of each entry-point. This metadata was stored in a `Map`, in
the form of `EntryPointProcessingMetadata` and passed around as needed.

After some recent refactorings, it turns out that this metadata (with
its only remaining property, `hasProcessedTypings`) was no longer used,
because the relevant information was extracted from other sources (such
as the `processDts` property on `Task`s).

This commit cleans up the code by removing the unused code and types.

PR Close #32427
matsko added a commit that referenced this pull request Sep 9, 2019
This change does not alter the current behavior, but makes it easier to
introduce `TaskQueue`s implementing different task selection algorithms,
for example to support executing multiple tasks in parallel (while
respecting interdependencies between them).

Inspired by/Based on @alxhub's prototype: alxhub/angular@cb631bd

PR Close #32427
matsko added a commit that referenced this pull request Sep 9, 2019
…32427)

This commit adds a new `TaskQueue` implementation that supports
executing multiple tasks in parallel (while respecting interdependencies
between them).

This new implementation is currently not used, thus the behavior of
`ngcc` is not affected by this change. The parallel `TaskQueue` will be
used in a subsequent commit that will introduce parallel task execution.

PR Close #32427
matsko added a commit that referenced this pull request Sep 9, 2019
`ngcc` supports both synchronous and asynchronous execution. The default
mode when using `ngcc` programmatically (which is how `@angular/cli` is
using it) is synchronous. When running `ngcc` from the command line
(i.e. via the `ivy-ngcc` script), it runs in async mode.

Previously, the work would be executed in the same way in both modes.

This commit improves the performance of `ngcc` in async mode by
processing tasks in parallel on multiple processes. It uses the Node.js
built-in [`cluster` module](https://nodejs.org/api/cluster.html) to
launch a cluster of Node.js processes and take advantage of multi-core
systems.

Preliminary comparisons indicate a 1.8x to 2.6x speed improvement when
processing the angular.io app (apparently depending on the OS, number of
available cores, system load, etc.). Further investigation is needed to
better understand these numbers and identify potential areas of
improvement.

Inspired by/Based on @alxhub's prototype: alxhub/angular@cb631bd
Original design doc: https://hackmd.io/uYG9CJrFQZ-6FtKqpnYJAA?view

Jira issue: [FW-1460](https://angular-team.atlassian.net/browse/FW-1460)

PR Close #32427
matsko added a commit that referenced this pull request Sep 9, 2019
…tions (#32427)

This gives an overview of how much time is spent in each operation/phase
and makes it easy to do rough comparisons of how different
configurations or changes affect performance.

PR Close #32427
matsko added a commit that referenced this pull request Sep 9, 2019
@gkalpak gkalpak deleted the gkalpak:perf-ngcc-parallelize branch Sep 9, 2019
arnehoek added a commit to arnehoek/angular that referenced this pull request Sep 26, 2019
…32427)

This commit addresses the review feedback from angular#32052, which was merged
before addressing the feedback there.

PR Close angular#32427
arnehoek added a commit to arnehoek/angular that referenced this pull request Sep 26, 2019
…the overwritten one (angular#32427)

In order to prevent `ngcc`'d packages (e.g. libraries) from getting
accidentally published, `ngcc` overwrites the `prepublishOnly` npm
script to log a warning and exit with an error. In case we want to
restore the original script (e.g. "undo" `ngcc` processing), we keep a
backup of the original `prepublishOnly` script.

Previously, running `ngcc` a second time (e.g. for a different format)
would create a backup of the overwritten `prepublishOnly` script (if
there was originally no `prepublishOnly` script). As a result, if we
ever tried to "undo" `ngcc` processing and restore the original
`prepublishOnly` script, the error-throwing script would be restored
instead.

This commit fixes it by ensuring that we only back up a `prepublishOnly`
script, iff it is not the one we created ourselves (i.e. the
error-throwing one).

PR Close angular#32427
arnehoek added a commit to arnehoek/angular that referenced this pull request Sep 26, 2019
…rface (angular#32427)

To persist some of its state, `ngcc` needs to update `package.json`
files (both in memory and on disk).

This refactoring abstracts these operations behind the
`PackageJsonUpdater` interface, making it easier to orchestrate them
from different contexts (e.g. when running tasks in parallel on multiple
processes).

Inspired by/Based on @alxhub's prototype: alxhub/angular@cb631bd

PR Close angular#32427
arnehoek added a commit to arnehoek/angular that referenced this pull request Sep 26, 2019
…face (angular#32427)

This change does not alter the current behavior, but makes it easier to
introduce new types of `Executors` , for example to do the required work
in parallel (on multiple processes).

Inspired by/Based on @alxhub's prototype: alxhub/angular@cb631bd

PR Close angular#32427
arnehoek added a commit to arnehoek/angular that referenced this pull request Sep 26, 2019
Previously, `ngcc`'s programmatic API would run and complete
synchronously. This was necessary for specific usecases (such as how the
`@angular/cli` invokes `ngcc` as part of the TypeScript module
resolution process), but not for others (e.g. running `ivy-ngcc` as a
`postinstall` script).

This commit adds a new option (`async`) that enables turning on
asynchronous execution. I.e. it signals that the caller is OK with the
function call to complete asynchronously, which allows `ngcc` to
potentially run in a more efficient mode.

Currently, there is no difference in the way tasks are executed in sync
vs async mode, but this change sets the ground for adding new execution
options (that require asynchronous operation), such as processing tasks
in parallel on multiple processes.

NOTE:
When using the programmatic API, the default value for `async` is
`false`, thus retaining backwards compatibility.
When running `ngcc` from the command line (i.e. via the `ivy-ngcc`
script), it runs in async mode (to be able to take advantage of future
optimizations), but that is transparent to the caller.

PR Close angular#32427
arnehoek added a commit to arnehoek/angular that referenced this pull request Sep 26, 2019
…ty processability (angular#32427)

In the past, a task's processability didn't use to be known in advance.
It was possible that a task would be created and added to the queue
during the analysis phase and then later (during the compilation phase)
it would be found out that the task (i.e. the associated format
property) was not processable.

As a result, certain checks had to be delayed, until a task's processing
had started or even until all tasks had been processed. Examples of
checks that had to be delayed are:
- Whether a task can be skipped due to `compileAllFormats: false`.
- Whether there were entry-points for which no format at all was
  successfully processed.

It turns out that (as made clear by the refactoring in 9537b2f), once
a task starts being processed it is expected to either complete
successfully (with the associated format being processed) or throw an
error (in which case the process will exit). In other words, a task's
processability is known in advance.

This commit takes advantage of this fact by moving certain checks
earlier in the process (e.g. in the analysis phase instead of the
compilation phase), which in turn allows avoiding some unnecessary work.
More specifically:

- When `compileAllFormats` is `false`, tasks are created _only_ for the
  first suitable format property for each entry-point, since the rest of
  the tasks would have been skipped during the compilation phase anyway.
  This has the following advantages:
  1. It avoids the slight overhead of generating extraneous tasks and
     then starting to process them (before realizing they should be
     skipped).
  2. In a potential future parallel execution mode, unnecessary tasks
     might start being processed at the same time as the first (useful)
     task, even if their output would be later discarded, wasting
     resources. Alternatively, extra logic would have to be added to
     prevent this from happening. The change in this commit avoids these
     issues.
- When an entry-point is not processable, an error will be thrown
  upfront without having to wait for other tasks to be processed before
  failing.

PR Close angular#32427
arnehoek added a commit to arnehoek/angular that referenced this pull request Sep 26, 2019
… types (angular#32427)

Previously, `ngcc` needed to store some metadata related to the
processing of each entry-point. This metadata was stored in a `Map`, in
the form of `EntryPointProcessingMetadata` and passed around as needed.

After some recent refactorings, it turns out that this metadata (with
its only remaining property, `hasProcessedTypings`) was no longer used,
because the relevant information was extracted from other sources (such
as the `processDts` property on `Task`s).

This commit cleans up the code by removing the unused code and types.

PR Close angular#32427
arnehoek added a commit to arnehoek/angular that referenced this pull request Sep 26, 2019
…32427)

This change does not alter the current behavior, but makes it easier to
introduce `TaskQueue`s implementing different task selection algorithms,
for example to support executing multiple tasks in parallel (while
respecting interdependencies between them).

Inspired by/Based on @alxhub's prototype: alxhub/angular@cb631bd

PR Close angular#32427
arnehoek added a commit to arnehoek/angular that referenced this pull request Sep 26, 2019
…ngular#32427)

This commit adds a new `TaskQueue` implementation that supports
executing multiple tasks in parallel (while respecting interdependencies
between them).

This new implementation is currently not used, thus the behavior of
`ngcc` is not affected by this change. The parallel `TaskQueue` will be
used in a subsequent commit that will introduce parallel task execution.

PR Close angular#32427
arnehoek added a commit to arnehoek/angular that referenced this pull request Sep 26, 2019
`ngcc` supports both synchronous and asynchronous execution. The default
mode when using `ngcc` programmatically (which is how `@angular/cli` is
using it) is synchronous. When running `ngcc` from the command line
(i.e. via the `ivy-ngcc` script), it runs in async mode.

Previously, the work would be executed in the same way in both modes.

This commit improves the performance of `ngcc` in async mode by
processing tasks in parallel on multiple processes. It uses the Node.js
built-in [`cluster` module](https://nodejs.org/api/cluster.html) to
launch a cluster of Node.js processes and take advantage of multi-core
systems.

Preliminary comparisons indicate a 1.8x to 2.6x speed improvement when
processing the angular.io app (apparently depending on the OS, number of
available cores, system load, etc.). Further investigation is needed to
better understand these numbers and identify potential areas of
improvement.

Inspired by/Based on @alxhub's prototype: alxhub/angular@cb631bd
Original design doc: https://hackmd.io/uYG9CJrFQZ-6FtKqpnYJAA?view

Jira issue: [FW-1460](https://angular-team.atlassian.net/browse/FW-1460)

PR Close angular#32427
arnehoek added a commit to arnehoek/angular that referenced this pull request Sep 26, 2019
…tions (angular#32427)

This gives an overview of how much time is spent in each operation/phase
and makes it easy to do rough comparisons of how different
configurations or changes affect performance.

PR Close angular#32427
arnehoek added a commit to arnehoek/angular that referenced this pull request Sep 26, 2019
@angular-automatic-lock-bot

This comment has been minimized.

Copy link

commented Oct 10, 2019

This issue has been automatically locked due to inactivity.
Please file a new issue if you are encountering a similar or related problem.

Read more about our automatic conversation locking policy.

This action has been performed automatically by a bot.

@angular-automatic-lock-bot angular-automatic-lock-bot bot locked and limited conversation to collaborators Oct 10, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.