Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(ngcc): support recovering when a worker process crashes (and more) #36626

Closed
wants to merge 18 commits into from

Conversation

gkalpak
Copy link
Member

@gkalpak gkalpak commented Apr 14, 2020

Several small ngcc fixes; the main one being being able to recover when a worker process crashes.
See individual commit messages for more info.

Jira issue: FW-2008

Fixes #36278.

@gkalpak gkalpak added comp: ngcc state: WIP target: patch This PR is targeted for the next patch release type: bug/fix labels Apr 14, 2020
@ngbot ngbot bot modified the milestone: needsTriage Apr 14, 2020
@mary-poppins
Copy link

You can preview cc6daf8 at https://pr36626-cc6daf8.ngbuilds.io/.

@mary-poppins
Copy link

You can preview 8f92a83 at https://pr36626-8f92a83.ngbuilds.io/.

@mary-poppins
Copy link

You can preview ed19d80 at https://pr36626-ed19d80.ngbuilds.io/.

@mary-poppins
Copy link

You can preview 8335429 at https://pr36626-8335429.ngbuilds.io/.

@mary-poppins
Copy link

You can preview 777cf8f at https://pr36626-777cf8f.ngbuilds.io/.

@mary-poppins
Copy link

You can preview 17b4fac at https://pr36626-17b4fac.ngbuilds.io/.

@mary-poppins
Copy link

You can preview 1f3257b at https://pr36626-1f3257b.ngbuilds.io/.

@mary-poppins
Copy link

You can preview 03e74f2 at https://pr36626-03e74f2.ngbuilds.io/.

@gkalpak gkalpak changed the title WIP - fix(ngcc): do not fail if a worker process crashes in parallel mode fix(ngcc): support recovering when a worker process crashes (and more) Apr 22, 2020
@gkalpak gkalpak added action: review The PR is still awaiting reviews from at least one requested reviewer and removed state: WIP labels Apr 22, 2020
@gkalpak gkalpak marked this pull request as ready for review April 22, 2020 20:50
@pullapprove pullapprove bot requested a review from JoostK April 22, 2020 20:50
@petebacondarwin
Copy link
Member

One can argue that none of these commits are "feat" since they don't actually offer any new features to the end-user. Instead they are parts of a reorganisation (refactor) of the ngcc internal implementation to "fix" the problem of a lack of resilience to workers being killed.

(And features can't be cherry picked to stable).

@petebacondarwin
Copy link
Member

typo in fix(ngcc): give up re-spawing crashed worker process after 3 attempts:

spawing -> spawning

AndrewKushnir pushed a commit that referenced this pull request Apr 29, 2020
…ores (#36626)

Previously, ngcc would run in parallel mode (using the
`ClusterExecutor`) when there were at least 2 CPU cores (and all other
requirements where met). On systems with just 2 CPU cores, this meant
there would only be one worker process (since one CPU core is always
reserved for the master process). In these cases, the tasks would still
be processed serially (on the one worker process), but we would also pay
the overhead of communicating between the master and worker processes.

This commit fixes this by only running in parallel mode if there are
more than 2 CPU cores (i.e. at least 2 worker processes).

PR Close #36626
AndrewKushnir pushed a commit that referenced this pull request Apr 29, 2020
This commit adds support for stopping processing an in-progress task
and moving it back to the list of pending tasks.

In a subsequent commit, this will be used to allow ngcc to recover when
a worker process crashes in the middle of processing a task.

PR Close #36626
AndrewKushnir pushed a commit that referenced this pull request Apr 29, 2020
…leted()` (#36626)

Rename the `markTaskCompleted()` method to be consistent with the other
similar methods of `TaskQueue` (`markAsFailed()` and
`markAsUnprocessed()`).

PR Close #36626
AndrewKushnir pushed a commit that referenced this pull request Apr 29, 2020
…iles (#36626)

This commit enhances the `CompileFn`, which is used to process each
entry-point, to support running a passed-in callback (and wait for it to
complete) before proceeding with writing the transformed files to disk.

This functionality is currently not used. In a subsequent commit, it
will be used for passing info from worker processes to the master
process that will allow ngcc to recover when a worker process crashes in
the middle of processing a task.

PR Close #36626
AndrewKushnir pushed a commit that referenced this pull request Apr 29, 2020
…writing (#36626)

With this commit, worker processes will notify the master process about
the transformed files they are about to write to disk before starting
writing them.

In a subsequent commit, this will be used to allow ngcc to recover when
a worker process crashes in the middle of processing a task.

PR Close #36626
AndrewKushnir pushed a commit that referenced this pull request Apr 29, 2020
With this commit, the master process will keep track of the transformed
files that each worker process is intending to write to disk.

In a subsequent commit, this info will be used to allow ngcc to recover
when a worker process crashes in the middle of processing a task.

PR Close #36626
AndrewKushnir pushed a commit that referenced this pull request Apr 29, 2020
This commit adds a `revertFile()` method to `FileWriter`, which can
revert a transformed file (and its backup - if any) written by the
`FileWriter`.

In a subsequent commit, this will be used to allow ngcc to recover
when a worker process crashes in the middle of processing a task.

PR Close #36626
AndrewKushnir pushed a commit that referenced this pull request Apr 29, 2020
Previously, when running in parallel mode and a worker process crashed
while processing a task, it was not possible for ngcc to continue
without risking ending up with a corrupted entry-point and therefore it
exited with an error. This, for example, could happen when a worker
process received a `SIGKILL` signal, which was frequently observed in CI
environments. This was probably the result of Docker killing processes
due to increased memory pressure.

One factor that amplifies the problem under Docker (which is often used
in CI) is that it is not possible to distinguish between the available
CPU cores on the host machine and the ones made available to Docker
containers, thus resulting in ngcc spawning too many worker processes.

This commit addresses these issues in the following ways:

1. We take advantage of the fact that files are written to disk only
   after an entry-point has been fully analyzed/compiled. The master
   process can now determine whether a worker process has not yet
   started writing files to disk (even if it was in the middle of
   processing a task) and just put the task back into the tasks queue if
   the worker process crashes.

2. The master process keeps track of the transformed files that a worker
   process will attempt to write to disk. If the worker process crashes
   while writing files, the master process can revert any changes and
   put the task back into the tasks queue (without risking corruption).

3. When a worker process crashes while processing a task (which can be a
   result of increased memory pressure or too many worker processes),
   the master process will not try to re-spawn it. This way the number
   or worker processes is gradually adjusted to a level that can be
   accomodated by the system's resources.

Examples of ngcc being able to recover after a worker process crashed:
- While idling: https://circleci.com/gh/angular/angular/682197
- While compiling: https://circleci.com/gh/angular/angular/682209
- While writing files: https://circleci.com/gh/angular/angular/682267

Jira issue: [FW-2008](https://angular-team.atlassian.net/browse/FW-2008)

Fixes #36278

PR Close #36626
AndrewKushnir pushed a commit that referenced this pull request Apr 29, 2020
…#36626)

Previously, when the last worker process crashed, the master process
would try to re-spawn it indefinitely. This could lead to an infinite
loop (if for some reason the worker process kept crashing).

This commit avoids this by limiting the number of re-spawn attempts to
3, after which ngcc will exit with an error.

PR Close #36626
AndrewKushnir pushed a commit that referenced this pull request Apr 29, 2020
When running in parallel mode, worker processes forward errors thrown
during task processing to the master process, which in turn exits with
an error.

However, there are cases where the error is not directly related to
processing the entry-point. One such case is when there is not enough
memory (for example, due to all the other tasks being processed
simultaneously).

Previously, an `ENOMEM` error thrown on a worker process would propagate
to the master process, eventually causing ngcc to exit with an error.
Example failure: https://circleci.com/gh/angular/angular/682198

This commit improves handling of these low-memory situations by
detecting `ENOMEM` errors and killing the worker process, thus allowing
the master process to decide how to handle that. The master process will
put the task back into the tasks queue and continue processing tasks
with the rest of the worker processes (and thus with lower memory
pressure).

PR Close #36626
AndrewKushnir pushed a commit that referenced this pull request Apr 29, 2020
…r dependency (#36626)

Now that `ngcc/src/ngcc_options` imports `FileWriter` type, there is a
circular dependency detected by the `ts-circular-deps:check` lint check:

```
ngcc/src/ngcc_options.ts
  → ngcc/src/writing/file_writer.ts
  → ngcc/src/packages/entry_point_bundle.ts
  → ngcc/src/ngcc_options.ts
```

This commit moves the `PathMappings` type (and related helpers) to a
separate file to avoid the circular dependency.

NOTE:
The circular dependency was only with taking types into account. There
was no circular dependency for the actual (JS) code.

PR Close #36626
@gkalpak gkalpak deleted the fix-ngcc-crashing-worker branch April 30, 2020 11:02
@angular-automatic-lock-bot
Copy link

This issue has been automatically locked due to inactivity.
Please file a new issue if you are encountering a similar or related problem.

Read more about our automatic conversation locking policy.

This action has been performed automatically by a bot.

@angular-automatic-lock-bot angular-automatic-lock-bot bot locked and limited conversation to collaborators May 31, 2020
profanis pushed a commit to profanis/angular that referenced this pull request Sep 5, 2020
…a task (angular#36626)

Previously, the "Compiling <entryPoint>" log message was printed before
starting to analyze and transform files, but after creating the
`EntryPointBundle` (which includes creating the TS program).

Since creating the `EntryPointBundle` involves some work, it is more
accurate to move the log message before creating the bundle.

PR Close angular#36626
profanis pushed a commit to profanis/angular that referenced this pull request Sep 5, 2020
…ores (angular#36626)

Previously, ngcc would run in parallel mode (using the
`ClusterExecutor`) when there were at least 2 CPU cores (and all other
requirements where met). On systems with just 2 CPU cores, this meant
there would only be one worker process (since one CPU core is always
reserved for the master process). In these cases, the tasks would still
be processed serially (on the one worker process), but we would also pay
the overhead of communicating between the master and worker processes.

This commit fixes this by only running in parallel mode if there are
more than 2 CPU cores (i.e. at least 2 worker processes).

PR Close angular#36626
profanis pushed a commit to profanis/angular that referenced this pull request Sep 5, 2020
…ar#36626)

This commit adds support for stopping processing an in-progress task
and moving it back to the list of pending tasks.

In a subsequent commit, this will be used to allow ngcc to recover when
a worker process crashes in the middle of processing a task.

PR Close angular#36626
profanis pushed a commit to profanis/angular that referenced this pull request Sep 5, 2020
…leted()` (angular#36626)

Rename the `markTaskCompleted()` method to be consistent with the other
similar methods of `TaskQueue` (`markAsFailed()` and
`markAsUnprocessed()`).

PR Close angular#36626
profanis pushed a commit to profanis/angular that referenced this pull request Sep 5, 2020
…iles (angular#36626)

This commit enhances the `CompileFn`, which is used to process each
entry-point, to support running a passed-in callback (and wait for it to
complete) before proceeding with writing the transformed files to disk.

This functionality is currently not used. In a subsequent commit, it
will be used for passing info from worker processes to the master
process that will allow ngcc to recover when a worker process crashes in
the middle of processing a task.

PR Close angular#36626
profanis pushed a commit to profanis/angular that referenced this pull request Sep 5, 2020
…writing (angular#36626)

With this commit, worker processes will notify the master process about
the transformed files they are about to write to disk before starting
writing them.

In a subsequent commit, this will be used to allow ngcc to recover when
a worker process crashes in the middle of processing a task.

PR Close angular#36626
profanis pushed a commit to profanis/angular that referenced this pull request Sep 5, 2020
With this commit, the master process will keep track of the transformed
files that each worker process is intending to write to disk.

In a subsequent commit, this info will be used to allow ngcc to recover
when a worker process crashes in the middle of processing a task.

PR Close angular#36626
profanis pushed a commit to profanis/angular that referenced this pull request Sep 5, 2020
…#36626)

This commit adds a `revertFile()` method to `FileWriter`, which can
revert a transformed file (and its backup - if any) written by the
`FileWriter`.

In a subsequent commit, this will be used to allow ngcc to recover
when a worker process crashes in the middle of processing a task.

PR Close angular#36626
profanis pushed a commit to profanis/angular that referenced this pull request Sep 5, 2020
…36626)

Previously, when running in parallel mode and a worker process crashed
while processing a task, it was not possible for ngcc to continue
without risking ending up with a corrupted entry-point and therefore it
exited with an error. This, for example, could happen when a worker
process received a `SIGKILL` signal, which was frequently observed in CI
environments. This was probably the result of Docker killing processes
due to increased memory pressure.

One factor that amplifies the problem under Docker (which is often used
in CI) is that it is not possible to distinguish between the available
CPU cores on the host machine and the ones made available to Docker
containers, thus resulting in ngcc spawning too many worker processes.

This commit addresses these issues in the following ways:

1. We take advantage of the fact that files are written to disk only
   after an entry-point has been fully analyzed/compiled. The master
   process can now determine whether a worker process has not yet
   started writing files to disk (even if it was in the middle of
   processing a task) and just put the task back into the tasks queue if
   the worker process crashes.

2. The master process keeps track of the transformed files that a worker
   process will attempt to write to disk. If the worker process crashes
   while writing files, the master process can revert any changes and
   put the task back into the tasks queue (without risking corruption).

3. When a worker process crashes while processing a task (which can be a
   result of increased memory pressure or too many worker processes),
   the master process will not try to re-spawn it. This way the number
   or worker processes is gradually adjusted to a level that can be
   accomodated by the system's resources.

Examples of ngcc being able to recover after a worker process crashed:
- While idling: https://circleci.com/gh/angular/angular/682197
- While compiling: https://circleci.com/gh/angular/angular/682209
- While writing files: https://circleci.com/gh/angular/angular/682267

Jira issue: [FW-2008](https://angular-team.atlassian.net/browse/FW-2008)

Fixes angular#36278

PR Close angular#36626
profanis pushed a commit to profanis/angular that referenced this pull request Sep 5, 2020
…angular#36626)

Previously, when the last worker process crashed, the master process
would try to re-spawn it indefinitely. This could lead to an infinite
loop (if for some reason the worker process kept crashing).

This commit avoids this by limiting the number of re-spawn attempts to
3, after which ngcc will exit with an error.

PR Close angular#36626
profanis pushed a commit to profanis/angular that referenced this pull request Sep 5, 2020
When running in parallel mode, worker processes forward errors thrown
during task processing to the master process, which in turn exits with
an error.

However, there are cases where the error is not directly related to
processing the entry-point. One such case is when there is not enough
memory (for example, due to all the other tasks being processed
simultaneously).

Previously, an `ENOMEM` error thrown on a worker process would propagate
to the master process, eventually causing ngcc to exit with an error.
Example failure: https://circleci.com/gh/angular/angular/682198

This commit improves handling of these low-memory situations by
detecting `ENOMEM` errors and killing the worker process, thus allowing
the master process to decide how to handle that. The master process will
put the task back into the tasks queue and continue processing tasks
with the rest of the worker processes (and thus with lower memory
pressure).

PR Close angular#36626
profanis pushed a commit to profanis/angular that referenced this pull request Sep 5, 2020
…r dependency (angular#36626)

Now that `ngcc/src/ngcc_options` imports `FileWriter` type, there is a
circular dependency detected by the `ts-circular-deps:check` lint check:

```
ngcc/src/ngcc_options.ts
  → ngcc/src/writing/file_writer.ts
  → ngcc/src/packages/entry_point_bundle.ts
  → ngcc/src/ngcc_options.ts
```

This commit moves the `PathMappings` type (and related helpers) to a
separate file to avoid the circular dependency.

NOTE:
The circular dependency was only with taking types into account. There
was no circular dependency for the actual (JS) code.

PR Close angular#36626
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
action: merge The PR is ready for merge by the caretaker cla: yes target: patch This PR is targeted for the next patch release type: bug/fix
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ngcc crashing on every CI build
5 participants