New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(ngcc): support recovering when a worker process crashes (and more) #36626
Conversation
You can preview cc6daf8 at https://pr36626-cc6daf8.ngbuilds.io/. |
cc6daf8
to
8f92a83
Compare
You can preview 8f92a83 at https://pr36626-8f92a83.ngbuilds.io/. |
8f92a83
to
ed19d80
Compare
You can preview ed19d80 at https://pr36626-ed19d80.ngbuilds.io/. |
ed19d80
to
8335429
Compare
You can preview 8335429 at https://pr36626-8335429.ngbuilds.io/. |
8335429
to
777cf8f
Compare
You can preview 777cf8f at https://pr36626-777cf8f.ngbuilds.io/. |
777cf8f
to
17b4fac
Compare
You can preview 17b4fac at https://pr36626-17b4fac.ngbuilds.io/. |
17b4fac
to
1f3257b
Compare
You can preview 1f3257b at https://pr36626-1f3257b.ngbuilds.io/. |
You can preview 03e74f2 at https://pr36626-03e74f2.ngbuilds.io/. |
One can argue that none of these commits are "feat" since they don't actually offer any new features to the end-user. Instead they are parts of a reorganisation (refactor) of the ngcc internal implementation to "fix" the problem of a lack of resilience to workers being killed. (And features can't be cherry picked to stable). |
typo in spawing -> spawning |
…ores (#36626) Previously, ngcc would run in parallel mode (using the `ClusterExecutor`) when there were at least 2 CPU cores (and all other requirements where met). On systems with just 2 CPU cores, this meant there would only be one worker process (since one CPU core is always reserved for the master process). In these cases, the tasks would still be processed serially (on the one worker process), but we would also pay the overhead of communicating between the master and worker processes. This commit fixes this by only running in parallel mode if there are more than 2 CPU cores (i.e. at least 2 worker processes). PR Close #36626
This commit adds support for stopping processing an in-progress task and moving it back to the list of pending tasks. In a subsequent commit, this will be used to allow ngcc to recover when a worker process crashes in the middle of processing a task. PR Close #36626
…iles (#36626) This commit enhances the `CompileFn`, which is used to process each entry-point, to support running a passed-in callback (and wait for it to complete) before proceeding with writing the transformed files to disk. This functionality is currently not used. In a subsequent commit, it will be used for passing info from worker processes to the master process that will allow ngcc to recover when a worker process crashes in the middle of processing a task. PR Close #36626
…writing (#36626) With this commit, worker processes will notify the master process about the transformed files they are about to write to disk before starting writing them. In a subsequent commit, this will be used to allow ngcc to recover when a worker process crashes in the middle of processing a task. PR Close #36626
With this commit, the master process will keep track of the transformed files that each worker process is intending to write to disk. In a subsequent commit, this info will be used to allow ngcc to recover when a worker process crashes in the middle of processing a task. PR Close #36626
This commit adds a `revertFile()` method to `FileWriter`, which can revert a transformed file (and its backup - if any) written by the `FileWriter`. In a subsequent commit, this will be used to allow ngcc to recover when a worker process crashes in the middle of processing a task. PR Close #36626
Previously, when running in parallel mode and a worker process crashed while processing a task, it was not possible for ngcc to continue without risking ending up with a corrupted entry-point and therefore it exited with an error. This, for example, could happen when a worker process received a `SIGKILL` signal, which was frequently observed in CI environments. This was probably the result of Docker killing processes due to increased memory pressure. One factor that amplifies the problem under Docker (which is often used in CI) is that it is not possible to distinguish between the available CPU cores on the host machine and the ones made available to Docker containers, thus resulting in ngcc spawning too many worker processes. This commit addresses these issues in the following ways: 1. We take advantage of the fact that files are written to disk only after an entry-point has been fully analyzed/compiled. The master process can now determine whether a worker process has not yet started writing files to disk (even if it was in the middle of processing a task) and just put the task back into the tasks queue if the worker process crashes. 2. The master process keeps track of the transformed files that a worker process will attempt to write to disk. If the worker process crashes while writing files, the master process can revert any changes and put the task back into the tasks queue (without risking corruption). 3. When a worker process crashes while processing a task (which can be a result of increased memory pressure or too many worker processes), the master process will not try to re-spawn it. This way the number or worker processes is gradually adjusted to a level that can be accomodated by the system's resources. Examples of ngcc being able to recover after a worker process crashed: - While idling: https://circleci.com/gh/angular/angular/682197 - While compiling: https://circleci.com/gh/angular/angular/682209 - While writing files: https://circleci.com/gh/angular/angular/682267 Jira issue: [FW-2008](https://angular-team.atlassian.net/browse/FW-2008) Fixes #36278 PR Close #36626
…#36626) Previously, when the last worker process crashed, the master process would try to re-spawn it indefinitely. This could lead to an infinite loop (if for some reason the worker process kept crashing). This commit avoids this by limiting the number of re-spawn attempts to 3, after which ngcc will exit with an error. PR Close #36626
When running in parallel mode, worker processes forward errors thrown during task processing to the master process, which in turn exits with an error. However, there are cases where the error is not directly related to processing the entry-point. One such case is when there is not enough memory (for example, due to all the other tasks being processed simultaneously). Previously, an `ENOMEM` error thrown on a worker process would propagate to the master process, eventually causing ngcc to exit with an error. Example failure: https://circleci.com/gh/angular/angular/682198 This commit improves handling of these low-memory situations by detecting `ENOMEM` errors and killing the worker process, thus allowing the master process to decide how to handle that. The master process will put the task back into the tasks queue and continue processing tasks with the rest of the worker processes (and thus with lower memory pressure). PR Close #36626
…r dependency (#36626) Now that `ngcc/src/ngcc_options` imports `FileWriter` type, there is a circular dependency detected by the `ts-circular-deps:check` lint check: ``` ngcc/src/ngcc_options.ts → ngcc/src/writing/file_writer.ts → ngcc/src/packages/entry_point_bundle.ts → ngcc/src/ngcc_options.ts ``` This commit moves the `PathMappings` type (and related helpers) to a separate file to avoid the circular dependency. NOTE: The circular dependency was only with taking types into account. There was no circular dependency for the actual (JS) code. PR Close #36626
This issue has been automatically locked due to inactivity. Read more about our automatic conversation locking policy. This action has been performed automatically by a bot. |
…a task (angular#36626) Previously, the "Compiling <entryPoint>" log message was printed before starting to analyze and transform files, but after creating the `EntryPointBundle` (which includes creating the TS program). Since creating the `EntryPointBundle` involves some work, it is more accurate to move the log message before creating the bundle. PR Close angular#36626
…ores (angular#36626) Previously, ngcc would run in parallel mode (using the `ClusterExecutor`) when there were at least 2 CPU cores (and all other requirements where met). On systems with just 2 CPU cores, this meant there would only be one worker process (since one CPU core is always reserved for the master process). In these cases, the tasks would still be processed serially (on the one worker process), but we would also pay the overhead of communicating between the master and worker processes. This commit fixes this by only running in parallel mode if there are more than 2 CPU cores (i.e. at least 2 worker processes). PR Close angular#36626
…ar#36626) This commit adds support for stopping processing an in-progress task and moving it back to the list of pending tasks. In a subsequent commit, this will be used to allow ngcc to recover when a worker process crashes in the middle of processing a task. PR Close angular#36626
…leted()` (angular#36626) Rename the `markTaskCompleted()` method to be consistent with the other similar methods of `TaskQueue` (`markAsFailed()` and `markAsUnprocessed()`). PR Close angular#36626
…iles (angular#36626) This commit enhances the `CompileFn`, which is used to process each entry-point, to support running a passed-in callback (and wait for it to complete) before proceeding with writing the transformed files to disk. This functionality is currently not used. In a subsequent commit, it will be used for passing info from worker processes to the master process that will allow ngcc to recover when a worker process crashes in the middle of processing a task. PR Close angular#36626
…writing (angular#36626) With this commit, worker processes will notify the master process about the transformed files they are about to write to disk before starting writing them. In a subsequent commit, this will be used to allow ngcc to recover when a worker process crashes in the middle of processing a task. PR Close angular#36626
With this commit, the master process will keep track of the transformed files that each worker process is intending to write to disk. In a subsequent commit, this info will be used to allow ngcc to recover when a worker process crashes in the middle of processing a task. PR Close angular#36626
…#36626) This commit adds a `revertFile()` method to `FileWriter`, which can revert a transformed file (and its backup - if any) written by the `FileWriter`. In a subsequent commit, this will be used to allow ngcc to recover when a worker process crashes in the middle of processing a task. PR Close angular#36626
…36626) Previously, when running in parallel mode and a worker process crashed while processing a task, it was not possible for ngcc to continue without risking ending up with a corrupted entry-point and therefore it exited with an error. This, for example, could happen when a worker process received a `SIGKILL` signal, which was frequently observed in CI environments. This was probably the result of Docker killing processes due to increased memory pressure. One factor that amplifies the problem under Docker (which is often used in CI) is that it is not possible to distinguish between the available CPU cores on the host machine and the ones made available to Docker containers, thus resulting in ngcc spawning too many worker processes. This commit addresses these issues in the following ways: 1. We take advantage of the fact that files are written to disk only after an entry-point has been fully analyzed/compiled. The master process can now determine whether a worker process has not yet started writing files to disk (even if it was in the middle of processing a task) and just put the task back into the tasks queue if the worker process crashes. 2. The master process keeps track of the transformed files that a worker process will attempt to write to disk. If the worker process crashes while writing files, the master process can revert any changes and put the task back into the tasks queue (without risking corruption). 3. When a worker process crashes while processing a task (which can be a result of increased memory pressure or too many worker processes), the master process will not try to re-spawn it. This way the number or worker processes is gradually adjusted to a level that can be accomodated by the system's resources. Examples of ngcc being able to recover after a worker process crashed: - While idling: https://circleci.com/gh/angular/angular/682197 - While compiling: https://circleci.com/gh/angular/angular/682209 - While writing files: https://circleci.com/gh/angular/angular/682267 Jira issue: [FW-2008](https://angular-team.atlassian.net/browse/FW-2008) Fixes angular#36278 PR Close angular#36626
…angular#36626) Previously, when the last worker process crashed, the master process would try to re-spawn it indefinitely. This could lead to an infinite loop (if for some reason the worker process kept crashing). This commit avoids this by limiting the number of re-spawn attempts to 3, after which ngcc will exit with an error. PR Close angular#36626
When running in parallel mode, worker processes forward errors thrown during task processing to the master process, which in turn exits with an error. However, there are cases where the error is not directly related to processing the entry-point. One such case is when there is not enough memory (for example, due to all the other tasks being processed simultaneously). Previously, an `ENOMEM` error thrown on a worker process would propagate to the master process, eventually causing ngcc to exit with an error. Example failure: https://circleci.com/gh/angular/angular/682198 This commit improves handling of these low-memory situations by detecting `ENOMEM` errors and killing the worker process, thus allowing the master process to decide how to handle that. The master process will put the task back into the tasks queue and continue processing tasks with the rest of the worker processes (and thus with lower memory pressure). PR Close angular#36626
…r dependency (angular#36626) Now that `ngcc/src/ngcc_options` imports `FileWriter` type, there is a circular dependency detected by the `ts-circular-deps:check` lint check: ``` ngcc/src/ngcc_options.ts → ngcc/src/writing/file_writer.ts → ngcc/src/packages/entry_point_bundle.ts → ngcc/src/ngcc_options.ts ``` This commit moves the `PathMappings` type (and related helpers) to a separate file to avoid the circular dependency. NOTE: The circular dependency was only with taking types into account. There was no circular dependency for the actual (JS) code. PR Close angular#36626
Several small ngcc fixes; the main one being being able to recover when a worker process crashes.
See individual commit messages for more info.
Jira issue: FW-2008
Fixes #36278.