Skip to content

Commit

Permalink
Update based on latest discussions
Browse files Browse the repository at this point in the history
  • Loading branch information
RandomByte committed Nov 22, 2023
1 parent 9d4cb25 commit 025954c
Show file tree
Hide file tree
Showing 3 changed files with 115 additions and 36 deletions.
151 changes: 115 additions & 36 deletions rfcs/0014-task-workers.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@
## Summary
<!-- You can either remove the following explanatory text or move it into this comment for later reference -->

Concept for a new API provided to UI5 Tooling build tasks, enabling easy use of Node.js [Worker Threads](https://nodejs.org/api/worker_threads.html) to execute CPU intensive operations outside of the main thread.
Concept for a new API provided to UI5 Tooling tasks, enabling easy use of Node.js [Worker Threads](https://nodejs.org/api/worker_threads.html) to execute CPU intensive operations outside of the main thread.

## Motivation
<!-- You can either remove the following explanatory text or move it into this comment for later reference -->
Expand All @@ -30,65 +30,128 @@ The pool should also be re-used when multiple projects are being built, either i
### Terminology

* **`Worker`**: A Node.js [Worker thread](https://nodejs.org/api/worker_threads.html) instance
* **`Build Task`**: A UI5 Tooling build task such as "minify" or "buildThemes" (standard tasks) or any [custom task](https://sap.github.io/ui5-tooling/stable/pages/extensibility/CustomTasks/)
* **`Task Processor`**: A module associated with a UI5 Tooling Build Task (standard or custom) that can be executed in a `Worker`
* **`Build Context`**: An already existing ui5-project module, coupled to the lifecycle of a Graph Build. It shall be extended to provide access to the `Work Dispatcher` by forwarding requests from `Build Tasks`
* **`Thread Runner`**: A ui5-project module that will be loaded in a `Worker`. It handles communication with the main thread and executes a `Task Processor` on request
* **`Work Dispatcher`**: A ui5-project singleton module which uses a library like [`workerpool`](https://github.com/josdejong/workerpool) to spawn and manage `Worker` instances in order to have them execute any `Task Processor` requested by the Build Task
- Handles the `Worker` lifecycle
* **`Task`**: A UI5 Tooling task such as `minify` or `buildThemes` (both standard tasks) or any [custom task](https://sap.github.io/ui5-tooling/stable/pages/extensibility/CustomTasks/)
* **`Task Processor`**: A module associated with a UI5 Tooling task (standard or custom) that can be executed in a worker
* **`Build Context`**: An already existing ui5-project module, coupled to the lifecycle of a Graph Build. It shall be extended to provide access to the Work Dispatcher` by forwarding requests from tasks
* **`Thread Runner`**: A `@ui5/project` module that will be loaded in a worker. It handles communication with the main thread and executes a task processor on request
* **`Work Dispatcher`**: A `@ui5/project` singleton module which uses a library like [`workerpool`](https://github.com/josdejong/workerpool) to spawn and manage worker instances in order to have them execute any task processor requested by the task
- Handles the worker lifecycle

![](./resources/0014-task-workers/Overview.png)

### Key Design Decisions

* Task Processors shall be called with a well defined signature as described [below](#task-processor)
* A Task Processor should not be exposed to Worker-specific API
- I.e. it can be executed on the main thread as well as in a Worker
- This allows users as well as UI5 Tooling logic to control whether Workers are used or not
- For example in CI environments where only one CPU core is available to the build, Workers are expected to produce overhead
- Users might want to disable Workers to easily debug issues in Processors
- The UI5 Tooling build itself might already be running in a Worker
* The Work Dispatcher and Thread Runner modules will handle all inter-process communication
* Task processors shall be called with a defined signature as described [below](#task-processor)
* A task processor should not be exposed to Worker-specific API
- i.e. it can be executed on the main thread as well as in a Worker
- This allows UI5 Tooling to dynamically decide whether to use Workers or not
+ For example in CI environments where only one CPU core is available to the build, Workers might cause unnecessary overhead
+ Users might want to disable Workers to easily debug issues in processors
+ The UI5 Tooling build itself might already be running in a Worker
* The work dispatcher and thread runner modules will handle all inter-process communication
- This includes serializing and de-serializing `@ui5/fs/Resource` instances
* Custom tasks can opt into this feature by defining one ore more "Task Processor" modules in its ui5.yaml
* A Task can only invoke its own Task Processor(s)
* Custom tasks can opt into this feature by defining one ore more task processor modules in their ui5.yaml
* A task can only invoke its own task processor(s)
* The work dispatcher or thread runners have no understanding of dependencies between the workloads
- Tasks are responsible for waiting on the completion of their processors
- Task processors should be executed in a first in, first out order

### Assumptions

* A Task Processor is assumed to utilize a CPU thread by 90-100%
* A task processor is assumed to utilize a single CPU thread by 90-100%
- Accordingly they are also assumed to execute little to no I/O operations
- This means one Worker should never execute more than one Task Processor at the same time
* A Task Processor is stateless
* A Worker should never execute more than one task processor at a time
* Task processors are generally stateless

### Task Processor

Similar to Tasks, Task Processors shall be invoked with a well defined signature:
[Processors](https://sap.github.io/ui5-tooling/stable/pages/Builder/#processors) are an established concept in UI5 Tooling but not yet exposed to custom tasks. The basic idea is that tasks act as the glue code that connects a more generic processor to UI5 Tooling. For example, UI5 Tooling processors make use of very little UI5 Tooling API, making them easily re-usable in different environments like plain Node.js scripts.

* **`resources`**: An array of `@ui5/fs/Resource` provided by the Build Task
* **`options`**: An object provided by the Build Task
* **`fs`**: An optional fs-interface provided by the Build Task
* *[To be discussed] **`workspace`**: An optional workspace __reader__ provided by the Build Task*
* *[To be discussed] **`dependencies`**: An optional dependencies reader provided by the Build Task*
* *[To be discussed] **`reader`**: An optional generic reader provided by the Build Task*
With this RFC, we extend this concept to custom tasks. A task can define one or more processors and execute them with a defined API. Their execution is managed by UI5 Tooling, which might execute them on the main thread or in a worker.

#### Input Parameters

* **`resources`**: An array of `@ui5/fs/Resource` provided by the task
* **`options`**: An object provided by the task
* **`fs`**: An optional fs-interface provided by the task
* **`resourceFactory`** Specification-version dependent object providing helper functions to create and manage resources.
- **`resourceFactory.createResource`** Creates a `@ui5/fs/Resource` (similar to [TaskUtil#resourceFactory.createResource](https://sap.github.io/ui5-tooling/stable/api/@ui5_project_build_helpers_TaskUtil.html#~resourceFactory))
- No other API for now and now general "ProcessorUtil" or similar, since processors should remain as UI5 Tooling independent as possible

**_Potential future additions:_**
* _**`workspace`**: An optional workspace __reader__ provided by the task_
* _**`dependencies`**: An optional dependencies reader provided by the task_
* _**`reader`**: An optional generic reader provided by the task_

#### Return Values

The allowed return values are rather generic. But since UI5 Tooling needs to serialize and de-serialize the values while transferring them back to the main thread, there are some limitations.

The thread runner shall validate the **return value must be either**:
1. A value that adheres to the requirements stated in [Serializing Data](#serializing-data)
2. A flat object (`[undefined, Object].includes(value.constructor)`, to detect `Object.create(null)` and `{}`) with property values adhering to the requirements stated in [Serializing Data](#serializing-data)
3. An array (`Array.isArray(value)`) with values adhering to the requirements stated in [Serializing Data](#serializing-data)

Note that nested objects or nested arrays must not be allowed until we become aware of any demand for that.

Processors should be able to return primitives and `@ui5/fs/Resource` instances directly:
```js
return createResource({
path: "resource/path"
string: "content"
});
````

It should also be possible to return simple objects with primitive values or `@ui5/fs/Resource` instances:

```js
return {
code: "string",
map: "string",
counter: 3,
someResource: createResource({
path: "resource/path"
string: "content"
}),
}
```

Alternatively, processors might also return a lists of primitives or `@ui5/fs/Resource` instances:

```js
return [
createResource({
path: "resource/path"
string: "content"
}),
createResource({
path: "resource/path"
string: "content"
}),
//...
]
```

#### Example

```js
/**
* Task Processor example
*
* @param {Object} parameters Parameters
* @param {@ui5/fs/Resource[]} parameters.resources Array of resources provided by the build task
* @param {@ui5/fs/Resource[]} parameters.resources Array of resources provided by the task
* @param {Object} parameters.options Options provided by the calling task
* @param {@ui5/fs/fsInterface} parameters.fs [fs interface]{@link module:@ui5/fs/fsInterface}-like class that internally handles communication with the main thread
* @returns {Promise<object|@ui5/fs/Resource[]>} Promise resolving with either a flat object containing Resource instances as values, or an array of Resources
* @param {@ui5/project/ProcessorResourceFactory} parameters.resourceFactory Helper object providing functions for creating and managing resources
* @returns {Promise<object|Array|@ui5/fs/Resource|@ui5/fs/Resource[]>} Promise resolving with either a flat object containing Resource instances as values, or an array of Resources
*/
module.exports = function({resources, options, fs}) {
module.exports = function({resources, options, fs, resourceFactory}) {
// [...]
};
````

### Task Configuration


```yaml
specVersion: "3.3"
kind: extension
Expand All @@ -101,9 +164,18 @@ task:
computePi: lib/tasks/piProcessor.js
```


### Task API

Tasks defining processors in their `ui5.yaml` configuration shall be provided with a new `processors` object, allowing them to trigger execution of the configured processors.

The `processors.execute` function shall accept the following parameters:
* `resources` _(optional)_: Array of `@ui5/fs/Resource` instances if required by the processor
* `options` _(optional)_: An object with configuration for the processor.
* `reader` _(optional)_: An instance of `@ui5/fs/AbstractReader` which will be used to read resources requested by the task processor. If supplied, the task processor will be provided with a `fs` parameter to read those resources


The `execute` function shall validate that `resources` only contains `@ui5/fs/Resource` instances and that `options` adheres to the requirements stated in [Serializing Data](#serializing-data).

```js
/**
* Custom task example
Expand All @@ -119,22 +191,29 @@ task:
*/
module.exports = function({workspace, options, processors}) {
const res = await processors.execute("computePi", {
resources: [workspace.byPath("/already-computed.txt")]
options: {
resources: [workspace.byPath("/already-computed.txt")] // Input resources
options: { // Processor configuration
digits: 1_000_000_000_000_000_000_000
},
fs: fsInterface(workspace) // To allow reading additional files if necessary
reader: workspace // To allow the processor to read additional files if necessary
});
await workspace.write(res);
// [...]
};
````

### Serializing Data

In order to ensure all data supplied to- and returned from- a processor can be serialized correctly, the following checks must be implemented:

In case of an object, all property values and in case of an array, all values must be either [**primitives**](https://developer.mozilla.org/en-US/docs/Glossary/Primitive) (except `symbol`?) or **`@ui5/fs/Resource`** instances (do not use `instanceof` checks since Resource instances might differ depending on the specification version).

Note: Instances of `@ui5/fs/Resource` might loose their original `stat` value since it is not fully serializable. Any serializable information will be preserved however.

## How we teach this
<!-- You can either remove the following explanatory text or move it into this comment for later reference -->

**TODO**
* Documentation for custom task developers on how to decide whether a task should use processors or not. For instance depending on their CPU demand

## Drawbacks
<!-- You can either remove the following explanatory text or move it into this comment for later reference -->
Expand Down
Binary file modified rfcs/resources/0014-task-workers.graffle
Binary file not shown.
Binary file modified rfcs/resources/0014-task-workers/Overview.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit 025954c

Please sign in to comment.