Skip to content
This repository has been archived by the owner on Aug 31, 2018. It is now read-only.

worker: initial implementation (large/base PR) #58

Closed
wants to merge 15 commits into from

Conversation

addaleax
Copy link
Contributor

@addaleax addaleax commented Sep 15, 2017

(status: currently ready for review, by anyone, including you: #40 (comment))

'use strict';
const { Worker } = require('worker');

if (process.isMainThread) {
  module.exports = async function parseJSAsync(script) {
    return new Promise((resolve, reject) => {
      const worker = new Worker(__filename, {
        workerData: script
      });
      worker.on('message', resolve);
      worker.on('error', reject);
      worker.on('exit', (code) => {
        if (code !== 0)
          reject(new Error(`Worker stopped with exit code ${code}`));
      });
    });
  };
} else {
  const { parse } = require('some-js-parsing-library');
  const script = process.workerData;
  process.postMessage(parse(script));
}

(If that gives the wrong impression: No, this does not conform the browser WebWorker API, but that should be rather easily implementable on top of this, and I’m okay with that.)

Preliminary benchmarks: https://gist.github.com/TimothyGu/7fd2fbe15537a84963c36a3e0a03bcce
Fixes: #31

Checklist
  • make -j4 test (UNIX), or vcbuild test (Windows) passes
  • tests and/or benchmarks are included
  • documentation is changed or added
  • commit message follows commit guidelines
TODO
  • maybe turn the message passing mechanisms into standard MessageChannels
  • implement ArrayBuffer transferring
  • implement SharedArrayBuffer, uh, Sharing
  • implement MessagePort transferring
  • remove the need to call .start() manually for MessagePorts
  • add tests for MessageChannels
  • improve uncaught exception serialization/deserialization
  • fix garbage collection tracking through performance (currently pending the V8 6.1 update in upstream Node.js)
  • add async hooks tests for Workers
  • figure out how to best run as many parallel/ tests as possible from Workers (did that by basically taking @petkaantonov’s approach directly from petkaantonov/io.js@ea143f7)
  • figure out whether and how this can be integrated with the inspector
  • figure out a native addon story
  • make deserialization context for MessagePorts configurable
  • look out for memory leaks created by node internals not cleaning up on environment destruction
  • integrate v8::Platform implementation with multi-isolate support

(@petkaantonov please feel free to indicate whether attributing you for code that comes from your original PR is not enough/too much/just right :) )

@addaleax addaleax mentioned this pull request Sep 15, 2017
19 tasks
@YafimK
Copy link

YafimK commented Sep 17, 2017

Hi,
I am trying to compile the branch on my local windows machine and the compilation fails -

c:...\node_perf.h(44): error C2589: '(': illegal token on right side of '::' (compiling
source file src\node_worker.cc) [c:...\node.vcxproj]
c:...\node_perf.h(44): error C2062: type 'unknown-type' unexpected (compiling source fil
e src\node_worker.cc) [c:...\node.vcxproj]
c:...\node_perf.h(44):` error C2059: syntax error: ')' (compiling source file src\node_wo
rker.cc) [c:...node.vcxproj]

I have been able to compile clean version of Node.js without any issues on same machine before
Is compiling this branch on Windows is something that should already work or is it a future goal?

on a side note, not as a requirement or anything but as a mere suggestion for help :)
if you'd like, I'd be happy to help to setup another CI basing on AppVeyor so we'd be able to have continuously windows build ready.

@addaleax
Copy link
Contributor Author

I am trying to compile the branch on my local windows machine and the compilation fails -

@YafimK Hm – I can’t quite figure out what the problem here is :/ Does it help if you just remove line 44 in src/node_perf.h? It doesn’t seem to be used anywhere…

Is compiling this branch on Windows is something that should already work or is it a future goal?

I think this isn’t really about this branch, but about Ayo in general – we don’t have much CI support yet, so talking about support for certain platforms is somewhat hard (but also see #25)

if you'd like, I'd be happy to help to setup another CI basing on AppVeyor so we'd be able to have continuously windows build ready.

If you can make that happen, that would be absolutely awesome 💙

(Also, just fyi, I rebased this against the current latest branch – I don’t think that’s going to make a huge difference, but maybe try rebuilding now…)

@addaleax
Copy link
Contributor Author

@benjamingr @Qard I’ve moved everything new off the process object, PTAL

Copy link
Member

@TimothyGu TimothyGu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will need to look at the SharedArrayBuffer part more, but I really like the direction this is taking. Hats off to @addaleax!

doc/api/vm.md Outdated
@@ -310,6 +310,27 @@ console.log(util.inspect(sandbox));
// { globalVar: 1024 }
```

## vm.moveMessagePortToContext(port, context)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/context/\0ifiedSandbox/


Sends a JavaScript value to the receiving side of this channel.
`value` will be transferred in a way
that is compatible with the [HTML structured clone algorithm][]. In particular,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Heh. Right. I think this is still what it’s known as, though? (That it’s implemented as a serialize + deserialize step should be more or less transparent to users)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@addaleax That article is pretty outdated though (it has no mention of MessagePort or SharedArrayBuffer)... HTML structured serialization/deserialization? maybe a link to v8.Serializer/Deserializer?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(it has no mention of MessagePort or SharedArrayBuffer)

That’s why these are mentioned explicitly here ;)

maybe a link to v8.Serializer/Deserializer?

Yeah, that’s a good idea … that should give people an idea of how to play around with the algorithm without spinning up message channels every time. I’ve added:

+For more information on the serialization and deserialization mechanisms
+behind this API, see the [serialization API of the `v8` module][v8.serdes].

-->

The `Worker` class represents an independent JavaScript execution thread.
Most Node APIs are available inside of it.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ayo or Ayo.js.

Is Worker available in a worker?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is Worker available in a worker?

Yes. That’s also tested and the code is generally laid out to support such a structure, in part because there’s not really anything standing in the way of it

* Returns: {undefined}

Starts receiving messages on this `MessagePort`. When using this port
as an event emitter, this will be called automatically once `message` listeners
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: 'message'

<!-- YAML
added: REPLACEME
-->

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What arguments does this event get? (value, transferList)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only value. We could try to add the transferList too, if that seems useful, but all of the relevant objects should be reachable through value anyway.

I don’t know if we have some standard format for saying “this events has one parameter that can be any JS value”, so I’m adding prose for it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think something like the following is used. Basically the same format as function parameters.

### Event: 'event'

* `value` {any} The value

This event is fired when ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, thanks – done!

MessagePort* port;
ASSIGN_OR_RETURN_UNWRAP(&port, args.This());
if (!port->data_) {
env->ThrowError("Can not send data on closed MessagePort");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Cannot

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed!


The `MessageChannel` has no methods of its own. `new MessageChannel()`
yields an object with `port1` and `port2` properties, which refer to linked
[`MessagePort`][] instances.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something I've been thinking is that the messaging API could well be applicable to VM contexts too. Is that a correct understanding?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that is a correct understanding, because the ports in a channel can be moved to different contexts

doc/api/vm.md Outdated
@@ -310,6 +310,27 @@ console.log(util.inspect(sandbox));
// { globalVar: 1024 }
```

## vm.moveMessagePortToContext(port, context)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe just vm.transferMessagePort()? The "to context" part is already conveyed by the fact that this method is in vm module, and "transfer" is a more Proper(R) name IMO than "move".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I like about move rather than transfer is that it conveys better that the original message port object is now rendered unusable … I don’t feel strongly about it, though


ExternalSABReference ReferenceCountedSAB::ForIncomingSharedArrayBuffer(
Environment* env, Local<Context> context, Local<SharedArrayBuffer> source) {
Local<Value> lifetime_partner;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aww... ;)

Local<FunctionTemplate> templ;
templ = env->message_port_constructor_template();
if (!templ.IsEmpty())
return templ->GetFunction(context);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is GetFunction() cached by context?

Copy link
Contributor Author

@addaleax addaleax Sep 18, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup – I have to admit I never got the point of {Function,Object}Templates before making this PR. However, in retrospect it’s a lot clearer: It’s a Template in the sense that it can be used to create functionally equivalent functions/objects in different contexts.

void AddSharedArrayBuffer(ExternalSABReference ref);
// Internal method of Message that is called once serialization finishes
// and that transfers ownership of `data` to this message.
void AddMessagePort(std::unique_ptr<MessagePortData>&& data);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make these private?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that would require giving the SerializerDelegate that accesses these a publicly visibly identifier, right?

@addaleax
Copy link
Contributor Author

@TimothyGu I think I got everything you commented on so far? :)

@addaleax addaleax force-pushed the workers-impl branch 2 times, most recently from 43b4725 to 5d793a4 Compare September 19, 2017 18:13
Copy link
Member

@TimothyGu TimothyGu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some more feedback. Still haven't read much code yet.

added: REPLACEME
-->

* Returns: {undefined}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

* `value` {any}
* `transferList` {Object[]}

<!-- YAML
added: REPLACEME
-->

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean, refer to cluster here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I intended to refer to the comment of adding parameter types. Seems like github messed up the order :/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah – in that case, done :)

}

void MessagePortData::Entangle(MessagePortData* a, MessagePortData* b) {
a->sibling_ = b;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd add some CHECKs or even mutexes here making sure it's thread-safe, and only works when either port is not entangled.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think CHECKs for making sure that the ports aren’t entangled are fine. The method isn’t really required or supposed to be thread-safe, I’ve noted that in the documentation. (It’s only called by the thread that created both message ports.)

use them for I/O, since Ayo’s built-in mechanisms for performing operations
asynchronously already treat it more efficiently than Worker threads can.

Workers can also, unlike child processes, share memory efficiently by
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can even explicitly call out clusters here.

-->

The `Worker` class represents an independent JavaScript execution thread.
Most Ayo APIs are available inside of it.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would mention message passing as the primary means of communication between the worker thread and the thread that spawned the worker. Maybe something like:

Like [Web Workers] and the [cluster module], two-way communication can be achieved through inter-thread message passing. Internally, a Worker has a built-in pair of [MessagePort]s that are already associated with each other when the Worker is created. While the MessagePort objects are not directly exposed, their functionalities are exposed through [worker.postMessage()] and the ['message' event] on the Worker object for the parent thread, and [require('worker').postMessage()] and the ['workerMessage' event] on require('worker') for the child thread.

To create custom messaging channels (which is strongly encouraged over using the default global channel for more complex tasks), users can create a MessageChannel object on either thread and pass one of the MessagePorts on that MessageChannel to the other thread through a pre-existing channel, such as the global one.

[insert example]

See [port.postMessage()] for more information on how messages are passed, and what kind of JavaScript values can be successfully transported through the thread barrier.


The `MessageChannel` has no methods of its own. `new MessageChannel()`
yields an object with `port1` and `port2` properties, which refer to linked
[`MessagePort`][] instances.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd add a description of what MessageChannel is before saying what it has. Maybe:

The MessageChannel class represents an asynchronous messaging channel.

doc/api/vm.md Outdated
context.

Note that the return instance is *not* an `EventEmitter`; for receiving
messages, the `.onmessage` property can be used.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems rather crude to have some MessagePorts being EventEmitters, and some not. I understand you are still working on this part, and that the EventEmitter is implemented in JS complicates things. But I'd rather have it all one way or the other.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But I'd rather have it all one way or the other.

Me too. :) But like with the Buffers, it’s just a problem that not all contexts have a concept of EventEmitters … this would require making some internal modules multi-context ready, which I would like to consider out of scope for this PR. :) Also, this is already quite an advanced feature on its own.

* Extends: {EventEmitter}

Instances of the `worker.MessagePort` class represent an asynchronous
communications channel. It can be used to transfer structured data,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/an asynchronous communications channel/an end(or port?) of a two-way asynchronous communications channel/


`transferList` may be a list of `ArrayBuffer` and `MessagePort` objects.
After transferring, they will not be usable on the sending side of the channel
anymore.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess one of the main thing that confused me was how transferList was used. So now I understand that objects present in value (no matter how deep) that are also in transferList are transferred, while ordinarily they are cloned. I wasn't aware of the "ordinarily" part.

However, what if transferList has an object that is not present in value? Will it get neutered as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m adding clarifications for both points, thanks

communications channel. It can be used to transfer structured data,
memory regions and other `MessagePort`s between different [`Worker`][]s
or [`vm` context][vm]s.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, please document if/when open() and close() need to be called. An example would be best.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It’s hard to come up with examples for these … I’ve added to the start() documentation that it doesn’t need to be used unless moving-between-VM-contexts is used. (And, as you pointed out below, ideally it wouldn’t be necessary at all).

Regarding close() … I’m not entirely sure whether MessagePorts should be unref()ed by default or not. Right now, any MessagePort keeps an event loop open unless explicitly unref()ed or close()ed …

@addaleax addaleax force-pushed the workers-impl branch 2 times, most recently from 8b087d0 to a2095eb Compare September 19, 2017 22:33
@addaleax
Copy link
Contributor Author

@TimothyGu I think I got more or less everything … the Buffer thing really annoys me as well, I’ll think about it a bit.

Copy link
Member

@TimothyGu TimothyGu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Making progress...

void Finish() {
for (MessagePort* port : ports_) {
port->Close();
msg_->AddMessagePort(std::move(port->Detach()));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the std::move redundant if MessagePort::Detach already returns a std::move'd smart pointer?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clang complains:

../src/node_messaging.cc:325:28: warning: moving a temporary object prevents copy elision [-Wpessimizing-move]
      msg_->AddMessagePort(std::move(port->Detach()));
                           ^
../src/node_messaging.cc:325:28: note: remove std::move call here
      msg_->AddMessagePort(std::move(port->Detach()));
                           ^~~~~~~~~~              ~

Will fix.

}
Context::Scope context_scope(context);
MessagePort* target =
MessagePort::New(env, context, nullptr, std::move(port->data_));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This std::move(port->data_) should be port->Detach().

@TimothyGu
Copy link
Member

@addaleax I've completed making MessagePort consistently EventEmitter using the V8 Extras approach I talked about on Discord. See bb85e11 and 62a0263. The same approach could be taken to make Buffer available everywhere as well.

While I agree that it isn't the most in scope for this PR, it shows something pretty promising IMO.

@addaleax
Copy link
Contributor Author

@TimothyGu that is, indeed, very nice. 👏 The commits generally LGTM, I’ll pull them in here

@benjamingr
Copy link
Contributor

With the changes pulled this mostly LGTM - hopefully I'll be able to get some interesting benchmarks going.

Copy link
Member

@Qard Qard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

b170b46 LGTM other than minor suggestion to have isMainThread getter rather than threadId === 0 in a bunch of places.

@addaleax
Copy link
Contributor Author

Fwiw I “fixed” the V8 extras + events + internal errors issue by manually editing the error messages and tacking on .code properties … not pretty but I think this should be fine for most practical purposes

@addaleax addaleax changed the title worker: initial implementation worker: initial implementation (largs/base PR) Oct 19, 2017
@addaleax addaleax changed the title worker: initial implementation (largs/base PR) worker: initial implementation (large/base PR) Oct 19, 2017
addaleax and others added 15 commits October 22, 2017 21:58
Taken from petkaantonov/io.js@ea143f7
and modified to fit current linter rules and coding style.
Native addons need to use flags to indicate that they are capable
of being loaded by worker threads.

Native addons are unloaded if all Environments referring to it
have been cleaned up, except if it also loaded by the main Environment.
This should help a lot with actual sandboxing of JS code.
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Only allow `.js` and `.mjs` extensions to provide future-proofing
for file type detection.
$ ./ayo benchmark/cluster/echo.js
cluster/echo.js n=100000 sendsPerBroadcast=1 payload="string" workers=1: 26,709.154114687004
cluster/echo.js n=100000 sendsPerBroadcast=10 payload="string" workers=1: 15,936.350422945854
cluster/echo.js n=100000 sendsPerBroadcast=1 payload="object" workers=1: 20,778.85550744996
cluster/echo.js n=100000 sendsPerBroadcast=10 payload="object" workers=1: 10,912.260027807712
$ ./ayo benchmark/worker/echo.js
worker/echo.js n=100000 sendsPerBroadcast=1 payload="string" workers=1: 69,787.63926344117
worker/echo.js n=100000 sendsPerBroadcast=10 payload="string" workers=1: 32,544.630210444844
worker/echo.js n=100000 sendsPerBroadcast=1 payload="object" workers=1: 48,706.90345844702
worker/echo.js n=100000 sendsPerBroadcast=10 payload="object" workers=1: 18,088.639282621873
@p3x-robot
Copy link

does it mean that we are not use thread in nodejs? or jsut this branch is deleted?

@TimothyGu
Copy link
Member

@p3x-robot A newer version of this PR was merged into Node.js' master branch.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

please develop native threads that can load native modules with require and share/lock objects
10 participants