Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Worker is not defined in Node v6 #27

Closed
Zodiase opened this issue May 1, 2017 · 23 comments
Closed

Worker is not defined in Node v6 #27

Zodiase opened this issue May 1, 2017 · 23 comments
Labels

Comments

@Zodiase
Copy link

Zodiase commented May 1, 2017

I'm trying to run the example code as such:

const hamsters = require('hamsters.js');

hamsters.init({
  maxThreads: 10,
  cache: false,
  debug: false,
  persistence: true,
});

const foo = () => {
  var params = {
    'array':[0,1,2,3,4,5,6,7,8,9]
  };
  hamsters.run(params, function() {
      var arr = params.array;
      arr.forEach(function(item) {
        rtn.data.push((item * 120)/10);
      });
  }, function(output) {
     return output;
  });
};
foo();

but got error:

node_modules/hamsters.js/src/es5/hamsters.js:124
          hamsters.wheel.hamsters.push(new Worker(hamsters.wheel.uri));
                                           ^

ReferenceError: Worker is not defined
    at spawnHamsters (node_modules/hamsters.js/src/es5/hamsters.js:124:44)
@Zodiase
Copy link
Author

Zodiase commented May 1, 2017

FYI, using hamsters.js@4.0.0 is fine, while 4.1.0 throws the error.

@austinksmith
Copy link
Owner

Please take a look at #23 , Node.js does not ship with a native web worker implementation which is why you are seeing the Worker is not defined error. The library should be working in legacy fall back mode assuming you haven't modified the package.json to point to the es6 version in which case you won't see the performance benefits you're looking for since everything will still be executed on the main thread until you add a 3rd party worker implementation. I'll go through the library to ensure it's working properly in legacy mode for the next release, let me know if that clarifies things for you.

@Zodiase
Copy link
Author

Zodiase commented May 1, 2017

I for some reason was under the impression that for node this library would not be using Worker but something else like child_process for example. I guess I drank too much coffee at night...

Since I don't see Web Worker coming to node any time soon, it should be very beneficial to state this piece of dependency more clearly in the documentation. Pointing to a preferred Worker implementation in the documentation, at least for node users, would be even better.

The current readme, this part:

Environment Support

  • All major browsers IE9+
  • Inside of existing web workers (threads inside threads)
  • Javascript shell environments
  • React Native
  • Node.js

reads more like saying the use of "web workers" does not apply to node.

@austinksmith
Copy link
Owner

The library does not make use of child_processes since those are not relevant to many places the library is executed, I didn't feel as the project maintainer it would be necessary to explain Node.js and what it does & does not support out of the box as anyone using Node.js should already have an understanding of what it comes with. That being said the library definitely does function within Node.js, and future documentation will try to cover the shortcomings that Node.js has when it comes to native threading however any third party worker implementation that adheres to the worker spec will function with Hamsters.js

As I mentioned in #23 the third party library recommendation I have at the moment will be this one https://www.npmjs.com/package/webworker-threads, just make sure you define Worker before you define hamsters, additionally for the sake of sanity do NOT declare hamsters as a constant as I don't want immutability to interfere with the library functionality, you can freely declare hamsters using either let or var but you should probably avoid const.

@austinksmith
Copy link
Owner

austinksmith commented May 1, 2017

I've gone ahead and pushed out v4.1.1, please update to v4.1.1 and your issues should be resolved.

https://github.com/austinksmith/Hamsters.js/releases/tag/v4.1.1

@Zodiase
Copy link
Author

Zodiase commented May 1, 2017

Same as csterritt said here #23 (comment), I'm not able to get hamsters.js@4.1.0 working with webworker-threads in node.

There's an issue blocking installing webworker-threads in node v7.7.0 so I'm using node v6.10.2.

with these modules:

  "dependencies": {
    "hamsters.js": "^4.1.0",
    "webworker-threads": "^0.7.11"
  }

Case 1

Without using webworker-threads, it obviously complains about Worker not defined.
image

Case 2

With require("webworker-threads");, still says Worker not defined.
image

Case 3

With global.Worker = require("webworker-threads").Worker;, the main process runs non-stop and won't end as long as I've waited. Definitely longer than 1 minute and at which point I killed it. The job should finish in less than 1ms sequentially...
image

Do you know what might be going on? Since you are claiming it supports node, the library must had worked for you. What combination of packages and platforms did you successfully achieved parallelism?

@austinksmith
Copy link
Owner

Please create a new ticket for the new issue you're encountering, v4.1.1 solves the problem related to this ticket.

@austinksmith
Copy link
Owner

Reopening this, I mixed it up with the other one you had open.

@austinksmith austinksmith reopened this May 3, 2017
@austinksmith
Copy link
Owner

I'm implementing a fix that will be released in 4.1.2 which will allow this to work with any 3rd party implementation, the solution i had previously isn't working currently. If you use 4.1.1 for now you will be able to write your logic and have it in place for when 4.1.2 drops.

@austinksmith
Copy link
Owner

@Zodiase take a look at this release please and let me know if you encounter more issues.

https://github.com/austinksmith/Hamsters.js/releases/tag/v4.1.2

@austinksmith
Copy link
Owner

@Zodiase this issue will close within 48 hours without a response.

@Zodiase
Copy link
Author

Zodiase commented Jun 1, 2017

Process still won't terminate or yield any result. Tested with node v6.10.2 and this project setup: https://gist.github.com/Zodiase/3c8b9d9862ec3ac9c486ec75b99f341a

@austinksmith
Copy link
Owner

@Zodiase I've gone ahead and used your example and resolved the issue you are mentioning, it looks like a change I made back in v3.9.* caused the recent incompatibilities. This is now resolved and I've successfully tested your example using this release. Please give v4.1.3 a try and report back, thanks.

https://github.com/austinksmith/Hamsters.js/releases/tag/v4.1.3

@Hizoul
Copy link

Hizoul commented Jun 3, 2017

(sorry for barging in although im not Zodiase I did encounter the same problem with previous builds, but since this Issue was already there, did not report a duplicate)
I can confirm that v4.1.3 fixed the Issue of not being able to use node webworker implementations.
I confirmed this using Zodiases Example and with custom code.
webworker-threads works, but I ended up using tiny-workers, which works too, because it supports require() statements.
tested on linux with node v7.10

@austinksmith
Copy link
Owner

That's great to hear @Hizoul as I had only tested it with web-workerthreads so far, I think the implementation should be fairly consistent across the board. In the future don't worry about commenting on an open issue, the more people able to shed light on an issue the better. If you don't mind me asking what is your current use case for the library and how is the performance you're seeing in Node? Node is the least tested and performance profiled platform for the library as support has been a bit of a hassle to get right.

@Hizoul
Copy link

Hizoul commented Jun 5, 2017

Well my use case is rather an edge case for now, just playing around with isomorphic multi-core usage in js for some existing point calculations. In the beginning I tried hamster.running everything, but realized very quickly that if you do not use the TypedArray's, then copy-times are a HUGE bottle-neck. Meaning single-threaded node is much faster than calling hamsters.run 1000x with huge objects that are not taking advantage of TypedArray. The code I now have could be achieved with nodes cluster as well, but the goal was portable code that could theoretically run on web/rn as well (which I actually don't need yet) :P
So in the end just classic, "omg this library looks awesome, it will solve all performance problems there might be, beautiful site, fast benchmark, readable docs, etc." going to "the lib is awesome but the data structures and the way I will be processing might not be the best fit for this kind of parallelization, and the library does not auto solve all my problems" :P
But even with non-intended usage, I did drop from 20 to 18 seconds, and I'm confident that even bigger data will result in more performance improvements even with increased copying times

Here's an abstract of my Usage:

const users = [] // ~2k users
const userActions = [] // ~20k actionobjects
hamsters.run({array: users, userActions}, () => {
  for (let user of params.array) {
    rtn.data.push(makeScoreObjectForUser(user, userActions))
  }
}, (res) => {
  // synchronous sort + insert do db
}, hamsters.maxThreads, true)

@austinksmith
Copy link
Owner

Yes serialization can be a pretty big bottleneck, you might be able to get around this by manually stringifying your data and parsing it within a thread, in some browsers that provides a pretty decent performance boost. Older versions I was manually creating array buffers to pass normal arrays using transferrables but I don't think it's going to get you past the userActions array since that contains objects I'm not sure if its as easy to simply pass a reference over. I might push a release today that restores that array buffer functionality if you are willing to report back on whether or not it was a good improvement.

const users = new Float32Array([]) // ~2k users
const userActions = [] // ~20k actionobjects
hamsters.run({array: users, userActions}, () => {
  for (let user of params.array) {
    rtn.data.push(makeScoreObjectForUser(user, userActions))
  }
}, (res) => {
  // synchronous sort + insert do db
}, hamsters.maxThreads, true, null, null, 'ascAlpha');

You might be able to get away with having the library do your sorting for you by making use of https://github.com/austinksmith/Hamsters.js/wiki/Sorting and if it doesn't impact your input data making any of your input arrays a typed array will bring some significant performance improvements even if you are not defining dataType to change your output array.

The library has some growing pains and I'm working on improving the way it's used to make less easily parallelizable tasks more performant, on a side note you might try switching your for(let user of params.array) into a native for(var i = 0.... loop as a native for loop is an order of magnitude faster than the for you are using right now. You can also probably lower the amount of threads you are scaling across either in your function or in your initialization to a reasonable thread count like 2 or 4 and see performance improvements since you will be spending less time communicating between threads and more time executing logic within a thread which helps maximize the improvement you can see.

@austinksmith
Copy link
Owner

austinksmith commented Jun 5, 2017

Just to provide some more clarity, in your params object your input array is going to be split into smaller chunks and each thread is going to receive a small chunk of that input array to work with, the problem is that your userActions array is not going to be broken into smaller pieces and is instead going to be duplicated to every thread so your serialization costs are going to go up exponentially with the threads you add, keeping it to 2 threads or 4 threads will keep that costs down to a reasonable level.

So basically userActions -> 1 thread will be 1 serialization to the thread and 1 serialization back from the thread, increase that by how many threads you have and at 8 threads you are hitting that serialization costs 16 times in total.

@austinksmith
Copy link
Owner

This limitation is only temporary as once JavaScript adds support for shared atomics that will allow us to completely avoid the costs of duplicating that data to every thread, it's an unfortunate side effect of just how poorly (in my opinion) they implemented the worker features in JavaScript which likely stemmed from a small minded belief system that is very against the idea of threading in general...why people like Douglas Crockford hate multithreading and made it as difficult as possible is beyond me honestly.

https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Atomics

@Hizoul
Copy link

Hizoul commented Jun 5, 2017

Wow thank you very much for the very detailed explanation and performance Improvements suggestions. Very Happy to see such detailed help and I do agree with everything you say.
Serialization is a good pointer also saw the background stringify and parse methods in the docs.

Unfortunately the makeScoreObjectForUser is very deeply nested structure {a: {b: {c: 3}}}, with varying keys per user. Then all those numbers (per deepest key) are aggregated into an array at that deep key again synchronously in the callback function and these small arrays are also sorted (and then tagged with position value thats why i sort) and these are then converted to db objects and inserted. Since the last step in the callback only takes about 1 sec there's no need for improvement there but the initial filtering + calculation that I'm now splitting across 4 rather than 1 "Thread" are the thing that's slow.
I am aware of the way data splitting works, and that passing this huge userActions Array is dumb for "threading" performance.
I'm iterating fully through that array for each user, filtering relevant actions and while filtering (for speed purposes) am already calculating the number into the nested structure. (not doing that in the database itself because my queries were too complicated and the db became unresponsive for normal usage during production which was not acceptable. that's why i'm trying to do everything purely in node. There's also more than just userActions because i also search for other db collections in the makeScoreObjectForUser. Like I said my usage of the Library is horrible and not at all the way it was intended, and cloning mongodb into ram is also something that does not scale at all and will explode into my face. Just splitting up at the very beginning across my CPU's and happy I'm now reasonably fast :P)

I'm also very probably thinking way too complicated with the data structuring and all. But it works now and this is the fourth reimplementation going from:

  1. ~3h
  2. ~30min
  3. ~3min
  4. ~20sec (after a few runs thanks to the jit even goes down to ~9-13 secs

runtime for full calculation (2. was already directly db and in 3. i made all mongodb findcalls in parallel with complicated queries thats why it locked up), so for now there's no need for further improvements :/

Regarding the growing pains, I found the library quite easy to understand and easy to apply to existing code. I have worked with webworkers directly before so that probably helped. I guess some people might be confused about:

  1. scoping of the executed function
  2. why one has to aggregate the data into a single array (because webworkers don't share heap and use messaging), because one usually could directly aggregate into a custom structure via locking.

And I do agree Javascript "Threading" with webworkers is unneccessarily complicated. It has always been a hassle, and node's cluster also sucks because you're executing the same code twice and need to check for Cluster.isMaster x) (although i think i recently saw a webworker similar api for clustering in node's own apis recently).
I'm coming from a Java-Background where Threading makes sense since the Heap stays the same. But I do also think that stuff like Goroutines are better and easier to use.

And regarding atomics thanks for the pointer, I was not aware that that was coming but it sound like a nice feature, i will keep an eye out for it.

Yes the webworker API is very crippled. I think it might also be because it was invented in a time where you tried to limit the browser's capability while still enhancing available API's. Not like now where they try to enable the Web to operate as a unified Operating-System-API (e.g. Battery-Status 🤢)

@Zodiase
Copy link
Author

Zodiase commented Jun 5, 2017

@austinksmith v4.1.3 works yay :D.

Although I noticed one minor issue: In my example provided previously, hamsters.init stated maxThreads: 3, but the thread count argument is omitted in hamsters.run and from examining the logs it appears only one thread is used. I'm not sure the purpose of configuring maxThreads in hamsters.init. In my assumption, if the thread count argument in hamsters.run is omitted, shouldn't the maxThreads be used? There's not much about maxThread in the documentation. (Doesn't really prevent me from using the library but I thought I'd report it in case it was caused by some unintended changes during these patches.)

@austinksmith
Copy link
Owner

@Zodiase The maximum threads you are defining is a global setting limiting the maximum number of threads the library is to make use of within the thread pool. If you define maxThreads in your hamsters.init to 4 and then later call a function using hamsters.run that invokes 16 threads the library is going to split your work into 16 smaller pieces just like it would if it had access to 16 threads to work with but it's going to pool the remaining work and keep the 4 threads active until all the work is complete.

If you do not define a thread count in your hamsters.run function the library defaults to a thread count of 1 for your function, it will not automatically scale a function for you to the maximum number of threads. That is an option exposed to allow you to change thread use on an individual function level.

This also allows the library to scale the work across any number of clients regardless of what their logical core count is, since maxThreads defaults to the logical core count detected on a machine if you do not modify it within hamsters.init there is no logical reason to tie specific functions to that value unless you as the developer want it to be that way. You can define a function to run across 200 threads if you want while limiting maxThreads to 1 and the library will pool all that work and keep that single thread completely occupied until all 200 tasks have completed.

@austinksmith
Copy link
Owner

I'm going to close this issue ticket, thanks @Zodiase and @Hizoul

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants