-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Help me understand how to use this. #28
Comments
Testing in my console shows that using const foo = () => {
var params = {
'array':[0,1,2,3,4,5,6,7,8,9]
};
hamsters.run(params, function() {
var arr = params.array;
arr.forEach(function(item) {
rtn.data.push((item * 120)/10);
});
}, function(output) {
console.log(output);
}, 10);
};
foo(); Is exactly what it should be which is an array containing 10 subarrays as your final output, since you have not asked the library to aggregate your results back together. If you want to have a single output you need to change your logic to const foo = () => {
var params = {
'array':[0,1,2,3,4,5,6,7,8,9]
};
hamsters.run(params, function() {
var arr = params.array;
arr.forEach(function(item) {
rtn.data.push((item * 120)/10);
});
}, function(output) {
console.log(output);
}, 10, true);
};
foo(); As far as
This isn't the case, your array assuming you are defining multiple threads WILL be split across as many threads as you have specified, the same operation you defined before will be executed across all items in the array regardless of what thread they make use of. Inspecting http://hamsters.io/performance while pasting your function into the console shows me that each thread is in fact getting a subarray equal to
That's exactly what is illustrated above, you've just forgotten to tell the library you want a single output. |
Sorry maybe I didn't make it clear in my first statement. By "some logging" I meant more const foo = () => {
var params = {
'array':[0,1,2,3,4,5,6,7,8,9]
};
hamsters.run(params, function() {
console.log('params', params);
var arr = params.array;
arr.forEach(function(item) {
console.log('run');
rtn.data.push((item * 120)/10);
});
}, function(output) {
console.log(output);
}, 10);
};
foo(); I saw |
As of the log from the output function, for me that was: [ [ 0, 12, 24, 36, 48, 60, 72, 84, 96, 108 ],
[ 0, 12, 24, 36, 48, 60, 72, 84, 96, 108 ],
[ 0, 12, 24, 36, 48, 60, 72, 84, 96, 108 ],
[ 0, 12, 24, 36, 48, 60, 72, 84, 96, 108 ],
[ 0, 12, 24, 36, 48, 60, 72, 84, 96, 108 ],
[ 0, 12, 24, 36, 48, 60, 72, 84, 96, 108 ],
[ 0, 12, 24, 36, 48, 60, 72, 84, 96, 108 ],
[ 0, 12, 24, 36, 48, 60, 72, 84, 96, 108 ],
[ 0, 12, 24, 36, 48, 60, 72, 84, 96, 108 ],
[ 0, 12, 24, 36, 48, 60, 72, 84, 96, 108 ] ] while I was expecting something like: [ [ 0 ],
[ 12 ],
[ 24 ],
[ 36 ],
[ 48 ],
[ 60 ],
[ 72 ],
[ 84 ],
[ 96 ],
[ 108 ] ] |
You need to provide your entire example source for me to understand why you are seeing different behavior from both the provided benchmark example logic and jasmine tests. const foo = () => {
var params = {
'array':[0,1,2,3,4,5,6,7,8,9]
};
hamsters.run(params, function() {
var arr = params.array;
arr.forEach(function(item) {
rtn.data.push((item * 120)/10);
});
}, function(output) {
console.log(output);
}, 10, true);
}; Using As mentioned before, you are not telling your output to be aggregated into a single output, pay special attention to #5 on the how it works section of the read me.
Additionally the documentation does not say to declare hamsters as a constant, the hamsters object should never be treated as immutable because the library modifies it self during runtime. var hamsters = require('hamsters.js'); |
I'm aware of that and I didn't expect the result to be aggregated (as I plan to do the aggregation myself). What I was saying was the result contains redundant data that apparently is produced by redundant work. I'll use the aggregation flag, hopefully it's easier for you to see what I mean. I'm printing the amount of data each thread received, as well as the amount of work cycles they did. With 2 threads, the total amount of work cycle should always be the input data size, which is 10, but we are seeing 20 cycles (runs). Also, with the aggregation flag set, the output contains redundant data. |
So I've managed to reproduce this issue only under the following conditions
My debugging so far leads me to believe this is an inheritance issue and a race condition based on the time slicing behavior of In the mean time my recommendations are to use a 3rd party worker implementation with Node.js as this is only going to affect the legacy mode of the library and only within Node or browsers that do not support web workers eg. IE9. Thanks for spotting this, it's very hard to debug multithreaded logic so understand my requests for more info are because I wasn't seeing the problem until I recreated every condition of your setup. |
I'm absolutely glad to help. I was looking for easy-to-use node parallelism solutions and came across this library and I think it is very promising. By the way, could you comment on my side note there? I'd assume there's some limitation that I'm not aware of forcing you to design the API this way but wouldn't it easier for people to understand to have the thread function signature like this: function threadFunc (params, report) {
// Do work here with data provided in `params`.
var data = params.array.map((v) => v + 1);
// Send results back with the `report` function.
report(data);
} Or simply: function threadFunc (params) {
// Do work here with data provided in `params`.
var data = params.array.map((v) => v + 1);
// `return`ed data is automatically collected by the main process.
return data;
} I know in the second case it's probably harder to allow async work, in which case |
I've gone ahead and pushed out v4.1.1, please update to v4.1.1 and your issues should be resolved. https://github.com/austinksmith/Hamsters.js/releases/tag/v4.1.1 It's unfortunately not possible to do it that way unless they added that functionality as a first class part of the language. |
Using
hamsters.js@4.0.0
, I was able to run the example code. However I don't think it's making any sense to me. Also since I'm not seeing the result I'm expecting, I don't think this library is working properly.Suppose running this code:
I was expecting the workload to be distributed among these 10 threads. But after some console logging, I see they are getting the exact same array and they are doing the exact same work, just now 10 times. How does that help? Maybe the example code just doesn't explain this but I was expecting either:
Could you help me understand how this library should be used? If you need an example, say we have a big array of size
n
(n
> 10k) of positive integers and we are trying to find(x) => x * (x - 1) * (x - 2) * ... * 2 * 1
for every item in the array. Since the data items have no dependency on each other, ideally we could spawnn
threads where each one would grab one number and start calculating. Could you illustrate how this could be done with this library?Thanks a lot.
Side note: Since
params
andrtn
are special "internal" variables, why not pass them in this way so it's easier to understand?The text was updated successfully, but these errors were encountered: