Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Synchronous Engine is not correct. #64

Closed
thinxer opened this issue May 17, 2013 · 15 comments
Closed

Synchronous Engine is not correct. #64

thinxer opened this issue May 17, 2013 · 15 comments
Labels

Comments

@thinxer
Copy link
Contributor

thinxer commented May 17, 2013

Results by the synchronous engine should be the same whether run single-threaded or multithreaded, since it reads node data only from the last iteration. However this is not the case for the page rank test. This indicates that the synchronous engine is not correct. Please investigate this problem.

@ghost ghost assigned wweic May 17, 2013
@wweic
Copy link
Contributor

wweic commented May 17, 2013

@thinxer while I can't reproduce the fail again in Mac. It's weird. I run about 50 times manually, all passed. Have you updated with the upstream?

@thinxer
Copy link
Contributor Author

thinxer commented May 17, 2013

have you merged the pagerank test with your multithread code to test?
On May 17, 2013 6:32 PM, "Wei Chen" notifications@github.com wrote:

@thinxer https://github.com/thinxer while I can't reproduce the fail
again in Mac. It's weird. I run about 50 times manually, all passed. Have
you updated with the upstream?


Reply to this email directly or view it on GitHubhttps://github.com//issues/64#issuecomment-18054335
.

@wweic
Copy link
Contributor

wweic commented May 17, 2013

@thinxer Yes. Weird.

@wweic
Copy link
Contributor

wweic commented May 17, 2013

@thinxer you try to test on multithraed branch too?

@thinxer
Copy link
Contributor Author

thinxer commented May 20, 2013

I just merged your thread_pool branch with upstream/master, and the pagerank_test wouldn't pass.

I reproduced this error today.

@wweic
Copy link
Contributor

wweic commented May 20, 2013

Weird.

Could you try my branch directly? just checkout to my thread-pool branch and test.

Wei Chen
ipondering.me

On 2013年5月20日Monday at 上午11:05, Jianfei Wang wrote:

I just merged your thread_pool branch with upstream/master, and the pagerank_test wouldn't pass.
I reproduced this error today.


Reply to this email directly or view it on GitHub (#64 (comment)).

@thinxer
Copy link
Contributor Author

thinxer commented May 20, 2013

Well, you don't have pagerank_test on your branch...

@wweic
Copy link
Contributor

wweic commented May 20, 2013

Oops.

My master branch is up-to-date, non-threading and has pr_test.

I run : repeat 100 ./pagerank_test. No error.

Wei Chen
ipondering.me

On 2013年5月20日Monday at 上午11:13, Jianfei Wang wrote:

Well, you don't have pagerank_test on your branch...


Reply to this email directly or view it on GitHub (#64 (comment)).

@thinxer
Copy link
Contributor Author

thinxer commented May 20, 2013

It's correct because it's single-threaded.

Only with multi-threading can the problem be found, since the execution
order of vertex programs is not stable.

On Mon, May 20, 2013 at 11:52 AM, Wei Chen notifications@github.com wrote:

Oops.

My master branch is up-to-date, non-threading and has pr_test.

I run : repeat 100 ./pagerank_test. No error.

Wei Chen
ipondering.me

On 2013年5月20日Monday at 上午11:13, Jianfei Wang wrote:

Well, you don't have pagerank_test on your branch...


Reply to this email directly or view it on GitHub (
#64 (comment)).


Reply to this email directly or view it on GitHubhttps://github.com//issues/64#issuecomment-18131199
.

@wweic
Copy link
Contributor

wweic commented May 20, 2013

Yes, I'll fix that.

I suppose it's because the implementation of thread_pool's join. Since all applys are executed after gathers, before scatter, even change vertex data is isolated from other vertex.

Wei Chen
ipondering.me

On 2013年5月20日Monday at 下午12:04, Jianfei Wang wrote:

It's correct because it's single-threaded.

Only with multi-threading can the problem be found, since the execution
order of vertex programs is not stable.

On Mon, May 20, 2013 at 11:52 AM, Wei Chen <notifications@github.com (mailto:notifications@github.com)> wrote:

Oops.

My master branch is up-to-date, non-threading and has pr_test.

I run : repeat 100 ./pagerank_test. No error.

Wei Chen
ipondering.me (http://ipondering.me)

On 2013年5月20日Monday at 上午11:13, Jianfei Wang wrote:

Well, you don't have pagerank_test on your branch...


Reply to this email directly or view it on GitHub (
#64 (comment)).


Reply to this email directly or view it on GitHubhttps://github.com//issues/64#issuecomment-18131199
.


Reply to this email directly or view it on GitHub (#64 (comment)).

@wweic
Copy link
Contributor

wweic commented May 20, 2013

this branch is trying to fix it: https://github.com/pondering/saedb/tree/fix-syn-engine .

now found exeGather may have problem. But I suppose it's the problem of OS's memory mapped file.

@thinxer
Copy link
Contributor Author

thinxer commented May 20, 2013

It's working on Linux.

On Mon, May 20, 2013 at 5:03 PM, Wei Chen notifications@github.com wrote:

this branch is trying to fix it:
https://github.com/pondering/saedb/tree/fix-syn-engine .

now found exeGather may have problem. But I suppose it's the problem of
OS's memory mapped file.


Reply to this email directly or view it on GitHubhttps://github.com//issues/64#issuecomment-18138136
.

@wweic
Copy link
Contributor

wweic commented May 20, 2013

what do you mean by "working"?

I comment out all parallel code except executeInits so I can inspect suspicious part one by one. While I can't find the bug anyway in the executeInits function. when I run ./pagerank_test 5000 times, there is still 1 failure.

do you have any idea about this issue?

Wei Chen
ipondering.me

On 2013年5月20日Monday at 下午5:55, Jianfei Wang wrote:

It's working on Linux.

On Mon, May 20, 2013 at 5:03 PM, Wei Chen <notifications@github.com (mailto:notifications@github.com)> wrote:

this branch is trying to fix it:
https://github.com/pondering/saedb/tree/fix-syn-engine .

now found exeGather may have problem. But I suppose it's the problem of
OS's memory mapped file.


Reply to this email directly or view it on GitHubhttps://github.com//issues/64#issuecomment-18138136
.


Reply to this email directly or view it on GitHub (#64 (comment)).

@thinxer
Copy link
Contributor Author

thinxer commented May 21, 2013

You mean that even the single-threaded engine has a hard-to-reproduce bug? (1 in 5000)

@wweic
Copy link
Contributor

wweic commented May 21, 2013

I'm not sure.

Single-threaded program should not have the problem, and I don't find a failure at least now.

But when I just make executeInits threading, there will be failure.

Wei Chen
ipondering.me

On 2013年5月21日Tuesday at 上午11:59, Jianfei Wang wrote:

You mean that even the single-threaded engine has a hard-to-reproduce bug?


Reply to this email directly or view it on GitHub (#64 (comment)).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants