Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Monitor wallclock and rusage #42

Merged
merged 32 commits into from
Nov 10, 2015
Merged

Conversation

ineiti
Copy link
Member

@ineiti ineiti commented Oct 28, 2015

Removed logservers and replaced with monitor-system
Instead of logservers, nodes report directly to the 'users' on users.deterlab.net
All times are measured for wallclock and rusage simultaneously
TODO: adapt matplotlib

@ineiti
Copy link
Member Author

ineiti commented Oct 30, 2015

deploy -platform localhost should be the default

@ineiti
Copy link
Member Author

ineiti commented Oct 30, 2015

simulation/naive_multi.toml gives
Peers=12,ppm=4,machines=3 in test_data/naive_multi.csv, but should be
Peers=12,ppm=12,machines=1 for localhost-simulation

@ineiti
Copy link
Member Author

ineiti commented Oct 30, 2015

simulation/ntree_multi.toml has too much debugging-output in level 1 and shows connection-errors - perhaps they should only be shown when the connection is definitively not possible

@ineiti
Copy link
Member Author

ineiti commented Oct 30, 2015

simulation/shamir_multi.toml gives Got unknown suite

@ineiti
Copy link
Member Author

ineiti commented Oct 30, 2015

simulation/sign_multi.toml gives
failed to listen: listen tcp4 0.0.0.0:3000: bind: address already in use when starting 256 applications

-> perhaps needs to wait for the 128 to be finished definitively before starting 256?

@ineiti
Copy link
Member Author

ineiti commented Oct 30, 2015

simulation/stamp_multi.toml reports exceed the max round: terminating 10 >= 10

@ineiti
Copy link
Member Author

ineiti commented Oct 30, 2015

remove key.priv and key.pub and put them in .gitignore

@nikkolasg
Copy link
Contributor

deploy -platform localhost should be the default

Done

simulation/naive_multi.toml gives
Peers=12,ppm=4,machines=3 in test_data/naive_multi.csv, but should be
Peers=12,ppm=12,machines=1 for localhost-simulation

With our actual code infrastructure it is not a trivial task (by trivial one change in one function) since it is not the platforms that are starting the monitoring, so the monitoring receive the RunConfig before the platform and write its header with the initial values. In order to do that, the best way would be to change the platform so it launch itself the monitoring. But anyway, do we really need that ? We are not plotting machines or ppm neither do we plan on doing some extensive benchmarking on localhost platform. If someone use localhost, it knows it has only one machine.

@ineiti
Copy link
Member Author

ineiti commented Oct 30, 2015

You could add an argument ‘force_machines’ to monitor::NewStats which is set to 1 in the case of localhost.

On 30 Oct 2015, at 15:49, nikkolasg notifications@github.com wrote:

deploy -platform localhost should be the default

Done

simulation/naive_multi.toml gives
Peers=12,ppm=4,machines=3 in test_data/naive_multi.csv, but should be
Peers=12,ppm=12,machines=1 for localhost-simulation

With our actual code infrastructure it is not a trivial task (by trivial one change in one function) since it is not the platforms that are starting the monitoring, so the monitoring receive the RunConfig before the platform and write its header with the initial values. But anyway, do we really need that ? We are not plotting machines or ppm neither do we plan on doing some extensive benchmarking on localhost platform. If someone use localhost, it knows it has only one machine.


Reply to this email directly or view it on GitHub #42 (comment).

@nikkolasg
Copy link
Contributor

Added a hack for that but I still disagree this is needed as the peer number represents what we need in localhost, and for the above reasons as well...
For the 256 it was because it was running 256 * 8 processus starting from the port 2000 with an increment of 10 or 5, so the default monitor port of 3000 was already taken ... ;)

@nikkolasg
Copy link
Contributor

simulation/stamp_multi.toml reports exceed the max round: terminating 10 >= 10

Indeed, but apparently this issue exists as well in the development branch. I will later open another branch for that problem then.

@ineiti
Copy link
Member Author

ineiti commented Nov 4, 2015

Still remaining:

simulation/ntree_multi.toml has too much debugging-output in level 1 and shows connection-errors - perhaps they should only be shown when the connection is definitively not possible.

simulation/shamir_multi.toml gives Got unknown suite

Afterwards it looks OK to merge - that would be great!

@nikkolasg
Copy link
Contributor

Done.
For shamir, I have another error is "could not verify schnorr signature :/ Signature could not have been verified against the message".
Same in development. I will try to fix that in a new branch ( it should be trivial, I think it is because of the suite used). It is because you still have the old shamir in your crypto package and not the corrected one.

@nikkolasg
Copy link
Contributor

If it is ok for you this branch, go ahead for merging. I'll look into shamir + stamp problem

@ineiti
Copy link
Member Author

ineiti commented Nov 6, 2015

On 06 Nov 2015, at 20:33, nikkolasg notifications@github.com wrote:

Done.
For shamir, I have another error is "could not verify schnorr signature :/ Signature could not have been verified against the message".
Same in development. I will try to fix that in a new branch ( it should be trivial, I think it is because of the suite used). It is because you still have the old shamir in your crypto package and not the corrected one.

I don’t understand how the ‘old shamir in your crypto package’ can make it fail - can you be more explicit, please?

@nikkolasg
Copy link
Contributor

We just have different errors because we have different crypto versions. I dont know what your error is due to since I have the corrected version. Pull it and then we'll be on the same page !

@ineiti
Copy link
Member Author

ineiti commented Nov 6, 2015

Ok, now I understand. There's a fine line between being too concise and too long. short things can be difficult to understand, long things... TLDR...

A "regarding the bug in app/shamir, please update dedis/crypto to the latest Shamir branch and test again" could be well in the vicinity of that fine line..

Linus

Envoyé depuis un mobile Samsung

-------- Original message --------
Subject: Re: [cothority] Monitor wallclock and rusage (#42)
From: nikkolasg notifications@github.com
To: DeDiS/cothority cothority@noreply.github.com
CC: Linus Gasser linus.gasser@epfl.ch

We just have different errors because we have different crypto versions. I dont know what your error is due to since I have the corrected version. Pull it and then we'll be on the same page !


Reply to this email directly or view it on GitHub.

@ineiti ineiti mentioned this pull request Nov 7, 2015
@nikkolasg nikkolasg assigned ineiti and unassigned nikkolasg Nov 9, 2015
@ineiti ineiti assigned nikkolasg and unassigned ineiti Nov 10, 2015
ineiti added a commit that referenced this pull request Nov 10, 2015
@ineiti ineiti merged commit b41200b into development Nov 10, 2015
@ineiti ineiti deleted the replace_wallclock_with_rusage branch November 10, 2015 11:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants