New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes for orchestra.daemon to work with systemd #934
Conversation
@ktdreyer @athanatos once this works we can replace half of the install task with ceph-deploy and other half with ceph-ansible that will do more customer like testing for thrash testcases Currently holding on http://tracker.ceph.com/issues/17050 to be fixed as well, also have to check on what the wait function does once i can reach that stage.. |
Neat! |
@vasukulkarni i just closed http://tracker.ceph.com/issues/17050 as "rejected". |
@tchaikov thanks, that makes sense, will make changes and rerun. |
f18ab11
to
0981a7f
Compare
@zmc this is ready for merge and tested. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What controls whether or not we use systemd vs. the "old way" of managing processes?
I'm a little confused about the overall use of the term 'init' - this implementation won't easily support any other init system - not that I think it needs to.
I think that instead of all the copy/pasting re: systemctl commands, it would be much nicer to e.g. use a template string for all of them and perhaps call a function with the action, role and id to get the command string.
teuthology/orchestra/daemon.py
Outdated
Start this daemon instance. | ||
""" | ||
if not self.running(): | ||
self.log.error('Restarting a running daemon') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should probably be a warning
teuthology/orchestra/daemon.py
Outdated
if self.use_init: | ||
self.log.info("using init to restart") | ||
if not self.running(): | ||
self.log.error('starting a non-running daemon') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should also be a warning I think
teuthology/orchestra/daemon.py
Outdated
|
||
class DaemonState(object): | ||
""" | ||
Daemon State. A daemon exists for each instance of each role. | ||
""" | ||
def __init__(self, remote, role, id_, *command_args, **command_kwargs): | ||
def __init__(self, remote, role, id_, use_init=False, *command_args, **command_kwargs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is use_init
? Should it be use_sytemd
?
teuthology/orchestra/daemon.py
Outdated
self.log.info('Restarting daemon') | ||
self.log.info('Restarting daemon with args') | ||
if self.use_init: | ||
self.log.info("restart with args not supported in init") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps this should be a warning - and it should mention systemd, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or raise NotImplementedError
?
teuthology/orchestra/daemon.py
Outdated
@@ -108,6 +193,16 @@ def signal(self, sig, silent=False): | |||
|
|||
:param sig: signal to send | |||
""" | |||
if self.use_init: | |||
self.log.info("using init to send signal") | |||
self.log.info("WARNING init may restart after kill signal") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- More vague usage of the term "init"
log.info()
and 'WARNING' ?- What exactly does "init may restart" mean?
teuthology/orchestra/daemon.py
Outdated
@@ -117,18 +212,35 @@ def running(self): | |||
Are we running? | |||
:return: True if remote run command value is set, False otherwise. | |||
""" | |||
if self.use_init: | |||
self.log.info("using init to send signal") | |||
self.log.info("WARNING init may restart after kill signal") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See above comment
teuthology/orchestra/daemon.py
Outdated
return self.proc is not None | ||
|
||
def reset(self): | ||
""" | ||
clear remote run command value. | ||
""" | ||
if self.use_init: | ||
self.log.info("reset not supported with init") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this really need to log anything?
teuthology/orchestra/daemon.py
Outdated
self.proc = None | ||
|
||
def wait_for_exit(self): | ||
""" | ||
clear remote run command value after waiting for exit. | ||
""" | ||
if self.use_init: | ||
self.log.info("wait_for_exit not supported with init") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is logging necessary here?
teuthology/orchestra/daemon.py
Outdated
self.id = remote.shortname | ||
else: | ||
self.id = id_ | ||
self.proc_regex = '"' + self.proc_name + '.*--id ' + self.id + '"' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's split out the pid-getting functionality into a separate method
teuthology/orchestra/daemon.py
Outdated
'grep', '-v', | ||
'grep', run.Raw('|'), | ||
'awk', | ||
run.Raw("{'print $2'}")] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of the above, could we use systemctl status
?
@vasukulkarni, any ETA on this one? |
internal task pr should be ready for monday, soon after that I will pick this one. |
@vasukulkarni and I agreed that I should help out with this PR. I'll rebase it and make changes as needed. In the meantime I will make this a DNM as it's not been tested. |
440a5aa
to
48a0322
Compare
7029e62
to
93b05b4
Compare
when cluster is setup using ceph-ansible or ceph-deploy use systemd commands to kill/revive deamons. Signed-off-by: Vasu Kulkarni <vasu@redhat.com>
Signed-off-by: Zack Cerza <zack@redhat.com>
Signed-off-by: Zack Cerza <zack@redhat.com>
Signed-off-by: Zack Cerza <zack@redhat.com>
Signed-off-by: Zack Cerza <zack@redhat.com>
Signed-off-by: Zack Cerza <zack@redhat.com>
Signed-off-by: Zack Cerza <zack@redhat.com>
... we already had self.id_ Signed-off-by: Zack Cerza <zack@redhat.com>
Signed-off-by: Zack Cerza <zack@redhat.com>
Signed-off-by: Zack Cerza <zack@redhat.com>
Signed-off-by: Zack Cerza <zack@redhat.com>
We had this right for systemctl commands, but not for the journalctl command. Signed-off-by: Zack Cerza <zack@redhat.com>
c79f0b6
to
36bcbaf
Compare
@vasukulkarni I've cleaned up the branch. Note:
|
@zmc awesome thanks! for 1) we can try to use config at higher level eg: overrides: disable-daemon-helper: true or something so that its easier to run suites with overrides |
These are the first 10 tests that I ran with these changes and I have checked logs for few and dont see any issues 💯 , if/when we enable the option to configure it from yaml we could starting enabling it per suite basis. http://pulpito.ceph.com/vasu-2017-09-29_22:02:40-smoke-luminous-distro-basic-ovh/
|
I will also schedule a full smoke suite today |
I think we are probably missing some setting, need to check with @jdurgin on what needs to be done for this to pass
|
I will fix the current cluster configuration to properly test with systemd without overloads and rerun, not really great run with current cluster config here http://pulpito.ceph.com/vasu-2017-10-03_18:00:35-smoke-luminous-distro-basic-ovh/ |
To use systemd, any tasks which create a DaemonGroup object must pass use_systemd=True. This could be accomplished by looking for a flag in the job config. Note that this option is mainly for regression testing; it may be removed in the future, where the default is to use systemd when possible. Signed-off-by: Zack Cerza <zack@redhat.com>
d64addb
to
51464ba
Compare
@zmc this is ready to be merged! Tested here: http://pulpito.ceph.com/vasu-2017-10-14_17:53:49-smoke-luminous-distro-basic-ovh/ I also used the qa branch at : https://github.com/zmc/ceph/tree/wip-master-init , so we should be able to merge both to master. |
@vasukulkarni is there a run that mirrors one of those but uses master branches? |
@zmc scheduled here now, just used stable branch before http://pulpito.ceph.com/vasu-2017-10-17_22:00:28-smoke-master-testing-basic-ovh/ |
@vasukulkarni that's not using master for teuthology :-/ |
@zmc its not clear what you are asking in comment, I dont understand why 'master' branch of teuthology needs to be tested? |
As we talked about last week, we need to compare otherwise-identical test
runs using master and the systemd branches in order to clearly spot
regressions.
If the tests were all green, you could certainly argue that it wasn't
necessary, but they aren't :)
…On Tue, Oct 17, 2017 at 4:27 PM, vasukulkarni ***@***.***> wrote:
@zmc <https://github.com/zmc> its not clear what you are asking in
comment, I dont understand why 'master' branch of teuthology needs to be
tested?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#934 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AADjw9s-hKXjpA4H_-pkmv1grlPBnz5Iks5stSnVgaJpZM4Jl49->
.
|
@zmc ok if you need for comparision here from the nightly runs http://pulpito.ceph.com/teuthology-2017-10-17_05:02:02-smoke-master-testing-basic-ovh/ Additional 2 rgw is due to own qa branch that does not have equivalent branch in s3tests git repo. |
@vasukulkarni ah, that's definitely helpful, thanks! |
Can t wait for this to be merged! , Please merge when you are done reviewing, I can merge the qa part. |
@vasukulkarni because of the way we've done this PR, it won't let you formally review. So if you could give a thumbs up or something, I can review and then we can merge. |
👍 👍 LGTM and is ready for merge along with qa branch at https://github.com/zmc/ceph/tree/wip-master-init |
@vasukulkarni - just filed the ceph PR |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zmc thanks, will merge both once the qa is green. |
Use systemd when cluster is setup with ceph-deploy or ceph-ansible, still in testing..
Signed-off-by: Vasu Kulkarni vasu@redhat.com