-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AMBARI-22888] Cancel operation during package deployment causing repository manager to be broken (dgrinenko) #244
[AMBARI-22888] Cancel operation during package deployment causing repository manager to be broken (dgrinenko) #244
Conversation
…ository manager to be broken (dgrinenko)
Refer to this link for build results (access rights to CI server needed): |
pids_to_kill = sorted(all_chield_pids, reverse=True) | ||
for pid in pids_to_kill: | ||
try: | ||
if is_pid_life(pid): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is_pid_life
maybe is_pid_alive?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice approach to solve issues with broken YUM transactions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also like the idea here. Will this work with non-root agents or processes that are started with non-root access (like killing namenodes and other cluster processes)?
comm_path_pattern = "/proc/{0}/comm" | ||
cmdline_path_pattern = "/proc/{0}/cmdline" | ||
|
||
def read_childrens(pid): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def read_children(pid)
except Exception, e: | ||
logger.warn("Failed to kill PID %d" % (pid)) | ||
logger.warn("Reported error: " + repr(e)) | ||
def get_all_childrens(base_pid): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def get_all_children(base_pid)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ncole no, actually same as previous implementation (popen didn't use ambari_sudo or sudo command)
It is another kind of problem which would need to be solved via sudo.py...
def kill_process_with_children(parent_pid): | ||
exception_list = ["apt-get", "apt", "yum", "zypper", "zypp"] | ||
signals_to_post = [signal.SIGTERM, signal.SIGKILL] | ||
all_chield_pids = [item[0] for item in get_all_childrens(parent_pid) if item[1].lower() not in exception_list and item[0] != os.getpid()] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we ever end up with never-dying yum here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is rare situation that yum hangs. Most likey it can slowly download package by holding lock. Ambari able to check package manager lock and retry package installation over time. However if some serious problem happen, what cause yum to hang, manual user debugging is needed anyway.
It is lesser evil, than killing it in the middle of the work and then screw the whole system.
…ository manager to be broken (dgrinenko)
moved PR#249 for trunk to branch-2.6, to make code-base equal, now instead of os.kill using sudo.kill, which will use proper implementation depends on agent user level @ncole, @dlysnichenko please re-check code as implementation is slightly different, due to was made according to changes in trunk |
retest this please |
Refer to this link for build results (access rights to CI server needed): |
Ambari 2.x Implementation
What changes were proposed in this pull request?
Rework of the way how we killing process and it children. Rather of usage Popen and bash scriptlets, which r hardly manageable, switching to native python
os.kill
with using/proc
FS to extract system information (same as ps, pgrep util doing).Additionally, such implementation allow to add several required limits to tree killer:
How was this patch tested?
Tested using Unit Tests and on selected cluster node