Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] monitor container termination status info when getting pod info #22266

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

jrafanie
Copy link
Member

@jrafanie jrafanie commented Dec 2, 2022

WIP...

This is what it looks like in pp MiqServer.my_server.worker_manager.current_pods

...
 "1-generic-7f6466d498-n2xh9"=>
  {:label_name=>"1-generic",
   :container_restarts=>1,
   :last_state_terminated=>true,
   :terminations=>
    [{:container_id=>
       "cri-o://c2f9c6a0d11c8befc0dfa1431a9ce8f73b22d39526b839f4d15b865d2ce46f10",
      :exit_code=>137,
      :message=>nil,
      :reason=>"OOMKilled",
      :signal=>nil,
      :started_at=>"2022-12-02T13:51:23Z",
      :finished_at=>"2022-12-02T15:22:07Z"}]},
 "1-google-cloud-event-catcher-4-596d898775-fqvf8"=>
  {:label_name=>"1-google-cloud-event-catcher-4",
   :container_restarts=>0,
   :last_state_terminated=>false,
   :terminations=>[]},
...

ch[:container_restarts] = pod.status.containerStatuses.sum { |cs| cs.restartCount.to_i }

ch[:last_state_terminated] = false
ch[:terminations] = []
pod.status.containerStatuses.each do |cs|
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The alternative way of doing this figuring out what termination events look like and monitor just those events.

See lines 178-195 above.

ch[:container_restarts] = pod.status.containerStatuses.sum { |cs| cs.restartCount.to_i }

ch[:last_state_terminated] = false
ch[:terminations] = []
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I chose to clear the array each time through. The server code that checks the current_pods concurrent hash can possibly miss terminations or deal with things it already processed... so we might want to just monitor for termination events as mentioned 👇 . I'm not sure how much work that is.

@miq-bot miq-bot removed the wip label Dec 2, 2022
@Fryguy
Copy link
Member

Fryguy commented Dec 2, 2022

really looking forward to this...thanks @jrafanie

@jrafanie
Copy link
Member Author

jrafanie commented Dec 2, 2022

I also found this but it assumes you can install things in the cluster so at best, we could possibly suggest it: https://medium.com/@andrew.kaczynski/kubernetes-events-how-to-keep-historical-data-of-your-cluster-835d685cc45

@Fryguy Fryguy changed the title WIP, monitor container termination status info when getting pod info [WIP] monitor container termination status info when getting pod info Dec 7, 2022
@miq-bot miq-bot added the wip label Dec 7, 2022
@miq-bot miq-bot added the stale label Mar 13, 2023
@miq-bot
Copy link
Member

miq-bot commented Mar 13, 2023

This pull request has been automatically marked as stale because it has not been updated for at least 3 months.

If these changes are still valid, please remove the stale label, make any changes requested by reviewers (if any), and ensure that this issue is being looked at by the assigned/reviewer(s)

Thank you for all your contributions! More information about the ManageIQ triage process can be found in the triage process documentation.

@miq-bot
Copy link
Member

miq-bot commented Jun 19, 2023

This pull request has been automatically closed because it has not been updated for at least 3 months.

Feel free to reopen this pull request if these changes are still valid.

Thank you for all your contributions! More information about the ManageIQ triage process can be found in the triage process documentation.

@miq-bot miq-bot closed this Jun 19, 2023
@agrare agrare reopened this Sep 11, 2023
@agrare agrare removed the stale label Sep 11, 2023
@miq-bot
Copy link
Member

miq-bot commented Sep 11, 2023

Checked commit jrafanie@0aaf05b with ruby 2.6.10, rubocop 1.28.2, haml-lint 0.35.0, and yamllint
1 file checked, 1 offense detected

app/models/miq_server/worker_management/kubernetes.rb

@miq-bot miq-bot added the stale label Dec 18, 2023
@miq-bot
Copy link
Member

miq-bot commented Dec 18, 2023

This pull request has been automatically marked as stale because it has not been updated for at least 3 months.

If these changes are still valid, please remove the stale label, make any changes requested by reviewers (if any), and ensure that this issue is being looked at by the assigned/reviewer(s).

1 similar comment
@miq-bot
Copy link
Member

miq-bot commented Apr 1, 2024

This pull request has been automatically marked as stale because it has not been updated for at least 3 months.

If these changes are still valid, please remove the stale label, make any changes requested by reviewers (if any), and ensure that this issue is being looked at by the assigned/reviewer(s).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants