Skip to content

[xenmgr] Add idle shutdown specific call#197

Merged
jandryuk merged 1 commit intoOpenXT:masterfrom
crogers1:idle-shutdown
Oct 25, 2022
Merged

[xenmgr] Add idle shutdown specific call#197
jandryuk merged 1 commit intoOpenXT:masterfrom
crogers1:idle-shutdown

Conversation

@crogers1
Copy link
Copy Markdown
Contributor

Implement a separate function for handling the idle shutdown
case. When the system has been idle for the specified idle period
and no user input has been detected, shutdown the platform. Start
a 5 minute timer that will force the host off if has not completed
the shutdown process.

If the idle timer feature is enabled, then the expected behavior is
that the host will be shutdown. This commit ensures that the host will
reach the off state in the event of a stuck guest VM or otherwise non-
responding process when handling an idle shutdown within a reasonable
amount of time.

Signed-off-by: Chris Rogers rogersc@ainfosec.com

@crogers1 crogers1 requested a review from jandryuk October 13, 2022 20:19
@crogers1
Copy link
Copy Markdown
Contributor Author

Related to OpenXT/xctools#74 and OpenXT/idl#45

Comment thread xenmgr/XenMgr/HostOps.hs
hostShutdown = (hostWhenIdleDoWithState HostShuttingDown $ executePmAction ActionShutdown) >> return ()

hostShutdownIdle :: XM ()
hostShutdownIdle = executePmAction ActionIdleShutdown >> return ()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Above and below, the actions are prefixed with hostWhenIdleDoWithState. Is that intentional? I can see it being intentional since when ShutdownIdle is called, we don't want to wait on something else.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that intentional?

Should be "Is not using hostWhenIdleDoWithState intentional?"

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The HostIdle xenmgr-ism is only set when going into host S3 (execute_ ActionSleep on line 345). hostWhenIdleDoWithState just does the action if the Host State is host-idle, updates the state, then immediately sets the state back to host-idle.

My guess the history here was some automatic Sleep request after being idle, but the current Idle system is now managed in the input part of Glass and xcpmd. We don't key off HostState for anything in this stack, so I'm not using the hostWhenIdleDoWithState method.

Comment thread xenmgr/XenMgr/PowerManagement.hs
Copy link
Copy Markdown
Contributor

@jandryuk jandryuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I built locally and an hostShutdownIdle worked - but it did not have to trigger the poweroff -ff

@jandryuk
Copy link
Copy Markdown
Contributor

I pushed the physical power button on a machine. A Fedora VM decided to go to sleep, so the UI is spinning on Shutdown Down the Fedora VM.

Oct 18 19:33:49.569854 xcpmd: Power button pressed event
Oct 18 19:33:49.571048 xenmgr: PM: detected power button press event
Oct 18 19:33:49.572637 xenmgr: PM: received pm action request: ActionShutdown
Oct 18 19:33:49.573439 xenmgr: PM: received host shutdown request

I added a shutdown idle rule to xcpmd to see poweroff -ff work. When it triggers, xenmgr shows:

Oct 18 19:42:55.125073 xcpmd: idle timeout
Oct 18 19:42:55.125778 xenmgr: PM: received pm action request: ActionIdleShutdown
Oct 18 19:42:55.126110 xenmgr: PM: but pm action ActionShutdown is currently running, so cannot do

And it keeps running. That was 20 minutes ago and the timeout has triggered multiple times. So something seems wrong with the poweroff -ff support?

@crogers1
Copy link
Copy Markdown
Contributor Author

hmmm. I'll look into this, might not get to it today though.

@crogers1
Copy link
Copy Markdown
Contributor Author

And it keeps running. That was 20 minutes ago and the timeout has triggered multiple times. So something seems wrong with the poweroff -ff support?

I rebased my branch and built on top of master-custom-73 (that post-xen4.16 merge build). I set the idle timeout to 1 minute and enabled idle shutdown, restarted xcpmd. Booted an HVM guest to network and killed its stubdom. After a minute I got the expected call and logs to xenmgr, and then 4 more times since the idle timeout was 1 minute long. Finally after 5 minutes you can see the log from the poweroff -ff block, and my box shutdown.

Did you ever see the

PM: 5 minute cap reached for idle shutdown, performing immediate host shutdown

message in your case?

Oct 21 16:09:03.990075 xcpmd: DAR idle timeout expired; shutting down...
Oct 21 16:09:03.990625 xenmgr: PM: received pm action request: ActionIdleShutdown
Oct 21 16:09:03.990812 xenmgr: PM: received host idle shutdown request
Oct 21 16:09:04.008721 xenmgr: shutting down VM 03b94858-c642-4166-bdbe-9b834c1a3f81
Oct 21 16:09:04.037146 xl: [4109] Waiting for 1 domains
Oct 21 16:09:04.037422 xenmgr: vm state change 03b94858-c642-4166-bdbe-9b834c1a3f81: shutdowning
Oct 21 16:10:03.989970 xcpmd: DAR idle timeout expired; shutting down...
Oct 21 16:10:03.990278 xenmgr: PM: received pm action request: ActionIdleShutdown
Oct 21 16:10:03.990405 xenmgr: PM: but pm action ActionIdleShutdown is currently running, so cannot do
Oct 21 16:11:03.990016 xcpmd: DAR idle timeout expired; shutting down...
Oct 21 16:11:03.990331 xenmgr: PM: received pm action request: ActionIdleShutdown
Oct 21 16:11:03.990453 xenmgr: PM: but pm action ActionIdleShutdown is currently running, so cannot do
Oct 21 16:12:03.989928 xcpmd: DAR idle timeout expired; shutting down...
Oct 21 16:12:03.990237 xenmgr: PM: received pm action request: ActionIdleShutdown
Oct 21 16:12:03.990368 xenmgr: PM: but pm action ActionIdleShutdown is currently running, so cannot do
Oct 21 16:13:03.989992 xcpmd: DAR idle timeout expired; shutting down...
Oct 21 16:13:03.990300 xenmgr: PM: received pm action request: ActionIdleShutdown
Oct 21 16:13:03.990443 xenmgr: PM: but pm action ActionIdleShutdown is currently running, so cannot do
Oct 21 16:14:03.989969 xcpmd: DAR idle timeout expired; shutting down...
Oct 21 16:14:03.990210 xenmgr: PM: received pm action request: ActionIdleShutdown
Oct 21 16:14:03.990345 xenmgr: PM: but pm action ActionIdleShutdown is currently running, so cannot do
Oct 21 16:14:03.999797 xenmgr: PM: 5 minute cap reached for idle shutdown, performing immediate host shutdown

@jandryuk
Copy link
Copy Markdown
Contributor

Did you ever see the

PM: 5 minute cap reached for idle shutdown, performing immediate host shutdown

message in your case?

No

The problem is roughly that a regular shutdown is in progress and hung.

Oct 18 19:33:49.572637 xenmgr: PM: received pm action request: ActionShutdown
Oct 18 19:33:49.573439 xenmgr: PM: received host shutdown request

When the idle shutdown happens:

Oct 18 19:42:55.125778 xenmgr: PM: received pm action request: ActionIdleShutdown
Oct 18 19:42:55.126110 xenmgr: PM: but pm action ActionShutdown is currently running, so cannot do

executePmAction prints its error and never calls execute_ ActionIdleShutdown supervised/shutdownIdle, so the poweroff -ff thread is not spawned.

@crogers1
Copy link
Copy Markdown
Contributor Author

Ohhhhh that makes way more sense. Ok I'll come up with a fix for that.

@crogers1
Copy link
Copy Markdown
Contributor Author

Extra logic now basically allows the idle shutdown operation to always proceed no matter what PM action is in progress, other than itself (since we don't want it spawning more poweroff -ff threads.

Oct 24 14:42:46.869591 xenmgr: PM: received host shutdown request
Oct 24 14:42:46.885983 xenmgr: shutting down VM 03b94858-c642-4166-bdbe-9b834c1a3f81
Oct 24 14:42:46.913818 xl: [2747] Waiting for 1 domains
Oct 24 14:42:46.914158 xenmgr: vm state change 03b94858-c642-4166-bdbe-9b834c1a3f81: shutdowning
Oct 24 14:42:52.877727 VM uivm (3): Memory pressure relief: Total: res = 89575424/80842752/-8732672, res+swap = 88293376/88293376/0
Oct 24 14:44:08.992231 xcpmd: DAR idle timeout expired; shutting down...
Oct 24 14:44:08.992530 xenmgr: PM: received pm action request: ActionIdleShutdown
Oct 24 14:44:08.992736 xenmgr: PM: received host idle shutdown request
Oct 24 14:44:09.008773 xenmgr: shutting down VM 03b94858-c642-4166-bdbe-9b834c1a3f81
Oct 24 14:44:09.036075 xl: [3238] Waiting for 1 domains
Oct 24 14:44:09.036283 xenmgr: vm state change 03b94858-c642-4166-bdbe-9b834c1a3f81: shutdowning
Oct 24 14:45:08.992139 xcpmd: DAR idle timeout expired; shutting down...
Oct 24 14:45:08.992455 xenmgr: PM: received pm action request: ActionIdleShutdown
Oct 24 14:45:08.992582 xenmgr: PM: but pm action idle-shutdown is already running, so not doing
Oct 24 14:46:08.992092 xcpmd: DAR idle timeout expired; shutting down...
Oct 24 14:46:08.992408 xenmgr: PM: received pm action request: ActionIdleShutdown
Oct 24 14:46:08.992515 xenmgr: PM: but pm action idle-shutdown is already running, so not doing
Oct 24 14:47:08.991312 xcpmd: DAR idle timeout expired; shutting down...
Oct 24 14:47:08.991640 xenmgr: PM: received pm action request: ActionIdleShutdown
Oct 24 14:47:08.991755 xenmgr: PM: but pm action idle-shutdown is already running, so not doing
Oct 24 14:48:08.992090 xcpmd: DAR idle timeout expired; shutting down...
Oct 24 14:48:08.992436 xenmgr: PM: received pm action request: ActionIdleShutdown
Oct 24 14:48:08.992549 xenmgr: PM: but pm action idle-shutdown is already running, so not doing
Oct 24 14:49:08.992084 xcpmd: DAR idle timeout expired; shutting down...
Oct 24 14:49:08.992353 xenmgr: PM: received pm action request: ActionIdleShutdown
Oct 24 14:49:08.992456 xenmgr: PM: but pm action idle-shutdown is already running, so not doing
Oct 24 14:49:09.001882 xenmgr: PM: 5 minute cap reached for idle shutdown, performing immediate host shutdown

Copy link
Copy Markdown
Contributor

@jandryuk jandryuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good - thanks for fixing. The current log messages are awkward - I made some suggestions. What do you think about those?

Comment thread xenmgr/XenMgr/PowerManagement.hs Outdated
Comment thread xenmgr/XenMgr/PowerManagement.hs Outdated
@jandryuk
Copy link
Copy Markdown
Contributor

Thanks. I plan to merge the idle shutdown stuff tomorrow.

Comment thread xenmgr/XenMgr/PowerManagement.hs Outdated
  Implement a separate function for handling the idle shutdown
  case. When the system has been idle for the specified idle period
  and no user input has been detected, shutdown the platform. Start
  a 5 minute timer that will force the host off if has not completed
  the shutdown process.

  If the idle timer feature is enabled, then the expected behavior is
  that the host will be shutdown. This commit ensures that the host will
  reach the off state in the event of a stuck guest VM or otherwise non-
  responding process when handling an idle shutdown within a reasonable
  amount of time.

Signed-off-by: Chris Rogers <rogersc@ainfosec.com>
@jandryuk jandryuk merged commit fd7887d into OpenXT:master Oct 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants