Added HA capability to the VMs during VMware Host alert #7352

harikrishna-patnala · 2023-03-20T07:43:19Z

Description

This PR tries to fix #7320 where it adds improvement to handle VMs HA when VMware host goes into alert state.
Usually, VMware hosts go to alert state when ping times out. So it is good idea to start HA process on the VMs residing on the host.

Types of changes

Breaking change (fix or feature that would cause existing functionality to change)
New feature (non-breaking change which adds functionality)
Bug fix (non-breaking change which fixes an issue)
Enhancement (improves an existing feature and functionality)
Cleanup (Code refactoring and cleanup, that may add test cases)

Feature/Enhancement Scale or Bug Severity

Feature/Enhancement Scale

Major
Minor

Bug Severity

Screenshots (if appropriate):

How Has This Been Tested?

harikrishna-patnala · 2023-03-20T07:44:50Z

@blueorangutan package

weizhouapache

Does Alert state mean the vmware host is Down ?

It is very risky to start 1 vm on 2 hosts, which might cause data corruption.

harikrishna-patnala · 2023-03-20T09:23:54Z

I get your point @weizhouapache and that seems to be correct.
Alert could be because of network issues also (not just the host is completely down).

And I now really see why HA is not implemented for VMware, we need to check the actual status of VM. I think we can do it by implementing CheckOnHostCommand

cloudstack/plugins/hypervisors/vmware/src/main/java/com/cloud/hypervisor/vmware/resource/VmwareResource.java

Lines 5685 to 5687 in d04d60b

    
           protected Answer execute(CheckOnHostCommand cmd) { 
        
               return new CheckOnHostAnswer(cmd, null, "Not Implmeneted"); 
        
           }

This does check the VM state using the neighbour hosts in the cluster (thats how it is impletemented in KVM)

For now, I'm marking this PR as draft.

@weizhouapache please let me know if CheckOnHostCommand makes some sense !

weizhouapache · 2023-03-20T11:30:28Z

I get your point @weizhouapache and that seems to be correct. Alert could be because of network issues also (not just the host is completely down).

And I now really see why HA is not implemented for VMware, we need to check the actual status of VM. I think we can do it by implementing CheckOnHostCommand

cloudstack/plugins/hypervisors/vmware/src/main/java/com/cloud/hypervisor/vmware/resource/VmwareResource.java

Lines 5685 to 5687 in d04d60b

protected Answer execute(CheckOnHostCommand cmd) {

return new CheckOnHostAnswer(cmd, null, "Not Implmeneted");

}

This does check the VM state using the neighbour hosts in the cluster (thats how it is impletemented in KVM)

For now, I'm marking this PR as draft.

@weizhouapache please let me know if CheckOnHostCommand makes some sense !

@harikrishna-patnala
yes, but it relies on NFS storage and heartbeat. I do not know how to implement it in vmware.

shwstppr

Some options in my opinion @harikrishna-patnala @weizhouapache @rohityadavcloud

We can have global config for a timeout after which all VMs on an Alert state host are migrated away
We allow putting such host in maintenance so the operator can manually put it in maintenance so the VMs on the host get migrated away

rohityadavcloud · 2023-05-08T11:52:49Z

ping @harikrishna-patnala any update on this? Thanks.

rohityadavcloud · 2023-05-22T06:23:55Z

ping @harikrishna-patnala @shwstppr any update on this? Thanks.

harikrishna-patnala · 2023-06-06T07:07:28Z

I like the idea of @shwstppr to add an global setting to attempt migrate the VMs when host goes to alert state. I'll add that and update the PR.

DaanHoogland · 2023-06-09T13:36:18Z

So if I understand @shwstppr correctly we will implement a setting to time putting hosts in Alert state in Maintenance. Migrating of VMs should then automatically happen and we should not have to care about that. Is that correct @shwstppr ?

I wonder if this takes @weizhouapache 's worry away. The host in alert state may still be up, and as it can not be reached, the VMs on it cannot be stopped or migrated. The disks may consequently still be accessed and modified by any process running on the VM (started by cron or as a deamon (or windows equivilents))

Can we leverage vSphere HA for this (https://www.techtarget.com/searchvmware/definition/VMware-HA)?

DaanHoogland · 2023-06-13T13:18:19Z

I think we should not merge this. I created a doc PR at apache/cloudstack-documentation#324 . If we can improve the functional description of this we may be able to implement some added HA functionality but relying on a timeout is error prone. Even if the operator is sure the VMs are no longer running, CloudStack cannot.

harikrishna-patnala · 2023-06-22T07:50:34Z

Agree with you @DaanHoogland, closing this PR for that reason.

Added HA capability to the VMs during VMware Host alert

3010dea

boring-cyborg bot added component:agent component:orchestration labels Mar 20, 2023

harikrishna-patnala added the status:needs-testing label Mar 20, 2023

harikrishna-patnala requested a review from weizhouapache March 20, 2023 07:45

weizhouapache reviewed Mar 20, 2023

View reviewed changes

harikrishna-patnala marked this pull request as draft March 20, 2023 09:24

rohityadavcloud requested a review from shwstppr April 13, 2023 09:41

shwstppr reviewed Apr 13, 2023

View reviewed changes

harikrishna-patnala closed this Jun 22, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Added HA capability to the VMs during VMware Host alert #7352

Added HA capability to the VMs during VMware Host alert #7352

Uh oh!

harikrishna-patnala commented Mar 20, 2023 •

edited

Loading

Uh oh!

harikrishna-patnala commented Mar 20, 2023

Uh oh!

weizhouapache left a comment

Uh oh!

harikrishna-patnala commented Mar 20, 2023

Uh oh!

weizhouapache commented Mar 20, 2023

Uh oh!

shwstppr left a comment

Uh oh!

rohityadavcloud commented May 8, 2023

Uh oh!

rohityadavcloud commented May 22, 2023

Uh oh!

harikrishna-patnala commented Jun 6, 2023

Uh oh!

DaanHoogland commented Jun 9, 2023

Uh oh!

DaanHoogland commented Jun 13, 2023

Uh oh!

harikrishna-patnala commented Jun 22, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Added HA capability to the VMs during VMware Host alert #7352

Added HA capability to the VMs during VMware Host alert #7352

Uh oh!

Conversation

harikrishna-patnala commented Mar 20, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Types of changes

Feature/Enhancement Scale or Bug Severity

Feature/Enhancement Scale

Bug Severity

Screenshots (if appropriate):

How Has This Been Tested?

Uh oh!

harikrishna-patnala commented Mar 20, 2023

Uh oh!

weizhouapache left a comment

Choose a reason for hiding this comment

Uh oh!

harikrishna-patnala commented Mar 20, 2023

Uh oh!

weizhouapache commented Mar 20, 2023

Uh oh!

shwstppr left a comment

Choose a reason for hiding this comment

Uh oh!

rohityadavcloud commented May 8, 2023

Uh oh!

rohityadavcloud commented May 22, 2023

Uh oh!

harikrishna-patnala commented Jun 6, 2023

Uh oh!

DaanHoogland commented Jun 9, 2023

Uh oh!

DaanHoogland commented Jun 13, 2023

Uh oh!

harikrishna-patnala commented Jun 22, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

harikrishna-patnala commented Mar 20, 2023 •

edited

Loading