Skip to content

Conversation

@joseflauzino
Copy link
Contributor

Description

This PR fixes the behavior of the stats collector with respect to VMs that are not running. The need for this is that if a running VM has a state change for some reason, ACS keeps showing the latest metrics stats collected.

Types of changes

  • Breaking change (fix or feature that would cause existing functionality to change)
  • New feature (non-breaking change which adds functionality)
  • Bug fix (non-breaking change which fixes an issue)
  • Enhancement (improves an existing feature and functionality)
  • Cleanup (Code refactoring and cleanup, that may add test cases)

Feature/Enhancement Scale or Bug Severity

Feature/Enhancement Scale

  • Major
  • Minor

Bug Severity

  • BLOCKER
  • Critical
  • Major
  • Minor
  • Trivial

How Has This Been Tested?

Using ACS UI, I performed the following steps:

  • I deployed a VM and waited for the complete boot;
  • I enabled the presentation of metric statistics in the UI;
  • I waited for ACS to collect and show metrics stats;
  • And then, I tested two different cases:
    • Case 1 - I stopped VM using ACS UI;
    • Case 2 - I simulated a VM crash by manually stopping it (directly from the hypervisor);
  • In both cases, the metrics stats for the VM were successfully cleared. As expected, when I tested Case 1, cleanup was done immediately. In Case 2, I had to wait for ACS to identify that the VM was stopped and the stats collector try to collect new metric stats.

@joseflauzino
Copy link
Contributor Author

joseflauzino commented Oct 28, 2021

@sureshanaparti and @DaanHoogland
Done.
Thanks for the suggestions!
Because removeAll() returns a boolean, I had to break into two lines.

@joseflauzino joseflauzino force-pushed the fix-metrics-stats-for-vms-not-running branch from fc07f07 to a78af60 Compare October 28, 2021 12:36
@DaanHoogland
Copy link
Contributor

@blueorangutan package

@blueorangutan
Copy link

@DaanHoogland a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress.

@blueorangutan
Copy link

Packaging result: ✔️ el7 ✔️ el8 ✔️ debian ✔️ suse15. SL-JID 1646

@sureshanaparti
Copy link
Contributor

@blueorangutan test

@blueorangutan
Copy link

@sureshanaparti a Trillian-Jenkins test job (centos7 mgmt + kvm-centos7) has been kicked to run smoke tests

@blueorangutan
Copy link

Trillian test result (tid-2502)
Environment: kvm-centos7 (x2), Advanced Networking with Mgmt server 7
Total time taken: 31865 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr5633-t2502-kvm-centos7.zip
Smoke tests completed. 91 look OK, 0 have errors
Only failed tests results shown below:

Test Result Time (s) Test File

*/
protected void cleanUpVirtualMachineStats() {
List<Long> allRunningVmIds = new ArrayList<Long>();
for (UserVmVO vm : _userVmDao.listAllRunning()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

VM stats are captured for running VMs only

List<UserVmVO> vms = _userVmDao.listRunningByHostId(host.getId());

removeVirtualMachineStats() removes the vm stat if the VM is stopped/destroyed after adding to the stats, so no need for this cleanup method.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review @sureshanaparti!
The cleanUpVirtualMachineStats() method is important to maintain consistency of the metric stats shown by ACS even when there are cases of VMs changing state unexpectedly (e.g., when a VM crashes; or when the VM is stopped directly by the hypervisor interface, etc). In these cases the removeVirtualMachineStats() method is not executed, so the cleanUpVirtualMachineStats() method does its job.

@joseflauzino
Copy link
Contributor Author

Any update about this PR?

@GutoVeronezi GutoVeronezi added this to the 4.17.0.0 milestone Nov 17, 2021
@DaanHoogland
Copy link
Contributor

code looks good, let's retest
@blueorangutan package

@blueorangutan
Copy link

@DaanHoogland a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress.

@blueorangutan
Copy link

Packaging result: ✖️ el7 ✖️ el8 ✔️ debian ✖️ suse15. SL-JID 1734

@DaanHoogland
Copy link
Contributor

@blueorangutan package

@blueorangutan
Copy link

@DaanHoogland a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress.

@blueorangutan
Copy link

Packaging result: ✔️ el7 ✖️ el8 ✔️ debian ✖️ suse15. SL-JID 1739

@joseflauzino
Copy link
Contributor Author

@blueorangutan package

@blueorangutan
Copy link

@joseflauzino a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress.

@blueorangutan
Copy link

Packaging result: ✔️ el7 ✔️ el8 ✔️ debian ✔️ suse15. SL-JID 1749

@DaanHoogland
Copy link
Contributor

@blueorangutan test

@blueorangutan
Copy link

@DaanHoogland a Trillian-Jenkins test job (centos7 mgmt + kvm-centos7) has been kicked to run smoke tests

@blueorangutan
Copy link

Trillian test result (tid-2572)
Environment: kvm-centos7 (x2), Advanced Networking with Mgmt server 7
Total time taken: 31220 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr5633-t2572-kvm-centos7.zip
Smoke tests completed. 91 look OK, 0 have errors
Only failed tests results shown below:

Test Result Time (s) Test File

Copy link
Contributor

@GutoVeronezi GutoVeronezi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@DaanHoogland can we merge this?

@DaanHoogland
Copy link
Contributor

@GutoVeronezi yes, I think @sureshanaparti's question was answered to satisfaction as well. let's merge.

@sureshanaparti
Copy link
Contributor

@joseflauzino Is this good to go in 4.16.1? If so, please rebase with '4.16'.

@joseflauzino
Copy link
Contributor Author

@sureshanaparti for me we can keep it in the main (4.17).

@GutoVeronezi
Copy link
Contributor

Merging this based on 2 LGTMs and package and test results.

@GutoVeronezi GutoVeronezi merged commit 28385be into apache:main Dec 6, 2021
mlsorensen pushed a commit to mlsorensen/cloudstack that referenced this pull request Dec 20, 2021
* Fix metrics stats for VMs that are not running

* Improves the way to get vmIdsToRemoveStats

* Improves test

Co-authored-by: José Flauzino <jose@scclouds.com.br>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants