Skip to content

Conversation

@Spaceman1984
Copy link
Contributor

@Spaceman1984 Spaceman1984 commented Nov 5, 2020

Description

When trying to start an instance with dynamic scaling on a Xenserver Hypervisor with less than 16 cores, the VM fails to start. This is due to the default max value being set to 16. This PR limits the max vCPUs to what the host is capable of.

Types of changes

  • Breaking change (fix or feature that would cause existing functionality to change)
  • New feature (non-breaking change which adds functionality)
  • Bug fix (non-breaking change which fixes an issue)
  • Enhancement (improves an existing feature and functionality)
  • Cleanup (Code refactoring and cleanup, that may add test cases)

Screenshots (if appropriate):

How Has This Been Tested?

Set enable.dynamic.scale.vm to true
Try to start an instance on a Xenserver host with less than 16 cores.

@Spaceman1984
Copy link
Contributor Author

@blueorangutan package

@blueorangutan
Copy link

@Spaceman1984 a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress.

@blueorangutan
Copy link

Packaging result: ✔centos7 ✔centos8 ✔debian. JID-2336

@Spaceman1984 Spaceman1984 reopened this Nov 6, 2020
@Spaceman1984
Copy link
Contributor Author

@blueorangutan test centos7 xcpng76

@blueorangutan
Copy link

@Spaceman1984 a Trillian-Jenkins test job (centos7 mgmt + xcpng76) has been kicked to run smoke tests

to.setVcpuMaxLimit(MaxNumberOfVCPUSPerVM.valueIn(host.getClusterId()));
// Max cpu cannot be more than host capability
if (host.getCpus() < MaxNumberOfVCPUSPerVM.valueIn(host.getClusterId()))
to.setVcpuMaxLimit(host.getCpus());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think host.getCpus() get the actual CPU on the physical host, but xenserver host is capable of allocating virtual CPUs more than the actual count.
Here is the list of configuration limits for XS 7.1 version. https://docs.citrix.com/en-us/xenserver/7-1/system-requirements/configuration-limits.html
It says "Virtual CPUs per VM (Linux)" 32. So we might need to set the value according to this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested this with a host with 8 CPUs and it failed trying to allocate 16 on XCP-ng 7.6

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll do some testing on Xenserver

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested XCP-NG 7.6 and Xenserver 7.6, I was unable to start a VM with more vCPUs than pCPUs.
XCP-NG
Xenserver

Copy link
Contributor Author

@Spaceman1984 Spaceman1984 Nov 6, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like overcommitting CPU on Xenserver was removed according to this article @harikrishna-patnala https://support.citrix.com/article/CTX236977

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Spaceman1984 got it, thanks for the article and yes it seems over committing CPU is disabled.
Do you think we have to also provide some global config "vcpu-unrestricted" to allow overcommit of CPU like xenserver does this way "xe vm-param-set uuid= platform:vcpu-unrestricted=true". This gives admins to choose whether to overcommit or not.

Otherwise If we have to go with by setting only actual pCPUs, we need to remove this configuration parameter "xen.vm.vcpu.max" as we dont need it now.

Copy link
Contributor Author

@Spaceman1984 Spaceman1984 Nov 6, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kept the global var in to allow things like if you have a host with 32 pCPUs but want to set a lower max vCPU limit for a VM for the purpose of dynamic scaling.
As for the ability to force overcommit - This is not recommended in the article and I would think not best practice, that's why I would vote no.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay LGTM.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please put a log message which explans why we are setting that value, otherwise user may think why a different value is set other than "xen.vm.vcpu.max"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, awesome, on it.

@yadvr
Copy link
Member

yadvr commented Nov 6, 2020

@blueorangutan package

@yadvr yadvr requested review from DaanHoogland and yadvr November 6, 2020 09:09
@yadvr yadvr added this to the 4.15.0.0 milestone Nov 6, 2020
@yadvr
Copy link
Member

yadvr commented Nov 6, 2020

@blueorangutan package

@blueorangutan
Copy link

@rhtyd a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress.

@blueorangutan
Copy link

Packaging result: ✔centos7 ✔centos8 ✔debian. JID-2337

@Spaceman1984
Copy link
Contributor Author

@blueorangutan test centos7 xcpng76

@blueorangutan
Copy link

@Spaceman1984 a Trillian-Jenkins test job (centos7 mgmt + xcpng76) has been kicked to run smoke tests

@blueorangutan
Copy link

Trillian test result (tid-3130)
Environment: xcpng76 (x2), Advanced Networking with Mgmt server 7
Total time taken: 33232 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr4447-t3130-xcpng76.zip
Intermittent failure detected: /marvin/tests/smoke/test_scale_vm.py
Intermittent failure detected: /marvin/tests/smoke/test_usage.py
Smoke tests completed. 85 look OK, 1 have error(s)
Only failed tests results shown below:

Test Result Time (s) Test File
test_01_scale_vm Failure 10.39 test_scale_vm.py

@DaanHoogland
Copy link
Contributor

looking by the discussion here, we should move this to 4.16 and make a good functional definition of what is and isn't possible on different xen platforms.

@blueorangutan
Copy link

Trillian test result (tid-3131)
Environment: xcpng76 (x2), Advanced Networking with Mgmt server 7
Total time taken: 33933 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr4447-t3131-xcpng76.zip
Intermittent failure detected: /marvin/tests/smoke/test_router_dhcphosts.py
Intermittent failure detected: /marvin/tests/smoke/test_scale_vm.py
Intermittent failure detected: /marvin/tests/smoke/test_host_maintenance.py
Smoke tests completed. 85 look OK, 1 have error(s)
Only failed tests results shown below:

Test Result Time (s) Test File
test_01_scale_vm Failure 9.32 test_scale_vm.py

@DaanHoogland
Copy link
Contributor

looking by the discussion here, we should move this to 4.16 and make a good functional definition of what is and isn't possible on different xen platforms.

ping @Spaceman1984 @harikrishna-patnala . cc @rhtyd

@Spaceman1984
Copy link
Contributor Author

@DaanHoogland I have determined this PR is related to #4443, I will merge this branch into that one and then close this PR.

@harikrishna-patnala
Copy link
Member

looking by the discussion here, we should move this to 4.16 and make a good functional definition of what is and isn't possible on different xen platforms.

ping @Spaceman1984 @harikrishna-patnala . cc @rhtyd

@DaanHoogland agreed to keep it for 4.16 since this makes a substantial change in the existing flow or values assigned.

@DaanHoogland DaanHoogland modified the milestones: 4.15.0.0, 4.16.0.0 Nov 11, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants