Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[XenServer/XCP-ng] Pass the image store NFS version on storage commands #5886

Merged
merged 7 commits into from Jan 31, 2022

Conversation

nvazquez
Copy link
Contributor

@nvazquez nvazquez commented Jan 21, 2022

Description

This PR fixes issues on mounting secondary storage from XCP hosts, controlling the NFS version set.

  • The NFS version is set at the secstorage.nfs.version setting of an image store, or if it is not set, the global value is used
  • Mount commands will append the NFS version to the mount options, for example: mount -o vers=4....
  • Each time the value is changed, the management server has to be restarted for the NFS version to take effect

Types of changes

  • Breaking change (fix or feature that would cause existing functionality to change)
  • New feature (non-breaking change which adds functionality)
  • Bug fix (non-breaking change which fixes an issue)
  • Enhancement (improves an existing feature and functionality)
  • Cleanup (Code refactoring and cleanup, that may add test cases)

Feature/Enhancement Scale or Bug Severity

Feature/Enhancement Scale

  • Major
  • Minor

Bug Severity

  • BLOCKER
  • Critical
  • Major
  • Minor
  • Trivial

Screenshots (if appropriate):

How Has This Been Tested?

Tested on a NFS 3 server as image store

  • Deploy the environment

  • Set image store 'secstorage.nfs.version' = 4

  • Restart the management server

  • Deploy VM --> failure as secondary storage could not be mounted on the host

  • Set image store 'secstorage.nfs.version' = 3

  • Restart the management server

  • Deploy VM --> Success

  • Set image store 'secstorage.nfs.version' = '' and global 'secstorage.nfs.version' = ''

  • Restart the management server

  • Take backups, snapshots, attach ISO -> Success

@nvazquez
Copy link
Contributor Author

@blueorangutan package

@blueorangutan
Copy link

@nvazquez a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress.

@blueorangutan
Copy link

Packaging result: ✔️ el7 ✔️ el8 ✔️ debian ✔️ suse15. SL-JID 2271

@nvazquez
Copy link
Contributor Author

@blueorangutan test centos7 xcpng82

@blueorangutan
Copy link

@nvazquez a Trillian-Jenkins test job (centos7 mgmt + xcpng82) has been kicked to run smoke tests

@blueorangutan
Copy link

Trillian test result (tid-2938)
Environment: xcpng82 (x2), Advanced Networking with Mgmt server 7
Total time taken: 25874 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr5886-t2938-xcpng82.zip
Smoke tests completed. 69 look OK, 23 have errors
Only failed tests results shown below:

Test Result Time (s) Test File
ContextSuite context=TestLBRuleUsage>:setup Error 19.48 test_usage.py
ContextSuite context=TestNatRuleUsage>:setup Error 21.45 test_usage.py
ContextSuite context=TestPublicIPUsage>:setup Error 23.50 test_usage.py
ContextSuite context=TestSnapshotUsage>:setup Error 25.47 test_usage.py
ContextSuite context=TestVmUsage>:setup Error 35.94 test_usage.py
ContextSuite context=TestVolumeUsage>:setup Error 37.98 test_usage.py
ContextSuite context=TestVpnUsage>:setup Error 39.94 test_usage.py
test_list_vms_metrics Error 1.35 test_metrics_api.py
test_01_create_redundant_VPC_2tiers_4VMs_4IPs_4PF_ACL Error 8.75 test_vpc_redundant.py
test_02_redundant_VPC_default_routes Error 7.73 test_vpc_redundant.py
test_03_create_redundant_VPC_1tier_2VMs_2IPs_2PF_ACL_reboot_routers Error 8.19 test_vpc_redundant.py
test_04_rvpc_network_garbage_collector_nics Error 7.72 test_vpc_redundant.py
test_05_rvpc_multi_tiers Error 8.80 test_vpc_redundant.py
ContextSuite context=TestRouterDHCPHosts>:setup Error 0.00 test_router_dhcphosts.py
ContextSuite context=TestRouterDHCPOpts>:setup Error 0.00 test_router_dhcphosts.py
ContextSuite context=TestRouterDns>:setup Error 0.00 test_router_dns.py
test_02_cancel_host_maintenace_with_migration_jobs Error 87.06 test_host_maintenance.py
test_03_cancel_host_maintenace_with_migration_jobs_failure Error 1.54 test_host_maintenance.py
ContextSuite context=TestRouterDnsService>:setup Error 0.00 test_router_dnsservice.py
ContextSuite context=TestRouterServices>:setup Error 0.00 test_routers.py
test_01_isolate_network_FW_PF_default_routes_egress_true Error 1.35 test_routers_network_ops.py
test_02_isolate_network_FW_PF_default_routes_egress_false Error 1.31 test_routers_network_ops.py
test_01_RVR_Network_FW_PF_SSH_default_routes_egress_true Error 4.62 test_routers_network_ops.py
test_02_RVR_Network_FW_PF_SSH_default_routes_egress_false Error 5.98 test_routers_network_ops.py
test_03_RVR_Network_check_router_state Error 6.01 test_routers_network_ops.py
test_01_deploy_vm_on_specific_host Error 2.49 test_vm_deployment_planner.py
test_02_deploy_vm_on_specific_cluster Error 2.38 test_vm_deployment_planner.py
test_03_deploy_vm_on_specific_pod Error 2.35 test_vm_deployment_planner.py
test_04_deploy_vm_on_host_override_pod_and_cluster Error 1.36 test_vm_deployment_planner.py
test_05_deploy_vm_on_cluster_override_pod Error 2.35 test_vm_deployment_planner.py
test_01_internallb_roundrobin_1VPC_3VM_HTTP_port80 Failure 6.75 test_internal_lb.py
test_02_internallb_roundrobin_1RVPC_3VM_HTTP_port80 Failure 7.82 test_internal_lb.py
test_03_vpc_internallb_haproxy_stats_on_all_interfaces Failure 7.75 test_internal_lb.py
test_04_rvpc_internallb_haproxy_stats_on_all_interfaces Failure 7.77 test_internal_lb.py
ContextSuite context=TestScaleVm>:setup Error 0.00 test_scale_vm.py
test_01_invalid_upgrade_kubernetes_cluster Failure 4.53 test_kubernetes_clusters.py
test_02_upgrade_kubernetes_cluster Failure 3.31 test_kubernetes_clusters.py
test_03_deploy_and_scale_kubernetes_cluster Failure 4.32 test_kubernetes_clusters.py
test_04_autoscale_kubernetes_cluster Failure 4.32 test_kubernetes_clusters.py
test_05_basic_lifecycle_kubernetes_cluster Failure 3.31 test_kubernetes_clusters.py
test_06_delete_kubernetes_cluster Failure 4.28 test_kubernetes_clusters.py
test_07_deploy_kubernetes_ha_cluster Failure 4.25 test_kubernetes_clusters.py
test_08_upgrade_kubernetes_ha_cluster Failure 4.23 test_kubernetes_clusters.py
test_09_delete_kubernetes_ha_cluster Failure 3.24 test_kubernetes_clusters.py
ContextSuite context=TestKubernetesCluster>:teardown Error 62.32 test_kubernetes_clusters.py
test_01_sys_vm_start Failure 0.10 test_secondary_storage.py
test_01_VPC_nics_after_destroy Error 5.77 test_vpc_router_nics.py
test_02_VPC_default_routes Error 5.74 test_vpc_router_nics.py
ContextSuite context=TestListIdsParams>:setup Error 0.00 test_list_ids_parameter.py
test_delete_account Error 16.91 test_network.py
test_delete_network_while_vm_on_it Error 2.32 test_network.py
test_delete_network_while_vm_on_it Error 2.32 test_network.py
test_deploy_vm_l2network Error 2.35 test_network.py
test_deploy_vm_l2network Error 2.35 test_network.py
test_l2network_restart Error 3.42 test_network.py
test_l2network_restart Error 3.42 test_network.py
ContextSuite context=TestL2Networks>:teardown Error 4.52 test_network.py
ContextSuite context=TestPortForwarding>:setup Error 6.25 test_network.py
ContextSuite context=TestPublicIP>:setup Error 2.35 test_network.py
test_reboot_router Error 1.75 test_network.py
test_releaseIP Error 1.73 test_network.py
ContextSuite context=TestRouterRules>:setup Error 3.45 test_network.py
test_01_deployVMInSharedNetwork Failure 1.17 test_network.py
test_02_verifyRouterIpAfterNetworkRestart Failure 1.07 test_network.py
test_03_destroySharedNetwork Failure 1.07 test_network.py
ContextSuite context=TestSharedNetwork>:teardown Error 2.17 test_network.py
ContextSuite context=TestDeployVM>:setup Error 0.00 test_vm_life_cycle.py
ContextSuite context=TestVMLifeCycle>:setup Error 1.74 test_vm_life_cycle.py
test_network_acl Error 5.29 test_network_acl.py
test_01_nic Error 65.02 test_nic.py
ContextSuite context=TestServiceOfferings>:setup Error 17.01 test_service_offerings.py
ContextSuite context=TestSnapshotRootDisk>:setup Error 0.00 test_snapshots.py
test_02_routervm_iptables_policies Error 1.25 test_routers_iptables_default_policy.py
test_01_single_VPC_iptables_policies Error 5.34 test_routers_iptables_default_policy.py

@nvazquez
Copy link
Contributor Author

@blueorangutan package

@blueorangutan
Copy link

@nvazquez a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress.

@nvazquez
Copy link
Contributor Author

@blueorangutan package

@blueorangutan
Copy link

@nvazquez a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress.

@blueorangutan
Copy link

Packaging result: ✔️ el7 ✔️ el8 ✔️ debian ✔️ suse15. SL-JID 2276

@nvazquez
Copy link
Contributor Author

@blueorangutan test centos7 xcpng82

@blueorangutan
Copy link

@nvazquez a Trillian-Jenkins test job (centos7 mgmt + xcpng82) has been kicked to run smoke tests

@blueorangutan
Copy link

Trillian test result (tid-2945)
Environment: xcpng82 (x2), Advanced Networking with Mgmt server 7
Total time taken: 42279 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr5886-t2945-xcpng82.zip
Smoke tests completed. 91 look OK, 1 have errors
Only failed tests results shown below:

Test Result Time (s) Test File
test_01_sys_vm_start Failure 0.10 test_secondary_storage.py

Copy link
Contributor

@sureshanaparti sureshanaparti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

code LGTM

private ImageStoreDao imageStoreDao;

protected void setSecondaryStorageNfsVersionToParams(Long zoneId, Map<String, Object> params) {
ImageStoreVO imageStoreInZone = imageStoreDao.findOneByZoneAndProtocol(zoneId, "nfs");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nvazquez Are multiple image stores with different nfs versions supported in a zone?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could be possible since the 'secstorage.nfs.version' setting has scope = ImageStore

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok @nvazquez, will there be any issues if the nfs version is set based on one image store when multiple image stores are added (with different versions).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potentially yes - suggesting we should pick the lower value set in this case? cc @DaanHoogland

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nvazquez @sureshanaparti I do not see a problem here. Unless I am looking at the code in isolation too much and should be looking at a bigger picture, what happens here:

a store is picked
if a store is found
- the nfs version for that store is retrieved and used
if not
- the default nfs version is used

what could go wrong here is only that the operator has not set the right nfs versions to the system/stores.

@blueorangutan
Copy link

@sureshanaparti a Trillian-Jenkins test job (centos7 mgmt + xcpng82) has been kicked to run smoke tests

@nvazquez
Copy link
Contributor Author

@blueorangutan package

@blueorangutan
Copy link

@nvazquez a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress.

@nvazquez nvazquez changed the title Pass image store NFS version to Xen/XCP hosts [XenServer/XCP-ng] Pass the image store NFS version on storage commands Jan 26, 2022
@blueorangutan
Copy link

Packaging result: ✔️ el7 ✔️ el8 ✔️ debian ✔️ suse15. SL-JID 2308

@sureshanaparti
Copy link
Contributor

@blueorangutan test centos7 xcpng82

@blueorangutan
Copy link

@sureshanaparti a Trillian-Jenkins test job (centos7 mgmt + xcpng82) has been kicked to run smoke tests

Copy link
Contributor

@borisstoyanov borisstoyanov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, verified manually the latest changes

@blueorangutan
Copy link

Trillian test result (tid-3012)
Environment: xcpng82 (x2), Advanced Networking with Mgmt server 7
Total time taken: 44380 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr5886-t3012-xcpng82.zip
Smoke tests completed. 87 look OK, 5 have errors
Only failed tests results shown below:

Test Result Time (s) Test File
ContextSuite context=TestCreateVolume>:setup Error 0.00 test_volumes.py
ContextSuite context=TestVolumes>:setup Error 0.00 test_volumes.py
ContextSuite context=TestIsolatedNetworksPasswdServer>:setup Error 0.00 test_password_server.py
test_07_deploy_vm_with_extraconfig_xenserver Error 2.29 test_deploy_vm_extra_config_data.py
test_01_isolated_persistent_network Error 2.74 test_persistent_network.py
test_03_deploy_and_destroy_VM_and_verify_network_resources_persist Failure 4.92 test_persistent_network.py
test_03_deploy_and_destroy_VM_and_verify_network_resources_persist Error 4.93 test_persistent_network.py
ContextSuite context=TestL2PersistentNetworks>:teardown Error 4.97 test_persistent_network.py
test_01_sys_vm_start Failure 0.09 test_secondary_storage.py

@blueorangutan
Copy link

Trillian test result (tid-3015)
Environment: xcpng82 (x2), Advanced Networking with Mgmt server 7
Total time taken: 43832 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr5886-t3015-xcpng82.zip
Smoke tests completed. 91 look OK, 1 have errors
Only failed tests results shown below:

Test Result Time (s) Test File
test_01_sys_vm_start Failure 0.13 test_secondary_storage.py

@sureshanaparti
Copy link
Contributor

@blueorangutan test centos7 xcpng82

@blueorangutan
Copy link

@sureshanaparti a Trillian-Jenkins test job (centos7 mgmt + xcpng82) has been kicked to run smoke tests

@blueorangutan
Copy link

Trillian test result (tid-3029)
Environment: xcpng82 (x2), Advanced Networking with Mgmt server 7
Total time taken: 44272 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr5886-t3029-xcpng82.zip
Smoke tests completed. 89 look OK, 3 have errors
Only failed tests results shown below:

Test Result Time (s) Test File
test_01_volume_usage Error 98.96 test_usage.py
test_01_sys_vm_start Failure 0.10 test_secondary_storage.py
test_01_nic Error 251.29 test_nic.py

Copy link
Contributor

@DaanHoogland DaanHoogland left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cltgm, needs extensive testing though

@sureshanaparti
Copy link
Contributor

@blueorangutan package

@blueorangutan
Copy link

@sureshanaparti a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress.

@blueorangutan
Copy link

Packaging result: ✔️ el7 ✔️ el8 ✔️ debian ✔️ suse15. SL-JID 2357

@sureshanaparti
Copy link
Contributor

@blueorangutan test centos7 xcpng82

@blueorangutan
Copy link

@sureshanaparti a Trillian-Jenkins test job (centos7 mgmt + xcpng82) has been kicked to run smoke tests

@blueorangutan
Copy link

Trillian test result (tid-3047)
Environment: xcpng82 (x2), Advanced Networking with Mgmt server 7
Total time taken: 47956 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr5886-t3047-xcpng82.zip
Smoke tests completed. 91 look OK, 1 have errors
Only failed tests results shown below:

Test Result Time (s) Test File
test_01_sys_vm_start Failure 0.17 test_secondary_storage.py

@nvazquez
Copy link
Contributor Author

@sureshanaparti it seems that failing test was consistent across the previous failures, I’ll investigate it

@sureshanaparti
Copy link
Contributor

sureshanaparti commented Jan 31, 2022

@sureshanaparti it seems that failing test was consistent across the previous failures, I’ll investigate it

@nvazquez there is some issue connecting to hosts, seems to be environment issue. noticed this in other PR tests as well (not observed in the latest health check runs).

@sureshanaparti sureshanaparti merged commit 3e92a63 into apache:4.16 Jan 31, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
No open projects
Status: Done
Development

Successfully merging this pull request may close these issues.

None yet

5 participants