Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add managed storage pool constraints to MigrateWithVolume API method #2761

Merged
merged 11 commits into from Sep 10, 2018

Conversation

@rafaelweingartner
Copy link
Member

rafaelweingartner commented Jul 23, 2018

Description

Mike discovered that when PR #2425 was merged, we created a bug with managed storage. The following constraints needed to be added to ACS:

  • If I want to migrate a VM across clusters, but if at least one of its volumes is placed in a cluster-wide managed storage, the migration is not allowed. On the other hand, if the VM is placed in a managed zone-wide storage, the migration can be executed;
  • A volume placed in managed storage can never (at least not using this migrateWithVolume method) be migrated out of the storage pool it resides;
  • When migrating a VM that does not have volumes in managed storage, it should be possible to migrate it cross clusters. Therefore, we should try to use the volume allocators to find a suitable storage pool for its volumes in the target cluster.

Types of changes

  • Breaking change (fix or feature that would cause existing functionality to change)
  • New feature (non-breaking change which adds functionality)
  • Bug fix (non-breaking change which fixes an issue)
  • Enhancement (improves an existing feature and functionality)
  • Cleanup (Code refactoring and cleanup, that may add test cases)

GitHub Issue/PRs

Checklist:

  • I have read the CONTRIBUTING document.
  • My code follows the code style of this project.
  • My change requires a change to the documentation.
  • I have updated the documentation accordingly.
    Testing
  • I have added tests to cover my changes.
  • All relevant new and existing integration tests have passed.
  • A full integration testsuite with all test that can run on my environment has passed.

@rafaelweingartner rafaelweingartner self-assigned this Jul 23, 2018

@rafaelweingartner rafaelweingartner added this to the 4.12.0.0 milestone Jul 23, 2018

@borisstoyanov

This comment has been minimized.

Copy link
Contributor

borisstoyanov commented Jul 24, 2018

@blueorangutan package

@blueorangutan

This comment has been minimized.

Copy link

blueorangutan commented Jul 24, 2018

@borisstoyanov a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress.

@blueorangutan

This comment has been minimized.

Copy link

blueorangutan commented Jul 24, 2018

Packaging result: ✔centos6 ✔centos7 ✔debian. JID-2203

@borisstoyanov

This comment has been minimized.

Copy link
Contributor

borisstoyanov commented Jul 24, 2018

@blueorangutan

This comment has been minimized.

Copy link

blueorangutan commented Jul 24, 2018

@borisstoyanov a Trillian-Jenkins test job (centos7 mgmt + kvm-centos7) has been kicked to run smoke tests

@blueorangutan

This comment has been minimized.

Copy link

blueorangutan commented Jul 24, 2018

Trillian test result (tid-2884)
Environment: kvm-centos7 (x2), Advanced Networking with Mgmt server 7
Total time taken: 22310 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr2761-t2884-kvm-centos7.zip
Intermitten failure detected: /marvin/tests/smoke/test_deploy_virtio_scsi_vm.py
Intermitten failure detected: /marvin/tests/smoke/test_primary_storage.py
Intermitten failure detected: /marvin/tests/smoke/test_privategw_acl.py
Intermitten failure detected: /marvin/tests/smoke/test_snapshots.py
Intermitten failure detected: /marvin/tests/smoke/test_vm_life_cycle.py
Intermitten failure detected: /marvin/tests/smoke/test_host_maintenance.py
Intermitten failure detected: /marvin/tests/smoke/test_hostha_kvm.py
Smoke tests completed. 61 look OK, 7 have error(s)
Only failed tests results shown below:

Test Result Time (s) Test File
ContextSuite context=TestDeployVirtioSCSIVM>:setup Error 0.00 test_deploy_virtio_scsi_vm.py
test_01_add_primary_storage_disabled_host Error 0.52 test_primary_storage.py
test_01_primary_storage_nfs Error 0.10 test_primary_storage.py
ContextSuite context=TestStorageTags>:setup Error 0.18 test_primary_storage.py
test_03_vpc_privategw_restart_vpc_cleanup Failure 1138.20 test_privategw_acl.py
test_02_list_snapshots_with_removed_data_store Error 1.14 test_snapshots.py
test_01_secure_vm_migration Error 131.76 test_vm_life_cycle.py
test_02_unsecure_vm_migration Error 131.77 test_vm_life_cycle.py
test_03_secured_to_nonsecured_vm_migration Error 132.81 test_vm_life_cycle.py
test_04_nonsecured_to_secured_vm_migration Error 132.79 test_vm_life_cycle.py
test_08_migrate_vm Error 14.67 test_vm_life_cycle.py
test_01_cancel_host_maintenace_with_no_migration_jobs Failure 0.09 test_host_maintenance.py
test_02_cancel_host_maintenace_with_migration_jobs Error 1.23 test_host_maintenance.py
test_hostha_enable_ha_when_host_in_maintenance Error 2.44 test_hostha_kvm.py
@borisstoyanov

This comment has been minimized.

Copy link
Contributor

borisstoyanov commented Jul 25, 2018

@blueorangutan

This comment has been minimized.

Copy link

blueorangutan commented Jul 25, 2018

@borisstoyanov a Trillian-Jenkins test job (centos7 mgmt + kvm-centos7) has been kicked to run smoke tests

@blueorangutan

This comment has been minimized.

Copy link

blueorangutan commented Jul 25, 2018

Trillian test result (tid-2886)
Environment: kvm-centos7 (x2), Advanced Networking with Mgmt server 7
Total time taken: 26964 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr2761-t2886-kvm-centos7.zip
Intermitten failure detected: /marvin/tests/smoke/test_deploy_virtio_scsi_vm.py
Intermitten failure detected: /marvin/tests/smoke/test_privategw_acl.py
Intermitten failure detected: /marvin/tests/smoke/test_vm_life_cycle.py
Intermitten failure detected: /marvin/tests/smoke/test_vpc_redundant.py
Intermitten failure detected: /marvin/tests/smoke/test_hostha_kvm.py
Smoke tests completed. 65 look OK, 3 have error(s)
Only failed tests results shown below:

Test Result Time (s) Test File
ContextSuite context=TestDeployVirtioSCSIVM>:setup Error 0.00 test_deploy_virtio_scsi_vm.py
test_03_vpc_privategw_restart_vpc_cleanup Failure 1099.73 test_privategw_acl.py
test_01_secure_vm_migration Error 131.70 test_vm_life_cycle.py
test_02_unsecure_vm_migration Error 132.65 test_vm_life_cycle.py
test_03_secured_to_nonsecured_vm_migration Error 131.64 test_vm_life_cycle.py
test_04_nonsecured_to_secured_vm_migration Error 132.73 test_vm_life_cycle.py
if (!currentPool.isManaged()) {
return;
}
if (currentPool.getClusterId() == targetHost.getClusterId()) {

This comment has been minimized.

@mike-tutkowski

mike-tutkowski Jul 25, 2018

Member

Since currentPool.getClusterId() can be null (if the storage pool is zone wide), you might want to reverse these if statements (like this):

    if (ScopeType.ZONE.equals(currentPool.getScope())) {
        return;
    }
    if (currentPool.getClusterId() == targetHost.getClusterId()) {
        return;
    }

Also, since getClusterId() for currentPool and targetHost both return an Integer (not an int), you might want to perform the compare like this:

    if (targetHost.getClusterId().equals(currentPool.getClusterId())) {
        return;
    }

This comment has been minimized.

@rafaelweingartner

rafaelweingartner Jul 25, 2018

Author Member

Ok, got the first issue. If it is a zone wide storage, we do not have a cluster ID. I fixed it now.

I just do not understand why we need the equals method though. Even though both currentPool.getClusterId() and targetHost.getClusterId() return a Long object, because of Java auto boxing, everything is going to work just fine.

* </ul>
*
*/
private void createVolumeToStoragePoolMappingIfNeeded(VirtualMachineProfile profile, Host targetHost, Map<Volume, StoragePool> volumeToPoolObjectMap, VolumeVO volume, StoragePoolVO currentPool) {
protected void createVolumeToStoragePoolMappingIfNeeded(VirtualMachineProfile profile, Host targetHost, Map<Volume, StoragePool> volumeToPoolObjectMap, VolumeVO volume,
StoragePoolVO currentPool) {

This comment has been minimized.

@mike-tutkowski

mike-tutkowski Jul 25, 2018

Member

Can you do a search on "avaliable" and change it to "available"? Thanks (I found one occurrence).

This comment has been minimized.

@rafaelweingartner

rafaelweingartner Jul 25, 2018

Author Member

Fixed!

* </ul>
*
*/
private void createVolumeToStoragePoolMappingIfNeeded(VirtualMachineProfile profile, Host targetHost, Map<Volume, StoragePool> volumeToPoolObjectMap, VolumeVO volume, StoragePoolVO currentPool) {
protected void createVolumeToStoragePoolMappingIfNeeded(VirtualMachineProfile profile, Host targetHost, Map<Volume, StoragePool> volumeToPoolObjectMap, VolumeVO volume,

This comment has been minimized.

@mike-tutkowski

mike-tutkowski Jul 25, 2018

Member

Perhaps canTargetHostAccessVolumeStoragePool should be called canTargetHostAccessVolumeCurrentStoragePool. Not a big deal, but when I think about it this way, it makes it more clear to me that we are going though this list of candidate storage pools in an attempt to see if one of them is the current storage pool.

This comment has been minimized.

@rafaelweingartner
if (_poolHostDao.findByPoolHost(targetPool.getId(), targetHost.getId()) == null) {
throw new CloudRuntimeException(
String.format("Cannot migrate the volume [%s] to the storage pool [%s] while migrating VM [%s] to target host [%s]. The host does not have access to the storage pool entered.",
volume.getUuid(), targetPool.getUuid(), profile.getUuid(), targetHost.getUuid()));
}
if (currentPool.getId() == targetPool.getId()) {

This comment has been minimized.

@mike-tutkowski

mike-tutkowski Jul 25, 2018

Member
        if (currentPool.getId() == targetPool.getId()) {
            s_logger.info(String.format("The volume [%s] is already allocated in storage pool [%s].", volume.getUuid(), targetPool.getUuid()));
        }
        volumeToPoolObjectMap.put(volume, targetPool);

The old code didn't associate the volume to the targetPool via the volumeToPoolObjectMap map if the targetPool was the same as the currentPool.

This comment has been minimized.

@rafaelweingartner

rafaelweingartner Jul 25, 2018

Author Member

You mean, I should not add an entry to the map if the volume is already in the "targetPool"?

I did this way because of XenServer; when we have VMs with more than one volume. Then, if one volume is in a shared storage and the other in local storage, we need to send to Xen the complete map of “volume-storagePool”. Otherwise, the migration will not work. Since I am mapping the volume to its current storage pool I am expecting the other hypervisors to simply ignore the entry.

This comment has been minimized.

@mike-tutkowski

mike-tutkowski Jul 26, 2018

Member

Yeah, in the old code, the volume was not added to the map (even for XenServer) unless it needed to be migrated to a new storage pool. It sounds like you are saying that logic was incorrect, though?

This comment has been minimized.

@rafaelweingartner

rafaelweingartner Jul 26, 2018

Author Member

Yes. It was not working with XenServer before because it requires all disks to be mapped. Even the ones that will not be migrated (e.g. disks in shared storage).

@@ -2282,7 +2282,7 @@ protected void migrate(final VMInstanceVO vm, final long srcHostId, final Deploy
* Create the mapping of volumes and storage pools. If the user did not enter a mapping on her/his own, we create one using {@link #getDefaultMappingOfVolumesAndStoragePoolForMigration(VirtualMachineProfile, Host)}.
* If the user provided a mapping, we use whatever the user has provided (check the method {@link #createMappingVolumeAndStoragePoolEnteredByUser(VirtualMachineProfile, Host, Map)}).
*/
private Map<Volume, StoragePool> getPoolListForVolumesForMigration(VirtualMachineProfile profile, Host targetHost, Map<Long, Long> volumeToPool) {
protected Map<Volume, StoragePool> getPoolListForVolumesForMigration(VirtualMachineProfile profile, Host targetHost, Map<Long, Long> volumeToPool) {

This comment has been minimized.

@mike-tutkowski

mike-tutkowski Jul 25, 2018

Member

The old code used to walk through each volume. It would see if the volume was specified in the volumeToPool mapping (provided by the user). If it was, it would verify that the target storage pool could, in fact, be used. If the specified target storage pool could not be used, an exception was thrown.

If the volume in question was not specified in the volumeToPool mapping, the old code would look to see if the volume could stay on the same storage pool or if it had to be migrated to a new storage pool. The new code does not perform this function. The new code only tries to migrate volumes that were specified by the user in the provided mapping. It could be that the user wants to specify that volume X gets migrated to storage pool Y, but that the user doesn't care where volume A gets migrated to (he/she is letting CloudStack decide on that one). The new code would just ignore the migration of volume A (which may not be acceptable in the case of a VM is being migrated to a new cluster).

Instead of making an empty mapping be a special use case (the if statement in the new code's getPoolListForVolumesForMigration method), I would recommend an approach more similar to the old code: Just iterate over each volume in the VM to migrate. Check the user-provided mapping for a target storage pool. If specified, make sure it can be used by the target host. If not specified, use the storage pool allocators to try to find a match. If the current storage pool is one of the matches, then there is nothing to migrate for this volume. If the volume is specified in the map and its target pool is the same as its current pool, make sure the target host can see that pool. If so, there is nothing to migrate for this volume; else, thrown an exception.

This comment has been minimized.

@rafaelweingartner

rafaelweingartner Jul 25, 2018

Author Member

Ok, so to summarize. If the VM has two volumes, you want to be able to define the migration for one of them and the other that was not specified should be taken care by ACS. Is that it?

This comment has been minimized.

@mike-tutkowski

This comment has been minimized.

@mike-tutkowski

mike-tutkowski Jul 25, 2018

Member

If you examine the original getPoolListForVolumesForMigration method, you can see the loop I'm referring to.

This comment has been minimized.

@rafaelweingartner

rafaelweingartner Jul 25, 2018

Author Member

Yes I remember the loop, but all of these requirements were not document anywhere. I will add this as well

@mike-tutkowski

This comment has been minimized.

Copy link
Member

mike-tutkowski commented Jul 25, 2018

Technically, this PR is about two issues:

  1. Non-Managed Storage: Migrating a VM across compute clusters (at least supported in XenServer) was no longer possible. If, say, a virtual disk resides on shared storage in the source compute cluster, we must be able to copy this virtual disk to shared storage in the destination compute cluster.

  2. Managed Storage: There is currently a constraint with zone-wide managed storage. This was not being honored by the new code.

@rafaelweingartner

This comment has been minimized.

Copy link
Member Author

rafaelweingartner commented Jul 25, 2018

@mike-tutkowski you are right, this PR is fixing two problems.

I just pushed a commit applying your requests.

if (_poolHostDao.findByPoolHost(targetPool.getId(), targetHost.getId()) == null) {
throw new CloudRuntimeException(
String.format("Cannot migrate the volume [%s] to the storage pool [%s] while migrating VM [%s] to target host [%s]. The host does not have access to the storage pool entered.",
volume.getUuid(), targetPool.getUuid(), profile.getUuid(), targetHost.getUuid()));
}
if (currentPool.getId() == targetPool.getId()) {

This comment has been minimized.

@mike-tutkowski

mike-tutkowski Jul 26, 2018

Member

Yeah, in the old code, the volume was not added to the map (even for XenServer) unless it needed to be migrated to a new storage pool. It sounds like you are saying that logic was incorrect, though?


executeManagedStorageChecks(targetHost, currentPool, volume);
if (ScopeType.HOST.equals(currentPool.getScope()) || isStorageCrossClusterMigration(targetHost, currentPool)) {
createVolumeToStoragePoolMappingIfPossible(profile, targetHost, volumeToPoolObjectMap, volume, currentPool);
} else {
volumeToPoolObjectMap.put(volume, currentPool);

This comment has been minimized.

@mike-tutkowski

mike-tutkowski Jul 26, 2018

Member

Per a previous comment of mine, we should decide if this line should be there. The old code didn't put the volume in the map if its current pool wasn't going to change.

This comment has been minimized.

@rafaelweingartner

rafaelweingartner Jul 26, 2018

Author Member

Yes, I explain why I am doing this in your other comment

@@ -2367,7 +2437,7 @@ private void createVolumeToStoragePoolMappingIfNeeded(VirtualMachineProfile prof
/**
* We use {@link StoragePoolAllocator} objects to find local storage pools connected to the targetHost where we would be able to allocate the given volume.
*/
private List<StoragePool> getCandidateStoragePoolsToMigrateLocalVolume(VirtualMachineProfile profile, Host targetHost, VolumeVO volume) {
private List<StoragePool> getCandidateStoragePoolsToMigrateLocalVolume(VirtualMachineProfile profile, Host targetHost, Volume volume) {

This comment has been minimized.

@mike-tutkowski

mike-tutkowski Jul 26, 2018

Member

Not sure if we want to use the word "Local" in this volume name.

This comment has been minimized.

@rafaelweingartner

rafaelweingartner Jul 26, 2018

Author Member

You are right!

@@ -2282,7 +2282,7 @@ protected void migrate(final VMInstanceVO vm, final long srcHostId, final Deploy
* Create the mapping of volumes and storage pools. If the user did not enter a mapping on her/his own, we create one using {@link #getDefaultMappingOfVolumesAndStoragePoolForMigration(VirtualMachineProfile, Host)}.
* If the user provided a mapping, we use whatever the user has provided (check the method {@link #createMappingVolumeAndStoragePoolEnteredByUser(VirtualMachineProfile, Host, Map)}).
*/
private Map<Volume, StoragePool> getPoolListForVolumesForMigration(VirtualMachineProfile profile, Host targetHost, Map<Long, Long> volumeToPool) {
protected Map<Volume, StoragePool> getPoolListForVolumesForMigration(VirtualMachineProfile profile, Host targetHost, Map<Long, Long> volumeToPool) {
if (MapUtils.isEmpty(volumeToPool)) {

This comment has been minimized.

@mike-tutkowski

mike-tutkowski Jul 26, 2018

Member

Do we even need this?

    if (MapUtils.isEmpty(volumeToPool)) {
        return getDefaultMappingOfVolumesAndStoragePoolForMigration(profile, targetHost);
    }

Won't the code work just fine without it? It doesn't seem like we need a special case for the mapping being empty when it gets here.

This comment has been minimized.

@rafaelweingartner

rafaelweingartner Jul 26, 2018

Author Member

Now it will. I will execute the changes then.

@rafaelweingartner

This comment has been minimized.

Copy link
Member Author

rafaelweingartner commented Jul 26, 2018

@mike-tutkowski I just pushed a new commit with the changes you suggested.

@mike-tutkowski

This comment has been minimized.

Copy link
Member

mike-tutkowski commented Jul 26, 2018

@rafaelweingartner I think we are very close. Take a look at what I did here: mike-tutkowski@07279b4 .

I made a few changes that I felt better reflected what the code used to do around checks for managed storage.

To get it to compile, I made a couple changes and commented out some code in the JUnit test for now.

@rafaelweingartner

This comment has been minimized.

Copy link
Member Author

rafaelweingartner commented Jul 26, 2018

Thanks mike. I applied your suggestions. I am only a little confused with the use of executeManagedStorageChecksWhenTargetStoragePoolProvided and executeManagedStorageChecksWhenTargetStoragePoolNotProvided.

Can't I simply use executeManagedStorageChecks in both places?

@mike-tutkowski

This comment has been minimized.

Copy link
Member

mike-tutkowski commented Jul 26, 2018

@rafaelweingartner Let's say the user provided a (target) managed storage pool for the volume in question. If that's the case, then we just want to make sure that this managed storage pool is the same as the storage pool of the volume. We don't care if it's a zone-wide managed storage pool or a cluster-wide managed storage pool in this case (either is acceptable). We just want to make sure that it's the same storage pool. Those checks are captured in executeManagedStorageChecksWhenTargetStoragePoolProvided (which does not care about the zone-wide requirement).

However, if you don't provide a storage pool for the volume in question and it is located on managed storage, then we - in case you are doing a migration of the VM from one cluster to another - need to only make sure the managed storage pool is zone wide. If it turns out that this managed storage pool is cluster wide, then the user needs to explicitly pass the volume/target storage pool in as a parameter to the API command when performing the migration of the VM (with its storage) from one cluster to another.

At least this is how it used to work.

One alternative is to keep a single managed-storage method for checking, but to drop the zone-wide requirement. In fact, I think we can get away with only calling that check method once then.

Take a look at this proposal:

mike-tutkowski@bea1dad

@mike-tutkowski

This comment has been minimized.

Copy link
Member

mike-tutkowski commented Jul 27, 2018

By the way, I didn't change the test for mike-tutkowski@bea1dad, so it won't technically compile. I intented it more to give you an idea of where we might want to go with the code here.

@rafaelweingartner

This comment has been minimized.

Copy link
Member Author

rafaelweingartner commented Jul 27, 2018

Ok, no problem, I got the gist.

Those changes created a confusion for me though.

if (ScopeType.HOST.equals(currentPool.getScope())) {
createVolumeToStoragePoolMappingIfNeeded(profile, targetHost, volumeToPoolObjectMap, volume, currentPool);

executeManagedStorageChecksWhenTargetStoragePoolNotProvided(targetHost, currentPool, volume);

This comment has been minimized.

@rafaelweingartner

rafaelweingartner Jul 27, 2018

Author Member

This is the bit where I am confused. Let's say a VM has two volumes. Volume A is in a managed storage at the cluster level, and volume B is in a local storage. Then, when I want to migrate this VM between hosts of the same cluster this method is triggered.

However, an exception will be raised here since the VM needs to migrate a volume between local storage of hosts. Volume B is going to be mapped to a local storage at the target host, and volume A can be left where it is. The problem is that this validation here will raise an exception for volume A, because it is a cluster wide managed storage and we are not executing that checks between cluster IDs of the targetHost and current storage pool.

Do you understand what I am saying?

This comment has been minimized.

@mike-tutkowski

mike-tutkowski Jul 27, 2018

Member

Yes - take a look at the new code I proposed: mike-tutkowski@bea1dad .

It does not have any managed-storage verification logic in createStoragePoolMappingsForVolumes anymore.

I did have to add an if statement, though. If managed storage, then make sure the target host can see the managed storage in question (if it can't, thrown an exception).

This comment has been minimized.

@rafaelweingartner

rafaelweingartner Jul 27, 2018

Author Member

So, to make sure the target host can "see" the current managed storage, isn't it a matter of checking if the the target host and the current storage pool of the volume (which is managed storage) are in the same cluster?

Sorry I feel that we need to be able to discuss and just then exchange code. Otherwise, it is becoming trial and error which I really dislike.

This comment has been minimized.

@mike-tutkowski

mike-tutkowski Jul 27, 2018

Member

If the primary storage is zone wide, then it won't have a cluster ID.

This comment has been minimized.

@rafaelweingartner

rafaelweingartner Jul 27, 2018

Author Member

That is exactly what I understood before. What I do not understand is why we need those executeManagedStorageChecksWhenTargetStoragePoolNotProvided and executeManagedStorageChecksWhenTargetStoragePoolProvided instead of a single executeManagedStorageChecks.

Can you check at the last commit I pushed? Why would we need to divide that executeManagedStorageChecks into two other methods?

@mike-tutkowski

This comment has been minimized.

Copy link
Member

mike-tutkowski commented Jul 27, 2018

@rafaelweingartner

This comment has been minimized.

Copy link
Member Author

rafaelweingartner commented Jul 27, 2018

Yes, the second link has only one method. The problem with the second suggestion is that it causes a bug as I mentioned. I mean, it will create a situation where it will not allow the migration of VMs with volumes in cluster-wide managed storage, even though it should be supported.

Imagine a VM that has two volumes one placed in local storage and other in a managed storage. The `migrateVirtualMachineWithVolume' is called to migrate the VM between hosts of the same cluster. The migration should be supported as long as the target host has a local storage to receive one of the volumes, and the other one does not need to be migrated as it is in a cluster-wide managed storage and the target host is in the same cluster as the source host.

@mike-tutkowski

This comment has been minimized.

Copy link
Member

mike-tutkowski commented Jul 27, 2018

@mike-tutkowski

This comment has been minimized.

Copy link
Member

mike-tutkowski commented Jul 27, 2018

OK, I'm back at my computer now. I'm looking at this bit of code in my sample code: mike-tutkowski@bea1dad#diff-a19937c69234222505de0e6bcaad43b9R2370 .

That method is only invoked in one place and only checks the two things I mentioned (if the current pool is managed and, if so, is its ID equal to that of the target pool).

I don't think that will be a problem for the use case you specified.

@rafaelweingartner

This comment has been minimized.

Copy link
Member Author

rafaelweingartner commented Jul 30, 2018

So, now with the state of this PR. Is there something missing? I mean, do we need to apply some other changes?

@mike-tutkowski

This comment has been minimized.

Copy link
Member

mike-tutkowski commented Jul 31, 2018

Hi @rafaelweingartner Yes, just a couple, though. Here is a diff of my recommended final changes: mike-tutkowski@48fb1ec.

To summarize them:

The executeManagedStorageChecks method is only called from one place. It does not require a check to make sure the managed storage pool is at the zone level. Also, instead of executeManagedStorageChecks checking if the current pool's cluster ID is equal to that of the cluster ID of the target host, we just compare the IDs of the current and target pools (because we don't allow you to change the managed storage pool during this operation). I needed to change what’s passed in to the method to make this work. Comparing the cluster ID of the current pool to the cluster ID of the target host won’t work with zone-wide primary storage.

The other area I changed was in the createStoragePoolMappingsForVolumes method. It does not need to call executeManagedStorageChecks. However, we do need to handle managed storage specially here. If the storage pool is managed, make sure the target host can see this storage pool.

@rafaelweingartner rafaelweingartner force-pushed the rafaelweingartner:fixMigrateWithVolumeMethod branch from b2890e5 to 79fa1b3 Aug 29, 2018

@rafaelweingartner

This comment has been minimized.

Copy link
Member Author

rafaelweingartner commented Aug 29, 2018

@borisstoyanov

This comment has been minimized.

Copy link
Contributor

borisstoyanov commented Aug 29, 2018

@blueorangutan package

@blueorangutan

This comment has been minimized.

Copy link

blueorangutan commented Aug 29, 2018

@borisstoyanov a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress.

@blueorangutan

This comment has been minimized.

Copy link

blueorangutan commented Aug 29, 2018

Packaging result: ✔centos6 ✔centos7 ✔debian. JID-2272

@borisstoyanov

This comment has been minimized.

Copy link
Contributor

borisstoyanov commented Aug 29, 2018

@blueorangutan

This comment has been minimized.

Copy link

blueorangutan commented Aug 29, 2018

@borisstoyanov a Trillian-Jenkins test job (centos7 mgmt + kvm-centos7) has been kicked to run smoke tests

@GabrielBrascher

This comment has been minimized.

Copy link
Member

GabrielBrascher commented Aug 30, 2018

@rafaelweingartner @mike-tutkowski @borisstoyanov sorry, I have been busy. Let's wait for the results after rebasing then. Some failures seem to be the same failures on the test against PR #2773.

@blueorangutan

This comment has been minimized.

Copy link

blueorangutan commented Aug 30, 2018

Trillian test result (tid-2973)
Environment: kvm-centos7 (x2), Advanced Networking with Mgmt server 7
Total time taken: 26054 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr2761-t2973-kvm-centos7.zip
Intermittent failure detected: /marvin/tests/smoke/test_primary_storage.py
Intermittent failure detected: /marvin/tests/smoke/test_privategw_acl.py
Intermittent failure detected: /marvin/tests/smoke/test_snapshots.py
Intermittent failure detected: /marvin/tests/smoke/test_vm_life_cycle.py
Intermittent failure detected: /marvin/tests/smoke/test_host_maintenance.py
Intermittent failure detected: /marvin/tests/smoke/test_hostha_kvm.py
Smoke tests completed. 64 look OK, 5 have error(s)
Only failed tests results shown below:

Test Result Time (s) Test File
test_01_add_primary_storage_disabled_host Error 0.65 test_primary_storage.py
test_01_primary_storage_nfs Error 0.16 test_primary_storage.py
ContextSuite context=TestStorageTags>:setup Error 0.27 test_primary_storage.py
test_03_vpc_privategw_restart_vpc_cleanup Failure 1165.84 test_privategw_acl.py
test_02_list_snapshots_with_removed_data_store Error 1.19 test_snapshots.py
test_01_secure_vm_migration Error 74.09 test_vm_life_cycle.py
test_02_unsecure_vm_migration Error 98.57 test_vm_life_cycle.py
test_08_migrate_vm Error 18.96 test_vm_life_cycle.py
test_01_cancel_host_maintenace_with_no_migration_jobs Failure 0.12 test_host_maintenance.py
test_02_cancel_host_maintenace_with_migration_jobs Error 2.35 test_host_maintenance.py
@rafaelweingartner

This comment has been minimized.

Copy link
Member Author

rafaelweingartner commented Sep 1, 2018

@borisstoyanov after the rebase, the errors @DaanHoogland said were a problem disappeared, but new ones appeared. Is this some sort of environment problem?

@borisstoyanov

This comment has been minimized.

Copy link
Contributor

borisstoyanov commented Sep 3, 2018

@rafaelweingartner we should not get these errors if that's the latest master. @blueorangutan test

@blueorangutan

This comment has been minimized.

Copy link

blueorangutan commented Sep 3, 2018

@borisstoyanov a Trillian-Jenkins test job (centos7 mgmt + kvm-centos7) has been kicked to run smoke tests

@blueorangutan

This comment has been minimized.

Copy link

blueorangutan commented Sep 3, 2018

Trillian test result (tid-2982)
Environment: kvm-centos7 (x2), Advanced Networking with Mgmt server 7
Total time taken: 30946 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr2761-t2982-kvm-centos7.zip
Intermittent failure detected: /marvin/tests/smoke/test_affinity_groups_projects.py
Intermittent failure detected: /marvin/tests/smoke/test_affinity_groups.py
Intermittent failure detected: /marvin/tests/smoke/test_primary_storage.py
Intermittent failure detected: /marvin/tests/smoke/test_privategw_acl.py
Intermittent failure detected: /marvin/tests/smoke/test_public_ip_range.py
Intermittent failure detected: /marvin/tests/smoke/test_templates.py
Intermittent failure detected: /marvin/tests/smoke/test_usage.py
Intermittent failure detected: /marvin/tests/smoke/test_volumes.py
Intermittent failure detected: /marvin/tests/smoke/test_vpc_vpn.py
Intermittent failure detected: /marvin/tests/smoke/test_hostha_kvm.py
Smoke tests completed. 62 look OK, 7 have error(s)
Only failed tests results shown below:

Test Result Time (s) Test File
test_DeployVmAntiAffinityGroup_in_project Error 25.15 test_affinity_groups_projects.py
test_DeployVmAntiAffinityGroup Error 7.67 test_affinity_groups.py
test_01_primary_storage_nfs Error 0.17 test_primary_storage.py
ContextSuite context=TestStorageTags>:setup Error 0.31 test_primary_storage.py
test_03_vpc_privategw_restart_vpc_cleanup Failure 1108.16 test_privategw_acl.py
test_04_extract_template Failure 128.34 test_templates.py
ContextSuite context=TestISOUsage>:setup Error 0.00 test_usage.py
test_06_download_detached_volume Failure 136.67 test_volumes.py
@rafaelweingartner

This comment has been minimized.

Copy link
Member Author

rafaelweingartner commented Sep 3, 2018

@borisstoyanov now the errors changed again.
These are repeated, but they do not seem to be related to changes developed here.

test_01_primary_storage_nfs 	Error 	0.16 	test_primary_storage.py
test_03_vpc_privategw_restart_vpc_cleanup 	Failure 	1165.84 	test_privategw_acl.py

Errors from test_01_primary_storage_nfs:

2018-09-03 08:54:41,143 - CRITICAL - EXCEPTION: test_01_primary_storage_nfs: ['Traceback (most recent call last):\n', '  File "/usr/lib64/python2.7/unittest/case.py", line 369, in run\n    testMethod()\n', '  File "/marvin/tests/smoke/test_primary_storage.py", line 105, in test_01_primary_storage_nfs\n    podid=self.pod.id\n', '  File "/usr/lib/python2.7/site-packages/marvin/lib/base.py", line 2883, in create\n    return StoragePool(apiclient.createStoragePool(cmd).__dict__)\n', '  File "/usr/lib/python2.7/site-packages/marvin/cloudstackAPI/cloudstackAPIClient.py", line 3188, in createStoragePool\n    response = self.connection.marvinRequest(command, response_type=response, method=method)\n', '  File "/usr/lib/python2.7/site-packages/marvin/cloudstackConnection.py", line 379, in marvinRequest\n    raise e\n', 'CloudstackAPIException: Execute cmd: createstoragepool failed, due to: errorCode: 530, errorText:Failed to add data store: Storage pool nfs://10.2.0.16/acs/primary/pr2761-t2982-kvm-centos7/marvin_pri1 already in use by another pod (id=1)\n']
2018-09-03 08:54:41,278 - CRITICAL - EXCEPTION: test_01_primary_storage_nfs: ['Traceback (most recent call last):\n', '  File "/usr/lib/python2.7/site-packages/nose/suite.py", line 209, in run\n    self.setUp()\n', '  File "/usr/lib/python2.7/site-packages/nose/suite.py", line 292, in setUp\n    self.setupContext(ancestor)\n', '  File "/usr/lib/python2.7/site-packages/nose/suite.py", line 315, in setupContext\n    try_run(context, names)\n', '  File "/usr/lib/python2.7/site-packages/nose/util.py", line 471, in try_run\n    return func()\n', '  File "/marvin/tests/smoke/test_primary_storage.py", line 406, in setUpClass\n    tags=cls.services["storage_tags"]["a"]\n', '  File "/usr/lib/python2.7/site-packages/marvin/lib/base.py", line 2883, in create\n    return StoragePool(apiclient.createStoragePool(cmd).__dict__)\n', '  File "/usr/lib/python2.7/site-packages/marvin/cloudstackAPI/cloudstackAPIClient.py", line 3188, in createStoragePool\n    response = self.connection.marvinRequest(command, response_type=response, method=method)\n', '  File "/usr/lib/python2.7/site-packages/marvin/cloudstackConnection.py", line 379, in marvinRequest\n    raise e\n', 'CloudstackAPIException: Execute cmd: createstoragepool failed, due to: errorCode: 530, errorText:Failed to add data store: Storage pool nfs://10.2.0.16/acs/primary/pr2761-t2982-kvm-centos7/marvin_pri1 already in use by another pod (id=1)\n']

Errors from test_03_vpc_privategw_restart_vpc_cleanup

2018-09-03 09:42:17,448 - CRITICAL - FAILED: test_03_vpc_privategw_restart_vpc_cleanup: ['Traceback (most recent call last):\n', '  File "/usr/lib64/python2.7/unittest/case.py", line 369, in run\n    testMethod()\n', '  File "/marvin/tests/smoke/test_privategw_acl.py", line 291, in test_03_vpc_privategw_restart_vpc_cleanup\n    self.performVPCTests(vpc_off, restart_with_cleanup = True)\n', '  File "/marvin/tests/smoke/test_privategw_acl.py", line 381, in performVPCTests\n    self.check_pvt_gw_connectivity(vm1, public_ip_1, [vm2.nic[0].ipaddress, vm1.nic[0].ipaddress])\n', '  File "/marvin/tests/smoke/test_privategw_acl.py", line 745, in check_pvt_gw_connectivity\n    self.fail("SSH Access failed for %s: %s" % (virtual_machine, e))\n', '  File "/usr/lib64/python2.7/unittest/case.py", line 450, in fail\n    raise self.failureException(msg)\n', 'AssertionError: SSH Access failed for <marvin.lib.base.VirtualMachine instance at 0x422a3f8>: [Errno 113] No route to host\n']
@GabrielBrascher
Copy link
Member

GabrielBrascher left a comment

Code LGTM

@mike-tutkowski

This comment has been minimized.

Copy link
Member

mike-tutkowski commented Sep 10, 2018

It looks like we have two LGTMs and tests passing within expectations. @rafaelweingartner Would you like to merge this PR?

@mike-tutkowski

This comment has been minimized.

Copy link
Member

mike-tutkowski commented Sep 10, 2018

@rafaelweingartner Or are you guys still investigating regression-test issues?

@rafaelweingartner rafaelweingartner merged commit f550d70 into apache:master Sep 10, 2018

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
@rhtyd

This comment has been minimized.

Copy link
Member

rhtyd commented Sep 28, 2018

@rafaelweingartner Yiping Zhang reported on dev@ about PR #2425 that added a regression. The PR #2425 was accepted in 4.11 branch, can you investigate and if applicable send a PR to fix the regression towards 4.11 branch so we can target the regression fix 4.11.2.0/4.11.3.0?

@rafaelweingartner

This comment has been minimized.

Copy link
Member Author

rafaelweingartner commented Sep 28, 2018

@rhtyd thanks.

I have replied to the e-mail thread.

@rhtyd

This comment has been minimized.

Copy link
Member

rhtyd commented Sep 28, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.