problem
When automated DRS is enabled and a cluster contains a VM with a vGPU
passthrough profile, the DRS plan generation fails for the entire cluster
with an unhandled InvalidParameterValueException. No migrations are planned
for any VM in the cluster, even for the ones that are perfectly migratable.
Error log:
ERROR [o.a.c.c.ClusterDrsServiceImpl] Unable to generate DRS plans for cluster
Cluster {id: "3", name: "princeton-cluster01"}
com.cloud.exception.InvalidParameterValueException: Unsupported operation,
VM uses host passthrough, cannot migrate
at org.apache.cloudstack.cluster.ClusterDrsServiceImpl.getBestMigration(ClusterDrsServiceImpl.java:453)
at org.apache.cloudstack.cluster.ClusterDrsServiceImpl.getDrsPlan(ClusterDrsServiceImpl.java:362)
at org.apache.cloudstack.cluster.ClusterDrsServiceImpl.generateDrsPlanForAllClusters(ClusterDrsServiceImpl.java:289)
at org.apache.cloudstack.cluster.ClusterDrsServiceImpl.poll(ClusterDrsServiceImpl.java:181)
versions
ACS 4.21.0.0, KVM
The steps to reproduce the bug
- Enable automated DRS on a cluster (cluster.drs.enabled=true)
- Have at least one VM in the cluster using a GPU passthrough profile
- Wait for the DRS poll cycle to run
What to do about it?
In getBestMigration, catch the InvalidParameterValueException thrown by
listHostsForMigrationOfVM inside the VM loop and skip the VM with a debug
log. This allows the remaining eligible VMs to still be evaluated normally.
problem
When automated DRS is enabled and a cluster contains a VM with a vGPU
passthrough profile, the DRS plan generation fails for the entire cluster
with an unhandled
InvalidParameterValueException. No migrations are plannedfor any VM in the cluster, even for the ones that are perfectly migratable.
Error log:
ERROR [o.a.c.c.ClusterDrsServiceImpl] Unable to generate DRS plans for cluster
Cluster {id: "3", name: "princeton-cluster01"}
com.cloud.exception.InvalidParameterValueException: Unsupported operation,
VM uses host passthrough, cannot migrate
at org.apache.cloudstack.cluster.ClusterDrsServiceImpl.getBestMigration(ClusterDrsServiceImpl.java:453)
at org.apache.cloudstack.cluster.ClusterDrsServiceImpl.getDrsPlan(ClusterDrsServiceImpl.java:362)
at org.apache.cloudstack.cluster.ClusterDrsServiceImpl.generateDrsPlanForAllClusters(ClusterDrsServiceImpl.java:289)
at org.apache.cloudstack.cluster.ClusterDrsServiceImpl.poll(ClusterDrsServiceImpl.java:181)
versions
ACS 4.21.0.0, KVM
The steps to reproduce the bug
What to do about it?
In
getBestMigration, catch theInvalidParameterValueExceptionthrown bylistHostsForMigrationOfVMinside the VM loop and skip the VM with a debuglog. This allows the remaining eligible VMs to still be evaluated normally.