-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CLOUDSTACK-8762: Check to confirm disk activity before starting a VM #754
Conversation
cloudstack-pull-rats #423 SUCCESS |
cloudstack-pull-analysis #356 UNSTABLE |
I'll be happy to help with reviewing this PR, but I need help testing. Can you point me to a Marvin test that tests this functionality? |
@miguelaferreira there is no marvin test written for this, this is a specific case of hosts fencing where two vms might be trying to write to the same disk, since it's a corner case not sure which existing marvin test could be used; unit tests are included. |
@@ -3923,6 +3944,17 @@ public int compare(DiskTO arg0, DiskTO arg1) { | |||
volPath = physicalDisk.getPath(); | |||
} | |||
|
|||
// check for disk activity, if detected we should exit because vm is running elsewhere |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't this block best be refactored into a separate method?
I'm reading through the code to see if I understand the change. The next thing is to find a way to test it. Might be difficult, but since it is a corner case (and regressions love corners), it is definitely worth it. |
It seems to me that the "VM volume/disk file activity checker" is a good candidate to become a new class that can be tested independently, and reused elsewhere. What do you think @bhaisaab? This would also make it easier to test in a Java integration (or unit) test. By fiddling with the file system attributes. Which you are already doing, btw! BTW, the same applies to #753 |
@miguelaferreira okay, I'll see what I can do. I'm also exploring other ways to detech if the vm disk file has changed, using an exhaustive checksum comparison for example as file attributes cannot be trusted for nfs partitions that were mounted with noatime. I would also like to discuss perhaps here, if anyone has better ideas or suggestions. |
@bhaisaab that's great. So if you think that in the future this implementation might change it would even be better to create an interface to use in the When a better implementation comes along, it is easier to switch. |
Seems, good, but, all the logging is debug. Isn't there something which we have to print on info or error here? We want to make sure that we also print useful stuff on info or error, not all systems should run on debug in production. |
@wido no need for info that, will fix to use error in case of errors |
@wido just checked again, most debug messages are in loop and changing them to info or errors would be unnecessary. In case of error, run time exceptions are thrown that would be captured in the logs and won't allow VMs to start. @miguelaferreira I discussed the issue of changing logic there, it seems that the mtime should be reliable enough and I won't be adding md5 or other checksum checking logic as that will be too slow and costing on CPU and IO. Do you still want me to refactor the logic as a separate class or move to say FileUtil in the cloud-utils package? |
@bhaisaab It would be great if you could refactor the logic out, since it can be an object on its own, and that class is already huge as it is (> 5k LOC!). Thanks |
Implements a VM volume/disk file activity checker that checks if QCOW2 file has been changed before starting the VM. This is useful as a pessimistic approach to save VMs that were running on faulty hosts that CloudStack could try to launch on other hosts while the host was not cleanly fenced. This is optional and available only if you enable the settings in agent.properties file, on per-host basis. Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
bd12614
to
711acfa
Compare
cloudstack-pull-rats #437 SUCCESS |
@miguelaferreira have moved the code to a new util class (so can be reusable perhaps by other hypervisors) ^^ |
cloudstack-pull-analysis #370 SUCCESS |
LGTM Fyi, for RBD we can potentially fix this with the locking provided by RBD itself. |
I'm not able to build this branch with maven.
It does not seem related to this PR, but this week I've build master many times and all tests were passing. Any idea on what has been merged recently that could cause this? |
@wido can you build it? |
@miguelaferreira I didn't build it since all checks were green :) |
@wido can you please try? |
@miguelaferreira from the logs, looks like you were building 4.5 branch (the 4.5.2 artifact); on my system the build passes as well, are you using Java8? |
@bhaisaab I've checked out this PR, which is your |
@miguelaferreira I'm not sure, but this seems environment specific; travis is green and build also work in my environment; are you still able to reproduce. @wido @wilderrodrigues @remibergsma @abhinandanprateek @kishankavala review please? |
Yes, I'm still getting the same error. It could very well be an environment issue. |
@bhaisaab I've created a totally new test environment, checked out your PR and started building it. I'm already past the point where it was failing before, so it was indeed an environment issue. I'll let you know if I can build it fully and all unit-tests are passing. |
When I build the entire project mvn always gets stuck running unit tests of some project (not always the same). I've built and unit-tested the three modules this PR touches:
👍 |
@miguelaferreira @wilderrodrigues @wido @jburwell @abhinandanprateek @kishankavala hi, can you help review this, thanks |
@bhaisaab There are 2 reviews already. @wido and @miguelaferreira gave 👍 I think you can merge this and #753 |
Thanks @karuturi merging now |
CLOUDSTACK-8762: Check to confirm disk activity before starting a VMImplements a VM volume/disk file activity checker that checks if QCOW2 file has been changed before starting the VM. This is useful as a pessimistic approach to save VMs that were running on faulty hosts that CloudStack could try to launch on other hosts while the host was not cleanly fenced. This is optional and available only if you enable the settings in agent.properties file, on per-host basis. Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com> * pr/754: CLOUDSTACK-8762: Confirm disk activity before starting a VM Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
… Domain Admin (#754) * account: choose `User` is the default selection when the user role is Domain Admin * renamed to userRole * Fix the incorrect variable Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
Implements a VM volume/disk file activity checker that checks if QCOW2 file
has been changed before starting the VM. This is useful as a pessimistic
approach to save VMs that were running on faulty hosts that CloudStack could
try to launch on other hosts while the host was not cleanly fenced. This is
optional and available only if you enable the settings in agent.properties
file, on per-host basis.
Signed-off-by: Rohit Yadav rohit.yadav@shapeblue.com