Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CLOUDSTACK-8762: Check to confirm disk activity before starting a VM #754

Merged
merged 1 commit into from
Sep 2, 2015

Conversation

rohityadavcloud
Copy link
Member

Implements a VM volume/disk file activity checker that checks if QCOW2 file
has been changed before starting the VM. This is useful as a pessimistic
approach to save VMs that were running on faulty hosts that CloudStack could
try to launch on other hosts while the host was not cleanly fenced. This is
optional and available only if you enable the settings in agent.properties
file, on per-host basis.

Signed-off-by: Rohit Yadav rohit.yadav@shapeblue.com

@asfbot
Copy link

asfbot commented Aug 27, 2015

cloudstack-pull-rats #423 SUCCESS
This pull request looks good

@asfbot
Copy link

asfbot commented Aug 27, 2015

cloudstack-pull-analysis #356 UNSTABLE
Looks like there's a problem with this pull request

@miguelaferreira
Copy link
Contributor

I'll be happy to help with reviewing this PR, but I need help testing. Can you point me to a Marvin test that tests this functionality?

@rohityadavcloud
Copy link
Member Author

@miguelaferreira there is no marvin test written for this, this is a specific case of hosts fencing where two vms might be trying to write to the same disk, since it's a corner case not sure which existing marvin test could be used; unit tests are included.

@@ -3923,6 +3944,17 @@ public int compare(DiskTO arg0, DiskTO arg1) {
volPath = physicalDisk.getPath();
}

// check for disk activity, if detected we should exit because vm is running elsewhere
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't this block best be refactored into a separate method?

@miguelaferreira
Copy link
Contributor

I'm reading through the code to see if I understand the change.

The next thing is to find a way to test it. Might be difficult, but since it is a corner case (and regressions love corners), it is definitely worth it.

@miguelaferreira
Copy link
Contributor

It seems to me that the "VM volume/disk file activity checker" is a good candidate to become a new class that can be tested independently, and reused elsewhere. What do you think @bhaisaab?

This would also make it easier to test in a Java integration (or unit) test. By fiddling with the file system attributes. Which you are already doing, btw!
By easier I mean, easier to extend and maintain because it would be a separate test case, as opposed to being included in the test case for LibvirtComputingResource

BTW, the same applies to #753

@rohityadavcloud
Copy link
Member Author

@miguelaferreira okay, I'll see what I can do. I'm also exploring other ways to detech if the vm disk file has changed, using an exhaustive checksum comparison for example as file attributes cannot be trusted for nfs partitions that were mounted with noatime. I would also like to discuss perhaps here, if anyone has better ideas or suggestions.

@miguelaferreira
Copy link
Contributor

@bhaisaab that's great. So if you think that in the future this implementation might change it would even be better to create an interface to use in the LibvirtComputingResource(and others). And as first implementation it would get the code you write here to check the file attributes.

When a better implementation comes along, it is easier to switch.

@wido
Copy link
Contributor

wido commented Aug 28, 2015

Seems, good, but, all the logging is debug. Isn't there something which we have to print on info or error here?

We want to make sure that we also print useful stuff on info or error, not all systems should run on debug in production.

@rohityadavcloud
Copy link
Member Author

@wido no need for info that, will fix to use error in case of errors

@rohityadavcloud
Copy link
Member Author

@wido just checked again, most debug messages are in loop and changing them to info or errors would be unnecessary. In case of error, run time exceptions are thrown that would be captured in the logs and won't allow VMs to start.

@miguelaferreira I discussed the issue of changing logic there, it seems that the mtime should be reliable enough and I won't be adding md5 or other checksum checking logic as that will be too slow and costing on CPU and IO. Do you still want me to refactor the logic as a separate class or move to say FileUtil in the cloud-utils package?

@miguelaferreira
Copy link
Contributor

@bhaisaab It would be great if you could refactor the logic out, since it can be an object on its own, and that class is already huge as it is (> 5k LOC!). Thanks

Implements a VM volume/disk file activity checker that checks if QCOW2 file
has been changed before starting the VM. This is useful as a pessimistic
approach to save VMs that were running on faulty hosts that CloudStack could
try to launch on other hosts while the host was not cleanly fenced. This is
optional and available only if you enable the settings in agent.properties
file, on per-host basis.

Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
@asfbot
Copy link

asfbot commented Aug 28, 2015

cloudstack-pull-rats #437 SUCCESS
This pull request looks good

@rohityadavcloud
Copy link
Member Author

@miguelaferreira have moved the code to a new util class (so can be reusable perhaps by other hypervisors) ^^

@asfbot
Copy link

asfbot commented Aug 28, 2015

cloudstack-pull-analysis #370 SUCCESS
This pull request looks good

@wido
Copy link
Contributor

wido commented Aug 28, 2015

LGTM

Fyi, for RBD we can potentially fix this with the locking provided by RBD itself.

@miguelaferreira
Copy link
Contributor

I'm not able to build this branch with maven.

 mvn clean install -P developer,systemvm

...
[INFO] ------------------------------------------------------------------------
[INFO] Building Apache CloudStack Core 4.5.2
[INFO] ------------------------------------------------------------------------
Running com.cloud.agent.resource.virtualnetwork.VirtualRoutingResourceTest
log4j:WARN No appenders could be found for logger (org.springframework.test.context.junit4.SpringJUnit4ClassRunner).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
Tests run: 20, Failures: 2, Errors: 0, Skipped: 0, Time elapsed: 0.424 sec <<< FAILURE!
Running com.cloud.agent.transport.RequestTest
...
Results :

Failed tests:   testFirewallRulesCommand(com.cloud.agent.resource.virtualnetwork.VirtualRoutingResourceTest): expected:<...1.2/24:,64.10.10.10:[reverted:0:0:0:,64.10.10.10:TCP:22:80:10.10.1.1/24-10.10.1.2/24]:,> but was:<...1.2/24:,64.10.10.10:[TCP:22:80:10.10.1.1/24-10.10.1.2/24:,64.10.10.10:reverted:0:0:0]:,>
  testAggregationCommands(com.cloud.agent.resource.virtualnetwork.VirtualRoutingResourceTest): expected:<...1.2/24:,64.10.10.10:[reverted:0:0:0:,64.10.10.10:TCP:22:80:10.10.1.1/24-10.10.1.2/24]:,(..)

Tests run: 139, Failures: 2, Errors: 0, Skipped: 0
....

It does not seem related to this PR, but this week I've build master many times and all tests were passing. Any idea on what has been merged recently that could cause this?

@miguelaferreira
Copy link
Contributor

@wido can you build it?

@wido
Copy link
Contributor

wido commented Aug 28, 2015

@miguelaferreira I didn't build it since all checks were green :)

@miguelaferreira
Copy link
Contributor

@wido can you please try?

@rohityadavcloud
Copy link
Member Author

@miguelaferreira from the logs, looks like you were building 4.5 branch (the 4.5.2 artifact); on my system the build passes as well, are you using Java8?

@miguelaferreira
Copy link
Contributor

@bhaisaab I've checked out this PR, which is your shapeblue:4.5-CLOUDSTACK-8762 applied to apache:4.5. Yes, I do have Java8 installed, but the mvn config points to Java7, right?

@rohityadavcloud
Copy link
Member Author

@miguelaferreira I'm not sure, but this seems environment specific; travis is green and build also work in my environment; are you still able to reproduce.

@wido @wilderrodrigues @remibergsma @abhinandanprateek @kishankavala review please?

@miguelaferreira
Copy link
Contributor

Yes, I'm still getting the same error.

It could very well be an environment issue.
ping @wilderrodrigues @remibergsma can you guys give it a try?

@miguelaferreira
Copy link
Contributor

@bhaisaab I've created a totally new test environment, checked out your PR and started building it. I'm already past the point where it was failing before, so it was indeed an environment issue.

I'll let you know if I can build it fully and all unit-tests are passing.

@miguelaferreira
Copy link
Contributor

When I build the entire project mvn always gets stuck running unit tests of some project (not always the same).
I don't want to keep digging any further as this is not master.

I've built and unit-tested the three modules this PR touches:

mvn clean install -pl :cloud-agent
...
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 12.377s
[INFO] Finished at: Tue Sep 01 08:55:15 GMT 2015
[INFO] Final Memory: 40M/239M
[INFO] ------------------------------------------------------------------------

mvn clean install -pl :cloud-plugin-hypervisor-kvm
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 21.559s
[INFO] Finished at: Tue Sep 01 08:56:00 GMT 2015
[INFO] Final Memory: 40M/244M
[INFO] ------------------------------------------------------------------------

 mvn clean install -pl :cloud-utils
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 32.421s
[INFO] Finished at: Tue Sep 01 08:57:19 GMT 2015
[INFO] Final Memory: 39M/219M
[INFO] ------------------------------------------------------------------------

👍

@rohityadavcloud
Copy link
Member Author

@miguelaferreira @wilderrodrigues @wido @jburwell @abhinandanprateek @kishankavala hi, can you help review this, thanks

@karuturi
Copy link
Member

karuturi commented Sep 2, 2015

@bhaisaab There are 2 reviews already. @wido and @miguelaferreira gave 👍 I think you can merge this and #753

@rohityadavcloud
Copy link
Member Author

Thanks @karuturi merging now

@asfgit asfgit merged commit 711acfa into apache:4.5 Sep 2, 2015
asfgit pushed a commit that referenced this pull request Sep 2, 2015
CLOUDSTACK-8762: Check to confirm disk activity before starting a VMImplements a VM volume/disk file activity checker that checks if QCOW2 file
has been changed before starting the VM. This is useful as a pessimistic
approach to save VMs that were running on faulty hosts that CloudStack could
try to launch on other hosts while the host was not cleanly fenced. This is
optional and available only if you enable the settings in agent.properties
file, on per-host basis.

Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>

* pr/754:
  CLOUDSTACK-8762: Confirm disk activity before starting a VM

Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
rohityadavcloud pushed a commit that referenced this pull request Jan 20, 2021
… Domain Admin (#754)

* account: choose `User` is the default selection when the user role is Domain Admin

* renamed to userRole

* Fix the incorrect variable

Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants