Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BROOKLYN-264 Stop app while VM still being provisioned: vm is left #211

Merged
merged 1 commit into from
Jul 18, 2016

Conversation

bostko
Copy link
Contributor

@bostko bostko commented Jun 21, 2016

  • Initial Live Tests

@bostko bostko force-pushed the brooklyn-264_expunge_early_tests branch 2 times, most recently from de8348b to f5a731f Compare June 21, 2016 18:16
@bostko bostko force-pushed the brooklyn-264_expunge_early_tests branch 3 times, most recently from 8fed890 to 17a0c04 Compare June 22, 2016 13:32
Entities.destroyCatching(app);
LOG.info("Time for expunging: {}", System.currentTimeMillis() - beginTime);

NodeMetadata nodeMetadata = Iterables.getFirst(((AWSEC2ComputeService) jcloudsLocation.getComputeService()).listNodesDetailsMatching(new Predicate<ComputeMetadata>() {
Copy link
Contributor

@andreaturli andreaturli Jun 23, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can simplify with something like

import static com.google.common.base.Predicates.*;
import static org.jclouds.compute.predicates.NodePredicates.*;
import static com.google.common.collect.Iterables.*;

...

Iterable<? extends NodeMetadata> nodes = filter(jcloudsLocation.getComputeService().listNodesDetailsMatching(all()), and(withGroup(group), not(TERMINATED))));

@bostko bostko force-pushed the brooklyn-264_expunge_early_tests branch 3 times, most recently from bd06a62 to f0c556d Compare June 23, 2016 15:36
* </ul>
*/
@Test(groups = {"Live"})
public void testExpunge() {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW the whole test passes for 3:45 seconds.
If we exclude the listNodes call, users will wait approximately 3 and a half minutes for expunge if they immediately click deploy and expunge.

@aledsage
Copy link
Contributor

My comments so far are bike-shedding.

Looking at the big picture, I need to think more about:

  • How should we trigger the delete, so that it safely handles success and failure of provisioning? Options include:
    • As you've done here, have stop() wait for the machine to be set; but abort if start() fails.
    • Mark the entity's internal state as "requiring machine-release", and have the start() code immediately release the machine. But we'd still need stop() to wait for that to be done, otherwise we might go on to unmanage the entity and thus cancel the tasks.
  • For speed:
    • Can we use the jclouds event bus to be told of the VM's id long before jclouds has returned from the provisioning call? @andreaturli what do you think?
    • Can we tell the JcloudsLocation instance that the provisioning has been aborted (this would allow us to avoid waiting-for-sshable, etc). Not quite sure how best to do that.

We can ignore the speed issue for now, unless there is a quick win that greatly speeds things up. The main thing is to get it functionally correct, and with clean code.

@bostko
Copy link
Contributor Author

bostko commented Jun 24, 2016

Unfortunately this change dramatically slow down unit tests :(.
Stopping EmptySoftwareProcess is a common task for most of the tests. Adding a five minute timeout on stop makes mvn clean install run for hours.

@bostko bostko force-pushed the brooklyn-264_expunge_early_tests branch 7 times, most recently from 3d2636c to 6e0e5ed Compare June 27, 2016 08:02
@bostko
Copy link
Contributor Author

bostko commented Jun 27, 2016

I updated the PR.

@bostko bostko force-pushed the brooklyn-264_expunge_early_tests branch from 6e0e5ed to b80bf9a Compare June 27, 2016 10:10
/*Do not retry if non-jclouds location has started.*/
if (Boolean.TRUE.equals(entity().sensors().get(Sensors.newBooleanSensor("jclouds.provisioningStarted")))
&& machine.isAbsent()) {
org.jclouds.util.Predicates2.retry(new Predicate<MachineLifecycleEffectorTasks>() {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about using Predicates2.retry across Apache Brooklyn. Can you suggest a guava alternative or what do you think about adding such function in brooklyn-core?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels like you should use our existing org.apache.brooklyn.util.repeat.Repeater. My personal opinion is that the builder pattern for the Repeater is easier to read, compared to the Predicates2.retry approach. The latter is probably only a good idea when you want to pass it to something that expects a Predicate.

@bostko bostko force-pushed the brooklyn-264_expunge_early_tests branch 2 times, most recently from 4990625 to 1572626 Compare June 29, 2016 17:47
@bostko bostko force-pushed the brooklyn-264_expunge_early_tests branch 3 times, most recently from 19f053f to 174ad3a Compare June 29, 2016 18:00
@bostko
Copy link
Contributor Author

bostko commented Jun 29, 2016

I think I have much better assertions now.

@bostko bostko force-pushed the brooklyn-264_expunge_early_tests branch 5 times, most recently from 85ba98b to 9486e30 Compare June 29, 2016 18:48
* </ul>
*/
@Test(groups = {"Live"})
public void verifyJclousMachineIsExpungedWhenStoppedImmediatelyAfterStart() {
Copy link
Contributor Author

@bostko bostko Jul 7, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have to add a test for one more case. Will see how straight it is to handle it.
When an exception occurs during deployment, in other words the execution goes here https://github.com/apache/brooklyn-server/blob/master/locations/jclouds/src/main/java/org/apache/brooklyn/location/jclouds/JcloudsLocation.java#L1062-L1105
and destroyOnFailure is false then the machine and its nodemetdata will not be persisted for later expunging.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Supporting this case will need won't be one line change.
What do you think about leaving this for a later PR?

@bostko bostko force-pushed the brooklyn-264_expunge_early_tests branch from 9486e30 to 5be1b52 Compare July 13, 2016 11:11
@bostko
Copy link
Contributor Author

bostko commented Jul 13, 2016

@aledsage do you have an idea how can I tell the stop task that the provisioning started other than the sensor flag. I feel like the code will need serious refactoring in order stop task to know earlier about the jclouds node.
The entity knows only about the machine location. It doesn't know about the LocationProvisioner.

Do you thing it is reasonable assigning the location provisioner to the sensor instead true or null?

@bostko bostko force-pushed the brooklyn-264_expunge_early_tests branch 3 times, most recently from 72f2f68 to 2bbd008 Compare July 15, 2016 14:17
- put a wait in MachineLifecycleEffectorTasks.java#doStop for MachineLocations
- Live test for jclouds locations
- integration test for byon locations
@bostko bostko force-pushed the brooklyn-264_expunge_early_tests branch from 2bbd008 to 5b3c412 Compare July 15, 2016 14:23
@bostko
Copy link
Contributor Author

bostko commented Jul 15, 2016

  • I marked the sensor as "transient"
  • I added a code which removes the sensor immidiately after a machine location is added
  • I added an enum class for the jclouds.provisioning.started flag.

    However I think with this code it could stay a boolean.

    I couldn't figure out a better place for the jclouds.provisioning.started than Attributes which is visible by software-base and locations-jclouds

@aledsage
Copy link
Contributor

I've been looking at how we can do this more generally (entirely in MachineLifecycleEffectorTasks, so it's not jclouds-specific). That makes the code easier to follow and easier to test. The former because the places we set the sensors is all in the same place; the latter because we can do it all with unit tests, rather than needing to provision VMs in live tests.

I have a commit that can go on top of this commit. I'll merge yours now, and then submit mine as a new PR.

@asfgit asfgit merged commit 5b3c412 into apache:master Jul 18, 2016
asfgit pushed a commit that referenced this pull request Jul 18, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants