Skip to content

Conversation

@aledsage
Copy link
Contributor

No description provided.

@aledsage aledsage force-pushed the BROOKLYN-325-handle-lost-vm branch 2 times, most recently from 9b1ccb3 to 7969d22 Compare November 14, 2016 22:03
@@ -0,0 +1,52 @@
/*
* or more contributor license agreements. See the NOTICE file
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First line is missing from the Licence

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good spot, interestingly rat doesn't complain.

public interface AttributesInternal extends Attributes {
@Beta
public static final AttributeSensor<ProvisioningTaskState> INTERNAL_PROVISIONING_TASK_STATE = new BasicAttributeSensor<ProvisioningTaskState>(
TypeToken.of(ProvisioningTaskState.class),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could use Sensors.newSensor(ProvisioningTaskState.class, below as well.

org.apache.brooklyn.config.ConfigInheritance$Always : org.apache.brooklyn.config.ConfigInheritance$Legacy$Always
org.apache.brooklyn.config.ConfigInheritance$Merged : org.apache.brooklyn.config.ConfigInheritance$Legacy$Merged

org.apache.brooklyn.entity.software.base.lifecycle.MachineLifecycleEffectorTasks$ProvisioningTaskState : org.apache.brooklyn.core.entity.internal.AttributesInternal$ProvisioningTaskState
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a comment "since 0.10.0" so we can start removing entries at some point.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@neykov Agree that would be useful. Do you think we should add it in this properties file, or just on the javadoc of the class in question?

For this file, there are a bunch of entries higher up that are also "since 0.10.0" so feels like we should figure out where and put the comment on that group.

It gets a little more complicated if we want to group them with finer granularity (e.g. some downstream projects like AMP have been taking timestamped early-access releases of Brooklyn. But here we should probably just focus on the Brooklyn GA releases.

I suggest we do that separately (figuring out where the changes since 0.9.0 begin!)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it will be useful to have a marker here as well. Agree it can be done separately. A good time to begin will be post-0.10.0 release. Then can treat all previous entries as added in 0.10.0.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On a tangential note, we should delete all anonymous classes that already have named replacements and use this mechanism to let rebind know about them.

ServiceStateLogic.ServiceNotUpLogic.updateNotUpIndicator(
entity,
"Task aborted on rebind",
"Set to on-fire (from previous expected state "+expectedState+") because tasks aborted on rebind");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tasks get aborted during shutdown.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've changed wording in the message.


I also looked again at how we set INTERNAL_PROVISIONING_TASK_STATE. In org.apache.brooklyn.entity.software.base.lifecycle.MachineLifecycleEffectorTasks.ProvisionMachineTask, there is a try-finally block, where it sets the sensor to "done" in the finally block. If that managed to execute + get persisted, then on rebind we wouldn't know that it was still executing!

We get away with this because the managementContext.terminate() stops the rebindManager and BrooklynStorage, before it shuts down the execution manager. Therefore our thread doesn't get interrupted (triggering the finally block) until after persistence has been disabled.

protected void markTransitioningEntityOnFireOnRebind(EntityInternal entity, Lifecycle expectedState) {
LOG.warn("Entity {} being marked as on-fire because it was in state {} on rebind; indicators={}", new Object[] {entity, expectedState, entity.getAttribute(Attributes.SERVICE_NOT_UP_INDICATORS)});
ServiceStateLogic.setExpectedState(entity, Lifecycle.ON_FIRE);
ServiceStateLogic.ServiceNotUpLogic.updateNotUpIndicator(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Who gets to clean these entries? Does a start/restart get the entity in a healthy state?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. No, start/restart won't clear that not-up-indicator.

In brooklyn-jsgui (i.e. the web-console) there is a "reset problems" button in the advanced tab. This makes a couple of API calls for the entity. It sets service.notUp.indicators and service.problems to be empty maps.

I think that is probably the only way to do this currently!

We need to at least update the docs about this. What would be a better way? Should start() and restart() effectors clear all not-up-indicators? That feels fiddly - it might cause the entity to immediately report serviceUp=true (which is why SoftwareProcessImpl.initEnrichers() calls ServiceNotUpLogic.updateNotUpIndicator(this, SERVICE_PROCESS_IS_RUNNING, "No information yet on whether this service is running")).

Thoughts?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think start & restart should clear these particular values, not all. Since the errors are triggered by start, restart, stop actions - these effectors are kind of responsible. So a subsequent start will reset the state and the entity will become healthy.

@VisibleForTesting
public enum ProvisioningTaskState {
RUNNING,
DONE;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the point of a DONE? Isn't null the same?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point; I'll delete "done" (it won't be in persisted state because INTERNAL_PROVISIONING_TASK_STATE used to have AttributeSensor.SensorPersistenceMode.NONE), and change it to use null.

@neykov
Copy link
Member

neykov commented Nov 16, 2016

Good clean up. Some questions above.

@ivanayov
Copy link
Contributor

Looks good. Tested all the cases and seems to work well.

@aledsage aledsage force-pushed the BROOKLYN-325-handle-lost-vm branch from 7969d22 to 8bef4c7 Compare November 16, 2016 15:46
@aledsage
Copy link
Contributor Author

@neykov comments addressed in the new final commit (and rebased against master).

@neykov
Copy link
Member

neykov commented Nov 16, 2016

Good to merge. I'll leave it to you to decide whether to clean the indicators in start/restart in this PR. Or leave it to the user to do it from the UI.

@aledsage
Copy link
Contributor Author

Thanks, I'll not include clearing the indicators in this PR.

@asfgit asfgit merged commit 8bef4c7 into apache:master Nov 16, 2016
asfgit pushed a commit that referenced this pull request Nov 16, 2016
@aledsage aledsage deleted the BROOKLYN-325-handle-lost-vm branch November 16, 2016 20:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants