-
Notifications
You must be signed in to change notification settings - Fork 53
BROOKLYN-325: alert if provisioning/termination aborted on rebind #435
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
9b1ccb3 to
7969d22
Compare
| @@ -0,0 +1,52 @@ | |||
| /* | |||
| * or more contributor license agreements. See the NOTICE file | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First line is missing from the Licence
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good spot, interestingly rat doesn't complain.
| public interface AttributesInternal extends Attributes { | ||
| @Beta | ||
| public static final AttributeSensor<ProvisioningTaskState> INTERNAL_PROVISIONING_TASK_STATE = new BasicAttributeSensor<ProvisioningTaskState>( | ||
| TypeToken.of(ProvisioningTaskState.class), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could use Sensors.newSensor(ProvisioningTaskState.class, below as well.
| org.apache.brooklyn.config.ConfigInheritance$Always : org.apache.brooklyn.config.ConfigInheritance$Legacy$Always | ||
| org.apache.brooklyn.config.ConfigInheritance$Merged : org.apache.brooklyn.config.ConfigInheritance$Legacy$Merged | ||
|
|
||
| org.apache.brooklyn.entity.software.base.lifecycle.MachineLifecycleEffectorTasks$ProvisioningTaskState : org.apache.brooklyn.core.entity.internal.AttributesInternal$ProvisioningTaskState |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a comment "since 0.10.0" so we can start removing entries at some point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@neykov Agree that would be useful. Do you think we should add it in this properties file, or just on the javadoc of the class in question?
For this file, there are a bunch of entries higher up that are also "since 0.10.0" so feels like we should figure out where and put the comment on that group.
It gets a little more complicated if we want to group them with finer granularity (e.g. some downstream projects like AMP have been taking timestamped early-access releases of Brooklyn. But here we should probably just focus on the Brooklyn GA releases.
I suggest we do that separately (figuring out where the changes since 0.9.0 begin!)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it will be useful to have a marker here as well. Agree it can be done separately. A good time to begin will be post-0.10.0 release. Then can treat all previous entries as added in 0.10.0.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On a tangential note, we should delete all anonymous classes that already have named replacements and use this mechanism to let rebind know about them.
| ServiceStateLogic.ServiceNotUpLogic.updateNotUpIndicator( | ||
| entity, | ||
| "Task aborted on rebind", | ||
| "Set to on-fire (from previous expected state "+expectedState+") because tasks aborted on rebind"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tasks get aborted during shutdown.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've changed wording in the message.
I also looked again at how we set INTERNAL_PROVISIONING_TASK_STATE. In org.apache.brooklyn.entity.software.base.lifecycle.MachineLifecycleEffectorTasks.ProvisionMachineTask, there is a try-finally block, where it sets the sensor to "done" in the finally block. If that managed to execute + get persisted, then on rebind we wouldn't know that it was still executing!
We get away with this because the managementContext.terminate() stops the rebindManager and BrooklynStorage, before it shuts down the execution manager. Therefore our thread doesn't get interrupted (triggering the finally block) until after persistence has been disabled.
| protected void markTransitioningEntityOnFireOnRebind(EntityInternal entity, Lifecycle expectedState) { | ||
| LOG.warn("Entity {} being marked as on-fire because it was in state {} on rebind; indicators={}", new Object[] {entity, expectedState, entity.getAttribute(Attributes.SERVICE_NOT_UP_INDICATORS)}); | ||
| ServiceStateLogic.setExpectedState(entity, Lifecycle.ON_FIRE); | ||
| ServiceStateLogic.ServiceNotUpLogic.updateNotUpIndicator( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Who gets to clean these entries? Does a start/restart get the entity in a healthy state?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question. No, start/restart won't clear that not-up-indicator.
In brooklyn-jsgui (i.e. the web-console) there is a "reset problems" button in the advanced tab. This makes a couple of API calls for the entity. It sets service.notUp.indicators and service.problems to be empty maps.
I think that is probably the only way to do this currently!
We need to at least update the docs about this. What would be a better way? Should start() and restart() effectors clear all not-up-indicators? That feels fiddly - it might cause the entity to immediately report serviceUp=true (which is why SoftwareProcessImpl.initEnrichers() calls ServiceNotUpLogic.updateNotUpIndicator(this, SERVICE_PROCESS_IS_RUNNING, "No information yet on whether this service is running")).
Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think start & restart should clear these particular values, not all. Since the errors are triggered by start, restart, stop actions - these effectors are kind of responsible. So a subsequent start will reset the state and the entity will become healthy.
| @VisibleForTesting | ||
| public enum ProvisioningTaskState { | ||
| RUNNING, | ||
| DONE; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the point of a DONE? Isn't null the same?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point; I'll delete "done" (it won't be in persisted state because INTERNAL_PROVISIONING_TASK_STATE used to have AttributeSensor.SensorPersistenceMode.NONE), and change it to use null.
|
Good clean up. Some questions above. |
|
Looks good. Tested all the cases and seems to work well. |
So that it can subsequently be used to fix BROOKLYN-325.
7969d22 to
8bef4c7
Compare
|
@neykov comments addressed in the new final commit (and rebased against master). |
|
Good to merge. I'll leave it to you to decide whether to clean the indicators in start/restart in this PR. Or leave it to the user to do it from the UI. |
|
Thanks, I'll not include clearing the indicators in this PR. |
No description provided.