Bug Description
The deploy(), start(), stop(), and undeploy() methods call state store operations (putApplicationDescriptor, putApplicationState) but don't check for success or handle exceptions, leading to silent failures and inconsistent cluster state.
Location
jplatform-cluster/src/main/java/org/flossware/jplatform/cluster/ClusteredApplicationManager.java:137-138,192,221,249
Problematic Code
@Override
public synchronized void deploy(ApplicationDescriptor descriptor) throws Exception {
String appId = descriptor.getApplicationId();
if (clusterManager != null && clusterManager.isJoined()) {
logger.info("[{}] Deploying application in cluster mode", appId);
// Write descriptor to cluster state
stateStore.putApplicationDescriptor(appId, descriptor); // Line 137 - can fail silently
stateStore.putApplicationState(appId, ApplicationState.DEPLOYED); // Line 138 - can fail silently
// ... continues even if state store writes failed ...
}
}
@Override
public synchronized void start(String applicationId) throws Exception {
if (clusterManager != null && clusterManager.isJoined() && scheduler != null) {
// ...
super.start(applicationId);
// Update cluster state
stateStore.putApplicationState(applicationId, ApplicationState.RUNNING); // Line 192 - can fail
}
}
@Override
public synchronized void stop(String applicationId) throws Exception {
if (clusterManager != null && clusterManager.isJoined() && scheduler != null) {
// ...
super.stop(applicationId);
// Update cluster state
stateStore.putApplicationState(applicationId, ApplicationState.STOPPED); // Line 221 - can fail
}
}
Impact
- Application deployed locally but descriptor not in cluster state
- Other nodes don't see the application
- Application running but cluster state shows DEPLOYED or STOPPED
- Monitoring dashboards show incorrect state
- Leader makes decisions based on stale/incorrect state
- No indication to caller that operation partially failed
Example
// Hazelcast network partition occurs
ClusteredApplicationManager manager = new ClusteredApplicationManager(...);
manager.deploy(descriptor);
// putApplicationDescriptor fails due to partition
// putApplicationState fails due to partition
// Method continues, calls super.deploy()
// Application deployed locally
// Cluster state not updated
// Other nodes don't know about application
// No exception thrown
manager.start(appId);
// super.start() succeeds
// putApplicationState fails
// Cluster still shows DEPLOYED but app is RUNNING
// Leader might try to start it on another node
Proposed Fix
@Override
public synchronized void deploy(ApplicationDescriptor descriptor) throws Exception {
String appId = descriptor.getApplicationId();
if (clusterManager != null && clusterManager.isJoined()) {
logger.info("[{}] Deploying application in cluster mode", appId);
// Write descriptor to cluster state - must succeed before local deployment
try {
stateStore.putApplicationDescriptor(appId, descriptor);
stateStore.putApplicationState(appId, ApplicationState.DEPLOYED);
} catch (Exception e) {
logger.error("[{}] Failed to update cluster state during deploy", appId, e);
throw new Exception("Failed to update cluster state: " + e.getMessage(), e);
}
// If leader, try to assign to a node
if (scheduler != null) {
try {
if (clusterManager.isLeader()) {
String assignedNode = scheduler.assignApplication(appId);
logger.info("[{}] Leader assigned application to node: {}", appId, assignedNode);
}
} catch (IllegalStateException e) {
logger.debug("[{}] Lost leadership during assignment: {}", appId, e.getMessage());
} catch (Exception e) {
logger.error("[{}] Failed to assign application", appId, e);
// Clean up cluster state
try {
stateStore.putApplicationState(appId, ApplicationState.FAILED);
} catch (Exception se) {
logger.error("[{}] Failed to update state to FAILED", appId, se);
}
throw new Exception("Failed to assign application: " + e.getMessage(), e);
}
// Check if assigned to local node
if (scheduler.isAssignedToLocalNode(appId)) {
logger.info("[{}] Application assigned to local node, deploying locally", appId);
try {
super.deploy(descriptor);
} catch (Exception e) {
// Update cluster state to reflect failure
try {
stateStore.putApplicationState(appId, ApplicationState.FAILED);
} catch (Exception se) {
logger.error("[{}] Failed to update state to FAILED", appId, se);
}
throw e;
}
}
}
} else {
// Standalone mode
logger.info("[{}] Deploying application in standalone mode", appId);
super.deploy(descriptor);
}
}
Similar fixes needed for start(), stop(), and undeploy() methods.
Bug Description
The
deploy(),start(),stop(), andundeploy()methods call state store operations (putApplicationDescriptor, putApplicationState) but don't check for success or handle exceptions, leading to silent failures and inconsistent cluster state.Location
jplatform-cluster/src/main/java/org/flossware/jplatform/cluster/ClusteredApplicationManager.java:137-138,192,221,249Problematic Code
Impact
Example
Proposed Fix
Similar fixes needed for start(), stop(), and undeploy() methods.