Describe the bug
There is a bug in the instance operation handling logic that causes an infinite cycle of regenerated DISABLE operations when applying operations like EVACUATE to an instance with HELIX_ENABLED=false in legacy fields. This prevents proper instance lifecycle transitions and causes unexpected behavior in instance swapping scenarios.
To Reproduce
- Create an instance in the cluster.
- Set DISABLE operation with USER source:
Apply to InstanceConf...
curl -X POST "http://<host>:<port>/admin/v2/clusters/<cluster>/instances/<instance>?command=setInstanceOperation&instanceOperationSource=USER&instanceOperation=DISABLE"
- Set EVACUATE operation with AUTOMATION source:
Apply to InstanceConf...
curl -X POST "http://<host>:<port>/admin/v2/clusters/<cluster>/instances/<instance>?command=setInstanceOperation&instanceOperationSource=AUTOMATION&instanceOperation=EVACUATE"
- Set EVACUATE operation again with AUTOMATION source
- Check the instance configuration in ZooKeeper
After the second EVACUATE operation, a new DISABLE operation is generated with:
- Updated timestamp (more recent than the EVACUATE timestamp)
- Added LEGACY_DISABLED_TYPE field
- The cycle repeats with each subsequent evacuate operation
Expected behavior
The DISABLE operation should not be regenerated with new timestamps when setting a different operation. Instance operations should be updated consistently without creating an infinite cycle.
Additional context
RCA:
The regeneration happens because:
- getActiveInstanceOperation() returns the last operation (EVACUATE)
- EVACUATE is in INSTANCE_DISABLED_OVERRIDABLE_OPERATIONS
- This causes getInstanceOperation() to override it with a newly generated DISABLE operation
Describe the bug
There is a bug in the instance operation handling logic that causes an infinite cycle of regenerated DISABLE operations when applying operations like EVACUATE to an instance with HELIX_ENABLED=false in legacy fields. This prevents proper instance lifecycle transitions and causes unexpected behavior in instance swapping scenarios.
To Reproduce
Apply to InstanceConf...
curl -X POST "http://<host>:<port>/admin/v2/clusters/<cluster>/instances/<instance>?command=setInstanceOperation&instanceOperationSource=USER&instanceOperation=DISABLE"Apply to InstanceConf...
curl -X POST "http://<host>:<port>/admin/v2/clusters/<cluster>/instances/<instance>?command=setInstanceOperation&instanceOperationSource=AUTOMATION&instanceOperation=EVACUATE"After the second EVACUATE operation, a new DISABLE operation is generated with:
Expected behavior
The DISABLE operation should not be regenerated with new timestamps when setting a different operation. Instance operations should be updated consistently without creating an infinite cycle.
Additional context
RCA:
The regeneration happens because: