Improve error handling in OneFlow scale-up operations #6545

OpenNebulaSupport · 2024-03-20T08:47:13Z

Description
The OneFlow component doesn't handle correctly the case when a single VM deployment fails during the role scaling operation. Instead of reporting a failure, it reports SUCCESS leaving the VM body information empty inside the JSON Service body:

# Extract from the JSON Service Body
  ...
  "nodes": [ 
     {
        "deploy_id": 4,
        "vm_info": null
     }
  ]
  ...

This may result in unexpected behavior, since the VM isn't controlled by the service or any other component and its information remains empty in the JSON body of the service.

To Reproduce

Create a OneFlow Service with the role scaling policies enabled. The following template was used to reproduce the case:

{
    "name": "test-service",
    "deployment": "straight",
    "description": "test-service template for debug purposes",
    "roles": [
        {
        "name": "master",
        "cardinality": 1,
        "vm_template": 1,
        "vm_template_contents": "",
        "min_vms": 1,
        "max_vms": 1,
        "cooldown": 5,
        "elasticity_policies": [],
        "scheduled_policies": []
        },
        {
        "name": "worker",
        "cardinality": 2,
        "vm_template": 2,
        "parents": ["master"],
        "vm_template_contents": "",
        "min_vms": 2,
        "max_vms": 10,
        "cooldown": 60,
        "elasticity_policies": [
            {
            "type": "CHANGE",
            "adjust": 1,
            "expression": "TEST_ATTR > 100",
            "period_number": 1,
            "period": 60,
            "cooldown": 120
            }
        ],
        "scheduled_policies": []
        }
    ]
}

Once the Service is deployed, wait for the service to automatically scale (you can force this by creating an attribute on the VMs and change its value).
To force the scaling operation to fail, once the Service is in RUNNING state, you can DISABLE all hosts, so that when the Service tries to scale, it's going to fail since there are no more free hosts left.
At this point, the OneFlow service will add the VM to the Service body with the empty body.

Expected behavior
The Service scaling operation is cancelled and the error is reported correctly.

Details

Affected Component: OneFlow
Hypervisor: KVM
Version: 6.8

Additional context
In some cases, the VM deployment works correctly during the scaling operation, but due to other errors or unexpected messages during deployment may cause the same behavior.

Progress Status

Code committed
Testing - QA
Documentation (Release notes - resolved issues, compatibility, known issues)

The text was updated successfully, but these errors were encountered:

OpenNebulaSupport added Type: Bug Category: Orchestration - Flow Sponsored Status: Accepted Priority: Normal labels Mar 20, 2024

tinova added this to the Release 7.0 milestone Mar 21, 2024

rsmontero modified the milestones: Release 7.0, Release 6.8.3 Apr 3, 2024

vickmp added a commit to OpenNebula/docs that referenced this issue Apr 16, 2024

B OpenNebula/one#6545: Update resolved issues

344e707

vickmp mentioned this issue Apr 16, 2024

B OpenNebula/one#6545: Update resolved issues OpenNebula/docs#2895

Merged

1 task

rsmontero pushed a commit to OpenNebula/docs that referenced this issue Apr 16, 2024

B OpenNebula/one#6545: Update resolved issues (#2895)

23d058a

rsmontero closed this as completed Apr 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve error handling in OneFlow scale-up operations #6545

Improve error handling in OneFlow scale-up operations #6545

OpenNebulaSupport commented Mar 20, 2024

Improve error handling in OneFlow scale-up operations #6545

Improve error handling in OneFlow scale-up operations #6545

Comments

OpenNebulaSupport commented Mar 20, 2024

Progress Status