Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve error handling in OneFlow scale-up operations #6545

Closed
3 tasks
OpenNebulaSupport opened this issue Mar 20, 2024 · 0 comments
Closed
3 tasks

Improve error handling in OneFlow scale-up operations #6545

OpenNebulaSupport opened this issue Mar 20, 2024 · 0 comments

Comments

@OpenNebulaSupport
Copy link
Collaborator

Description
The OneFlow component doesn't handle correctly the case when a single VM deployment fails during the role scaling operation. Instead of reporting a failure, it reports SUCCESS leaving the VM body information empty inside the JSON Service body:

# Extract from the JSON Service Body
  ...
  "nodes": [ 
     {
        "deploy_id": 4,
        "vm_info": null
     }
  ]
  ...

This may result in unexpected behavior, since the VM isn't controlled by the service or any other component and its information remains empty in the JSON body of the service.

To Reproduce

  1. Create a OneFlow Service with the role scaling policies enabled. The following template was used to reproduce the case:

    {
        "name": "test-service",
        "deployment": "straight",
        "description": "test-service template for debug purposes",
        "roles": [
            {
            "name": "master",
            "cardinality": 1,
            "vm_template": 1,
            "vm_template_contents": "",
            "min_vms": 1,
            "max_vms": 1,
            "cooldown": 5,
            "elasticity_policies": [],
            "scheduled_policies": []
            },
            {
            "name": "worker",
            "cardinality": 2,
            "vm_template": 2,
            "parents": ["master"],
            "vm_template_contents": "",
            "min_vms": 2,
            "max_vms": 10,
            "cooldown": 60,
            "elasticity_policies": [
                {
                "type": "CHANGE",
                "adjust": 1,
                "expression": "TEST_ATTR > 100",
                "period_number": 1,
                "period": 60,
                "cooldown": 120
                }
            ],
            "scheduled_policies": []
            }
        ]
    }
    
  2. Once the Service is deployed, wait for the service to automatically scale (you can force this by creating an attribute on the VMs and change its value).

  3. To force the scaling operation to fail, once the Service is in RUNNING state, you can DISABLE all hosts, so that when the Service tries to scale, it's going to fail since there are no more free hosts left.

  4. At this point, the OneFlow service will add the VM to the Service body with the empty body.

Expected behavior
The Service scaling operation is cancelled and the error is reported correctly.

Details

  • Affected Component: OneFlow
  • Hypervisor: KVM
  • Version: 6.8

Additional context
In some cases, the VM deployment works correctly during the scaling operation, but due to other errors or unexpected messages during deployment may cause the same behavior.

Progress Status

  • Code committed
  • Testing - QA
  • Documentation (Release notes - resolved issues, compatibility, known issues)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants