Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

oneflow recover can falsely recover services in FAILED_DEPLOYING #6396

Closed
3 tasks
dann1 opened this issue Nov 13, 2023 · 1 comment
Closed
3 tasks

oneflow recover can falsely recover services in FAILED_DEPLOYING #6396

dann1 opened this issue Nov 13, 2023 · 1 comment

Comments

@dann1
Copy link
Contributor

dann1 commented Nov 13, 2023

Description
If a flow template is instantiated and reaches FAILED_DEPLOY, a subsequent recover operation could set the flow service to RUNNING even though it could have no VMs at all backing it.

To Reproduce

root@provisionengine-test-env:~# oneflow-template instantiate FAILED_DEPLOY
ID: 1059
root@provisionengine-test-env:~# oneflow list
  ID USER     GROUP    NAME                                                                                                                             STARTTIME STAT
1059 oneadmin oneadmin FAILED_DEPLOY                                                                                                               11/13 17:36:53 FAILED_DEPLOYING
root@provisionengine-test-env:~# oneflow show 1059
SERVICE 1059 INFORMATION
ID                  : 1059
NAME                : FAILED_DEPLOY
USER                : oneadmin
GROUP               : oneadmin
STRATEGY            : straight
SERVICE STATE       : FAILED_DEPLOYING
START TIME          : 11/13 17:36:53

PERMISSIONS
OWNER               : um-
GROUP               : ---
OTHER               : ---

ROLE FAAS
ROLE STATE          : FAILED_DEPLOYING
VM TEMPLATE         : 7
CARDINALITY         : 1
SHUTDOWN            : terminate-hard

NODES INFORMATION
 VM_ID NAME                     USER            GROUP

LOG MESSAGES
11/13/23 17:36 [I] New state: DEPLOYING_NETS
11/13/23 17:36 [E] Role FAAS : Instantiate failed for template 7; [one.template.instantiate] Error allocating a new virtual machine template. Cannot get IP/MAC lease from virtual network 1.
11/13/23 17:36 [I] New state: FAILED_DEPLOYING
root@provisionengine-test-env:~# onevm list
  ID USER     GROUP    NAME                                                                        STAT  CPU     MEM HOST                                                     TIME
root@provisionengine-test-env:~# oneflow recover 1059
root@provisionengine-test-env:~# oneflow list
  ID USER     GROUP    NAME                                                                                                                                     STARTTIME STAT
1059 oneadmin oneadmin FAILED_DEPLOY                                                                                                                       11/13 17:36:53 RUNNING
root@provisionengine-test-env:~# onevm list
  ID USER     GROUP    NAME                                                                        STAT  CPU     MEM HOST                                                     TIME
root@provisionengine-test-env:~# oneflow show 1059
SERVICE 1059 INFORMATION
ID                  : 1059
NAME                : FAILED_DEPLOY
USER                : oneadmin
GROUP               : oneadmin
STRATEGY            : straight
SERVICE STATE       : RUNNING
START TIME          : 11/13 17:36:53

PERMISSIONS
OWNER               : um-
GROUP               : ---
OTHER               : ---

ROLE FAAS
ROLE STATE          : RUNNING
VM TEMPLATE         : 7
CARDINALITY         : 1
SHUTDOWN            : terminate-hard

NODES INFORMATION
 VM_ID NAME                     USER            GROUP

LOG MESSAGES
11/13/23 17:36 [I] New state: DEPLOYING_NETS
11/13/23 17:36 [E] Role FAAS : Instantiate failed for template 7; [one.template.instantiate] Error allocating a new virtual machine template. Cannot get IP/MAC lease from virtual network 1.
11/13/23 17:36 [I] New state: FAILED_DEPLOYING
11/13/23 17:37 [E] Role FAAS : Instantiate failed for template 7; [one.template.instantiate] Error allocating a new virtual machine template. Cannot get IP/MAC lease from virtual network 1.
11/13/23 17:37 [I] New state: RUNNING

Expected behavior
When issuing the recover the flow should remain in a failure state as the conditions of the failures didn't change at all. Even the cardinality is set to 1 when there are no VMs backing the role.

Additional context
There might also be a problem with the core as it is possible to create a virtual network with a size 0 address range

root@provisionengine-test-env:~# onevnet show 1
VIRTUAL NETWORK 1 INFORMATION
ID                       : 1
NAME                     : no_leases
USER                     : oneadmin
GROUP                    : oneadmin
LOCK                     : None
CLUSTERS                 : 0
BRIDGE                   : onebr1
STATE                    : READY
VN_MAD                   : bridge
AUTOMATIC VLAN ID        : NO
AUTOMATIC OUTER VLAN ID  : NO
USED LEASES              : 0

PERMISSIONS
OWNER                    : um-
GROUP                    : ---
OTHER                    : ---

VIRTUAL NETWORK TEMPLATE
BRIDGE="onebr1"
BRIDGE_TYPE="linux"
OUTER_VLAN_ID=""
PHYDEV=""
SECURITY_GROUPS="0"
VLAN_ID=""
VN_MAD="bridge"

ADDRESS RANGE POOL
AR 0
SIZE           : 0
LEASES         : 0

RANGE                                   FIRST                               LAST
MAC                         02:00:b9:18:c9:66                  02:00:b9:18:c9:65


LEASES
AR  OWNER        MAC    IP PORT_FORWARD   IP6

VIRTUAL ROUTERS

VIRTUAL MACHINES
UPDATED        :
OUTDATED       :
ERROR          :

Progress Status

  • Code committed
  • Testing - QA
  • Documentation (Release notes - resolved issues, compatibility, known issues)
@dann1
Copy link
Contributor Author

dann1 commented Nov 14, 2023

Another example

root@opennebula-frontend:~# oneflow show 576
SERVICE 576 INFORMATION
ID                  : 576
NAME                : Function
USER                : oneadmin
GROUP               : oneadmin
STRATEGY            : straight
SERVICE STATE       : FAILED_DEPLOYING
START TIME          : 11/14 02:18:40

PERMISSIONS
OWNER               : um-
GROUP               : ---
OTHER               : ---

ROLE FAAS
ROLE STATE          : FAILED_DEPLOYING
VM TEMPLATE         : 15
CARDINALITY         : 1

NODES INFORMATION
 VM_ID NAME                     USER            GROUP

LOG MESSAGES
11/14/23 02:18 [I] New state: DEPLOYING_NETS
11/14/23 02:18 [E] Role FAAS : Instantiate failed for template 15; [one.template.instantiate] Error allocating a new virtual machine template. User 0 does not own a network with name: github_actions_no_lease . Set NETWORK_UNAME or NETWORK_UID of owner in NIC.
11/14/23 02:18 [I] New state: FAILED_DEPLOYING
root@opennebula-frontend:~# onevm list
  ID USER     GROUP    NAME                                                                        STAT  CPU     MEM HOST                                                     TIME
root@opennebula-frontend:~# oneflow recover 576
root@opennebula-frontend:~# oneflow list
  ID USER     GROUP    NAME                                                                                                                                     STARTTIME STAT
 576 oneadmin oneadmin Function                                                                                                                            11/14 02:18:40 RUNNING
root@opennebula-frontend:~# onevm list
  ID USER     GROUP    NAME                                                                        STAT  CPU     MEM HOST                                                     TIME

rsmontero added a commit to OpenNebula/docs that referenced this issue May 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants