Skip to content

Conversation

@heyvister1
Copy link
Collaborator

No description provided.

Signed-off-by: Ido Heyvi <iheyvi@nvidia.com>
Copy link
Collaborator

@almaslennikov almaslennikov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

small nits, otherwise LGTM

.. note:: Enabling requestor mode will require deployment of NVIDIA maintenance operator on the cluster.
By default, upgrade controller will use in-place mode.
``nodeMaintenanceNamePrefix`` is used to distinguish between different (operators) requestors, requesting node maintenance operations on the same node(s).
Deploying maintenance operator, as well as enabling reuestor mode, can be done through Network Operator helm ``values.yaml``:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: 'requestor'

* - Mode
- Description
* - In-place
- In-place (legacy) mode is incorporating full driver upgrade lifecycle, including nodes operations e.g. cordon, pod eviction, drain, uncordon. It also maintains an internal scheduler for performing above node operations, according to provided ``maxParallelUpgrades`` under ``UpgradePolicy``.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

incorporates

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

* - In-place
- In-place (legacy) mode is incorporates full driver upgrade lifecycle, including nodes operations e.g. cordon, pod eviction, drain, uncordon. It also maintains an internal scheduler for performing above node operations, according to provided ``maxParallelUpgrades`` under ``UpgradePolicy``.
* - Requestor
- New ``requestor`` upgrade mode uses NVIDIA maintenance operator (please refer to `maintenance-operator repo`_) nodeMaintenance k8s API objects, to initiate the DOCA driver upgrade process. Essentially, it will retire current upgrade controller (in-place mode) from performing the following node operations: cordon, wait for pods completion, drain, uncordon. To enable requestor mode, the following environment variable should be enabled ``MAINTENANCE_OPERATOR_ENABLED=true``.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, add a note that this environment variable could be configured via helm values

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

…odes: inplace/requestor

Signed-off-by: Ido Heyvi <iheyvi@nvidia.com>
@heyvister1 heyvister1 requested a review from e0ne May 18, 2025 14:57
Copy link
Collaborator

@e0ne e0ne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for addressing my comments, @heyvister1 !

@heyvister1 heyvister1 merged commit c0eca78 into Mellanox:main May 19, 2025
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants