Operator is not resiliant to a short API-Server downtime

Similar to #2450 (but this time, **both pods** are stopped)

I found the same problem as last time. Following a small unavailability of the Kubernetes API-Server: the operator can't update its lease.
This behavior leads to a **total stop (without the pod restarting)** in addition to an abnormal consumption of resources.

Please note that the two pods are on 2 differents nodes. This micro-disruption in API-Server availability only affected one pod at a time (a first replica several days ago on node-1, the other replica recently on node-2). Since you can't set liveness, pods never restart and are stuck in this state

## Expected Behavior

I expected there to be a retry until the pod could succeed in its request (or better, could properly stop itself).

## Current Behavior

Currently, the operator's two replicas are stopped and no longer respond to anything (creating/deleting a CRD Tenant does nothing). On top of that, the pods will consume **1000m CPU**, as if there were a loop in the code that was never exited.

## Possible Solution

Add a retry (or fix the problem where the pod can't stop if the API-Server is unavailable).


## Steps to Reproduce (for bugs)

1. Deploy a cluster with a single controlplane
2. Install Minio Operator (ensure both pods are not on the cp)
3. Restart the controlplane to create a downtime of api-server
4. Operator is stuck

## Your Environment

- Version used (minio-operator): quay.io/minio/operator:v7.0.0
- Environment name and version (e.g. kubernetes v1.17.2): v1.32.2
- Server type and version: Talos 1.9.3 (VM)
- Link to your deployment file: https://gist.github.com/qjoly/b96a1509d130d3902ef4957e8dba8d85



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Operator is not resiliant to a short API-Server downtime #2458

Expected Behavior

Current Behavior

Possible Solution

Steps to Reproduce (for bugs)

Your Environment

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Operator is not resiliant to a short API-Server downtime #2458

Description

Expected Behavior

Current Behavior

Possible Solution

Steps to Reproduce (for bugs)

Your Environment

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions