Conversation
There was a problem hiding this comment.
Code Review
This pull request reduces the minimum number of replicas for the DG Engine from 4 to 1 in the prod_omi_values.yaml configuration. While this change may aim for cost savings, it critically compromises the high availability of the service by creating a single point of failure in a production environment. If the single engine pod fails, the service could experience significant downtime. I have added a critical review comment recommending this value be increased to ensure service resilience.
| engine: | ||
| # -- Minimum number of Engine replicas. | ||
| minReplicas: 4 | ||
| minReplicas: 1 |
There was a problem hiding this comment.
Reducing minReplicas to 1 in a production environment introduces a critical availability risk. With a single replica, the service has no redundancy and becomes a single point of failure. If the pod goes down, the service will be unavailable until a new one starts, which could take a significant amount of time based on your startup probe configuration (over 15 minutes). For production-grade resilience, a minimum of 2 replicas is highly recommended. Please consider reverting to the previous, safer value.
minReplicas: 4
Change: