-
-
Notifications
You must be signed in to change notification settings - Fork 9
/
checks.md
292 lines (160 loc) · 12.9 KB
/
checks.md
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
# Checks
If a check fails, it is reported as a finding. Each check will have a remediation type - either recommended or required. A recommended remediation is one that is recommended to be performed, but is not required to be performed.
- ⚠️ Recommended: A finding that users are encouraged to evaluate the recommendation and determine if it is applicable and whether or not to act upon that recommendation. Not remediating the finding does not prevent the upgrade from occurring.
- ❌ Required: A finding that requires remediation prior to upgrading to be able to perform the upgrade and avoid downtime or disruption
See the [symbol table](https://clowdhaus.github.io/eksup/#symbol-table) for further details on the symbols used throughout the documentation.
## Amazon
Checks that are not specific to Amazon EKS or Kubernetes
#### AWS001
!!! info "🚧 _Not yet implemented_"
**⚠️ Remediation recommended**
There is a sufficient quantity of IPs available for the nodes to support the upgrade.
If custom networking is enabled, the results represent the number of IPs available in the subnets used by the EC2 instances. Otherwise, the results represent the number of IPs available in the subnets used by both the EC2 instances and the pods.
#### AWS002
**⚠️ Remediation recommended**
There is a sufficient quantity of IPs available for the **pods** to support the upgrade.
This check is used when custom networking is enabled since the IPs used by pods are coming from subnets different from those used by the EC2 instances themselves.
#### AWS003
!!! info "🚧 _Not yet implemented_"
EC2 instance service limits
#### AWS004
!!! info "🚧 _Not yet implemented_"
EBS GP2 volume service limits
#### AWS005
!!! info "🚧 _Not yet implemented_"
EBS GP3 volume service limits
---
## Amazon EKS
Checks that are specific to Amazon EKS
#### EKS001
**❌ Remediation required**
There are at least 2 subnets in different availability zones, each with at least 5 available IPs for the control plane to upgrade.
#### EKS002
**❌ Remediation required**
Control plane does not have any reported health issues.
#### EKS003
**❌ Remediation required**
EKS managed nodegroup does not have any reported health issues.
This does not include self-managed nodegroups or Fargate profiles; those are not currently supported by the AWS API to report health issues.
#### EKS004
**❌ Remediation required**
EKS addon does not have any reported health issues.
#### EKS005
**❌ Remediation required**
EKS addon version is within the supported range.
The addon must be updated to a version that is supported by the target Kubernetes version prior to upgrading.
**⚠️ Remediation recommended**
The target Kubernetes version default addon version is newer than the current addon version.
For example, if the default addon version of CoreDNS for Kubernetes `v1.24` is `v1.8.7-eksbuild.3` and the current addon version is `v1.8.4-eksbuild.2`, while the current version is supported on Kubernetes `v1.24`, its recommended to update the addon to `v1.8.7-eksbuild.3` during the upgrade.
#### EKS006
**⚠️ Remediation recommended**
EKS managed nodegroup are using the latest launch template version and there are no pending updates for the nodegroup.
Users are encourage to evaluate if remediation is warranted or not and whether to update to the latest launch template version prior to upgrading. If there are pending updates, this could potentially introduce additional changes to the nodegroup during the upgrade.
<!-- TODO - add the CLI command to diff the launch template versions
diff <(aws ec2 describe-launch-template-versions A ...) <(aws ec2 describe-launch-template-versions B ...) -->
<!-- TODO - consider diffing the templates and reporting the differences in the reported output -->
#### EKS007
**⚠️ Remediation recommended**
Self-managed nodegroup are using the latest launch template version and there are no pending updates for the nodegroup.
Users are encourage to evaluate if remediation is warranted or not and whether to update to the latest launch template version prior to upgrading. If there are pending updates, this could potentially introduce additional changes to the nodegroup during the upgrade.
<!-- TODO - add the CLI command to diff the launch template versions
diff <(aws ec2 describe-launch-template-versions A ...) <(aws ec2 describe-launch-template-versions B ...) -->
<!-- TODO - consider diffing the templates and reporting the differences in the reported output -->
---
## Kubernetes
Checks that are specific to Kubernetes, regardless of the underlying platform provider.
Table below shows the checks that are applicable, or not, to the respective Kubernetes resource.
| Check | Deployment | ReplicaSet | ReplicationController | StatefulSet | Job | CronJob | Daemonset |
| :------: | :--------: | :--------: | :-------------------: | :---------: | :-: | :-----: | :-------: |
| `K8S001` | ➖ | ➖ | ➖ | ➖ | ➖ | ➖ | ➖ |
| `K8S002` | ✅ | ✅ | ✅ | ✅ | ❌ | ❌ | ❌ |
| `K8S003` | ✅ | ✅ | ✅ | ✅ | ❌ | ❌ | ❌ |
| `K8S004` | ✅ | ✅ | ❌ | ✅ | ❌ | ❌ | ❌ |
| `K8S005` | ✅ | ✅ | ✅ | ✅ | ❌ | ❌ | ❌ |
| `K8S006` | ✅ | ✅ | ✅ | ✅ | ❌ | ❌ | ❌ |
| `K8S007` | ✅ | ✅ | ✅ | ✅ | ❌ | ❌ | ❌ |
| `K8S008` | ❌ | ❌ | ❌ | ✅ | ❌ | ❌ | ❌ |
| `K8S009` | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
| `K8S010` | ➖ | ➖ | ➖ | ➖ | ➖ | ➖ | ➖ |
| `K8S011` | ➖ | ➖ | ➖ | ➖ | ➖ | ➖ | ➖ |
#### K8S001
**❌ Remediation required**
The version skew between the control plane (API Server) and the data plane (kubelet) violates the Kubernetes version skew policy, or will violate the version skew policy after the control plane has been upgraded.
The data plane nodes must be upgraded to at least within 1 minor version of the control plane version in order to stay within the version skew policy through the upgrade; it is recommended to upgrade the data plane nodes to the same version as the control plane.
**⚠️ Remediation recommended**
There is a version skew between the control plane (API Server) and the data plane (kubelet).
While Kubernetes does support a version skew of n-2 between the API Server and kubelet, it is recommended to upgrade the data plane nodes to the same version as the control plane.
[Kubernetes version skew policy](https://kubernetes.io/releases/version-skew-policy/#supported-version-skew)
#### K8S002
**❌ Remediation required**
There are at least 3 replicas specified for the resource.
```yaml
---
spec:
replicas: 3 # >= 3
```
Multiple replicas, along with the use of `PodDisruptionBudget`, are required to ensure high availability during the upgrade.
[EKS Best Practices - Reliability](https://aws.github.io/aws-eks-best-practices/reliability/docs/application/#run-multiple-replicas)
#### K8S003
**❌ Remediation required**
`minReadySeconds` has been set to a value greater than 0 seconds for `StatefulSet`
You can read more about why this is necessary for `StatefulSet` [here](https://kubernetes.io/blog/2021/08/27/minreadyseconds-statefulsets/)
**⚠️ Remediation recommended**
`minReadySeconds` has been set to a value greater than 0 seconds for `Deployment`, `ReplicaSet`, `ReplicationController`
#### K8S004
!!! info "🚧 _Not yet implemented_"
**❌ Remediation required**
At least one `podDisruptionBudget` covers the workload, and at least one of `minAvailable` or `maxUnavailable` is set
The Kubernetes eviction API is the preferred method for draining nodes for replacement during an upgrade. The eviction API respects `PodDisruptionBudget` and will not evict pods that would violate the `PodDisruptionBudget` to ensure application availability, when specified.
#### K8S005
**❌ Remediation required**
Either `.spec.affinity.podAntiAffinity` or `.spec.topologySpreadConstraints` is set to avoid multiple pods from the same workload from being scheduled on the same node.
`topologySpreadConstraints` are preferred over affinity, especially for larger clusters:
- [Inter-pod affinity and anti-affinity](https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/#inter-pod-affinity-and-anti-affinity)
> Note: Inter-pod affinity and anti-affinity require substantial amount of processing which can slow down scheduling in large clusters significantly. We do not recommend using them in clusters larger than several hundred nodes.
[Types of inter-pod affinity and anti-affinity](https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/#types-of-inter-pod-affinity-and-anti-affinity)
[Pod Topology Spread Constraints](https://kubernetes.io/docs/concepts/scheduling-eviction/topology-spread-constraints/)
#### K8S006
**❌ Remediation required**
A `readinessProbe` must be set to ensure traffic is not routed to pods before they are ready following their re-deployment from a node replacement.
#### K8S007
**❌ Remediation required**
The `StatefulSet` should not specify a `TerminationGracePeriodSeconds` of 0
- [Deployment and Scaling Guarantees](https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/#deployment-and-scaling-guarantees)
> The StatefulSet should not specify a pod.Spec.TerminationGracePeriodSeconds of 0. This practice is unsafe and strongly discouraged. For further explanation, please refer to force deleting StatefulSet Pods.
[Force Delete StatefulSet Pods](https://kubernetes.io/docs/tasks/run-application/force-delete-stateful-set-pod/)
#### K8S008
Pod volumes should not mount the `docker.sock` file with the removal of the Dockershim starting in Kubernetes `v1.24`
**❌ Remediation required**
For clusters on Kubernetes `v1.23`
**⚠️ Remediation recommended**
For clusters on Kubernetes <`v1.22`
[Dockershim Removal FAQ](https://kubernetes.io/blog/2022/02/17/dockershim-faq/)
[Detector for Docker Socket (DDS)](https://github.com/aws-containers/kubectl-detector-for-docker-socket)
#### K8S009
The pod security policy resource has been removed started in Kubernetes `v1.25`
**❌ Remediation required**
For clusters on Kubernetes `v1.24`
**⚠️ Remediation recommended**
For clusters on Kubernetes <`v1.23`
[Migrate from PodSecurityPolicy to the Built-In PodSecurity Admission Controller](https://kubernetes.io/docs/tasks/configure-pod-container/migrate-from-psp/)
[PodSecurityPolicy Deprecation: Past, Present, and Future](https://kubernetes.io/blog/2021/04/06/podsecuritypolicy-deprecation-past-present-and-future/)
#### K8S010
!!! info "🚧 _Not yet implemented_"
The [in-tree Amazon EBS storage provisioner](https://kubernetes.io/docs/concepts/storage/volumes/#awselasticblockstore) is deprecated. If you are upgrading your cluster to version `v1.23`, then you must first install the Amazon EBS driver before updating your cluster. For more information, see [Amazon EBS CSI migration frequently asked questions](https://docs.aws.amazon.com/eks/latest/userguide/ebs-csi-migration-faq.html).
**❌ Remediation required**
For clusters on Kubernetes `v1.22`
**⚠️ Remediation recommended**
For clusters on Kubernetes <`v1.21`
[Amazon EBS CSI migration frequently asked questions](https://docs.aws.amazon.com/eks/latest/userguide/ebs-csi-migration-faq.html)
[Kubernetes In-Tree to CSI Volume Migration Status Update](https://kubernetes.io/blog/2021/12/10/storage-in-tree-to-csi-migration-status-update/)
#### K8S011
**❌ Remediation required**
`kube-proxy` on an Amazon EKS cluster has the same [compatibility and skew policy as Kubernetes](https://kubernetes.io/releases/version-skew-policy/#kube-proxy)
- It must be the same minor version as kubelet on your Amazon EC2 nodes
- It cannot be newer than the minor version of your cluster's control plane
- Its version on your Amazon EC2 nodes can't be more than two minor versions older than your control plane. For example, if your control plane is running Kubernetes `1.25`, then the kube-proxy minor version cannot be older than `1.23`
If you recently updated your cluster to a new Kubernetes minor version, then update your Amazon EC2 nodes (i.e. - `kubelet`) to the same minor version before updating `kube-proxy` to the same minor version as your nodes. The order of operations during an upgrade are as follows:
1. Update the control plane to the new Kubernetes minor version
2. Update the nodes, which updates `kubelet`, to the new Kubernetes minor version
3. Update `kube-proxy` to the new Kubernetes minor version