Feature/Add HA to backend controller #29

slopezz · 2020-07-27T11:52:53Z

Closes #8

Adds Makefile target to run operator locally
Adds PodDisruptionBudget for backend-worker and backend-listener
- Enabled by default (ansible state present)
- Configured by default with maxUnavailable:1
- CR field can be a number or a percentage (both cases tested and working)
- If you want to use minAvailable instead of default maxUnavailable:
  - If you add it to the CR from the very beginning, it will be created PDB with minAvailable (ignoring default maxUnavailable)
  - It PDB was already created with maxUnavailable, operator task managing PDB will fail, because ansible operator executed patch operation, and these 2 vars are mutually exclusive and cannot coexists on same PDB
  - It happens the same the other way around, if you have minAvailable and want to use maxUnavailable
  - This race condition not being easy handled with limited ansible operator, has been documented at CR level, and it has been added a possible workaround either using CR fields or deleting object manually
- There is a task converting CR boolean into internal ansible state "absent" in case you want to disable it (so we guarantee if you want to ensure it is not created, it wont be created, but in case it was already enabled and you disable it, then it will be deleted)
Adds HoritzontalPodAutoscaler for backend-worker and backend-listener
- Enabled by default (ansible state present)
- Configured by default with:
  - minReplicas:2
  - maxReplicas:4
  - resourceName:cpu (only cpu/memory admitted at CRD level)
  - resourceUtilization:90 (a percentage)
- If set to true, then replicas CR field is ignored, not managed by Deployments
- There is a task converting CR boolean into internal ansible state "absent" in case you want to disable it (so we guarantee if you want to ensure it is not created, it wont be created, but in case it was already enabled and you disable it, then it will be deleted)
Adds podAntiffinity for backend-worker and backend-listener following best practices:
- It is a soft podAntiffinity (using Preferred instead of Required)
- With highest priority, it tries to use hosts where there is no pod with specific label
- With lower priority, it tries to use hosts from different AWS AZs where there is no pod with specific label
- So finanly, il will try to balanced pods accross different AZs, and accross different hosts
Update Backend CRD validation with new pdb/hpa fields
Adds documentation for both PDB/HPA
Add backend-listener-internal Service, because:
- Current backend-listener Service is published via NLB with proxy-protocol enabled (both 80/443 ports)
- Marin3r destination ports for Service 80/443 have configured proxy-protocol (so internal communication directly to the k8s Service, instead of public NLB, won't work, because NLB is the one adding proxy-protocol)
- Backend-listener needs to be accessed internally by ,at least, System component, so we require an extra Service whose marin3r port won't have proxy-protocol enabled

PodAntiffinity tests

Regarding podAntiffinity, I have done several testing to ensure it is working as expected.

So this is the worker nodes, 2 per AZ:

ip-10-65-0-91.ec2.internal    Ready     worker    82d       v1.16.2 --> us-east-1a
ip-10-65-1-200.ec2.internal   Ready     worker    132d      v1.16.2 --> us-east-1a
ip-10-65-6-169.ec2.internal   Ready     worker    132d      v1.16.2 --> us-east-1b
ip-10-65-7-70.ec2.internal    Ready     worker    82d       v1.16.2 --> us-east-1b
ip-10-65-9-40.ec2.internal    Ready     worker    132d      v1.16.2 --> us-east-1c
ip-10-65-11-2.ec2.internal    Ready     worker    82d       v1.16.2 --> us-east-1c

I have increased, one by one, the number of replicas per deployment, and it has always ensured the expected distribution accross AZs and nodes
Even when I decreased from 6 to 3 replicas, it still balanced one pod per AZ

Evolution history of AZs used by each deployment depending on number of replicas

Listener:
- Initial 2 replicas: a/b
- UP to 3 replicas: a/b/c
- UP to 4 replicas: a/b/c/b
- UP to 5 replicas: a/b/c/b/a
- UP to 6 replicas: a/b/c/b/a/c
- DOWN to 3 replicas: a/b/c
Worker:
- Initial 2 replicas: c/b
- UP to 3 replicas: c/b/a
- UP to 4 replicas: c/b/a/b
- UP to 5 replicas: c/b/a/b/c
- UP to 6 replicas: c/b/a/b/c/a
- DOWN to 3 replicas: a/b/c

…marin3r port without proxy-protocol enabled

roivaz

Just a minor typo needs fixing. Goog job!

roles/backend/tasks/main.yml

slopezz added 8 commits July 27, 2020 11:18

Add makefile target to deploy operator locally for dev purpose

b57b875

Update backend worker/listener number of replicas to 2

7f1732b

Add backend worker/listener PDB

227a026

Add backend worker/listener HPA

f6ccda7

Add podAntiAffinity to backennd worker/listener deployments

8af6a01

Update Backend CRD wih pdb/hpa field validation

663c756

Update Backend documentation with pdb/hpa CR fields

4f44296

Add backend-listener-internal service to be used internally with new …

0c64600

…marin3r port without proxy-protocol enabled

slopezz added the enhancement label Jul 27, 2020

slopezz requested review from raelga and roivaz July 27, 2020 11:52

slopezz self-assigned this Jul 27, 2020

Fix PDB config/docs of maxUnavail or minAvail (mutually exclusive)

c2a5ccc

slopezz force-pushed the feature/add-ha-backend branch from 27fd368 to c2a5ccc Compare July 27, 2020 14:53

raelga approved these changes Jul 28, 2020

View reviewed changes

roivaz requested changes Jul 28, 2020

View reviewed changes

roles/backend/tasks/main.yml Outdated Show resolved Hide resolved

roles/backend/tasks/main.yml Outdated Show resolved Hide resolved

Fix minor typos

74007b2

slopezz force-pushed the feature/add-ha-backend branch from 3aa6d85 to 74007b2 Compare July 28, 2020 08:26

roivaz approved these changes Jul 28, 2020

View reviewed changes

slopezz merged commit f2a8757 into master Jul 28, 2020

slopezz deleted the feature/add-ha-backend branch July 28, 2020 08:29

This was referenced Jul 28, 2020

Add PDB/HPA rbac permissions to operator role #32

Merged

Add HA to the rest of the saas-operator components #33

Closed

raelga added kind/feature Categorizes issue or PR as related to a new feature. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple sprints to complete. size/M Requires about a day to complete the PR or the issue. labels Sep 30, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature/Add HA to backend controller #29

Feature/Add HA to backend controller #29

slopezz commented Jul 27, 2020 •

edited

Loading

roivaz left a comment

Feature/Add HA to backend controller #29

Feature/Add HA to backend controller #29

Conversation

slopezz commented Jul 27, 2020 • edited Loading

PodAntiffinity tests

Evolution history of AZs used by each deployment depending on number of replicas

roivaz left a comment

Choose a reason for hiding this comment

slopezz commented Jul 27, 2020 •

edited

Loading