Skip to content

using --serviceType=LoadBalancer on AWS LoadBalancer never recognizes instances as healthy #1611

@alrooney

Description

@alrooney

Describe the bug
A clear and concise description of what the bug is.
Used --service-type=LoadBalancer to launch a cluster using pgo. After the cluster was up I noticed looking at the load balancers in AWS that both for the primary and the read replica they showed the instances as unhealthy. The health check on the load balancer was pointed at the pgbadger port. I changed the health check on the load balancer to point at postgresql and the instances became healthy.

To Reproduce
Steps to reproduce the behavior:

  1. Launch pgo operator in AWS with default settings - see below for config
  2. Create a cluster as follows: pgo create cluster hagiscluster --ccp-image=crunchy-postgres-gis-ha --pvc-size=50Gi --service-type=LoadBalancer
  3. Use kubectl get svc -n pgo to see the provisioned load balancer
  4. Go to AWS console -> EC2 console -> LoadBalancers -> enter dns name of load balancer into search box and select the load balancer.
  5. Look at the description tab of the load balancer and notice that under status 0 of X instances are in service.
  6. Under the health-check tab notice that the health check is going to pgbadger port not postgresql port.

Expected behavior
A clear and concise description of what you expected to happen.
LoadBalancer should show instances as healthy. Health check should go to primary service which is postgresql
Screenshots
If applicable, add screenshots to help explain your problem.

Please tell us about your environment:

  • Operating System:
    pgo running on OSX. pgo version 4.3.2. Installed using client-setup.sh script.
  • Where is this running ( Local, Cloud Provider)
    running in AWS on EKS
  • Storage being used (NFS, Hostpath, Gluster, etc):
    gp2
  • Container Image Tag:
$ pgo status
Operator Start:          2020-06-09 02:41:48 +0000 UTC
Databases:               3
Claims:                  3
Total Volume Size:       200Gi     

Database Images:
                         2	registry.developers.crunchydata.com/crunchydata/pgo-backrest:centos7-4.3.2
                         2	registry.developers.crunchydata.com/crunchydata/crunchy-postgres-gis-ha:centos7-12.3-4.3.2
                         1	registry.developers.crunchydata.com/crunchydata/pgo-backrest-repo:centos7-4.3.2

Databases Not Ready:

Labels (count > 1): [count] [label]
	[4]	[vendor=crunchydata]
	[3]	[pg-cluster=hagiscluster]
	[2]	[pg-pod-anti-affinity=]
	[2]	[pgo-pg-database=true]
	[2]	[workflowid=3a83853c-08d9-42e9-8171-4c1f6b149639]
	[2]	[crunchy_collect=false]
	[2]	[service-type=LoadBalancer]
	[2]	[pgo-version=4.3.2]
	[2]	[crunchy-pgha-scope=hagiscluster]
  • PostgreSQL Version:
    see above
  • Platform (Docker, Kubernetes, OpenShift):
    k8s version: 1.14 - AWS - EKS
  • Platform Version:

Additional context
Add any other context about the problem here.

$ pgo show config
BackrestStorage: gp2
BackupStorage: gp2
BasicAuth: ""
Cluster:
  Backrest: true
  BackrestPort: 2022
  BackrestS3Bucket: ""
  BackrestS3Endpoint: ""
  BackrestS3Region: ""
  Badger: false
  CCPImagePrefix: registry.developers.crunchydata.com/crunchydata
  CCPImageTag: centos7-12.3-4.3.2
  Database: ""
  DefaultBackrestMemory: 48Mi
  DefaultInstanceMemory: 128Mi
  DefaultPgBouncerMemory: 24Mi
  DisableAutofail: false
  DisableFSGroup: false
  DisableReplicaStartFailReinit: false
  EnableCrunchyadm: false
  ExporterPort: "9187"
  Metrics: false
  PGBadgerPort: "10000"
  PasswordAgeDays: ""
  PasswordLength: "24"
  PgmonitorPassword: ""
  PodAntiAffinity: preferred
  PodAntiAffinityPgBackRest: ""
  PodAntiAffinityPgBouncer: ""
  Policies: ""
  Port: "5432"
  Replicas: "0"
  ServiceType: ClusterIP
  SyncReplication: false
  User: testuser
Pgo:
  Audit: false
  ConfigMapWorkerCount: null
  ControllerGroupRefreshInterval: null
  NamespaceRefreshInterval: null
  NamespaceWorkerCount: null
  PGClusterWorkerCount: null
  PGOImagePrefix: registry.developers.crunchydata.com/crunchydata
  PGOImageTag: centos7-4.3.2
  PGReplicaWorkerCount: null
  PGTaskWorkerCount: null
PrimaryStorage: gp2
ReplicaStorage: gp2
Storage:
  alternatesite:
    AccessMode: ReadWriteOnce
    MatchLabels: ""
    Size: 4G
    StorageClass: alternatesite
    StorageType: dynamic
    SupplementalGroups: ""
  gce:
    AccessMode: ReadWriteOnce
    MatchLabels: ""
    Size: 300M
    StorageClass: standard
    StorageType: dynamic
    SupplementalGroups: ""
  gp2:
    AccessMode: ReadWriteOnce
    MatchLabels: ""
    Size: 100Gi
    StorageClass: gp2
    StorageType: dynamic
    SupplementalGroups: ""
  hostpathstorage:
    AccessMode: ReadWriteMany
    MatchLabels: ""
    Size: 1G
    StorageClass: ""
    StorageType: create
    SupplementalGroups: ""
  nfsstorage:
    AccessMode: ReadWriteMany
    MatchLabels: ""
    Size: 1G
    StorageClass: ""
    StorageType: create
    SupplementalGroups: "65534"
  nfsstoragered:
    AccessMode: ReadWriteMany
    MatchLabels: ""
    Size: 1G
    StorageClass: ""
    StorageType: create
    SupplementalGroups: "65534"
  primarysite:
    AccessMode: ReadWriteOnce
    MatchLabels: ""
    Size: 4G
    StorageClass: primarysite
    StorageType: dynamic
    SupplementalGroups: ""
  replicastorage:
    AccessMode: ReadWriteMany
    MatchLabels: ""
    Size: 700M
    StorageClass: ""
    StorageType: create
    SupplementalGroups: ""
  storageos:
    AccessMode: ReadWriteOnce
    MatchLabels: ""
    Size: 5Gi
    StorageClass: fast
    StorageType: dynamic
    SupplementalGroups: ""
WALStorage: ""

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions