Skip to content
This repository has been archived by the owner on Apr 17, 2023. It is now read-only.

Latest commit

 

History

History
783 lines (722 loc) · 39.5 KB

SOP-push.adoc

File metadata and controls

783 lines (722 loc) · 39.5 KB

UnifiedPush Operator - Standard Operating Procedures

Overview

The following guide outlines the steps required to manage and solve issues in the UnifiedPush Server which is managed, installed and configured via the UnifiedPush Operator.

Success Indicators

All alerts should appears as green in the Prometheus Alert Monitoring.

Prometheus Alerts Procedures

Before checking any of following steps see if the operator pod is running successfully. More info: UnifiedPushOperatorDown
💡
Logs can be saved by running oc logs <podname> > <filename>.log. The logs can provide useful information in order to identify the root cause of the issue. They may also contain useful information which should be included when creating any issues against the project for maintainers to check.

Critical

UnifiedPushDown or UnifiedPushConsoleDown

The pod which runs the UnifiedPush Server is down or is not present in the same namespace of the operator.

  1. Check that the UnifiedPushServer CR or UnifiedPushServerWithBackup CR is deployed in the same namespace as the operator by running oc get UnifiedPushServer. Following the expected result.

$ oc get UnifiedPushServer
NAME                        AGE
example-unifiedpushserver   9d
ℹ️
The UnifiedPushServer should be applied in the same namespace as the operator. The operator will not be able to manage it in another namespace.
  1. Check the environment variables of the Server

    1. Run oc describe pods -l service=ups

      ⚠️
      It will use the values mapped in the Secret created by the operator with the database pod name.
  2. Check the environment variables of the Database

    1. Run oc get pods and check the database pod name. The following is an example of the expected result.

      $ oc get pods
      NAME                                           READY     STATUS    RESTARTS   AGE
      example-unifiedpushserver-1-dk8vm              2/2       Running   2          9d
      example-unifiedpushserver-postgresql-1-bw8mt   1/1       Running   1          9d
      unifiedpush-operator-58c8877fd8-g6dvr          1/1       Running   3          9d
    2. Run oc describe pods <databasepodname>. The following is an example of the expected result.

       $ oc describe pods example-unifiedpushserver-postgresql-1-bw8mt
      Name:               example-unifiedpushserver-postgresql-5d44cbb4f5-sxssg
      Namespace:          unifiedpush
      Priority:           0
      PriorityClassName:  <none>
      Node:               localhost/192.168.122.127
      Start Time:         Thu, 26 Sep 2019 11:48:32 +0300
      Labels:             app=example-unifiedpushserver
                          pod-template-hash=1800766091
                          service=example-unifiedpushserver-postgresql
      Annotations:        openshift.io/scc: restricted
      Status:             Running
      IP:                 172.17.0.9
      Controlled By:      ReplicaSet/example-unifiedpushserver-postgresql-5d44cbb4f5
      Containers:
        postgresql:
          Container ID:   docker://25372b0d08518630d16e9f9a9a49b051ed1eda57aa8b6602908396a43ca66b78
          Image:          centos/postgresql-10-centos7:1
          Image ID:       docker-pullable://docker.io/centos/postgresql-10-centos7@sha256:80894ff2dd64504acac207c6c050091698466291c9e0c8712e5edf473eb4e725
          Port:           5432/TCP
          Host Port:      0/TCP
          State:          Running
            Started:      Thu, 26 Sep 2019 11:48:44 +0300
          Ready:          True
          Restart Count:  0
          Limits:
            cpu:     1
            memory:  512Mi
          Requests:
            cpu:      250m
            memory:   256Mi
          Liveness:   tcp-socket :5432 delay=0s timeout=1s period=10s #success=1 #failure=3
          Readiness:  exec [/bin/sh -i -c psql -h 127.0.0.1 -U $POSTGRESQL_USER -q -d $POSTGRESQL_DATABASE -c 'SELECT 1'] delay=5s timeout=1s period=10s #success=1 #failure=3
          Environment:
            POSTGRESQL_USER:      <set to the key 'POSTGRES_USERNAME' in secret 'example-unifiedpushserver-postgresql'>  Optional: false
            POSTGRESQL_PASSWORD:  <set to the key 'POSTGRES_PASSWORD' in secret 'example-unifiedpushserver-postgresql'>  Optional: false
            POSTGRESQL_DATABASE:  <set to the key 'POSTGRES_DATABASE' in secret 'example-unifiedpushserver-postgresql'>  Optional: false
          Mounts:
            /var/lib/pgsql/data from example-unifiedpushserver-postgresql-data (rw)
            /var/run/secrets/kubernetes.io/serviceaccount from default-token-nmwvz (ro)
      Conditions:
        Type              Status
        Initialized       True
        Ready             True
        ContainersReady   True
        PodScheduled      True
      Volumes:
        example-unifiedpushserver-postgresql-data:
          Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
          ClaimName:  example-unifiedpushserver-postgresql
          ReadOnly:   false
        default-token-nmwvz:
          Type:        Secret (a volume populated by a Secret)
          SecretName:  default-token-nmwvz
          Optional:    false
      QoS Class:       Burstable
      Node-Selectors:  <none>
      Tolerations:     node.kubernetes.io/memory-pressure:NoSchedule
      Events:
        Type     Reason     Age   From                Message
        ----     ------     ----  ----                -------
        Normal   Scheduled  22m   default-scheduler   Successfully assigned unifiedpush/example-unifiedpushserver-postgresql-5d44cbb4f5-sxssg to localhost
        Normal   Pulling    22m   kubelet, localhost  pulling image "docker.io/centos/postgresql-10-centos7:1"
        Normal   Pulled     22m   kubelet, localhost  Successfully pulled image "docker.io/centos/postgresql-10-centos7:1"
        Normal   Created    22m   kubelet, localhost  Created container
        Normal   Started    22m   kubelet, localhost  Started container
        Warning  Unhealthy  21m   kubelet, localhost  Liveness probe failed: dial tcp 172.17.0.9:5432: connect: connection refused
      ℹ️
      It can lead you to find the root cause of the issue faced.
    3. Check if the database image was pulled successfully.

  3. Check the logs of the UPS OAuth Proxy Container

    1. Get the service pod name → oc describe pods -l service=ups. The following is an example of the expected result.

      $ oc describe pods -l service=ups
      Name:               example-unifiedpushserver-78cdcd6589-2l2s5
      Namespace:          unifiedpush
      Priority:           0
      PriorityClassName:  <none>
      Node:               localhost/192.168.122.127
      Start Time:         Thu, 26 Sep 2019 12:09:48 +0300
      Labels:             app=example-unifiedpushserver
                          pod-template-hash=3478782145
                          service=ups
      Annotations:        openshift.io/scc: restricted
      Status:             Running
      IP:                 172.17.0.10
      Controlled By:      ReplicaSet/example-unifiedpushserver-78cdcd6589
      Init Containers:
        postgresql:
          Container ID:  docker://c4d3f1d6379e3c57bab1bf0b2b87342ff92393ba44513052dc774688d9a4ac15
          Image:          centos/postgresql-10-centos7:1
          Image ID:       docker-pullable://docker.io/centos/postgresql-10-centos7@sha256:80894ff2dd64504acac207c6c050091698466291c9e0c8712e5edf473eb4e725
          Port:          <none>
          Host Port:     <none>
          Command:
            /bin/sh
            -c
            source /opt/rh/rh-postgresql96/enable && until pg_isready -h $POSTGRES_SERVICE_HOST; do echo waiting for database; sleep 2; done;
          State:          Terminated
            Reason:       Completed
            Exit Code:    0
            Started:      Thu, 26 Sep 2019 12:09:52 +0300
            Finished:     Thu, 26 Sep 2019 12:09:52 +0300
          Ready:          True
          Restart Count:  0
          Environment:
            POSTGRES_SERVICE_HOST:  example-unifiedpushserver-postgresql
          Mounts:
            /var/run/secrets/kubernetes.io/serviceaccount from example-unifiedpushserver-token-cszr4 (ro)
      Containers:
        ups:
          Container ID:   docker://2cdc9c3703f274053e724b589ac24e1ba12db52a07158b68ffd7ddec328fcb51
          Image:          quay.io/aerogear/unifiedpush-configurable-container:2.4
          Image ID:       docker-pullable://quay.io/aerogear/unifiedpush-configurable-container@sha256:df467ea07730ad35d8255a2d0a65a1f1777a7937272ad9073953abbf3a4b8331
          Port:           8080/TCP
          Host Port:      0/TCP
          State:          Running
            Started:      Thu, 26 Sep 2019 12:09:55 +0300
          Ready:          True
          Restart Count:  0
          Limits:
            cpu:     1
            memory:  2Gi
          Requests:
            cpu:      500m
            memory:   512Mi
          Liveness:   http-get http://:8080/rest/applications delay=120s timeout=10s period=10s #success=1 #failure=3
          Readiness:  http-get http://:8080/rest/applications delay=15s timeout=2s period=10s #success=1 #failure=3
          Environment:
            POSTGRES_SERVICE_HOST:  <set to the key 'POSTGRES_HOST' in secret 'example-unifiedpushserver-postgresql'>  Optional: false
            POSTGRES_SERVICE_PORT:  5432
            POSTGRES_USER:          <set to the key 'POSTGRES_USERNAME' in secret 'example-unifiedpushserver-postgresql'>  Optional: false
            POSTGRES_PASSWORD:      <set to the key 'POSTGRES_PASSWORD' in secret 'example-unifiedpushserver-postgresql'>  Optional: false
            POSTGRES_DATABASE:      <set to the key 'POSTGRES_DATABASE' in secret 'example-unifiedpushserver-postgresql'>  Optional: false
          Mounts:
            /var/run/secrets/kubernetes.io/serviceaccount from example-unifiedpushserver-token-cszr4 (ro)
        ups-oauth-proxy:
          Container ID:  docker://bdecada019bc6e2542b1f732f259a349fbb7651e6dd7d1227e42ee3c0c416630
          Image:         docker.io/openshift/oauth-proxy:v1.1.0
          Image ID:      docker-pullable://docker.io/openshift/oauth-proxy@sha256:731c1fdad1de4bf68ae9eece5e99519f063fd8d9990da312082b4c995c4e4e33
          Port:          4180/TCP
          Host Port:     0/TCP
          Args:
            --provider=openshift
            --openshift-service-account=example-unifiedpushserver
            --upstream=http://localhost:8080
            --http-address=0.0.0.0:4180
            --skip-auth-regex=/rest/sender,/rest/registry/device,/rest/prometheus/metrics,/rest/auth/config
            --https-address=
            --cookie-secret=a509f22fa6224f8ea6ed663c8187cf49
          State:          Running
            Started:      Thu, 26 Sep 2019 12:09:58 +0300
          Ready:          True
          Restart Count:  0
          Limits:
            cpu:     20m
            memory:  64Mi
          Requests:
            cpu:        10m
            memory:     32Mi
          Environment:  <none>
          Mounts:
            /var/run/secrets/kubernetes.io/serviceaccount from example-unifiedpushserver-token-cszr4 (ro)
      Conditions:
        Type              Status
        Initialized       True
        Ready             True
        ContainersReady   True
        PodScheduled      True
      Volumes:
        example-unifiedpushserver-token-cszr4:
          Type:        Secret (a volume populated by a Secret)
          SecretName:  example-unifiedpushserver-token-cszr4
          Optional:    false
      QoS Class:       Burstable
      Node-Selectors:  <none>
      Tolerations:     node.kubernetes.io/memory-pressure:NoSchedule
      Events:
        Type     Reason     Age                From                Message
        ----     ------     ----               ----                -------
        Normal   Scheduled  92s                default-scheduler   Successfully assigned unifiedpush/example-unifiedpushserver-78cdcd6589-2l2s5 to localhost
        Normal   Pulling    91s                kubelet, localhost  pulling image "docker.io/centos/postgresql-10-centos7:1"
        Normal   Pulled     89s                kubelet, localhost  Successfully pulled image "docker.io/centos/postgresql-10-centos7:1"
        Normal   Created    89s                kubelet, localhost  Created container
        Normal   Started    88s                kubelet, localhost  Started container
        Normal   Pulling    87s                kubelet, localhost  pulling image "quay.io/aerogear/unifiedpush-configurable-container:2.4"
        Normal   Pulling    85s                kubelet, localhost  pulling image "docker.io/openshift/oauth-proxy:v1.1.0"
        Normal   Pulled     85s                kubelet, localhost  Successfully pulled image "quay.io/aerogear/unifiedpush-configurable-container:2.4"
        Normal   Started    85s                kubelet, localhost  Started container
        Normal   Created    85s                kubelet, localhost  Created container
        Normal   Pulled     83s                kubelet, localhost  Successfully pulled image "docker.io/openshift/oauth-proxy:v1.1.0"
        Normal   Created    83s                kubelet, localhost  Created container
        Normal   Started    82s                kubelet, localhost  Started container
        Warning  Unhealthy  48s (x3 over 68s)  kubelet, localhost  Readiness probe failed: Get http://172.17.0.10:8080/rest/applications: dial tcp 172.17.0.10:8080: connect: connection refused
        Warning  Unhealthy  16s (x3 over 36s)  kubelet, localhost  Readiness probe failed: Get http://172.17.0.10:8080/rest/applications: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
      ℹ️
      It can lead you to find the root cause of the issue faced.
    2. Run oc logs <service-podname> -c ups-oauth-proxy. E.g oc logs example-unifiedpushserver-1-dk8vm -c ups-oauth-proxy

      Logs should include the following:

      2019/08/08 11:28:42 oauthproxy.go:201: mapping path "/" => upstream "http://localhost:8080/ "
      2019/08/08 11:28:42 oauthproxy.go:222: compiled skip-auth-regex => "/rest/sender"
      2019/08/08 11:28:42 oauthproxy.go:222: compiled skip-auth-regex => "/rest/registry/device"
      2019/08/08 11:28:42 oauthproxy.go:222: compiled skip-auth-regex => "/rest/prometheus/metrics"
      2019/08/08 11:28:42 oauthproxy.go:222: compiled skip-auth-regex => "/rest/auth/config"
      2019/08/08 11:28:42 oauthproxy.go:228: OAuthProxy configured for  Client ID: system:serviceaccount:unifiedpush:example-unifiedpushserver
      2019/08/08 11:28:42 oauthproxy.go:238: Cookie settings: name:_oauth_proxy secure(https):true httponly:true expiry:168h0m0s domain:<default> refresh:disabled
      2019/08/08 11:28:42 http.go:56: HTTP: listening on 0.0.0.0:4180
    3. If alternative logs are found in the above step then save the logs by running oc logs <service-podname> -c ups-oauth-proxy > <filename>.log

      ℹ️
      Capture the logs are important to provide the required information for its maintainers in order to allow them check it.
    4. Check if the oauth-proxy image was pulled successfully.

  4. Check the logs of the UPS Container

    1. Get the service pod name → oc describe pods -l service=ups. The following is an example of the expected result.

      $ oc describe pods -l service=ups
      Name:               example-unifiedpushserver-78cdcd6589-2l2s5
      Namespace:          unifiedpush
      Priority:           0
      PriorityClassName:  <none>
      Node:               localhost/192.168.122.127
      Start Time:         Thu, 26 Sep 2019 12:09:48 +0300
      Labels:             app=example-unifiedpushserver
                          pod-template-hash=3478782145
                          service=ups
      Annotations:        openshift.io/scc: restricted
      Status:             Running
      IP:                 172.17.0.10
      Controlled By:      ReplicaSet/example-unifiedpushserver-78cdcd6589
      Init Containers:
        postgresql:
          Container ID:  docker://c4d3f1d6379e3c57bab1bf0b2b87342ff92393ba44513052dc774688d9a4ac15
          Image:          centos/postgresql-10-centos7:1
          Image ID:       docker-pullable://docker.io/centos/postgresql-10-centos7@sha256:80894ff2dd64504acac207c6c050091698466291c9e0c8712e5edf473eb4e725
          Port:          <none>
          Host Port:     <none>
          Command:
            /bin/sh
            -c
            source /opt/rh/rh-postgresql96/enable && until pg_isready -h $POSTGRES_SERVICE_HOST; do echo waiting for database; sleep 2; done;
          State:          Terminated
            Reason:       Completed
            Exit Code:    0
            Started:      Thu, 26 Sep 2019 12:09:52 +0300
            Finished:     Thu, 26 Sep 2019 12:09:52 +0300
          Ready:          True
          Restart Count:  0
          Environment:
            POSTGRES_SERVICE_HOST:  example-unifiedpushserver-postgresql
          Mounts:
            /var/run/secrets/kubernetes.io/serviceaccount from example-unifiedpushserver-token-cszr4 (ro)
      Containers:
        ups:
          Container ID:   docker://2cdc9c3703f274053e724b589ac24e1ba12db52a07158b68ffd7ddec328fcb51
          Image:          quay.io/aerogear/unifiedpush-configurable-container:2.4
          Image ID:       docker-pullable://quay.io/aerogear/unifiedpush-configurable-container@sha256:df467ea07730ad35d8255a2d0a65a1f1777a7937272ad9073953abbf3a4b8331
          Port:           8080/TCP
          Host Port:      0/TCP
          State:          Running
            Started:      Thu, 26 Sep 2019 12:09:55 +0300
          Ready:          True
          Restart Count:  0
          Limits:
            cpu:     1
            memory:  2Gi
          Requests:
            cpu:      500m
            memory:   512Mi
          Liveness:   http-get http://:8080/rest/applications delay=120s timeout=10s period=10s #success=1 #failure=3
          Readiness:  http-get http://:8080/rest/applications delay=15s timeout=2s period=10s #success=1 #failure=3
          Environment:
            POSTGRES_SERVICE_HOST:  <set to the key 'POSTGRES_HOST' in secret 'example-unifiedpushserver-postgresql'>  Optional: false
            POSTGRES_SERVICE_PORT:  5432
            POSTGRES_USER:          <set to the key 'POSTGRES_USERNAME' in secret 'example-unifiedpushserver-postgresql'>  Optional: false
            POSTGRES_PASSWORD:      <set to the key 'POSTGRES_PASSWORD' in secret 'example-unifiedpushserver-postgresql'>  Optional: false
            POSTGRES_DATABASE:      <set to the key 'POSTGRES_DATABASE' in secret 'example-unifiedpushserver-postgresql'>  Optional: false
          Mounts:
            /var/run/secrets/kubernetes.io/serviceaccount from example-unifiedpushserver-token-cszr4 (ro)
        ups-oauth-proxy:
          Container ID:  docker://bdecada019bc6e2542b1f732f259a349fbb7651e6dd7d1227e42ee3c0c416630
          Image:         docker.io/openshift/oauth-proxy:v1.1.0
          Image ID:      docker-pullable://docker.io/openshift/oauth-proxy@sha256:731c1fdad1de4bf68ae9eece5e99519f063fd8d9990da312082b4c995c4e4e33
          Port:          4180/TCP
          Host Port:     0/TCP
          Args:
            --provider=openshift
            --openshift-service-account=example-unifiedpushserver
            --upstream=http://localhost:8080
            --http-address=0.0.0.0:4180
            --skip-auth-regex=/rest/sender,/rest/registry/device,/rest/prometheus/metrics,/rest/auth/config
            --https-address=
            --cookie-secret=a509f22fa6224f8ea6ed663c8187cf49
          State:          Running
            Started:      Thu, 26 Sep 2019 12:09:58 +0300
          Ready:          True
          Restart Count:  0
          Limits:
            cpu:     20m
            memory:  64Mi
          Requests:
            cpu:        10m
            memory:     32Mi
          Environment:  <none>
          Mounts:
            /var/run/secrets/kubernetes.io/serviceaccount from example-unifiedpushserver-token-cszr4 (ro)
      Conditions:
        Type              Status
        Initialized       True
        Ready             True
        ContainersReady   True
        PodScheduled      True
      Volumes:
        example-unifiedpushserver-token-cszr4:
          Type:        Secret (a volume populated by a Secret)
          SecretName:  example-unifiedpushserver-token-cszr4
          Optional:    false
      QoS Class:       Burstable
      Node-Selectors:  <none>
      Tolerations:     node.kubernetes.io/memory-pressure:NoSchedule
      Events:
        Type     Reason     Age                From                Message
        ----     ------     ----               ----                -------
        Normal   Scheduled  2m24s              default-scheduler   Successfully assigned unifiedpush/example-unifiedpushserver-78cdcd6589-2l2s5 to localhost
        Normal   Pulling    2m23s              kubelet, localhost  pulling image "docker.io/centos/postgresql-10-centos7:1"
        Normal   Pulled     2m21s              kubelet, localhost  Successfully pulled image "docker.io/centos/postgresql-10-centos7:1"
        Normal   Created    2m21s              kubelet, localhost  Created container
        Normal   Started    2m20s              kubelet, localhost  Started container
        Normal   Pulling    2m19s              kubelet, localhost  pulling image "quay.io/aerogear/unifiedpush-configurable-container:2.4"
        Normal   Pulling    2m17s              kubelet, localhost  pulling image "docker.io/openshift/oauth-proxy:v1.1.0"
        Normal   Pulled     2m17s              kubelet, localhost  Successfully pulled image "quay.io/aerogear/unifiedpush-configurable-container:2.4"
        Normal   Started    2m17s              kubelet, localhost  Started container
        Normal   Created    2m17s              kubelet, localhost  Created container
        Normal   Pulled     2m15s              kubelet, localhost  Successfully pulled image "docker.io/openshift/oauth-proxy:v1.1.0"
        Normal   Created    2m15s              kubelet, localhost  Created container
        Normal   Started    2m14s              kubelet, localhost  Started container
        Warning  Unhealthy  100s (x3 over 2m)  kubelet, localhost  Readiness probe failed: Get http://172.17.0.10:8080/rest/applications: dial tcp 172.17.0.10:8080: connect: connection refused
        Warning  Unhealthy  68s (x3 over 88s)  kubelet, localhost  Readiness probe failed: Get http://172.17.0.10:8080/rest/applications: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
      ℹ️
      It can lead you to find the root cause of the issue faced.
    2. Save the logs by running oc logs <service-podname> -c ups > <filename>.log. E.g oc logs example-unifiedpushserver-1-dk8vm -c ups > logs.log

      ℹ️
      Capture the logs are important to provide the required information for its maintainers in order to allow them check it.
    3. See and capture the pod/example-unifiedpushserver-<xyz123> > <filename>.log logs. E.g oc logs example-unifiedpushserver-1-dk8vm -c ups > logs.log

    4. Check if the UnifiedPush Server image was pulled successfully

  5. Check if the secret was created

    1. Run oc get secrets | grep postgresql in the namespace where the operator is installed. Following the expected result.

      $ oc get secrets | grep postgresql
      example-unifiedpushserver-postgresql        Opaque                                6         9d
      ℹ️
      The secret is required in order to provide the data required for the database pod container as user, database name and password.
  6. Check if the values in the secret are correct. To check them you can use oc edit secret <postgresqlsecretname>. E.g oc edit secret example-unifiedpushserver-postgresql. The following is an example of the expected result.

    apiVersion: v1
    data:
      POSTGRES_DATABASE: dW5pZmllZHB1c2g=
      POSTGRES_HOST: ZXhhbXBsZS11bmlmaWVkcHVzaHNlcnZlci1wb3N0Z3Jlc3FsLnVuaWZpZWRwdXNoLnN2Yw==
      POSTGRES_PASSWORD: NzM4NDQ1Mjg1Nzc2NDc4NmIxY2FmMjRlNjdkZDYyNzY=
      POSTGRES_SUPERUSER: ZmFsc2U=
      POSTGRES_USERNAME: dW5pZmllZHB1c2g=
    kind: Secret
    ...
    ℹ️
    The values described above should not be the same but should all data keys should be present with each respective value.
  7. Check the operator pod is present as it is responsible for managing the service pod as described in UnifiedPushOperatorDown

  8. In order to fix it, try to deploy it again by running oc rollout --latest dc/unifiedpush

UnifiedPushDatabaseDown

The pod which runs the UnifiedPush Server's Database(PostgreSQL) is down or is not present in the same namespace as the operator.

  1. Check that the UnifiedPushServer CR or UnifiedPushServerWithBackup CR is deployed in the same namespace as the operator by running oc get UnifiedPushServer. Following the expected result.

    $ oc get UnifiedPushServer
    NAME                        AGE
    example-unifiedpushserver   9d
    ℹ️
    The 1 UnifiedPushServer CR (UnifiedPushServer CR or UnifiedPushServerWithBackup CR) should be applied in the same namespace as the operator.
  2. Check that the Database Pod is deployed in the same namespace as the operator by running oc get pods | grep postgresql. The following is an example of the expected result.

    $ oc get pods | grep postgresql
    example-unifiedpushserver-postgresql-1-bw8mt   1/1       Running   1          9d
    ℹ️
    It will use the values mapped in the Secret created by the operator with the database pod name.
  3. Check the pod logs

    1. Run oc logs <database-podname>

      ℹ️
      You can save the logs by running oc logs <database-podname> > <filename>.log
  4. Check if you are able to see any useful information in the logs which can lead you for the root cause of the issue. Also, by capturing the logs you are able to provide a required information for its maintainers if it be required.

    1. Check if the Database image was pulled successfully.

  5. Check the operator pod is present as it is responsible for managing the service pod as described in UnifiedPushOperatorDown

  6. In order to fix it, try to deploy it again by running oc rollout --latest dc/unifiedpush-postgresql

UnifiedPushJavaNonHeapThresholdExceeded

This alert indicates that the Service pod(s) is/are facing performance issues.

  1. Please following the To capture the logs procedure in order to capture the required information to send it to its maintainers.

  2. Following the steps To scale the pod in order to try to solve performance issues.

UnifiedPushJavaGCTimePerMinuteScavenge

This alert indicates that the Service pod(s) is/are facing performance issues.

  1. Please following the To capture the logs procedure in order to capture the required information to send it to its maintainers.

  2. Following the steps To scale the pod in order to try to solve performance issues.

Warning

UnifiedPushMessagesFailures

This alert indicates that the Service pod(s) has some error which is preventing it sending the quantity of messages expected.

  1. Please following the To capture the logs procedure in order to capture the required information to send it to its maintainers.

To capture the logs

  1. Capture a snapshot of the 'UnifiedPush Server' Grafana dashboard and track it over time. The metrics can be useful for identifying performance issues over time.

  2. Capture application logs for analysis.

    1. Get the pod names by running oc get pods. Following an example of teh expected result.

      $ oc get pods
      NAME                                           READY     STATUS    RESTARTS   AGE
      example-unifiedpushserver-1-dk8vm              2/2       Running   2          9d
      example-unifiedpushserver-postgresql-1-bw8mt   1/1       Running   1          9d
      unifiedpush-operator-58c8877fd8-g6dvr          1/1       Running   3          9d
    2. Save the logs by running oc logs <database-podname> > <filename>.log for each pod

      ℹ️
      You can get the logs from the Console (OCP UI) as well.
      Capture this data will be useful in order to provide the required information for its maintainers are able to check it.

To scale the pod

Currently, it is not possible scale the UPS Server and its Database

Validate

Installation

Follow these steps to ensure that the installation completed as expected.

  1. Switch to the UPS namespace by running oc project <namespace>. E.g oc project unifiedpush

  2. Check that the UnifiedPushServer CR or UnifiedPushServerWithBackup CR is deployed in the same namespace as the operator by running oc get UnifiedPushServer. Following the expected result.

    ℹ️
    Just one kind of UnifiedPushServer CR can be applied, however, if the backup service is enable for your installation then it means that it is using the UnifiedPushServerWithBackup CR.
    $ oc get UnifiedPushServer
    NAME                        AGE
    example-unifiedpushserver   9d
    This CR instructs the operator to install and configure the Database and the Service pods. If there is any issues with the creation of any of the following resources the logs of the operator should be checked for relevant errors.
    💡
    Logs can be saved by running oc logs <podname> > <filename>.log. The logs can provide useful information in order to identify the root cause of the issue. They may also contain useful information which should be included when creating any issues against the project for maintainers to check.
  3. Check that there are at least 3 pods running in the namspace (the Database, Server and Operator) by running oc get pods. The following is an example of the expected result.

    $ oc get pods
    NAME                                           READY     STATUS    RESTARTS   AGE
    example-unifiedpushserver-1-dk8vm              2/2       Running   4          12d
    example-unifiedpushserver-postgresql-1-bw8mt   1/1       Running   2          12d
    unifiedpush-operator-58c8877fd8-g6dvr          1/1       Running   6          12d
  4. Check that the secret with the Database data which will be used by the service and its database was created by running oc get secrets | grep postgresql. The following is an example of the expected result.

    $ oc get secrets | grep postgresql
    example-unifiedpushserver-postgresql        Opaque                                6         12d
  5. Check that the route to expose the service was created successfully by running oc get route | grep unifiedpush-proxy. The following is an example of the expected result.

    $ oc get route | grep unifiedpush-proxy
    example-unifiedpushserver-unifiedpush-proxy   example-unifiedpushserver-unifiedpush-proxy-unifiedpush.192.168.64.27.nip.io             example-unifiedpushserver-unifiedpush-proxy   <all>     edge/None     None
  6. Check that the Deployments to deploy the Service and Database were created with success by running oc get deployment | grep unifiedpush. The following is an example of the expected result.

    $ oc get deployment | grep unifiedpush
    example-unifiedpushserver              1         1         1            1           3m
    example-unifiedpushserver-postgresql   1         1         1            1           25m
  7. Check that the Proxy Service which is required to allow the UPS Server persist data into its Database was created with success by running oc get service | grep unifiedpush-proxy

    $ oc get service | grep unifiedpush-proxy
    example-unifiedpushserver-unifiedpush-proxy   ClusterIP   172.30.189.9     <none>        80/TCP     12d
  8. Check that the Service for the Database was created with success by running oc get service | grep postgresql

    $ oc get service | grep postgresql
    example-unifiedpushserver-postgresql          ClusterIP   172.30.67.199    <none>        5432/TCP   12d
  9. Check that the Service for the Service was created with success by running oc get service | grep unifiedpushserver

    $ oc get service | grep unifiedpushserver
    example-unifiedpushserver-postgresql          ClusterIP   172.30.67.199    <none>        5432/TCP   12d
    example-unifiedpushserver-unifiedpush         ClusterIP   172.30.90.23     <none>        80/TCP     12d
    example-unifiedpushserver-unifiedpush-proxy   ClusterIP   172.30.189.9     <none>        80/TCP     12d

    Following an example of an installation which has the UPS installed without the Backup.

    $ oc get all
    NAME                                                        READY   STATUS    RESTARTS   AGE
    pod/example-unifiedpushserver-78cdcd6589-2l2s5              2/2     Running   0          4m
    pod/example-unifiedpushserver-postgresql-5d44cbb4f5-sxssg   1/1     Running   0          25m
    pod/unifiedpush-operator-fccb9d9d8-h9cz4                    1/1     Running   0          26m
    
    NAME                                                  TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)             AGE
    service/example-unifiedpushserver-postgresql          ClusterIP   172.30.71.206    <none>        5432/TCP            25m
    service/example-unifiedpushserver-unifiedpush         ClusterIP   172.30.27.116    <none>        80/TCP              25m
    service/example-unifiedpushserver-unifiedpush-proxy   ClusterIP   172.30.10.53     <none>        80/TCP              25m
    service/unifiedpush-operator-metrics                  ClusterIP   172.30.131.190   <none>        8383/TCP,8686/TCP   26m
    
    NAME                                                   DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
    deployment.apps/example-unifiedpushserver              1         1         1            1           4m
    deployment.apps/example-unifiedpushserver-postgresql   1         1         1            1           25m
    deployment.apps/unifiedpush-operator                   1         1         1            1           26m
    
    NAME                                                              DESIRED   CURRENT   READY   AGE
    replicaset.apps/example-unifiedpushserver-78cdcd6589              1         1         1       4m
    replicaset.apps/example-unifiedpushserver-postgresql-5d44cbb4f5   1         1         1       25m
    replicaset.apps/unifiedpush-operator-fccb9d9d8                    1         1         1       26m
    
    NAME                                                                   HOST/PORT                                                                      PATH   SERVICES                                      PORT    TERMINATION   WILDCARD
    route.route.openshift.io/example-unifiedpushserver-unifiedpush-proxy   example-unifiedpushserver-unifiedpush-proxy-unifiedpush.192.168.42.58.nip.io          example-unifiedpushserver-unifiedpush-proxy   <all>   edge/None     None

Optional configurations

Monitor

If the Monitoring Service (Metrics) is enabled for the installation, a Grafana Dashboard titled UnifiedPush Operator, and the Prometheus Monitoring instance are created.

Backup

  1. Switch to the UPS namespace by running oc project <namespace>. E.g oc project unifiedpush

  2. Check that UnifiedPushServerWithBackup CR is deployed in the same namespace as the operator by running oc get UnifiedPushServer. Following the expected result.

    $ oc get UnifiedPushServer
    NAME                        AGE
    example-unifiedpushserver   9d
    ℹ️
    Just one kind of UnifiedPushServer CR can be applied, however, if the backup service is enable for your installation then it means that it is using the UnifiedPushServerWithBackup CR.
  3. To ensure that it is the UnifiedPushServer with the Backup see its specs by running oc describe UnifiedPushServer.

    1. Following an example without Backup installed.

      $ oc describe UnifiedPushServer
      Name:         example-unifiedpushserver
      Namespace:    unifiedpush
      Labels:       <none>
      Annotations:  kubectl.kubernetes.io/last-applied-configuration={"apiVersion":"push.aerogear.org/v1alpha1","kind":"UnifiedPushServer","metadata":{"annotations":{},"name":"example-unifiedpushserver","namespace":"unif...
      API Version:  push.aerogear.org/v1alpha1
      Kind:         UnifiedPushServer
      Metadata:
        Creation Timestamp:  2019-07-04T00:44:47Z
        Generation:          1
        Resource Version:    7026921
        Self Link:           /apis/push.aerogear.org/v1alpha1/namespaces/unifiedpush/unifiedpushservers/example-unifiedpushserver
        UID:                 ec430bf1-9df4-11e9-817f-beb071062273
      Status:
        Phase:  Complete
      Events:   <none>
    2. Following an example with the Backup

      $ oc describe UnifiedPushServer
      Name:         example-unifiedpushserver
      Namespace:    unifiedpush
      Labels:       <none>
      Annotations:  kubectl.kubernetes.io/last-applied-configuration={"apiVersion":"push.aerogear.org/v1alpha1","kind":"UnifiedPushServer","metadata":{"annotations":{},"name":"example-unifiedpushserver","namespace":"unif...
      API Version:  push.aerogear.org/v1alpha1
      Kind:         UnifiedPushServer
      Metadata:
        Creation Timestamp:  2019-07-04T00:44:47Z
        Generation:          1
        Resource Version:    7026921
        Self Link:           /apis/push.aerogear.org/v1alpha1/namespaces/unifiedpush/unifiedpushservers/example-unifiedpushserver
        UID:                 ec430bf1-9df4-11e9-817f-beb071062273
      Status:
        Phase:  Complete
      Events:   <none>
      
      
      Name:         example-ups-with-backups
      Namespace:    unifiedpush
      Labels:       <none>
      Annotations:  kubectl.kubernetes.io/last-applied-configuration={"apiVersion":"push.aerogear.org/v1alpha1","kind":"UnifiedPushServer","metadata":{"annotations":{},"name":"example-ups-with-backups","namespace":"unifi...
      API Version:  push.aerogear.org/v1alpha1
      Kind:         UnifiedPushServer
      Metadata:
        Creation Timestamp:  2019-07-16T08:51:47Z
        Generation:          1
        Resource Version:    8621940
        Self Link:           /apis/push.aerogear.org/v1alpha1/namespaces/unifiedpush/unifiedpushservers/example-ups-with-backups
        UID:                 f20c5f0b-a7a6-11e9-a6b1-beb071062273
      Spec:
        Backups:
          Backend Secret Name:              example-aws-key
          Backend Secret Namespace:         unifiedpush
          Encryption Key Secret Name:       example-encryption-key
          Encryption Key Secret Namespace:  unifiedpush
          Name:                             ups-daily-at-midnight
          Schedule:                         0 0 * * *
      Events:                               <none>
  4. To verify that the backup has been successfully created you can run the following command in the namespace where the operator is installed.

    $ oc get cronjob.batch/example-ups-with-backups
    NAME                             SCHEDULE      SUSPEND   ACTIVE    LAST SCHEDULE   AGE
    example-ups-with-backups   0 * * * *   False     0         13s             12m
  5. To check the jobs executed you can run the command oc get jobs in the namespace where the operator is installed as in the following example.

    $ oc get jobs
    NAME                                 DESIRED   SUCCESSFUL   AGE
    example-ups-with-backups-1561588320   1         0            6m
    example-ups-with-backups-1561588380   1         0            5m
    example-ups-with-backups-1561588440   1         0            4m
    example-ups-with-backups-1561588500   1         0            3m
    example-ups-with-backups-1561588560   1         0            2m
    example-ups-with-backups-1561588620   1         0            1m
    example-ups-with-backups-1561588680   1         0            43s
    ℹ️
    In the above example the schedule was made to run this job each minute (*/1 * * * *)
  6. To check the logs and troubleshooting you can run the command oc logs $podName -f in the namespace where the operator is installed as the following example.

    $ oc logs job.batch/example-ups-with-backups-1561589040 -f
    dumping ups
    dumping postgres
    ==> Component data dump completed
    /tmp/intly/archives/ups.ups-22_46_06.pg_dump.gz
    WARNING: ups.ups-22_46_06.pg_dump.gz: Owner username not known. Storing UID=1001 instead.
    upload: '/tmp/intly/archives/ups.ups-22_46_06.pg_dump.gz' -> 's3://camilabkp/backups/mss/postgres/2019/06/26/ups.ups-22_46_06.pg_dump.gz'  [1 of 1]
    1213 of 1213   100% in    1s   955.54 B/s  done
    ERROR: S3 error: 403 (RequestTimeTooSkewed): The difference between the request time and the current time is too large.