Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Helm FlowForge install failed: context deadline exceeded #330

Closed
wadebev11 opened this issue Feb 27, 2024 · 11 comments
Closed

Helm FlowForge install failed: context deadline exceeded #330

wadebev11 opened this issue Feb 27, 2024 · 11 comments
Labels
needs-triage Needs looking at to decide what to do

Comments

@wadebev11
Copy link

Current Behavior

When running helm upgrade --atomic --install --timeout 10m flowforge flowforge/flowforge -f customization.yml, I get the following error:
Error: release flowforge failed, and has been uninstalled due to atomic being set: context deadline exceeded

Expected Behavior

helm should be able to correctly install flowforge.

Steps To Reproduce

  1. Following the Kubernetes Install docs, install kubectl and helm
  2. Next following the AWS EKS Specific details, set up the aws cli, eksctl, then set up a domain in Route53 and create a new AWS wildcard certificate
  3. Create the cluster through eksctl create cluster -f cluster.yml
    cluster.yml:
apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig

metadata:
  name: FlowFuse
  region: us-east-1

nodeGroups:
  - name: management
    labels:
      role: "management"
    instanceType: t2.small
    desiredCapacity: 1
    volumeSize: 20
    ssh:
      allow: false
    iam:
      withAddonPolicies:
        efs: true
  - name: instance
    labels:
      role: "projects"
    tags:
      k8s.io/cluster-autoscaler/enabled: "true"
      k8s.io/cluster-autoscaler/flowforge: "owned"
    instanceType: t2.small
    desiredCapacity: 2
    volumeSize:
    ssh:
      allow: false
  1. Install ingress-nginx with
helm repo add ingress-nginx https://kubernetes.github.io/ingress-nginx
helm install ingress-nginx --values nginx-values.yaml ingress-nginx/ingress-nginx

nginx-values.yaml:

controller:
  # publishService required to Allow ELB Alias for DNS registration w/ external-dns
  publishService:
    enabled: true
  tcp:
    configNameSpace: $(POD_NAMESPACE)/tcp-services
  udp:
    configNameSpace: $(POD_NAMESPACE)/udp-services
  config:
    proxy-body-size: "0"
  service:
    # AWS Annotations for LoadBalaner with Certificate ARN
    annotations:
      service.beta.kubernetes.io/aws-load-balancer-ssl-cert: <cert created above>
      service.beta.kubernetes.io/aws-load-balancer-backend-protocol: "tcp"
      service.beta.kubernetes.io/aws-load-balancer-ssl-ports: "443"
      service.beta.kubernetes.io/aws-load-balancer-type: "nlb"
      service.beta.kubernetes.io/aws-load-balancer-connection-idle-timeout: "120"
    # TLS (https) terminated at ELB, so internal endpoint is 'http'
    targetPorts:
      https: http
  1. Set the ingress-nginx as the default with kubectl annotate ingressclass nginx ingressclass.kubernetes.io/is-default-class=true
  2. Add the flowforge repo to helm with
helm repo add flowforge https://flowfuse.github.io/helm
helm repo update
  1. Try installing flowforge with helm upgrade --atomic --install --timeout 10m flowforge flowforge/flowforge -f customization.yml
    customization.yml:
forge:
  entryPoint: forge.<my domain>.com
  domain: <my domain>.com
  https: false
  localPostgresql: true
  cloudProvider: aws
  aws:
    IAMRole: <FlowFuse-cluster-ServiceRole>

Result:

Release "flowforge" does not exist. Installing it now.
Error: release flowforge failed, and has been uninstalled due to atomic being set: client rate limiter Wait returned an error: context deadline exceeded

Output when running with --debug flag

history.go:56: [debug] getting history for release flowforge
Release "flowforge" does not exist. Installing it now.
install.go:214: [debug] Original chart version: ""
install.go:231: [debug] CHART PATH: /home/wade/.cache/helm/repository/flowforge-2.1.0.tgz

client.go:142: [debug] creating 15 resource(s)
wait.go:48: [debug] beginning wait for 15 resources with timeout of 5m0s
ready.go:303: [debug] Deployment is not ready: default/flowforge. 0 out of 1 expected pods are ready
ready.go:303: [debug] Deployment is not ready: default/flowforge. 0 out of 1 expected pods are ready
ready.go:303: [debug] Deployment is not ready: default/flowforge. 0 out of 1 expected pods are ready
ready.go:303: [debug] Deployment is not ready: default/flowforge. 0 out of 1 expected pods are ready
ready.go:303: [debug] Deployment is not ready: default/flowforge. 0 out of 1 expected pods are ready
ready.go:303: [debug] Deployment is not ready: default/flowforge. 0 out of 1 expected pods are ready
[... continued for 10 minutes ...]
ready.go:303: [debug] Deployment is not ready: default/flowforge. 0 out of 1 expected pods are ready
ready.go:303: [debug] Deployment is not ready: default/flowforge. 0 out of 1 expected pods are ready
install.go:488: [debug] Install failed and atomic is set, uninstalling release
uninstall.go:102: [debug] uninstall: Deleting flowforge
uninstall.go:248: [debug] uninstall: given cascade value: , defaulting to delete propagation background
client.go:486: [debug] Starting delete for "flowforge-ingress" Ingress
client.go:486: [debug] Starting delete for "flowforge-postgresql-hl" Service
client.go:486: [debug] Starting delete for "flowforge-postgresql" Service
client.go:486: [debug] Starting delete for "forge" Service
client.go:486: [debug] Starting delete for "flowforge-postgresql" StatefulSet
client.go:486: [debug] Starting delete for "flowforge" Deployment
client.go:486: [debug] Starting delete for "create-pod" RoleBinding
client.go:486: [debug] Starting delete for "create-pod" ClusterRole
client.go:486: [debug] Starting delete for "flowforge-config" ConfigMap
client.go:486: [debug] Starting delete for "flowfuse-secrets" Secret
client.go:486: [debug] Starting delete for "flowforge-postgresql" Secret
client.go:486: [debug] Starting delete for "editors" ServiceAccount
client.go:486: [debug] Starting delete for "flowforge" ServiceAccount
client.go:486: [debug] Starting delete for "flowforge-database-policy" NetworkPolicy
client.go:486: [debug] Starting delete for "flowforge" Namespace
uninstall.go:155: [debug] purge requested for flowforge
Error: release flowforge failed, and has been uninstalled due to atomic being set: client rate limiter Wait returned an error: context deadline exceeded
helm.go:84: [debug] client rate limiter Wait returned an error: context deadline exceeded
release flowforge failed, and has been uninstalled due to atomic being set
helm.sh/helm/v3/pkg/action.(*Install).failRelease
        helm.sh/helm/v3/pkg/action/install.go:496
helm.sh/helm/v3/pkg/action.(*Install).RunWithContext
        helm.sh/helm/v3/pkg/action/install.go:394
main.runInstall
        helm.sh/helm/v3/cmd/helm/install.go:306
main.newUpgradeCmd.func2
        helm.sh/helm/v3/cmd/helm/upgrade.go:146
github.com/spf13/cobra.(*Command).execute
        github.com/spf13/cobra@v1.8.0/command.go:983
github.com/spf13/cobra.(*Command).ExecuteC
        github.com/spf13/cobra@v1.8.0/command.go:1115
github.com/spf13/cobra.(*Command).Execute
        github.com/spf13/cobra@v1.8.0/command.go:1039
main.main
        helm.sh/helm/v3/cmd/helm/helm.go:83
runtime.main
        runtime/proc.go:267
runtime.goexit
        runtime/asm_amd64.s:1650

I tried upping the timeout to 30 minutes and even 60 minutes and got the same thing each time

Environment

  • FlowFuse version: v2.1.0
  • Node.js version: Unknown, whatever helm uses
  • npm version: Unknown, whatever helm uses
  • Platform/OS: Amazon Linux 2
  • Browser:
@wadebev11 wadebev11 added the needs-triage Needs looking at to decide what to do label Feb 27, 2024
@hardillb
Copy link
Contributor

With only one node with the label role: projects you have no where to run the Forge App as it will expect to run on a node with the label role: management by default.

If you want to run a single node cluster you need to either remove both sets of labels,

forge:
...
  managementSelector:
  projectSelector:

or set them both the same:

forge:
...
  managementSelector:
    role: projects
  projectSelector:
    role: projects

@wadebev11
Copy link
Author

It's not a single node cluster though.
kubectl get nodes:

NAME                             STATUS   ROLES    AGE   VERSION
ip-192-168-39-136.ec2.internal   Ready    <none>   24h   v1.27.9-eks-5e0fdde
ip-192-168-39-66.ec2.internal    Ready    <none>   24h   v1.27.9-eks-5e0fdde
ip-192-168-5-226.ec2.internal    Ready    <none>   24h   v1.27.9-eks-5e0fdde

When looking at the describe for these nodes, I see one with role=management and the other two have role=projects

@hardillb
Copy link
Contributor

Sorry, miss read the eksctl file on my phone.

First thing increase the helm timeout to 20mins

Then in a separate terminal run kubectl get all and kubectl describe flowforge-postgresql-0 after about 5mins

Best guess will be you need to provide a Storage Class for the local PostgreSQL install to use as disk space.

@hardillb
Copy link
Contributor

You may need to add

addons:
  - name: aws-ebs-csi-driver
    version: "v1.27.0-eksbuild.1"
    resolveConflicts: overwrite

To your eksctl file to include the ebs storage class these days.

@hardillb
Copy link
Contributor

@ppawlowski When you get a moment can you look at the https://flowfuse.com/docs/install/kubernetes/aws/ and see if it needs updating based on what we did for the scaling test cluster.

@wadebev11
Copy link
Author

Here's the output from those commands while running the install command, but before I added the addons section to my cluster.yml file:

$ kubectl get all
NAME                                            READY   STATUS             RESTARTS      AGE
pod/flowforge-759f68f595-bsqnj                  0/1     CrashLoopBackOff   5 (88s ago)   4m28s
pod/flowforge-postgresql-0                      0/1     Pending            0             4m28s
pod/ingress-nginx-controller-7bf6cbf688-lcz4h   1/1     Running            0             2d7h

NAME                                         TYPE           CLUSTER-IP       EXTERNAL-IP                                                                     PORT(S)                      AGE
service/flowforge-postgresql                 ClusterIP      10.100.217.139   <none>                                                                          5432/TCP                     4m28s
service/flowforge-postgresql-hl              ClusterIP      None             <none>                                                                          5432/TCP                     4m28s
service/forge                                ClusterIP      10.100.70.80     <none>                                                                          80/TCP                       4m28s
service/ingress-nginx-controller             LoadBalancer   10.100.219.197   <external IP>   80:30982/TCP,443:30666/TCP   2d7h
service/ingress-nginx-controller-admission   ClusterIP      10.100.75.164    <none>                                                                          443/TCP                      2d7h
service/kubernetes                           ClusterIP      10.100.0.1       <none>                                                                          443/TCP                      2d7h

NAME                                       READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/flowforge                  0/1     1            0           4m29s
deployment.apps/ingress-nginx-controller   1/1     1            1           2d7h

NAME                                                  DESIRED   CURRENT   READY   AGE
replicaset.apps/flowforge-759f68f595                  1         1         0       4m29s
replicaset.apps/ingress-nginx-controller-7bf6cbf688   1         1         1       2d7h

NAME                                    READY   AGE
statefulset.apps/flowforge-postgresql   0/1     4m29s
$ kubectl describe pod/flowforge-postgresql-0
Name:             flowforge-postgresql-0
Namespace:        default
Priority:         0
Service Account:  default
Node:             <none>
Labels:           app.kubernetes.io/component=primary
                  app.kubernetes.io/instance=flowforge
                  app.kubernetes.io/managed-by=Helm
                  app.kubernetes.io/name=postgresql
                  apps.kubernetes.io/pod-index=0
                  controller-revision-hash=flowforge-postgresql-769f6d4dfd
                  helm.sh/chart=postgresql-11.9.13
                  statefulset.kubernetes.io/pod-name=flowforge-postgresql-0
Annotations:      <none>
Status:           Pending
IP:
IPs:              <none>
Controlled By:    StatefulSet/flowforge-postgresql
Containers:
  postgresql:
    Image:      docker.io/bitnami/postgresql:14.10.0-debian-11-r30
    Port:       5432/TCP
    Host Port:  0/TCP
    Requests:
      cpu:      250m
      memory:   256Mi
    Liveness:   exec [/bin/sh -c exec pg_isready -U "forge" -d "dbname=flowforge" -h 127.0.0.1 -p 5432] delay=30s timeout=5s period=10s #success=1 #failure=6
    Readiness:  exec [/bin/sh -c -e exec pg_isready -U "forge" -d "dbname=flowforge" -h 127.0.0.1 -p 5432
[ -f /opt/bitnami/postgresql/tmp/.initialized ] || [ -f /bitnami/postgresql/.initialized ]
] delay=5s timeout=5s period=10s #success=1 #failure=6
    Environment:
      BITNAMI_DEBUG:                        false
      POSTGRESQL_PORT_NUMBER:               5432
      POSTGRESQL_VOLUME_DIR:                /bitnami/postgresql
      PGDATA:                               /bitnami/postgresql/data
      POSTGRES_USER:                        forge
      POSTGRES_POSTGRES_PASSWORD:           <set to the key 'postgres-password' in secret 'flowforge-postgresql'>  Optional: false
      POSTGRES_PASSWORD:                    <set to the key 'password' in secret 'flowforge-postgresql'>           Optional: false
      POSTGRES_DB:                          flowforge
      POSTGRESQL_ENABLE_LDAP:               no
      POSTGRESQL_ENABLE_TLS:                no
      POSTGRESQL_LOG_HOSTNAME:              false
      POSTGRESQL_LOG_CONNECTIONS:           false
      POSTGRESQL_LOG_DISCONNECTIONS:        false
      POSTGRESQL_PGAUDIT_LOG_CATALOG:       off
      POSTGRESQL_CLIENT_MIN_MESSAGES:       error
      POSTGRESQL_SHARED_PRELOAD_LIBRARIES:  pgaudit
    Mounts:
      /bitnami/postgresql from data (rw)
      /dev/shm from dshm (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-d78x4 (ro)
Volumes:
  data:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  data-flowforge-postgresql-0
    ReadOnly:   false
  dshm:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:     Memory
    SizeLimit:  <unset>
  kube-api-access-d78x4:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   Burstable
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:                      <none>

I then destroyed the cluster in eks, added the addons section you suggested to my cluster.yml file and reran the Steps to Reproduce from the original post.

I got the same context deadline exceeded issue as at the beginning, but got some data a few minutes in as it was running.

$ kubectl get all
NAME                                            READY   STATUS             RESTARTS      AGE
pod/flowforge-59645cb6b9-8r6dk                  0/1     CrashLoopBackOff   3 (15s ago)   92s
pod/flowforge-postgresql-0                      0/1     Pending            0             92s
pod/ingress-nginx-controller-7bf6cbf688-5sgfh   1/1     Running            0             3m34s

NAME                                         TYPE           CLUSTER-IP       EXTERNAL-IP                                                                     PORT(S)                      AGE
service/flowforge-postgresql                 ClusterIP      10.100.190.206   <none>                                                                          5432/TCP                     92s
service/flowforge-postgresql-hl              ClusterIP      None             <none>                                                                          5432/TCP                     92s
service/forge                                ClusterIP      10.100.57.164    <none>                                                                          80/TCP                       92s
service/ingress-nginx-controller             LoadBalancer   10.100.102.82    <external IP>   3m34s
service/ingress-nginx-controller-admission   ClusterIP      10.100.145.209   <none>                                                                          443/TCP                      3m34s
service/kubernetes                           ClusterIP      10.100.0.1       <none>                                                                          443/TCP                      15m

NAME                                       READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/flowforge                  0/1     1            0           92s
deployment.apps/ingress-nginx-controller   1/1     1            1           3m34s

NAME                                                  DESIRED   CURRENT   READY   AGE
replicaset.apps/flowforge-59645cb6b9                  1         1         0       92s
replicaset.apps/ingress-nginx-controller-7bf6cbf688   1         1         1       3m34s

NAME                                    READY   AGE
statefulset.apps/flowforge-postgresql   0/1     92s
$ kubectl describe pod/flowforge-postgresql-0
Name:             flowforge-postgresql-0
Namespace:        default
Priority:         0
Service Account:  default
Node:             <none>
Labels:           app.kubernetes.io/component=primary
                  app.kubernetes.io/instance=flowforge
                  app.kubernetes.io/managed-by=Helm
                  app.kubernetes.io/name=postgresql
                  controller-revision-hash=flowforge-postgresql-769f6d4dfd
                  helm.sh/chart=postgresql-11.9.13
                  statefulset.kubernetes.io/pod-name=flowforge-postgresql-0
Annotations:      <none>
Status:           Pending
IP:
IPs:              <none>
Controlled By:    StatefulSet/flowforge-postgresql
Containers:
  postgresql:
    Image:      docker.io/bitnami/postgresql:14.10.0-debian-11-r30
    Port:       5432/TCP
    Host Port:  0/TCP
    Requests:
      cpu:      250m
      memory:   256Mi
    Liveness:   exec [/bin/sh -c exec pg_isready -U "forge" -d "dbname=flowforge" -h 127.0.0.1 -p 5432] delay=30s timeout=5s period=10s #success=1 #failure=6
    Readiness:  exec [/bin/sh -c -e exec pg_isready -U "forge" -d "dbname=flowforge" -h 127.0.0.1 -p 5432
[ -f /opt/bitnami/postgresql/tmp/.initialized ] || [ -f /bitnami/postgresql/.initialized ]
] delay=5s timeout=5s period=10s #success=1 #failure=6
    Environment:
      BITNAMI_DEBUG:                        false
      POSTGRESQL_PORT_NUMBER:               5432
      POSTGRESQL_VOLUME_DIR:                /bitnami/postgresql
      PGDATA:                               /bitnami/postgresql/data
      POSTGRES_USER:                        forge
      POSTGRES_POSTGRES_PASSWORD:           <set to the key 'postgres-password' in secret 'flowforge-postgresql'>  Optional: false
      POSTGRES_PASSWORD:                    <set to the key 'password' in secret 'flowforge-postgresql'>           Optional: false
      POSTGRES_DB:                          flowforge
      POSTGRESQL_ENABLE_LDAP:               no
      POSTGRESQL_ENABLE_TLS:                no
      POSTGRESQL_LOG_HOSTNAME:              false
      POSTGRESQL_LOG_CONNECTIONS:           false
      POSTGRESQL_LOG_DISCONNECTIONS:        false
      POSTGRESQL_PGAUDIT_LOG_CATALOG:       off
      POSTGRESQL_CLIENT_MIN_MESSAGES:       error
      POSTGRESQL_SHARED_PRELOAD_LIBRARIES:  pgaudit
    Mounts:
      /bitnami/postgresql from data (rw)
      /dev/shm from dshm (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-cwpvp (ro)
Volumes:
  data:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  data-flowforge-postgresql-0
    ReadOnly:   false
  dshm:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:     Memory
    SizeLimit:  <unset>
  kube-api-access-cwpvp:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   Burstable
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:                      <none>

I noticed the non postgres pod had a status of CrashLoopBackOff so I included that as well

kubectl describe pod/flowforge-59645cb6b9-8r6dk
Name:             flowforge-59645cb6b9-8r6dk
Namespace:        default
Priority:         0
Service Account:  flowforge
Node:             ip-192-168-2-48.ec2.internal/192.168.2.48
Start Time:       Wed, 28 Feb 2024 19:31:37 -0700
Labels:           app=flowforge
                  pod-template-hash=59645cb6b9
Annotations:      checksum/config: 71d10683982950dcd0c783d3eff503e7c3f3b3ccc4b4a5ce1401133049169f76
Status:           Running
SeccompProfile:   RuntimeDefault
IP:               192.168.28.218
IPs:
  IP:           192.168.28.218
Controlled By:  ReplicaSet/flowforge-59645cb6b9
Init Containers:
  config:
    Container ID:  containerd://6f8a7fc9953b4ec1bb5cad47dfad4c634dc08f80caafc307cf810cb798efea56
    Image:         ruby:2.7-slim
    Image ID:      docker.io/library/ruby@sha256:39f68389fd3fe8c04ed896b10cc90ba4553a42e555ae18b37b41ea8c3ce965fd
    Port:          <none>
    Host Port:     <none>
    Command:
      sh
      -c
      erb /tmpl/flowforge.yml > /config/flowforge.yml
    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Wed, 28 Feb 2024 19:31:42 -0700
      Finished:     Wed, 28 Feb 2024 19:31:42 -0700
    Ready:          True
    Restart Count:  0
    Environment:
      PGPASSWORD:                   <set to the key 'password' in secret 'flowfuse-secrets'>       Optional: false
      SMTPPASSWORD:                 <set to the key 'smtp-password' in secret 'flowfuse-secrets'>  Optional: true
      AWS_STS_REGIONAL_ENDPOINTS:   regional
      AWS_DEFAULT_REGION:           us-east-1
      AWS_REGION:                   us-east-1
      AWS_ROLE_ARN:                 arn:aws:iam::617078758495:role/eksctl-FlowFuse-cluster-ServiceRole-6yjc4bcfmKAM
      AWS_WEB_IDENTITY_TOKEN_FILE:  /var/run/secrets/eks.amazonaws.com/serviceaccount/token
    Mounts:
      /config from configdir (rw)
      /tmpl from configtemplate (rw)
      /var/run/secrets/eks.amazonaws.com/serviceaccount from aws-iam-token (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-7vwcb (ro)
Containers:
  forge:
    Container ID:   containerd://39f874682341330f471ff1df3fe008670a8211d04531c4a8120c850a64c23cf8
    Image:          flowfuse/forge-k8s:2.1.1
    Image ID:       docker.io/flowfuse/forge-k8s@sha256:2bd19321e8a5d7388121ae3e4310ad8e4608514a57c4b4e573c757a7391f6f92
    Port:           3000/TCP
    Host Port:      0/TCP
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Error
      Exit Code:    1
      Started:      Wed, 28 Feb 2024 19:35:14 -0700
      Finished:     Wed, 28 Feb 2024 19:35:16 -0700
    Ready:          False
    Restart Count:  5
    Liveness:       http-get http://:3000/ delay=10s timeout=5s period=10s #success=1 #failure=3
    Readiness:      http-get http://:3000/ delay=10s timeout=5s period=10s #success=1 #failure=3
    Environment:
      EDITOR_SERVICE_ACCOUNT:       editors
      FLOWFORGE_CLOUD_PROVIDER:     aws
      NODE_ENV:                     production
      AWS_STS_REGIONAL_ENDPOINTS:   regional
      AWS_DEFAULT_REGION:           us-east-1
      AWS_REGION:                   us-east-1
      AWS_ROLE_ARN:                 arn:aws:iam::617078758495:role/eksctl-FlowFuse-cluster-ServiceRole-6yjc4bcfmKAM
      AWS_WEB_IDENTITY_TOKEN_FILE:  /var/run/secrets/eks.amazonaws.com/serviceaccount/token
    Mounts:
      /usr/src/forge/etc from configdir (rw)
      /var/run/secrets/eks.amazonaws.com/serviceaccount from aws-iam-token (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-7vwcb (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             False
  ContainersReady   False
  PodScheduled      True
Volumes:
  aws-iam-token:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  86400
  configdir:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:
    SizeLimit:  <unset>
  configtemplate:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      flowforge-config
    Optional:  false
  kube-api-access-7vwcb:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   BestEffort
Node-Selectors:              role=management
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason     Age                    From               Message
  ----     ------     ----                   ----               -------
  Normal   Scheduled  3m46s                  default-scheduler  Successfully assigned default/flowforge-59645cb6b9-8r6dk to ip-192-168-2-48.ec2.internal
  Normal   Pulling    3m45s                  kubelet            Pulling image "ruby:2.7-slim"
  Normal   Pulled     3m41s                  kubelet            Successfully pulled image "ruby:2.7-slim" in 3.819829456s (3.819839677s including waiting)
  Normal   Created    3m41s                  kubelet            Created container config
  Normal   Started    3m41s                  kubelet            Started container config
  Normal   Pulled     3m20s                  kubelet            Successfully pulled image "flowfuse/forge-k8s:2.1.1" in 20.095645619s (20.095656328s including waiting)
  Normal   Pulled     3m16s                  kubelet            Successfully pulled image "flowfuse/forge-k8s:2.1.1" in 146.118795ms (146.128364ms including waiting)
  Normal   Started    2m53s (x3 over 3m19s)  kubelet            Started container forge
  Normal   Pulled     2m53s                  kubelet            Successfully pulled image "flowfuse/forge-k8s:2.1.1" in 212.731927ms (212.7432ms including waiting)
  Warning  BackOff    2m45s (x6 over 3m14s)  kubelet            Back-off restarting failed container forge in pod flowforge-59645cb6b9-8r6dk_default(a1610784-0141-42ee-9353-64a76b11b354)
  Normal   Pulling    2m31s (x4 over 3m40s)  kubelet            Pulling image "flowfuse/forge-k8s:2.1.1"
  Normal   Created    2m31s (x4 over 3m20s)  kubelet            Created container forge
  Normal   Pulled     2m31s                  kubelet            Successfully pulled image "flowfuse/forge-k8s:2.1.1" in 200.968763ms (200.979014ms including waiting)

Sorry for the information dump, I'm new to kubernetes and flowfuse, so not quite sure what's needed to troubleshoot this

@ppawlowski
Copy link
Contributor

pod/flowforge-postgresql-0                      0/1     Pending            0             92s

Pod with database is not able to be deployed which is the main reason the setup process fails.
In order to determine the reason of pending pod, please:

  • run installation with following command
    helm upgrade --install flowforge flowforge/flowforge -f customization.yml
    (installation will complete with success but pods will not become ready)
  • after 5-10 minutes describe the database pod like previously - the pending reason should appear in the Events section at the bottom of the output
    kubectl describe pod/flowforge-postgresql-0
  • additionally, please collect some storage-related data by running:
kubectl get pv,pvc,storageclasses
kubectl describe pvc data-flowforge-postgresql-0
kubectl describe pv $(kubectl get pvc data-flowforge-postgresql-0 -o jsonpath='{.spec.volumeName}')
  • cleanup by executing helm uninstall flowforge

@hardillb
Copy link
Contributor

hardillb commented Feb 29, 2024

Just to be clear, the AWS defaults have changed with newer releases of Kubernetes and the instructions have not yet been updated.

I think there are a couple more bits that need adding to the eksctl config file to enable ebs volumes.

I think the management node-group also needs the ebs: true adding to the withAddonPolicies section.

  iam:
    withAddonPolicies:
      efs: true
      ebs: true

But please follow @ppawlowski's instructions first before trying this.

@wadebev11
Copy link
Author

Gave this one last hoorah, but wasn't able to get it going. The size of the organization I'm with doesn't quite make sense with kubernetes. We were originally looking at this as part of some R&D, but we've decided the local install option fits us better, so feel free to close the issue as I won't be able to continue looking into it

Thank you both for helping look into this with me @hardillb @ppawlowski! 🙏

@hardillb
Copy link
Contributor

hardillb commented Mar 6, 2024

@wadebev11 We have updated the docs to cover the newer AWS EKS defaults, if you do come back to it please try with the new docs.

@hardillb hardillb closed this as completed Mar 6, 2024
@wadebev11
Copy link
Author

For future posterity, I tried this with the updated documentation and it worked. Thanks again

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs-triage Needs looking at to decide what to do
Projects
Status: Closed / Done
Development

No branches or pull requests

3 participants