Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Quickstart Helm Chart fails post-install #16176

Closed
kasteph opened this issue May 30, 2021 · 67 comments
Closed

Quickstart Helm Chart fails post-install #16176

kasteph opened this issue May 30, 2021 · 67 comments
Labels
area:helm-chart Airflow Helm Chart kind:bug This is a clearly a bug pending-response

Comments

@kasteph
Copy link
Contributor

kasteph commented May 30, 2021

Apache Airflow version: 2.0.2

Kubernetes version (if you are using kubernetes) (use kubectl version): 1.19

Environment:

  • Cloud provider or hardware configuration: Running kind locally
  • OS (e.g. from /etc/os-release): macOS
  • Kernel (e.g. uname -a): Darwin MacBook-Pro 19.6.0 Darwin Kernel Version 19.6.0: Mon Apr 12 20:57:45 PDT 2021; root:xnu-6153.141.28.1~1/RELEASE_X86_64 x86_64
  • Install tools: brew
  • Others:

What happened:

Helm chart does not successfully deploy to a kind cluster despite following the Quick Start. Repeatedly tried multiple times and the flower, postgres, redis and statsd services run fine but it fails at the run-airflow-migrations service with a CrashLoopBackoff:

  Type     Reason     Age                    From               Message
  ----     ------     ----                   ----               -------
  Normal   Scheduled  5m19s                  default-scheduler  Successfully assigned airflow/airflow-run-airflow-migrations-c9pph to kind-control-plane
  Normal   Pulled     2m43s (x5 over 5m17s)  kubelet            Container image "apache/airflow:2.0.2" already present on machine
  Normal   Created    2m43s (x5 over 5m17s)  kubelet            Created container run-airflow-migrations
  Normal   Started    2m43s (x5 over 5m17s)  kubelet            Started container run-airflow-migrations
  Warning  BackOff    9s (x18 over 4m25s)    kubelet            Back-off restarting failed container

What you expected to happen:

Successful Helm deployment.

How to reproduce it:

  1. Created a kind cluster: kind create cluster --image kindest/node:v1.18.15
  2. Added Helm chart repo: helm repo add apache-airflow https://airflow.apache.org
  3. Created kube namespace: kubectl create namespace airflow
  4. Installed chart: helm install airflow apache-airflow/airflow --namespace airflow --debug
install.go:173: [debug] Original chart version: ""
install.go:190: [debug] CHART PATH: /Users/stephaniesamson/Library/Caches/helm/repository/airflow-1.0.0.tgz

client.go:282: [debug] Starting delete for "airflow-broker-url" Secret
client.go:122: [debug] creating 1 resource(s)
client.go:282: [debug] Starting delete for "airflow-fernet-key" Secret
client.go:122: [debug] creating 1 resource(s)
client.go:282: [debug] Starting delete for "airflow-redis-password" Secret
client.go:122: [debug] creating 1 resource(s)
client.go:122: [debug] creating 30 resource(s)
client.go:282: [debug] Starting delete for "airflow-run-airflow-migrations" Job
client.go:122: [debug] creating 1 resource(s)
client.go:491: [debug] Watching for changes to Job airflow-run-airflow-migrations with timeout of 5m0s
client.go:519: [debug] Add/Modify event for airflow-run-airflow-migrations: ADDED
client.go:558: [debug] airflow-run-airflow-migrations: Jobs active: 0, jobs failed: 0, jobs succeeded: 0
client.go:519: [debug] Add/Modify event for airflow-run-airflow-migrations: MODIFIED
client.go:558: [debug] airflow-run-airflow-migrations: Jobs active: 1, jobs failed: 0, jobs succeeded: 0
Error: failed post-install: timed out waiting for the condition
helm.go:81: [debug] failed post-install: timed out waiting for the condition
@kasteph kasteph added the kind:bug This is a clearly a bug label May 30, 2021
@boring-cyborg
Copy link

boring-cyborg bot commented May 30, 2021

Thanks for opening your first issue here! Be sure to follow the issue template!

@kaxil
Copy link
Member

kaxil commented May 30, 2021

Can you provide logs of airflow-run-airflow-migrations job please

@kaxil kaxil added the area:helm-chart Airflow Helm Chart label May 30, 2021
@Dr-Denzy
Copy link
Contributor

I will try to see if I can reproduce this issue.

@Dr-Denzy
Copy link
Contributor

I could not reproduce this issue. Consider closing it @kaxil

...

helm install $RELEASE_NAME apache-airflow/airflow --namespace $NAMESPACE --debug
...

NOTES:
Thank you for installing Apache Airflow 2.0.2!

Your release is named airflow-release.
You can now access your dashboard(s) by executing the following command(s) and visiting the corresponding port at localhost in your browser:

Airflow Webserver:     kubectl port-forward svc/airflow-release-webserver 8080:8080 --namespace airflow-namespace
Flower dashboard:      kubectl port-forward svc/airflow-release-flower 5555:5555 --namespace airflow-namespace
Default Webserver (Airflow UI) Login credentials:
    username: admin
    password: admin
Default Postgres connection credentials:
    username: postgres
    password: postgres
    port: 5432

You can get Fernet Key value by running the following:

    echo Fernet Key: $(kubectl get secret --namespace airflow-namespace airflow-release-fernet-key -o jsonpath="{.data.fernet-key}" | base64 --decode)

@kasteph
Copy link
Contributor Author

kasteph commented May 31, 2021

Can you provide logs of airflow-run-airflow-migrations job please

❯ kubectl logs -n airflow airflow-run-airflow-migrations-hw9lz
BACKEND=postgresql
DB_HOST=airflow-postgresql.airflow
DB_PORT=5432

DB: postgresql://postgres:***@airflow-postgresql.airflow:5432/postgres?sslmode=disable
[2021-05-31 19:39:05,756] {db.py:684} INFO - Creating tables
INFO  [alembic.runtime.migration] Context impl PostgresqlImpl.
INFO  [alembic.runtime.migration] Will assume transactional DDL.
WARNI [airflow.providers_manager] Exception when importing 'airflow.providers.google.common.hooks.leveldb.LevelDBHook' from 'apache-airflow-providers-google' package: No module named 'airflow.providers.google.common.hooks.leveldb'
WARNI [airflow.providers_manager] Exception when importing 'airflow.providers.google.common.hooks.leveldb.LevelDBHook' from 'apache-airflow-providers-google' package: No module named 'airflow.providers.google.common.hooks.leveldb'
Traceback (most recent call last):
  File "/home/airflow/.local/lib/python3.6/site-packages/alembic/script/base.py", line 171, in _catch_revision_errors
    yield
  File "/home/airflow/.local/lib/python3.6/site-packages/alembic/script/base.py", line 365, in _upgrade_revs
    revs = list(revs)
  File "/home/airflow/.local/lib/python3.6/site-packages/alembic/script/revision.py", line 904, in _iterate_revisions
    requested_lowers = self.get_revisions(lower)
  File "/home/airflow/.local/lib/python3.6/site-packages/alembic/script/revision.py", line 455, in get_revisions
    return sum([self.get_revisions(id_elem) for id_elem in id_], ())
  File "/home/airflow/.local/lib/python3.6/site-packages/alembic/script/revision.py", line 455, in <listcomp>
    return sum([self.get_revisions(id_elem) for id_elem in id_], ())
  File "/home/airflow/.local/lib/python3.6/site-packages/alembic/script/revision.py", line 460, in get_revisions
    for rev_id in resolved_id
  File "/home/airflow/.local/lib/python3.6/site-packages/alembic/script/revision.py", line 460, in <genexpr>
    for rev_id in resolved_id
  File "/home/airflow/.local/lib/python3.6/site-packages/alembic/script/revision.py", line 536, in _revision_for_ident
    resolved_id,
alembic.script.revision.ResolutionError: No such revision or branch 'a13f7613ad25'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/airflow/.local/bin/airflow", line 8, in <module>
    sys.exit(main())
  File "/home/airflow/.local/lib/python3.6/site-packages/airflow/__main__.py", line 40, in main
    args.func(args)
  File "/home/airflow/.local/lib/python3.6/site-packages/airflow/cli/cli_parser.py", line 48, in command
    return func(*args, **kwargs)
  File "/home/airflow/.local/lib/python3.6/site-packages/airflow/utils/cli.py", line 89, in wrapper
    return f(*args, **kwargs)
  File "/home/airflow/.local/lib/python3.6/site-packages/airflow/cli/commands/db_command.py", line 48, in upgradedb
    db.upgradedb()
  File "/home/airflow/.local/lib/python3.6/site-packages/airflow/utils/db.py", line 694, in upgradedb
    command.upgrade(config, 'heads')
  File "/home/airflow/.local/lib/python3.6/site-packages/alembic/command.py", line 294, in upgrade
    script.run_env()
  File "/home/airflow/.local/lib/python3.6/site-packages/alembic/script/base.py", line 490, in run_env
    util.load_python_file(self.dir, "env.py")
  File "/home/airflow/.local/lib/python3.6/site-packages/alembic/util/pyfiles.py", line 97, in load_python_file
    module = load_module_py(module_id, path)
  File "/home/airflow/.local/lib/python3.6/site-packages/alembic/util/compat.py", line 182, in load_module_py
    spec.loader.exec_module(module)
  File "<frozen importlib._bootstrap_external>", line 678, in exec_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "/home/airflow/.local/lib/python3.6/site-packages/airflow/migrations/env.py", line 108, in <module>
    run_migrations_online()
  File "/home/airflow/.local/lib/python3.6/site-packages/airflow/migrations/env.py", line 102, in run_migrations_online
    context.run_migrations()
  File "<string>", line 8, in run_migrations
  File "/home/airflow/.local/lib/python3.6/site-packages/alembic/runtime/environment.py", line 813, in run_migrations
    self.get_context().run_migrations(**kw)
  File "/home/airflow/.local/lib/python3.6/site-packages/alembic/runtime/migration.py", line 548, in run_migrations
    for step in self._migrations_fn(heads, self):
  File "/home/airflow/.local/lib/python3.6/site-packages/alembic/command.py", line 283, in upgrade
    return script._upgrade_revs(revision, rev)
  File "/home/airflow/.local/lib/python3.6/site-packages/alembic/script/base.py", line 370, in _upgrade_revs
    for script in reversed(list(revs))
  File "/usr/local/lib/python3.6/contextlib.py", line 99, in __exit__
    self.gen.throw(type, value, traceback)
  File "/home/airflow/.local/lib/python3.6/site-packages/alembic/script/base.py", line 203, in _catch_revision_errors
    compat.raise_(util.CommandError(resolution), from_=re)
  File "/home/airflow/.local/lib/python3.6/site-packages/alembic/util/compat.py", line 294, in raise_
    raise exception
alembic.util.exc.CommandError: Can't locate revision identified by 'a13f7613ad25'

@ephraimbuddy
Copy link
Contributor

@stephsamson can you delete the namespace and recreate it. Then run helm repo update before install?

@kasteph
Copy link
Contributor Author

kasteph commented May 31, 2021

@ephraimbuddy thanks that worked!

@kasteph kasteph closed this as completed May 31, 2021
@niklasden
Copy link

niklasden commented Oct 26, 2021

Hi everyone,

I have run into the same issue on a fresh microk8s cluster.
After running the following command:
microk8s.helm3 install airflow apache-airflow/airflow --namespace airflow --wait=false Error: failed post-install: timed out waiting for the condition

I have tried deleting the namespace and updating the repo several times.

Anyone running into the same issues?

@ralleman-quasarsat
Copy link

ralleman-quasarsat commented Nov 1, 2021

I have not been able to get airflow installed. I've tried several times, deleting the cluster on each attempt. I'm following these instructions https://marclamberti.com/blog/airflow-on-kubernetes-get-started-in-10-mins/. I'm working on a 2021 M1 Mac Air under Big Sur.

% helm install airflow apache-airflow/airflow --namespace airflow --debug
install.go:178: [debug] Original chart version: ""
install.go:199: [debug] CHART PATH: .../Library/Caches/helm/repository/airflow-1.2.0.tgz

client.go:299: [debug] Starting delete for "airflow-broker-url" Secret
client.go:328: [debug] secrets "airflow-broker-url" not found
client.go:128: [debug] creating 1 resource(s)
client.go:299: [debug] Starting delete for "airflow-fernet-key" Secret
client.go:328: [debug] secrets "airflow-fernet-key" not found
client.go:128: [debug] creating 1 resource(s)
client.go:299: [debug] Starting delete for "airflow-redis-password" Secret
client.go:328: [debug] secrets "airflow-redis-password" not found
client.go:128: [debug] creating 1 resource(s)
client.go:128: [debug] creating 31 resource(s)
client.go:299: [debug] Starting delete for "airflow-run-airflow-migrations" Job
client.go:328: [debug] jobs.batch "airflow-run-airflow-migrations" not found
client.go:128: [debug] creating 1 resource(s)
client.go:528: [debug] Watching for changes to Job airflow-run-airflow-migrations with timeout of 5m0s
client.go:556: [debug] Add/Modify event for airflow-run-airflow-migrations: ADDED
client.go:595: [debug] airflow-run-airflow-migrations: Jobs active: 0, jobs failed: 0, jobs succeeded: 0
client.go:556: [debug] Add/Modify event for airflow-run-airflow-migrations: MODIFIED
client.go:595: [debug] airflow-run-airflow-migrations: Jobs active: 1, jobs failed: 0, jobs succeeded: 0
Error: INSTALLATION FAILED: failed post-install: timed out waiting for the condition
helm.go:88: [debug] failed post-install: timed out waiting for the condition
INSTALLATION FAILED
main.newInstallCmd.func2
	helm.sh/helm/v3/cmd/helm/install.go:127
github.com/spf13/cobra.(*Command).execute
	github.com/spf13/cobra@v1.2.1/command.go:856
github.com/spf13/cobra.(*Command).ExecuteC
	github.com/spf13/cobra@v1.2.1/command.go:974
github.com/spf13/cobra.(*Command).Execute
	github.com/spf13/cobra@v1.2.1/command.go:902
main.main
	helm.sh/helm/v3/cmd/helm/helm.go:87
runtime.main
	runtime/proc.go:225
runtime.goexit
	runtime/asm_arm64.s:1130

@ralleman-quasarsat
Copy link

On another attempt, this gets added to the output:

client.go:556: [debug] Add/Modify event for airflow-run-airflow-migrations: ADDED
client.go:595: [debug] airflow-run-airflow-migrations: Jobs active: 0, jobs failed: 0, jobs succeeded: 0
client.go:556: [debug] Add/Modify event for airflow-run-airflow-migrations: MODIFIED
client.go:595: [debug] airflow-run-airflow-migrations: Jobs active: 1, jobs failed: 0, jobs succeeded: 0
W1101 15:44:58.592580   29841 reflector.go:441] k8s.io/client-go@v0.22.1/tools/cache/reflector.go:167: watch of *unstructured.Unstructured ended with: an error on the server ("unable to decode an event from the watch stream: http2: client connection lost") has prevented the request from succeeding
I1101 15:45:09.716362   29841 trace.go:205] Trace[2052545262]: "Reflector ListAndWatch" name:k8s.io/client-go@v0.22.1/tools/cache/reflector.go:167 (01-Nov-2021 15:44:59.714) (total time: 10001ms):
Trace[2052545262]: [10.001633833s] [10.001633833s] END
E1101 15:45:09.716411   29841 reflector.go:138] k8s.io/client-go@v0.22.1/tools/cache/reflector.go:167: Failed to watch *unstructured.Unstructured: failed to list *unstructured.Unstructured: Get "https://127.0.0.1:62689/apis/batch/v1/namespaces/airflow/jobs?fieldSelector=metadata.name%3Dairflow-run-airflow-migrations&resourceVersion=930": net/http: TLS handshake timeout
I1101 15:45:22.725536   29841 trace.go:205] Trace[904910366]: "Reflector ListAndWatch" name:k8s.io/client-go@v0.22.1/tools/cache/reflector.go:167 (01-Nov-2021 15:45:12.724) (total time: 10001ms):
Trace[904910366]: [10.001626666s] [10.001626666s] END
E1101 15:45:22.725569   29841 reflector.go:138] k8s.io/client-go@v0.22.1/tools/cache/reflector.go:167: Failed to watch *unstructured.Unstructured: failed to list *unstructured.Unstructured: Get "https://127.0.0.1:62689/apis/batch/v1/namespaces/airflow/jobs?fieldSelector=metadata.name%3Dairflow-run-airflow-migrations&resourceVersion=930": net/http: TLS handshake timeout
client.go:556: [debug] Add/Modify event for airflow-run-airflow-migrations: MODIFIED
client.go:595: [debug] airflow-run-airflow-migrations: Jobs active: 1, jobs failed: 0, jobs succeeded: 0
Error: INSTALLATION FAILED: failed post-install: timed out waiting for the condition
helm.go:88: [debug] failed post-install: timed out waiting for the condition
INSTALLATION FAILED

@zambien
Copy link

zambien commented Nov 8, 2021

@kaxil @Dr-Denzy please re-open this issue as multiple people are reporting it. I am able to recreate it intermittently myself. You can follow the notes here:

https://github.com/zambien/tf-eks-airflow/blob/tf_eks_extended/notes.md

deploy airflow on k8s using helm without packaged db

To keep everything simple we use the default namespace
Create the cluster and set it in kubectl

kind create cluster --name airflow --config terraform/kind/kind-config.yaml
kubectl cluster-info --context kind-airflow

Get your charts

helm repo add apache-airflow https://airflow.apache.org
helm repo add bitnami https://charts.bitnami.com/bitnami
helm repo update

Run postgres

helm install db \
  --set postgresqlPassword=secretpassword,postgresqlDatabase=airflow \
    bitnami/postgresql

Run airflow without the included db:

helm install airflow apache-airflow/airflow --debug \
  -f terraform/kind/airflow-values.yaml \
  --set 'env[0].name=AIRFLOW__CORE__LOAD_EXAMPLES,env[0].value=True'

Sometimes this works, other times it does not. It seems that the catalyst may be the separate database.

Here is the issue I see:

helm install --debug airflow apache-airflow/airflow \                       ✔  5624  06:57:56
  -f terraform/kind/airflow-values.yaml \
  --set 'env[0].name=AIRFLOW__CORE__LOAD_EXAMPLES,env[0].value=True'
install.go:178: [debug] Original chart version: ""
install.go:199: [debug] CHART PATH: /home/adam/.cache/helm/repository/airflow-1.2.0.tgz

client.go:299: [debug] Starting delete for "airflow-broker-url" Secret
client.go:128: [debug] creating 1 resource(s)
client.go:299: [debug] Starting delete for "airflow-fernet-key" Secret
client.go:128: [debug] creating 1 resource(s)
client.go:299: [debug] Starting delete for "airflow-redis-password" Secret
client.go:128: [debug] creating 1 resource(s)
client.go:128: [debug] creating 27 resource(s)
client.go:299: [debug] Starting delete for "airflow-run-airflow-migrations" Job
client.go:128: [debug] creating 1 resource(s)
client.go:528: [debug] Watching for changes to Job airflow-run-airflow-migrations with timeout of 5m0s
client.go:556: [debug] Add/Modify event for airflow-run-airflow-migrations: ADDED
client.go:595: [debug] airflow-run-airflow-migrations: Jobs active: 0, jobs failed: 0, jobs succeeded: 0
client.go:556: [debug] Add/Modify event for airflow-run-airflow-migrations: MODIFIED
client.go:595: [debug] airflow-run-airflow-migrations: Jobs active: 1, jobs failed: 0, jobs succeeded: 0
Error: INSTALLATION FAILED: failed post-install: timed out waiting for the condition
helm.go:88: [debug] failed post-install: timed out waiting for the condition
INSTALLATION FAILED
main.newInstallCmd.func2
	helm.sh/helm/v3/cmd/helm/install.go:127
github.com/spf13/cobra.(*Command).execute
	github.com/spf13/cobra@v1.2.1/command.go:856
github.com/spf13/cobra.(*Command).ExecuteC
	github.com/spf13/cobra@v1.2.1/command.go:974
github.com/spf13/cobra.(*Command).Execute
	github.com/spf13/cobra@v1.2.1/command.go:902
main.main
	helm.sh/helm/v3/cmd/helm/helm.go:87
runtime.main
	runtime/proc.go:225
runtime.goexit
	runtime/asm_amd64.s:1371

@kaxil
Copy link
Member

kaxil commented Nov 8, 2021

@zambien #18776 should allow a disabling a Helm Hooks which might fix issue for you.

Can you try it out on your local machine or dev cluster by running the following commands:

helm repo add apache-airflow-dev https://dist.apache.org/repos/dist/dev/airflow/helm-chart/1.3.0rc1/
helm repo update
helm install airflow apache-airflow-dev/airflow

1.3.0rc1 is the release candidate for 1.3.0 release

@matasejem
Copy link

matasejem commented Nov 16, 2021

@kaxil, i am receiving similar err (see below) - wonder if i should also try the above commands relating to 1.3.0rc1, or whether it requires different kind of fix - thanks.

PS C:\Windows\System32> helm install airflow apache-airflow/airflow --namespace airflow --debug --timeout 10m0s
install.go:178: [debug] Original chart version: ""
install.go:199: [debug] CHART PATH: C:\Users\MARTIN~1\AppData\Local\Temp\helm\repository\airflow-1.3.0.tgz

client.go:299: [debug] Starting delete for "airflow-broker-url" Secret
client.go:128: [debug] creating 1 resource(s)
client.go:299: [debug] Starting delete for "airflow-fernet-key" Secret
client.go:128: [debug] creating 1 resource(s)
client.go:299: [debug] Starting delete for "airflow-redis-password" Secret
client.go:128: [debug] creating 1 resource(s)
client.go:128: [debug] creating 33 resource(s)
client.go:299: [debug] Starting delete for "airflow-run-airflow-migrations" Job
client.go:128: [debug] creating 1 resource(s)
client.go:528: [debug] Watching for changes to Job airflow-run-airflow-migrations with timeout of 10m0s
client.go:556: [debug] Add/Modify event for airflow-run-airflow-migrations: ADDED
client.go:595: [debug] airflow-run-airflow-migrations: Jobs active: 0, jobs failed: 0, jobs succeeded: 0
client.go:556: [debug] Add/Modify event for airflow-run-airflow-migrations: MODIFIED
client.go:595: [debug] airflow-run-airflow-migrations: Jobs active: 1, jobs failed: 0, jobs succeeded: 0
client.go:556: [debug] Add/Modify event for airflow-run-airflow-migrations: MODIFIED
client.go:299: [debug] Starting delete for "airflow-create-user" Job
client.go:328: [debug] jobs.batch "airflow-create-user" not found
client.go:128: [debug] creating 1 resource(s)
client.go:528: [debug] Watching for changes to Job airflow-create-user with timeout of 10m0s
client.go:556: [debug] Add/Modify event for airflow-create-user: ADDED
client.go:595: [debug] airflow-create-user: Jobs active: 0, jobs failed: 0, jobs succeeded: 0
client.go:556: [debug] Add/Modify event for airflow-create-user: MODIFIED
client.go:595: [debug] airflow-create-user: Jobs active: 1, jobs failed: 0, jobs succeeded: 0
W1115 12:55:22.480047    9876 reflector.go:441] k8s.io/client-go@v0.22.1/tools/cache/reflector.go:167: watch of *unstructured.Unstructured ended with: an error on the server ("unable to decode an event from the watch stream: http2: client connection lost") has prevented the request from succeeding
I1115 12:55:33.664716    9876 trace.go:205] Trace[300875778]: "Reflector ListAndWatch" name:k8s.io/client-go@v0.22.1/tools/cache/reflector.go:167 (15-Nov-2021 12:55:23.654) (total time: 10009ms):
Trace[300875778]: [10.0095411s] [10.0095411s] END
E1115 12:55:33.665699    9876 reflector.go:138] k8s.io/client-go@v0.22.1/tools/cache/reflector.go:167: Failed to watch *unstructured.Unstructured: failed to list *unstructured.Unstructured: Get "https://127.0.0.1:56936/apis/batch/v1/namespaces/airflow/jobs?fieldSelector=metadata.name%3Dairflow-create-user&resourceVersion=5376": net/http: TLS handshake timeout
I1115 12:55:45.927127    9876 trace.go:205] Trace[922864028]: "Reflector ListAndWatch" name:k8s.io/client-go@v0.22.1/tools/cache/reflector.go:167 (15-Nov-2021 12:55:35.908) (total time: 10018ms):
Trace[922864028]: [10.0183913s] [10.0183913s] END
E1115 12:55:45.927382    9876 reflector.go:138] k8s.io/client-go@v0.22.1/tools/cache/reflector.go:167: Failed to watch *unstructured.Unstructured: failed to list *unstructured.Unstructured: Get "https://127.0.0.1:56936/apis/batch/v1/namespaces/airflow/jobs?fieldSelector=metadata.name%3Dairflow-create-user&resourceVersion=5376": net/http: TLS handshake timeout
I1115 12:56:00.518389    9876 trace.go:205] Trace[1282502707]: "Reflector ListAndWatch" name:k8s.io/client-go@v0.22.1/tools/cache/reflector.go:167 (15-Nov-2021 12:55:50.511) (total time: 10007ms):
Trace[1282502707]: [10.007077s] [10.007077s] END
E1115 12:56:00.521483    9876 reflector.go:138] k8s.io/client-go@v0.22.1/tools/cache/reflector.go:167: Failed to watch *unstructured.Unstructured: failed to list *unstructured.Unstructured: Get "https://127.0.0.1:56936/apis/batch/v1/namespaces/airflow/jobs?fieldSelector=metadata.name%3Dairflow-create-user&resourceVersion=5376": net/http: TLS handshake timeout
I1115 12:56:23.254867    9876 trace.go:205] Trace[336697707]: "Reflector ListAndWatch" name:k8s.io/client-go@v0.22.1/tools/cache/reflector.go:167 (15-Nov-2021 12:56:13.237) (total time: 10017ms):
Trace[336697707]: [10.0173028s] [10.0173028s] END
E1115 12:56:23.255431    9876 reflector.go:138] k8s.io/client-go@v0.22.1/tools/cache/reflector.go:167: Failed to watch *unstructured.Unstructured: failed to list *unstructured.Unstructured: Get "https://127.0.0.1:56936/apis/batch/v1/namespaces/airflow/jobs?fieldSelector=metadata.name%3Dairflow-create-user&resourceVersion=5376": net/http: TLS handshake timeout
I1115 12:56:50.279920    9876 trace.go:205] Trace[1113683026]: "Reflector ListAndWatch" name:k8s.io/client-go@v0.22.1/tools/cache/reflector.go:167 (15-Nov-2021 12:56:40.266) (total time: 10013ms):
Trace[1113683026]: [10.013341s] [10.013341s] END
E1115 12:56:50.281131    9876 reflector.go:138] k8s.io/client-go@v0.22.1/tools/cache/reflector.go:167: Failed to watch *unstructured.Unstructured: failed to list *unstructured.Unstructured: Get "https://127.0.0.1:56936/apis/batch/v1/namespaces/airflow/jobs?fieldSelector=metadata.name%3Dairflow-create-user&resourceVersion=5376": net/http: TLS handshake timeout
I1115 12:57:28.631461    9876 trace.go:205] Trace[2006327411]: "Reflector ListAndWatch" name:k8s.io/client-go@v0.22.1/tools/cache/reflector.go:167 (15-Nov-2021 12:57:18.629) (total time: 10002ms):
Trace[2006327411]: [10.002029s] [10.002029s] END
E1115 12:57:28.631461    9876 reflector.go:138] k8s.io/client-go@v0.22.1/tools/cache/reflector.go:167: Failed to watch *unstructured.Unstructured: failed to list *unstructured.Unstructured: Get "https://127.0.0.1:56936/apis/batch/v1/namespaces/airflow/jobs?fieldSelector=metadata.name%3Dairflow-create-user&resourceVersion=5376": net/http: TLS handshake timeout
I1115 12:58:22.706208    9876 trace.go:205] Trace[365191476]: "Reflector ListAndWatch" name:k8s.io/client-go@v0.22.1/tools/cache/reflector.go:167 (15-Nov-2021 12:58:12.695) (total time: 10010ms):
Trace[365191476]: [10.0109451s] [10.0109451s] END
E1115 12:58:22.706762    9876 reflector.go:138] k8s.io/client-go@v0.22.1/tools/cache/reflector.go:167: Failed to watch *unstructured.Unstructured: failed to list *unstructured.Unstructured: Get "https://127.0.0.1:56936/apis/batch/v1/namespaces/airflow/jobs?fieldSelector=metadata.name%3Dairflow-create-user&resourceVersion=5376": net/http: TLS handshake timeout
I1115 12:59:16.350704    9876 trace.go:205] Trace[611706561]: "Reflector ListAndWatch" name:k8s.io/client-go@v0.22.1/tools/cache/reflector.go:167 (15-Nov-2021 12:59:06.341) (total time: 10009ms):
Trace[611706561]: [10.0090197s] [10.0090197s] END
E1115 12:59:16.351089    9876 reflector.go:138] k8s.io/client-go@v0.22.1/tools/cache/reflector.go:167: Failed to watch *unstructured.Unstructured: failed to list *unstructured.Unstructured: Get "https://127.0.0.1:56936/apis/batch/v1/namespaces/airflow/jobs?fieldSelector=metadata.name%3Dairflow-create-user&resourceVersion=5376": net/http: TLS handshake timeout
I1115 12:59:59.275048    9876 trace.go:205] Trace[1510419015]: "Reflector ListAndWatch" name:k8s.io/client-go@v0.22.1/tools/cache/reflector.go:167 (15-Nov-2021 12:59:49.268) (total time: 10006ms):
Trace[1510419015]: [10.0061847s] [10.0061847s] END
E1115 12:59:59.275556    9876 reflector.go:138] k8s.io/client-go@v0.22.1/tools/cache/reflector.go:167: Failed to watch *unstructured.Unstructured: failed to list *unstructured.Unstructured: Get "https://127.0.0.1:56936/apis/batch/v1/namespaces/airflow/jobs?fieldSelector=metadata.name%3Dairflow-create-user&resourceVersion=5376": net/http: TLS handshake timeout
I1115 13:00:49.912921    9876 trace.go:205] Trace[434755033]: "Reflector ListAndWatch" name:k8s.io/client-go@v0.22.1/tools/cache/reflector.go:167 (15-Nov-2021 13:00:39.902) (total time: 10010ms):
Trace[434755033]: [10.0104631s] [10.0104631s] END
E1115 13:00:49.913463    9876 reflector.go:138] k8s.io/client-go@v0.22.1/tools/cache/reflector.go:167: Failed to watch *unstructured.Unstructured: failed to list *unstructured.Unstructured: Get "https://127.0.0.1:56936/apis/batch/v1/namespaces/airflow/jobs?fieldSelector=metadata.name%3Dairflow-create-user&resourceVersion=5376": net/http: TLS handshake timeout
I1115 13:01:49.592632    9876 trace.go:205] Trace[958360155]: "Reflector ListAndWatch" name:k8s.io/client-go@v0.22.1/tools/cache/reflector.go:167 (15-Nov-2021 13:01:39.576) (total time: 10016ms):
Trace[958360155]: [10.0162521s] [10.0162521s] END
E1115 13:01:49.593159    9876 reflector.go:138] k8s.io/client-go@v0.22.1/tools/cache/reflector.go:167: Failed to watch *unstructured.Unstructured: failed to list *unstructured.Unstructured: Get "https://127.0.0.1:56936/apis/batch/v1/namespaces/airflow/jobs?fieldSelector=metadata.name%3Dairflow-create-user&resourceVersion=5376": net/http: TLS handshake timeout
I1115 13:02:30.179570    9876 trace.go:205] Trace[412353598]: "Reflector ListAndWatch" name:k8s.io/client-go@v0.22.1/tools/cache/reflector.go:167 (15-Nov-2021 13:02:20.167) (total time: 10012ms):
Trace[412353598]: [10.0121795s] [10.0121795s] END
E1115 13:02:30.180122    9876 reflector.go:138] k8s.io/client-go@v0.22.1/tools/cache/reflector.go:167: Failed to watch *unstructured.Unstructured: failed to list *unstructured.Unstructured: Get "https://127.0.0.1:56936/apis/batch/v1/namespaces/airflow/jobs?fieldSelector=metadata.name%3Dairflow-create-user&resourceVersion=5376": net/http: TLS handshake timeout
Error: INSTALLATION FAILED: failed post-install: timed out waiting for the condition
helm.go:88: [debug] failed post-install: timed out waiting for the condition
INSTALLATION FAILED
main.newInstallCmd.func2
        helm.sh/helm/v3/cmd/helm/install.go:127
github.com/spf13/cobra.(*Command).execute
        github.com/spf13/cobra@v1.2.1/command.go:856
github.com/spf13/cobra.(*Command).ExecuteC
        github.com/spf13/cobra@v1.2.1/command.go:974
github.com/spf13/cobra.(*Command).Execute
        github.com/spf13/cobra@v1.2.1/command.go:902
main.main
        helm.sh/helm/v3/cmd/helm/helm.go:87
runtime.main
        runtime/proc.go:225
runtime.goexit
        runtime/asm_amd64.s:1371
PS C:\Windows\System32>

@pplanel
Copy link

pplanel commented Nov 20, 2021

Up, facing the same problem.

@rdeteix
Copy link

rdeteix commented Dec 19, 2021

up, I have the same issue

@kyp0717
Copy link

kyp0717 commented Dec 24, 2021

I have the same issue.

@chyumin
Copy link

chyumin commented Dec 25, 2021

I got the same issue, but installing older version worked for me
helm install airflow apache-airflow/airflow --namespace airflow --version 1.0.0
Probably something is wrong with the current latest chart version

@fm-falken
Copy link

fm-falken commented Dec 27, 2021

Confirming this.
helm upgrade --install airflow apache-airflow/airflow --namespace airflow -f .\airflow\values.yaml --debug
Returns:

history.go:56: [debug] getting history for release airflow
Release "airflow" does not exist. Installing it now.
install.go:178: [debug] Original chart version: ""
install.go:199: [debug] CHART PATH: C:\Users\admin\AppData\Local\Temp\helm\repository\airflow-1.3.0.tgz

client.go:299: [debug] Starting delete for "airflow-broker-url" Secret
client.go:128: [debug] creating 1 resource(s)
client.go:299: [debug] Starting delete for "airflow-fernet-key" Secret
client.go:128: [debug] creating 1 resource(s)
client.go:299: [debug] Starting delete for "airflow-redis-password" Secret
client.go:128: [debug] creating 1 resource(s)
client.go:128: [debug] creating 37 resource(s)
client.go:299: [debug] Starting delete for "airflow-run-airflow-migrations" Job
client.go:328: [debug] jobs.batch "airflow-run-airflow-migrations" not found
client.go:128: [debug] creating 1 resource(s)
client.go:528: [debug] Watching for changes to Job airflow-run-airflow-migrations with timeout of 5m0s
client.go:556: [debug] Add/Modify event for airflow-run-airflow-migrations: ADDED
client.go:595: [debug] airflow-run-airflow-migrations: Jobs active: 1, jobs failed: 0, jobs succeeded: 0
Error: failed post-install: timed out waiting for the condition
helm.go:88: [debug] failed post-install: timed out waiting for the condition

@potiuk
Copy link
Member

potiuk commented Dec 27, 2021

Can someone please open a new issue with all the details please? This is a closed issue. Likely different reason. Commenting on a closed issue from May will not resurrect it. Even if symptoms might be similiar, it is likely a different issue.

@potiuk
Copy link
Member

potiuk commented Dec 27, 2021

We really need more details - values, configurtions, detaile logs from the "wait-for-migrations" jobs etc

@rdeteix
Copy link

rdeteix commented Dec 30, 2021

Hi
I tried it with an increased server configuration and it worked.
It may be a memory/cpu issue.

@kyp0717
Copy link

kyp0717 commented Dec 30, 2021

Perhaps someone can install with "--timeout 10m0s" option. It worked for me when I use the official apache helm chart.

@veromos
Copy link

veromos commented Feb 23, 2022

I'm facing the same issue, even with timeout option it's not working.

@potiuk
Copy link
Member

potiuk commented Mar 8, 2022

I'm facing the same issue, even with timeout option it's not working.

Please open a detailed issue about this with more details (logs and describing what you experience). It might likely be a differetn issues

The comment "I have the same issue" on a closed issue does not help in any meaningful way in diagnosing the issue.

@javad87
Copy link

javad87 commented May 28, 2022

What the logs of your migration pods show ? https://www.digitalocean.com/community/questions/how-to-check-the-logs-of-running-and-crashed-pods-in-kubernetes

I run this command:
kubectl logs airflow-run-airflow-migrations-h7b72 -n airflow
it shows nothing even with -f to follow log it still shows nothing and in other terminal helm install command is running with this --debug message showing in console:

[root@localhost ~]# helm upgrade --install airflow apache-airflow/airflow --namespace airflow --create-namespace --debug --timeout 10m0s
WARNING: Kubernetes configuration file is group-readable. This is insecure. Location: /root/.kube/config
WARNING: Kubernetes configuration file is world-readable. This is insecure. Location: /root/.kube/config
history.go:56: [debug] getting history for release airflow
Release "airflow" does not exist. Installing it now.
install.go:178: [debug] Original chart version: ""
install.go:199: [debug] CHART PATH: /root/.cache/helm/repository/airflow-1.6.0.tgz

client.go:128: [debug] creating 1 resource(s)
client.go:299: [debug] Starting delete for "airflow-broker-url" Secret
client.go:128: [debug] creating 1 resource(s)
client.go:299: [debug] Starting delete for "airflow-fernet-key" Secret
client.go:128: [debug] creating 1 resource(s)
client.go:299: [debug] Starting delete for "airflow-redis-password" Secret
client.go:128: [debug] creating 1 resource(s)
client.go:128: [debug] creating 30 resource(s)
client.go:299: [debug] Starting delete for "airflow-run-airflow-migrations" Job
client.go:128: [debug] creating 1 resource(s)
client.go:528: [debug] Watching for changes to Job airflow-run-airflow-migrations with timeout of 10m0s
client.go:556: [debug] Add/Modify event for airflow-run-airflow-migrations: ADDED
client.go:595: [debug] airflow-run-airflow-migrations: Jobs active: 0, jobs failed: 0, jobs succeeded: 0
client.go:556: [debug] Add/Modify event for airflow-run-airflow-migrations: MODIFIED
client.go:595: [debug] airflow-run-airflow-migrations: Jobs active: 1, jobs failed: 0, jobs succeeded: 0
client.go:556: [debug] Add/Modify event for airflow-run-airflow-migrations: MODIFIED
client.go:595: [debug] airflow-run-airflow-migrations: Jobs active: 1, jobs failed: 0, jobs succeeded: 0

@potiuk
Copy link
Member

potiuk commented May 28, 2022

can you please use k9s tool and connect/monitor/extract the migration logs ? I found out that it is much better in getting to the right logs.

@potiuk
Copy link
Member

potiuk commented May 28, 2022

K9s will allow you to monitor more logs in your deployment and likely find the right problem - just explore your installation with it.

@javad87
Copy link

javad87 commented May 28, 2022

i

File "/home/airflow/.local/lib/python3.7/site-packages/sqlalchemy/pool/impl.py", line 142, in _do_get │
│ return self._create_connection() │
│ File "/home/airflow/.local/lib/python3.7/site-packages/sqlalchemy/pool/base.py", line 247, in _create_connection │
│ return _ConnectionRecord(self) │
│ File "/home/airflow/.local/lib/python3.7/site-packages/sqlalchemy/pool/base.py", line 362, in init
│ self.__connect(first_connect_check=True) │
│ File "/home/airflow/.local/lib/python3.7/site-packages/sqlalchemy/pool/base.py", line 605, in _connect │
│ pool.logger.debug("Error on connect(): %s", e) │
│ File "/home/airflow/.local/lib/python3.7/site-packages/sqlalchemy/util/langhelpers.py", line 72, in exit
│ with_traceback=exc_tb, │
│ File "/home/airflow/.local/lib/python3.7/site-packages/sqlalchemy/util/compat.py", line 211, in raise

│ raise exception │
│ File "/home/airflow/.local/lib/python3.7/site-packages/sqlalchemy/pool/base.py", line 599, in __connect │
│ connection = pool._invoke_creator(self) │
│ File "/home/airflow/.local/lib/python3.7/site-packages/sqlalchemy/engine/create.py", line 578, in connect │
│ return dialect.connect(*cargs, **cparams) │
│ File "/home/airflow/.local/lib/python3.7/site-packages/sqlalchemy/engine/default.py", line 583, in connect │
│ return self.dbapi.connect(*cargs, **cparams) │
│ File "/home/airflow/.local/lib/python3.7/site-packages/psycopg2/init.py", line 122, in connect │
│ conn = _connect(dsn, connection_factory=connection_factory, **kwasync) │
│ sqlalchemy.exc.OperationalError: (psycopg2.OperationalError) connection to server at "airflow-postgresql.airflow" (10.152.183.179), po │
│ Is the server running on that host and accepting TCP/IP connections? │
│ │
│ (Background on this error at: http://sqlalche.me/e/14/e3q8) │

@potiuk
Copy link
Member

potiuk commented May 28, 2022

So you have a problem with connecting to postgres then

@javad87
Copy link

javad87 commented May 28, 2022

So you have a problem with connecting to postgres then

how can I resolve connection issue?

@potiuk
Copy link
Member

potiuk commented May 28, 2022

No idea. You have to debug it.

@Abhinav1598
Copy link

install.go:173: [debug] Original chart version: ""
install.go:190: [debug] CHART PATH: /home/e4338/.cache/helm/repository/airflow-1.6.0.tgz

client.go:290: [debug] Starting delete for "airflow-broker-url" Secret
client.go:128: [debug] creating 1 resource(s)
client.go:290: [debug] Starting delete for "airflow-fernet-key" Secret
client.go:128: [debug] creating 1 resource(s)
client.go:290: [debug] Starting delete for "airflow-redis-password" Secret
client.go:128: [debug] creating 1 resource(s)
client.go:128: [debug] creating 30 resource(s)
client.go:290: [debug] Starting delete for "airflow-run-airflow-migrations" Job
client.go:128: [debug] creating 1 resource(s)
client.go:519: [debug] Watching for changes to Job airflow-run-airflow-migrations with timeout of 20m0s
client.go:547: [debug] Add/Modify event for airflow-run-airflow-migrations: ADDED
client.go:586: [debug] airflow-run-airflow-migrations: Jobs active: 1, jobs failed: 0, jobs succeeded: 0

@Abhinav1598
Copy link

I am facing the exact same error, as per official documentation the postgres db is itself being created in a container, so connection issue should not be there, it just gets stuck at airflow-run-airflow-migrations.

Any resolution will be highly appreciated.

@potiuk
Copy link
Member

potiuk commented Jun 2, 2022

I am facing the exact same error, as per official documentation the postgres db is itself being created in a container, so connection issue should not be there, it just gets stuck at airflow-run-airflow-migrations.

Any resolution will be highly appreciated.

More details as mentioned, are the only way any help can be given to you (or rather yourself looking at the logs of migration job will likely find the reason). Without those details we are not able to help you.

Stating " I have the same problem" without providing any additional details helps no-one to find the root cause. If you state "I have the same problem" you need to provide more detailed logs to bring any value to the discussion here @Abhinav1598

@lordvcs
Copy link

lordvcs commented Sep 13, 2022

K9s will allow you to monitor more logs in your deployment and likely find the right problem - just explore your installation with it.

Tried using K9s still dont see any log output, most of the time it just says stream logs failed container ... for each pod/container that I check.
Any other ideas to debug

@potiuk
Copy link
Member

potiuk commented Sep 13, 2022

kubectl ? How else are you debugging other charts? Just do the same.

@Abhinav1598
Copy link

It’s solved, I was inside my companies vpn, so I was unable to pull the images from docker. I pulled and pushed the images to my remote repo, and it started working. :)

@ahoodasf
Copy link

I had the same issue there were multiple reasons, so thought of sharing

  1. Initially I did not have enough nodes available in my k8s cluster so the airflow-run-airflow-migrations job pod was not getting scheduled
  2. After increasing the number of nodes, I still had the same issue that was because I was using t3 micro (free tier) instances which do not support some kind of networking

@noah-gil
Copy link

I was able to resolve this for my single-node testing cluster. Checking the kubectl describe for the postgresql pod, I noticed that it could not bind a persistent volume claim, which had the effect of every pod failing to connect to postgres (as it could not start). I also noticed that other pods failed to bind a persistent volume claim too, namely the redis and worker pods. The solution was to create 3 persistent volumes with sufficient space (10 GB for 2 of them, and 100 GB for the 3rd) and to make sure the storage classes for each was set to "" (empty string).

@KishinNext
Copy link

KishinNext commented Oct 27, 2022

@noah-gil Do you have the configuration to get the correct cluster to run Airflow? I'm using the last helm chart of Airflow, and I used this configuration for the cluster... but I get the same error :(

apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig

metadata:
  name: airflow
  region: us-east-1
  version: "1.23"

managedNodeGroups:
  - name: workers
    instanceType: t3.medium
    privateNetworking: true
    minSize: 1
    maxSize: 3
    desiredCapacity: 3
    volumeSize: 20
    ssh:
      allow: true
      publicKeyName: airflow-workstation
    labels: { role: worker }
    tags:
      nodegroup-role: worker
    iam:
      withAddonPolicies:
        ebs: true
        imageBuilder: true
        efs: true
        albIngress: true
        autoScaler: true
        cloudWatch: true
        externalDNS: true

@MarianneRay
Copy link

MarianneRay commented Nov 1, 2022

I'm getting the same error. Currently debugging but will open a separate issue if I get to a standstill [ sorry to add on to the comments of this closed issue ]

  - name: Add apache helm chart
    run: |-
      helm repo add apache-airflow https://airflow.apache.org

  - name: Update helm charts
    run: |-
      helm repo update

  - name: Deploy to latest image cluster
    run: |-
      helm upgrade --install $NAMESPACE apache-airflow/airflow \
        --timeout 3m30s  --debug --force --namespace $NAMESPACE --create-namespace \
        --set images.airflow.repository="$ARTIFACT_REGISTRY/gp-ops-controller-$BRANCH_NAME/$REPO_NAME/$BRANCH_NAME/$GITHUB_SHA" \
        --set images.airflow.tag=latest \
        --set images.airflow.pullPolicy=Always \
        --set images.airflow.pullSecretName=registry-credentials \
        --set executor=CeleryExecutor \
        --set pgbouncer.enabled=true \
        --set airflowLocalSettings="" \
        --set secret_key="$AIRFLOW__WEBSERVER__SECRET_KEY" \
        --set logging.remote_logging=true \
        --set logging.remote_base_log_folder="gs://ops-controller-$BRANCH_NAME-bucket/$GITHUB_ENV/dags/logs"

I added the update step after installing the apache-airflow helm chart.
My cluster has 3 bound persistent volumes.
I increased the machine type of my node pool to e2-highmem-2 and it contains 9 nodes across 3 zones.
I have 18 vCPUs in my cluster and 144GB of memory.

Not sure how to proceed. Any suggestions welcome, thank you!

@agarwalYashBCG
Copy link

agarwalYashBCG commented Nov 6, 2022

I tried deleting/uninstalling the airflow deployment and also wiped clean the airflow repo from local helm. Still facing same issue. Will post my progress in case if i'm able to fix it.


client.go:568: [debug] Add/Modify event for airflow-run-airflow-migrations: MODIFIED
client.go:607: [debug] airflow-run-airflow-migrations: Jobs active: 0, jobs failed: 0, jobs succeeded: 0
client.go:568: [debug] Add/Modify event for airflow-run-airflow-migrations: MODIFIED
upgrade.go:434: [debug] warning: Upgrade "airflow" failed: post-upgrade hooks failed: job failed: BackoffLimitExceeded
Error: UPGRADE FAILED: post-upgrade hooks failed: job failed: BackoffLimitExceeded
helm.go:84: [debug] post-upgrade hooks failed: job failed: BackoffLimitExceeded
UPGRADE FAILED
main.newUpgradeCmd.func2
	helm.sh/helm/v3/cmd/helm/upgrade.go:201
github.com/spf13/cobra.(*Command).execute
	github.com/spf13/cobra@v1.5.0/command.go:872
github.com/spf13/cobra.(*Command).ExecuteC
	github.com/spf13/cobra@v1.5.0/command.go:990
github.com/spf13/cobra.(*Command).Execute
	github.com/spf13/cobra@v1.5.0/command.go:918
main.main
	helm.sh/helm/v3/cmd/helm/helm.go:83
runtime.main
	runtime/proc.go:250
runtime.goexit
	runtime/asm_amd64.s:1571

Command: `sudo helm install airflow apache-airflow/airflow --namespace airflow --debug
Machine: AWS EC2 ubuntu, running on kind cluster

Helm Version

ubuntu@AMRAPCMU200050L:~/dir$ helm version
version.BuildInfo{Version:"v3.10.1", GitCommit:"9f88ccb6aee40b9a0535fcc7efea6055e1ef72c9", GitTreeState:"clean", GoVersion:"go1.18.7"}

@potiuk
Copy link
Member

potiuk commented Nov 6, 2022

Without any details while the migration job failed, I am afraid commenting on closed issue will not help. You need to see the logs of the job that failed and post it (ideally as a new issue as this might be completelty different issue).

@agarwalYashBCG
Copy link

Without any details while the migration job failed, I am afraid commenting on closed issue will not help. You need to see the logs of the job that failed and post it (ideally as a new issue as this might be completelty different issue).

Apologies @potiuk, i'll create a new issue with more detailed instructions to reproduce the issue. Cheers!!

@alexlightbody
Copy link

To anyone else who has stumbled upon this thread, for me the issue was Docker Desktop not having enough memory. I increased this to 9gb with a Swap of 2gb and repeated the helm install process and all was fine

@lordvcs
Copy link

lordvcs commented Nov 27, 2022

My issue was fixed when I cleared more of system space

@beascar
Copy link

beascar commented Dec 12, 2022

Without any details while the migration job failed, I am afraid commenting on closed issue will not help. You need to see the logs of the job that failed and post it (ideally as a new issue as this might be completelty different issue).

These are the commands I'm using the check the logs of the failed job:

$ kubectl describe job airflow-run-airflow-migrations
Name:             airflow-run-airflow-migrations
Namespace:        airflow
Selector:         controller-uid=3a6f5bd7-2128-42be-a28d-7a50b215ff3f
Labels:           chart=airflow-1.7.0
                  component=run-airflow-migrations
                  heritage=Helm
                  release=airflow
                  tier=airflow
Annotations:      batch.kubernetes.io/job-tracking: 
                  helm.sh/hook: post-install,post-upgrade
                  helm.sh/hook-delete-policy: before-hook-creation,hook-succeeded
                  helm.sh/hook-weight: 1
Parallelism:      1
Completions:      1
Completion Mode:  NonIndexed
Start Time:       Mon, 12 Dec 2022 09:44:39 -0700
Pods Statuses:    1 Active (0 Ready) / 0 Succeeded / 0 Failed
Pod Template:
  Labels:           component=run-airflow-migrations
                    controller-uid=3a6f5bd7-2128-42be-a28d-7a50b215ff3f
                    job-name=airflow-run-airflow-migrations
                    release=airflow
                    tier=airflow
  Service Account:  airflow-migrate-database-job
  Containers:
   run-airflow-migrations:
    Image:      apache/airflow:2.4.1
    Port:       <none>
    Host Port:  <none>
    Args:
      bash
      -c
      exec \
      airflow db upgrade
    Environment:
      PYTHONUNBUFFERED:                     1
      AIRFLOW__CORE__FERNET_KEY:            <set to the key 'fernet-key' in secret 'airflow-fernet-key'>                      Optional: false
      AIRFLOW__CORE__SQL_ALCHEMY_CONN:      <set to the key 'connection' in secret 'airflow-airflow-metadata'>                Optional: false
      AIRFLOW__DATABASE__SQL_ALCHEMY_CONN:  <set to the key 'connection' in secret 'airflow-airflow-metadata'>                Optional: false
      AIRFLOW_CONN_AIRFLOW_DB:              <set to the key 'connection' in secret 'airflow-airflow-metadata'>                Optional: false
      AIRFLOW__WEBSERVER__SECRET_KEY:       <set to the key 'webserver-secret-key' in secret 'airflow-webserver-secret-key'>  Optional: false
      AIRFLOW__CELERY__BROKER_URL:          <set to the key 'connection' in secret 'airflow-broker-url'>                      Optional: false
    Mounts:
      /opt/airflow/airflow.cfg from config (ro,path="airflow.cfg")
  Volumes:
   config:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      airflow-airflow-config
    Optional:  false
Events:
  Type    Reason            Age    From            Message
  ----    ------            ----   ----            -------
  Normal  SuccessfulCreate  7m16s  job-controller  Created pod: airflow-run-airflow-migrations-wtlzd

$ kubectl logs airflow-run-airflow-migrations-wtlzd
Error from server (BadRequest): container "run-airflow-migrations" in pod "airflow-run-airflow-migrations-wtlzd" is waiting to start: trying and failing to pull image

Looks like my issue is similar to the one described above by @Abhinav1598... but unsure which image is the one that is causing the failure.

@potiuk
Copy link
Member

potiuk commented Dec 22, 2022

You myst check your logs on K8S - this is absolutely normal for you as someone who manages k8s installation to fix any problems and be able to diagnose this. You have to learn it I am afraid @beascar. Various tools (kubectl, helm, k9s) are useful for that and your job is basically to master them. You chose k8s as your deployment, so you need to understand how to diagnose various problems there as a consequence.

I cannot solve and diagnose your k8s installation for you, but If you are not familiar with using kubectl (you should eventually), one useful tool to use is helm install --dry-run - it will show you the resources that Helm chart creates after applying all the templates - just find the right Pod/container and you will find what image it pulls. And you can also check this way what are the resources created by helm. K9s is also useful to look at your k8s installation in "exploratory" way - and it allows to learn how k8s works much faster.

Good luck with the diagnoses.

@amorskoy
Copy link

amorskoy commented Mar 8, 2023

I have spectated, that in my case postgres pod were in pending state due to claim - hope it helps some of you above

image

@Amin-Siddique
Copy link

for windows after increasing WSL memory, it worked!! https://learn.microsoft.com/en-us/windows/wsl/wsl-config#configure-global-options-with-wslconfig

@curlup
Copy link
Contributor

curlup commented Sep 7, 2023

Hi

Same issue. Helm output

client.go:339: [debug] jobs.batch "airflow-run-airflow-migrations" not found
client.go:128: [debug] creating 1 resource(s)
client.go:540: [debug] Watching for changes to Job airflow-run-airflow-migrations with timeout of 5m0s
client.go:568: [debug] Add/Modify event for airflow-run-airflow-migrations: ADDED
client.go:607: [debug] airflow-run-airflow-migrations: Jobs active: 1, jobs failed: 0, jobs succeeded: 0
client.go:568: [debug] Add/Modify event for airflow-run-airflow-migrations: MODIFIED
client.go:607: [debug] airflow-run-airflow-migrations: Jobs active: 1, jobs failed: 0, jobs succeeded: 0
client.go:568: [debug] Add/Modify event for airflow-run-airflow-migrations: MODIFIED
client.go:607: [debug] airflow-run-airflow-migrations: Jobs active: 1, jobs failed: 0, jobs succeeded: 0
client.go:568: [debug] Add/Modify event for airflow-run-airflow-migrations: MODIFIED
client.go:607: [debug] airflow-run-airflow-migrations: Jobs active: 0, jobs failed: 0, jobs succeeded: 0
client.go:568: [debug] Add/Modify event for airflow-run-airflow-migrations: MODIFIED
client.go:310: [debug] Starting delete for "airflow-create-user" Job
client.go:339: [debug] jobs.batch "airflow-create-user" not found
client.go:128: [debug] creating 1 resource(s)
client.go:540: [debug] Watching for changes to Job airflow-create-user with timeout of 5m0s
client.go:568: [debug] Add/Modify event for airflow-create-user: ADDED
client.go:607: [debug] airflow-create-user: Jobs active: 1, jobs failed: 0, jobs succeeded: 0
client.go:568: [debug] Add/Modify event for airflow-create-user: MODIFIED
client.go:607: [debug] airflow-create-user: Jobs active: 1, jobs failed: 0, jobs succeeded: 0
client.go:568: [debug] Add/Modify event for airflow-create-user: MODIFIED
client.go:607: [debug] airflow-create-user: Jobs active: 1, jobs failed: 0, jobs succeeded: 0
client.go:568: [debug] Add/Modify event for airflow-create-user: MODIFIED
client.go:607: [debug] airflow-create-user: Jobs active: 0, jobs failed: 0, jobs succeeded: 0
upgrade.go:434: [debug] warning: Upgrade "airflow" failed: post-upgrade hooks failed: timed out waiting for the condition
Error: UPGRADE FAILED: post-upgrade hooks failed: timed out waiting for the condition
helm.go:84: [debug] post-upgrade hooks failed: timed out waiting for the condition
UPGRADE FAILED
main.newUpgradeCmd.func2
	helm.sh/helm/v3/cmd/helm/upgrade.go:201
github.com/spf13/cobra.(*Command).execute
	github.com/spf13/cobra@v1.5.0/command.go:872
github.com/spf13/cobra.(*Command).ExecuteC
	github.com/spf13/cobra@v1.5.0/command.go:990
github.com/spf13/cobra.(*Command).Execute
	github.com/spf13/cobra@v1.5.0/command.go:918
main.main
	helm.sh/helm/v3/cmd/helm/helm.go:83
runtime.main
	runtime/proc.go:250
runtime.goexit
	runtime/asm_amd64.s:1594

Migration job log shows all done but never gets "success" state?


Container: run-airflow-migrations
Filter
Disconnected

/home/airflow/.local/lib/python3.10/site-packages/airflow/models/base.py:49 MovedIn20Warning: Deprecated API features detected! These feature(s) are not compatible with SQLAlchemy 2.0. To prevent incompatible upgrades prior to updating applications, ensure requirements files are pinned to "sqlalchemy<2.0". Set environment variable SQLALCHEMY_WARN_20=1 to show all deprecation warnings.  Set environment variable SQLALCHEMY_SILENCE_UBER_WARNING=1 to silence this message. (Background on SQLAlchemy 2.0 at: https://sqlalche.me/e/b8d9)
DB: postgresql://postgres:***@airflow-pgbouncer.hector-staging:6543/airflow-metadata?sslmode=disable
Performing upgrade with database postgresql://postgres:***@airflow-pgbouncer.hector-staging:6543/airflow-metadata?sslmode=disable
[2023-09-07T17:26:46.733+0000] {migration.py:205} INFO - Context impl PostgresqlImpl.
[2023-09-07T17:26:46.734+0000] {migration.py:208} INFO - Will assume transactional DDL.
[2023-09-07T17:26:46.751+0000] {db.py:1571} INFO - Creating tables
INFO  [alembic.runtime.migration] Context impl PostgresqlImpl.
INFO  [alembic.runtime.migration] Will assume transactional DDL.
Upgrades done
$ kubectl --insecure-skip-tls-verify  get jobs -n hector-staging airflow-run-airflow-migrations
NAME                             COMPLETIONS   DURATION   AGE
airflow-run-airflow-migrations   1/1           14s        74s
$ kubectl --insecure-skip-tls-verify  describe jobs -n hector-staging airflow-run-airflow-migrations
Name:             airflow-run-airflow-migrations
Namespace:        hector-staging
Selector:         controller-uid=11553df0-9e05-42ef-ae5a-78146ff935a7
Labels:           chart=airflow-1.9.0
                  component=run-airflow-migrations
                  heritage=Helm
                  release=airflow
                  tier=airflow
Annotations:      batch.kubernetes.io/job-tracking:
                  helm.sh/hook: post-install,post-upgrade
                  helm.sh/hook-delete-policy: before-hook-creation,hook-succeeded
                  helm.sh/hook-weight: 1
Parallelism:      1
Completions:      1
Completion Mode:  NonIndexed
Start Time:       Thu, 07 Sep 2023 13:54:28 -0400
Completed At:     Thu, 07 Sep 2023 13:54:42 -0400
Duration:         14s
Pods Statuses:    0 Active (0 Ready) / 1 Succeeded / 0 Failed
Pod Template:
  Labels:           component=run-airflow-migrations
                    controller-uid=11553df0-9e05-42ef-ae5a-78146ff935a7
                    job-name=airflow-run-airflow-migrations
                    release=airflow
                    tier=airflow
  Service Account:  airflow-migrate-database-job
  Containers:
   run-airflow-migrations:
    Image:     airflow/master:latest
    Port:       <none>
    Host Port:  <none>
    Args:
      bash
      -c
      exec \
      airflow db upgrade
    Environment Variables from:
      airflow-auth-provider  Secret  Optional: false
    Environment:
      PYTHONUNBUFFERED:                     1
      AIRFLOW__CORE__FERNET_KEY:            <set to the key 'fernet-key' in secret 'airflow-fernet-key'>               Optional: false
      AIRFLOW__CORE__SQL_ALCHEMY_CONN:      <set to the key 'connection' in secret 'airflow-airflow-metadata'>         Optional: false
      AIRFLOW__DATABASE__SQL_ALCHEMY_CONN:  <set to the key 'connection' in secret 'airflow-airflow-metadata'>         Optional: false
      AIRFLOW_CONN_AIRFLOW_DB:              <set to the key 'connection' in secret 'airflow-airflow-metadata'>         Optional: false
      AIRFLOW__WEBSERVER__SECRET_KEY:       <set to the key 'webserver-secret-key' in secret 'airflow-webserver-key'>  Optional: false
    Mounts:
      /opt/airflow/airflow.cfg from config (ro,path="airflow.cfg")
      /opt/airflow/config/airflow_local_settings.py from config (ro,path="airflow_local_settings.py")
  Volumes:
   config:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      airflow-airflow-config
    Optional:  false
Events:        <none>

@potiuk
Copy link
Member

potiuk commented Sep 7, 2023

Increase timeout (look at help of helm), or increase memory (check your resources settings. Or if you use Argo or Similar look at our docs for chart https://airflow.apache.org/docs/helm-chart/stable/index.html#installing-the-chart-with-argo-cd-flux-rancher-or-terraform

@curlup
Copy link
Contributor

curlup commented Sep 11, 2023

Thanks @potiuk

Can you by any chance elaborate on why does one need to

createUserJob:
  useHelmHooks: false
  applyCustomEnv: false
migrateDatabaseJob:
  useHelmHooks: false
  applyCustomEnv: false

for Argo, Rancher etc? As in: why without this (or with this? I'm confused now) the migrations will not be run?

@potiuk
Copy link
Member

potiuk commented Sep 11, 2023

I think it's the question to Argo and Rancher.

The current way works with standard Helm - they seem to use the hooks in a non-standard way, but maybe you can help developing better ways. We are open-source projects so we aim to support standards, not commercial solutions that somewhat modified it. But if you use such a solution and want to help with making it better supported - cool.

Some of the initial reasoning was described here #17447 but if someone (you?) find a better way of supporting Argo/Rancher that's cool. We are happy to accept contributions to make it easier/better. I personally don't use Argo, so I am not able to comment more other than - this is the way someone at some point found as working solution. But if someone else finds a better way and can confirm it works (and keeps it working for regular Helm Chart - this is even cooler).

Airflow is created by > 2600 contributors - and often people who miss something or find it confusing, spend time to fix it better and contribute back. So - if you think you can help with analysing and providing a better fix - cool.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:helm-chart Airflow Helm Chart kind:bug This is a clearly a bug pending-response
Projects
None yet
Development

No branches or pull requests