Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can not get more logs of DAGS on UI when deploy on Kubernetes #31054

Open
1 of 2 tasks
chuxiangfeng opened this issue May 4, 2023 · 25 comments · Fixed by #31101
Open
1 of 2 tasks

Can not get more logs of DAGS on UI when deploy on Kubernetes #31054

chuxiangfeng opened this issue May 4, 2023 · 25 comments · Fixed by #31101
Labels
affected_version:2.6 Issues Reported for 2.6 area:core area:logging kind:bug This is a clearly a bug

Comments

@chuxiangfeng
Copy link

Apache Airflow version

2.6.0

What happened

After update airflow to 2.6.0 and apache-airflow-providers-cncf-kubernetes==6.1.0,Can not get more logs of DAGS on UI when deploy on Kubernetes,It was like suspended animation, but it was able to get the log in the end.
Before update,Kubernetes tasks logs will look like this in UI:

{pod_manager.py:285} WARNING - Pod not yet started: t-crawler-gem-ipo-request-71zd2ngd
{pod_manager.py:285} WARNING - Pod not yet started: t-crawler-gem-ipo-request-71zd2ngd
.......
{pod_manager.py:397} INFO - Pod t-crawler-gem-ipo-request-71zd2ngd has phase Running
{pod_manager.py:397} INFO - Pod t-crawler-gem-ipo-request-71zd2ngd has phase Running
{pod_manager.py:397} INFO - Pod t-crawler-gem-ipo-request-71zd2ngd has phase Running
{pod_manager.py:397} INFO - Pod t-crawler-gem-ipo-request-71zd2ngd has phase Running
{pod_manager.py:397} INFO - Pod t-crawler-gem-ipo-request-71zd2ngd has phase Running
........
{pod.py:724} INFO - Skipping deleting pod: t-crawler-gem-ipo-request-71zd2ngd
{taskinstance.py:1373} INFO - Marking task as SUCCESS. dag_id=d_crawler_gem_ipo, task_id=t_crawler_gem_ipo_request, execution_date=20230504T083622, start_date=20230504T083623, end_date=20230504T084155
{local_task_job_runner.py:232} INFO - Task exited with return code 0
{taskinstance.py:2674} INFO - 0 downstream tasks scheduled from follow-on schedule check

But now is likes this:

{pod_manager.py:285} WARNING - Pod not yet started: t-crawler-gem-ipo-request-71zd2ngd
{pod_manager.py:285} WARNING - Pod not yet started: t-crawler-gem-ipo-request-71zd2ngd

.......
(In this stage, there is no Pod startup log and no subsequent print during the Running period, so the Pod Running log cannot be recognized, In fact, Pod is running normally on k8s. f12 of Chrome has been calling the rotation training query log, but no updated log content can be obtained. At the end of the Pod it prints out Pod t-crawler-gem-ipo-request-71zd2ngd has phase Running, In theory, the print Pod t-crawler-gem-ipo-request-71zd2ngd has phase Running)

ERROR - Error parsing timestamp (no timestamp in message ''). Will continue execution but won't update timestamp
{pod_manager.py:342} INFO - 
{pod_manager.py:397} INFO - Pod t-crawler-gem-ipo-request-71zd2ngd has phase Running

What you think should happen instead

No response

How to reproduce

Just start a KubernetesPod task.

Operating System

PRETTY_NAME="Debian GNU/Linux 11 (bullseye)" NAME="Debian GNU/Linux" VERSION_ID="11" VERSION="11 (bullseye)" VERSION_CODENAME=bullseye ID=debian HOME_URL="https://www.debian.org/" SUPPORT_URL="https://www.debian.org/support" BUG_REPORT_URL="https://bugs.debian.org/"

Versions of Apache Airflow Providers

apache-airflow==2.6.0
apache-airflow-providers-cncf-kubernetes==6.1.0
apache-airflow-providers-common-sql==1.4.0
apache-airflow-providers-dingding==3.1.0
apache-airflow-providers-elasticsearch==4.4.0
apache-airflow-providers-ftp==3.3.1
apache-airflow-providers-http==4.3.0
apache-airflow-providers-imap==3.1.1
apache-airflow-providers-mongo==3.1.1
apache-airflow-providers-mysql==5.0.0
apache-airflow-providers-postgres==5.4.0
apache-airflow-providers-redis==3.1.0
apache-airflow-providers-sqlite==3.3.2

Deployment

Official Apache Airflow Helm Chart

Deployment details

No response

Anything else

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

@chuxiangfeng chuxiangfeng added area:core kind:bug This is a clearly a bug needs-triage label for new issues that we didn't triage yet labels May 4, 2023
@iJanki-gr
Copy link
Contributor

iJanki-gr commented May 5, 2023

I have a similar issues related to logs after upgrading to docker image 2.6.0 (I tried both helm chart 1.8.0 and 1.9.0)
The task logs (KPO) are not visible in the UI. It's completely blank. I noticed that immediately after a task is started I see logs for a fraction of a second and then they disappear.
No error logs in webserver, scheduler or worker. (I use celeryworker)

P.S After a task run I also see logs until
WARNING - Pod not yet started... lines
but then it goes blank.

@potiuk
Copy link
Member

potiuk commented May 5, 2023

cc: @dstandish - seems that it is a bit more widespread than just celery case. we might want to take a look closer.

Can you please @chuxiangfeng @iJanki-gr describe your deployment configuration (logging configuratio, whether your logs are distrubuted and shared among multiple components (log folder persistency) and whether you use remote logging at all).

@potiuk
Copy link
Member

potiuk commented May 5, 2023

Related: #31083, #31059

@potiuk potiuk added this to the Airflow 2.6.1 milestone May 5, 2023
@potiuk potiuk added area:logging and removed needs-triage label for new issues that we didn't triage yet labels May 5, 2023
@iJanki-gr
Copy link
Contributor

iJanki-gr commented May 5, 2023

I use official helm chart. No remote logging, each worker has its own pvc. In this instance though I have only 1 worker. The logs are present in the worker pod log directory, but webserver is not showing them for some reason.
It was all working fine on airflow 2.5.3 and kubernetes provider==5.2.2

@potiuk
Copy link
Member

potiuk commented May 5, 2023

I use official helm chart. No remote logging, each worker has its own pvc. In this instance though I have only 1 worker. The logs are present in the worker pod log directory, but webserver is not showing them for some reason.

Do you have log persistence enabled in the chart? Only then the logs would be present in the webserver.

@potiuk
Copy link
Member

potiuk commented May 5, 2023

Enabling log persistence is a very quick solution/workaround to the problem - not sure if final and the only one we will suggest, but it is confirmed to solve the problem

@potiuk potiuk added the affected_version:2.6 Issues Reported for 2.6 label May 5, 2023
@iJanki-gr
Copy link
Contributor

Yes I have log persistence. Problem is still present.

@potiuk
Copy link
Member

potiuk commented May 5, 2023

Yes I have log persistence. Problem is still present.

You mentioned that your webserver does not have the log folder - likely you do not have persistence properly configured because if you have log persistence enabled, the logs should appear in your webserver too.. Did you check if you have the right configuration of log persistence and the settings you have it configured with is properly set for your k8S? do you have the PVC created for logs and is it properly mounted in your wbeserver (the same PVC volume should be mounted for your worker and for your webserver if you have it properly configured)

Can you double check those and see if it works for you - if you think you have log persistence enabled and you do not see the logs in the log folder in your webserver then you need to double check your configuration. Once you fix it and logs will be present in both - worker and webserver, the problem should be fixed.

@iJanki-gr
Copy link
Contributor

I will double check that. But I thought the webserver was connecting to the worker, and the worker was serving the logs from port 8793.
Has something changed in airflow 2.6.0?

@iJanki-gr
Copy link
Contributor

The persistence is enabled in workers. There is no persistence configurable for webserver in the helm chart. But the log serving from worker was working fine regardless

@potiuk
Copy link
Member

potiuk commented May 5, 2023

Yes. This is the bug that we have in 2.6.0 and will be fixed in 2.6.1 - what I proposed you was a workaround you could implement so that you do not have to wait @iJanki-gr in case you missed the context.

@potiuk
Copy link
Member

potiuk commented May 5, 2023

I also suggest - when we release 2.6.1rc with a fix coming here: #31101 I strongly recommend that you take part in testing it and verifying if the fix works. This is the least you can do to help the community that mostly works in their free time so that you can get software for free.

@iJanki-gr
Copy link
Contributor

Cool, I didn't realise the issue was already identified. Sure I can test the new RC, thanks.

@eladkal eladkal removed this from the Airflow 2.6.2 milestone May 12, 2023
@DonchenkoNadia
Copy link

I also faced that issue after upgrading to 2.6.0. Deployment of 2.6.1 did not help. Still do not see logs for the tasks in the UI.
Where else can I see the logs?

@iJanki-gr
Copy link
Contributor

Just mentioning that version 2.6.1 worked for me.

@DonchenkoNadia
Copy link

Have to apologize....
Initially, I failed to upgrade properly from 2.6.0 to 2.6.1. Once I managed to do it, logs started to be visible in UI. So many thanks for provided fix.

@chuxiangfeng
Copy link
Author

i‘m sorry to say after update to 2.6.1, the logs still in that state as question above.

@potiuk potiuk reopened this Jun 19, 2023
@potiuk
Copy link
Member

potiuk commented Jun 19, 2023

Reopened. Can you please see if you can provide fresh logs/more details @chuxiangfeng ? CC: @dstandish - possibly worth taking a look.

@chuxiangfeng
Copy link
Author

Reopened. Can you please see if you can provide fresh logs/more details @chuxiangfeng ? CC: @dstandish - possibly worth taking a look.

*** Found local files:
***   * /opt/airflow/logs/dag_id=d_crawler_bond_chinabond/run_id=scheduled__2023-06-19T18:00:00+00:00/task_id=t_crawler_bond_chinabond_request/attempt=1.log
[2023-06-21, 02:00:04 CST] {taskinstance.py:1103} INFO - Dependencies all met for dep_context=non-requeueable deps ti=<TaskInstance: d_crawler_bond_chinabond.t_crawler_bond_chinabond_request scheduled__2023-06-19T18:00:00+00:00 [queued]>
[2023-06-21, 02:00:04 CST] {taskinstance.py:1103} INFO - Dependencies all met for dep_context=requeueable deps ti=<TaskInstance: d_crawler_bond_chinabond.t_crawler_bond_chinabond_request scheduled__2023-06-19T18:00:00+00:00 [queued]>
[2023-06-21, 02:00:04 CST] {taskinstance.py:1308} INFO - Starting attempt 1 of 4
[2023-06-21, 02:00:04 CST] {taskinstance.py:1327} INFO - Executing <Task(CustomKubernetesPodOperator): t_crawler_bond_chinabond_request> on 2023-06-19 18:00:00+00:00
[2023-06-21, 02:00:04 CST] {standard_task_runner.py:57} INFO - Started process 2021068 to run task
[2023-06-21, 02:00:04 CST] {standard_task_runner.py:84} INFO - Running: ['airflow', 'tasks', 'run', 'd_crawler_bond_chinabond', 't_crawler_bond_chinabond_request', 'scheduled__2023-06-19T18:00:00+00:00', '--job-id', '36560', '--raw', '--subdir', 'DAGS_FOLDER/wudao/custom_dags/to_ods_dags/crawler/bond_chinabond/crawler_bond_chinabond.py', '--cfg-path', '/tmp/tmpgdv7ctnx']
[2023-06-21, 02:00:04 CST] {standard_task_runner.py:85} INFO - Job 36560: Subtask t_crawler_bond_chinabond_request
[2023-06-21, 02:00:04 CST] {task_command.py:410} INFO - Running <TaskInstance: d_crawler_bond_chinabond.t_crawler_bond_chinabond_request scheduled__2023-06-19T18:00:00+00:00 [running]> on host airflow-scheduler-0.airflow-scheduler.airflow-schedule.svc.cluster.local
[2023-06-21, 02:00:05 CST] {taskinstance.py:1547} INFO - Exporting env vars: AIRFLOW_CTX_DAG_OWNER='airflow' AIRFLOW_CTX_DAG_ID='d_crawler_bond_chinabond' AIRFLOW_CTX_TASK_ID='t_crawler_bond_chinabond_request' AIRFLOW_CTX_EXECUTION_DATE='2023-06-19T18:00:00+00:00' AIRFLOW_CTX_TRY_NUMBER='1' AIRFLOW_CTX_DAG_RUN_ID='scheduled__2023-06-19T18:00:00+00:00'
[2023-06-21, 02:00:05 CST] {pod.py:850} INFO - Building pod t-crawler-bond-chinabond-request-86p5zmlg with labels: {'dag_id': 'd_crawler_bond_chinabond', 'task_id': 't_crawler_bond_chinabond_request', 'run_id': 'scheduled__2023-06-19T1800000000-3697f8a44', 'kubernetes_pod_operator': 'True', 'try_number': '1'}
[2023-06-21, 02:00:05 CST] {base.py:73} INFO - Using connection ID 'kubernetes_default' for task execution.
[2023-06-21, 02:00:05 CST] {pod.py:489} INFO - Found matching pod t-crawler-bond-chinabond-request-86p5zmlg with labels {'airflow_kpo_in_cluster': 'True', 'airflow_version': '2.6.1', 'dag_id': 'd_crawler_bond_chinabond', 'kubernetes_pod_operator': 'True', 'run_id': 'scheduled__2023-06-19T1800000000-3697f8a44', 'task_id': 't_crawler_bond_chinabond_request', 'try_number': '1'}
[2023-06-21, 02:00:05 CST] {pod.py:490} INFO - `try_number` of task_instance: 1
[2023-06-21, 02:00:05 CST] {pod.py:491} INFO - `try_number` of pod: 1
[2023-06-21, 02:00:05 CST] {pod_manager.py:285} WARNING - Pod not yet started: t-crawler-bond-chinabond-request-86p5zmlg
[2023-06-21, 02:00:06 CST] {pod_manager.py:285} WARNING - Pod not yet started: t-crawler-bond-chinabond-request-86p5zmlg
[2023-06-21, 02:00:07 CST] {pod_manager.py:285} WARNING - Pod not yet started: t-crawler-bond-chinabond-request-86p5zmlg
[2023-06-21, 05:40:11 CST] {pod_manager.py:397} INFO - Pod t-crawler-bond-chinabond-request-86p5zmlg has phase Running
[2023-06-21, 05:40:13 CST] {airflow_extends.py:248} INFO - Deleting pod: t-crawler-bond-chinabond-request-86p5zmlg
[2023-06-21, 05:40:13 CST] {logging_mixin.py:149} WARNING - /home/airflow/.local/lib/python3.7/site-packages/airflow/models/baseoperator.py:1453 RemovedInAirflow3Warning: Passing 'execution_date' to 'TaskInstance.xcom_push()' is deprecated.
[2023-06-21, 05:40:13 CST] {taskinstance.py:1350} INFO - Marking task as SUCCESS. dag_id=d_crawler_bond_chinabond, task_id=t_crawler_bond_chinabond_request, execution_date=20230619T180000, start_date=20230620T180004, end_date=20230620T214013
[2023-06-21, 05:40:13 CST] {local_task_job_runner.py:225} INFO - Task exited with return code 0
[2023-06-21, 05:40:14 CST] {taskinstance.py:2651} INFO - 1 downstream tasks scheduled from follow-on schedule check

this is all log show in ui, from 2023-06-21, 02:00:05 CST to 2023-06-21, 02:00:07 CST, pod is running, but the logs are always in loading state in ui, after pod end, after the pod runs, the ui logs are printed [2023-06-21, 05:40:11 CST] {pod_manager.py:397} INFO - Pod t-crawler-bond-chinabond-request-86p5zmlg has phase Running, no logs are printed for three hours during the running period, and ui logs are always loading.

@potiuk
Copy link
Member

potiuk commented Jun 23, 2023

Can you please update to 2.6.2 - there were a few issues connected to logs in 2.6.1 and 2.6.2 and they seem to be related (see for "logs" in the release notes). I think they should fix your issue. Closing it provisionally - we can always reopen if upgrade does not help.

@potiuk potiuk closed this as completed Jun 23, 2023
@potiuk
Copy link
Member

potiuk commented Jun 23, 2023

cc: @dstandish

@potiuk potiuk added the duplicate Issue that is duplicated label Jun 23, 2023
@chuxiangfeng
Copy link
Author

sorry to say the problem still exists after the upgrade to 2.6.3.

@potiuk potiuk reopened this Jul 13, 2023
@potiuk
Copy link
Member

potiuk commented Jul 13, 2023

can you please provide more logging information and describe exactly the circumstances you have @chuxiangfeng ? That would help to focus the problem.

@potiuk potiuk removed the duplicate Issue that is duplicated label Jul 13, 2023
@adminidever
Copy link

in 2.7 the problem returned

@potiuk
Copy link
Member

potiuk commented Sep 17, 2023

I believe (again) it does not add value to say 'it's returned' as opposed to providing more information/logs and details of the circumstances when it returned (I know it's much more time to gather all that info but helping those who might want to help to diagnose the problem and help to solve it.

But maybe It's too high of an expectation for users of free software like Airflow? Not sure. Maybe they do not understand that people contribute here and help others in their free time so and it's a good idea to help them by providing more information?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
affected_version:2.6 Issues Reported for 2.6 area:core area:logging kind:bug This is a clearly a bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants