Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TaskInstances do not succeed when using enable_logging=True option in DockerSwarmOperator #28452

Closed
2 tasks done
chriscmorgan opened this issue Dec 19, 2022 · 7 comments · Fixed by #35677
Closed
2 tasks done

Comments

@chriscmorgan
Copy link

chriscmorgan commented Dec 19, 2022

Apache Airflow Provider(s)

docker

Versions of Apache Airflow Providers

apache-airflow-providers-celery==3.1.0
apache-airflow-providers-docker==3.3.0

Apache Airflow version

2.5.0

Operating System

centos 7

Deployment

Other Docker-based deployment

Deployment details

Running an a docker-swarm cluster deployed locally.

What happened

Same issue as #13675

With logging_enabled=True the DAG never completes and stays in running.

When using DockerSwarmOperator together with the default enable_logging=True option, tasks do not succeed and stay in state running. When checking the docker service logs I can clearly see that the container ran and ended successfully. Airflow however does not recognize that the container finished and keeps the tasks in state running.

What you think should happen instead

DAG should complete.

How to reproduce

Docker-compose deployment:

curl -LfO 'https://airflow.apache.org/docs/apache-airflow/2.5.0/docker-compose.yaml'
docker compose up airflow-init
docker compose up -d

DAG code:

from airflow import DAG
from docker.types import Mount, SecretReference
from airflow.providers.docker.operators.docker_swarm import DockerSwarmOperator
from datetime import timedelta
from airflow.utils.dates import days_ago
from airflow.models import Variable


# Setup default args for the job
default_args = {
	'owner': 'airflow',
	'start_date': days_ago(2),
	'retries': 0
}
 
# Create the DAG
dag = DAG(
    'test_dag',         # DAG ID
    default_args=default_args,
    schedule_interval='0 0 * * *', 
    catchup=False
)

# # Create the DAG object
with dag as dag:
	  docker_swarm_task = DockerSwarmOperator(
			task_id="job_run",
			image="<any image>",
			execution_timeout=timedelta(minutes=5),
			command="<specific code>",
			api_version='auto',
			tty=True,
			enable_logging=True
		)

Anything else

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

@chriscmorgan chriscmorgan added area:providers kind:bug This is a clearly a bug labels Dec 19, 2022
@boring-cyborg
Copy link

boring-cyborg bot commented Dec 19, 2022

Thanks for opening your first issue here! Be sure to follow the issue template!

@maxim317
Copy link

#13675

enable_logging=False,

@chriscmorgan
Copy link
Author

@maxim317, yes disabling logging is a workaround. Ideally we would like the logs though in airflow.

@potiuk
Copy link
Member

potiuk commented Jan 16, 2023

Marked it as good first issue, hopefully someone wil take a look at this one.

@atulpillai
Copy link

can I work on this?

@potiuk
Copy link
Member

potiuk commented Apr 17, 2023

Feel free, but there is a linked PR for that so better ask there.

@atulpillai
Copy link

yeah I see the pull request recently updated and active. my bad!

@atulpillai atulpillai removed their assignment May 5, 2023
stavdav143 added a commit to stavdav143/airflow that referenced this issue Nov 16, 2023
…ogging=True option in DockerSwarmOperator"

It introduces logging of Docker Swarm services which was previously not
working.
stavdav143 added a commit to stavdav143/airflow that referenced this issue Dec 3, 2023
stavdav143 added a commit to stavdav143/airflow that referenced this issue Dec 5, 2023
stavdav143 added a commit to stavdav143/airflow that referenced this issue Dec 6, 2023
potiuk pushed a commit to stavdav143/airflow that referenced this issue Dec 6, 2023
…ogging=True option in DockerSwarmOperator"

It introduces logging of Docker Swarm services which was previously not
working.
potiuk pushed a commit to stavdav143/airflow that referenced this issue Dec 6, 2023
potiuk pushed a commit to stavdav143/airflow that referenced this issue Dec 6, 2023
potiuk pushed a commit to stavdav143/airflow that referenced this issue Dec 6, 2023
potiuk pushed a commit that referenced this issue Dec 6, 2023
* Fixes #28452: "TaskInstances do not succeed when using enable_logging=True option in DockerSwarmOperator"
It introduces logging of Docker Swarm services which was previously not
working.

* tty=True/False to be chosen by the user as it was the case before this fix (#28452)

* Follow=true for logs will always result in tasks not ending. This is  standard and provided upstream by the Docker API.
Therefore in DockerSwarmOperator follow is always false.

* service_logs called multiple times as we continuously poll the Docker API for logs. As we indicated in the previous commmit,
the docker client malfunctions when we try to get the logs with follow=True. Therefore we make multiple calls to the API (every 2 seconds), to fetch the new logs.

* service_logs called multiple times. In this test the tasks increase (6 instead of 5) as we check if the service has terminated (+1). As this assertion makes less sense in a situation where we do multiple calls to the Docker API (polling), we might think of removing it or replacing it with something more suitable.

* Final commit of this PR marking the test case that validates logging in the Docker Swarm Operator. We log two times a different message and we assert that the two lines are given back in the logs in the expected sequence.

* Formatting ruff

* Reverting as Github actions don't run this test as a swarm node:

docker.errors.APIError: 503 Server Error for http+docker://localhost/v1.43/services/create: Service Unavailable ("This node is not a swarm manager. Use "docker swarm init" or "docker swarm join" to connect this node to swarm and try again.")

Revert "Final commit of this PR marking the test case that validates logging in the Docker Swarm Operator. We log two times a different message and we assert that the two lines are given back in the logs in the expected sequence."

This reverts commit 048ba1e.

* Logging "since" timestamp to avoid memory issues
Fix for #28452

* Formatting - Fix for #28452
stavdav143 added a commit to stavdav143/airflow that referenced this issue Dec 8, 2023
eladkal pushed a commit that referenced this issue Dec 9, 2023
* Fixes #28452: "TaskInstances do not succeed when using enable_logging=True option in DockerSwarmOperator"
It introduces logging of Docker Swarm services which was previously not
working.

* tty=True/False to be chosen by the user as it was the case before this fix (#28452)

* Follow=true for logs will always result in tasks not ending. This is  standard and provided upstream by the Docker API.
Therefore in DockerSwarmOperator follow is always false.

* service_logs called multiple times as we continuously poll the Docker API for logs. As we indicated in the previous commmit,
the docker client malfunctions when we try to get the logs with follow=True. Therefore we make multiple calls to the API (every 2 seconds), to fetch the new logs.

* service_logs called multiple times. In this test the tasks increase (6 instead of 5) as we check if the service has terminated (+1). As this assertion makes less sense in a situation where we do multiple calls to the Docker API (polling), we might think of removing it or replacing it with something more suitable.

* Final commit of this PR marking the test case that validates logging in the Docker Swarm Operator. We log two times a different message and we assert that the two lines are given back in the logs in the expected sequence.

* Formatting ruff

* Reverting as Github actions don't run this test as a swarm node:

docker.errors.APIError: 503 Server Error for http+docker://localhost/v1.43/services/create: Service Unavailable ("This node is not a swarm manager. Use "docker swarm init" or "docker swarm join" to connect this node to swarm and try again.")

Revert "Final commit of this PR marking the test case that validates logging in the Docker Swarm Operator. We log two times a different message and we assert that the two lines are given back in the logs in the expected sequence."

This reverts commit 048ba1e.

* Logging "since" timestamp to avoid memory issues
Fix for #28452

* Formatting - Fix for #28452

* Fixes #28452: "TaskInstances do not succeed when using enable_logging=True option in DockerSwarmOperator"
It introduces logging of Docker Swarm services which was previously not
working.

* tty=True/False to be chosen by the user as it was the case before this fix (#28452)

* Follow=true for logs will always result in tasks not ending. This is  standard and provided upstream by the Docker API.
Therefore in DockerSwarmOperator follow is always false.

* service_logs called multiple times as we continuously poll the Docker API for logs. As we indicated in the previous commmit,
the docker client malfunctions when we try to get the logs with follow=True. Therefore we make multiple calls to the API (every 2 seconds), to fetch the new logs.

* service_logs called multiple times. In this test the tasks increase (6 instead of 5) as we check if the service has terminated (+1). As this assertion makes less sense in a situation where we do multiple calls to the Docker API (polling), we might think of removing it or replacing it with something more suitable.

* Final commit of this PR marking the test case that validates logging in the Docker Swarm Operator. We log two times a different message and we assert that the two lines are given back in the logs in the expected sequence.

* Formatting ruff

* Reverting as Github actions don't run this test as a swarm node:

docker.errors.APIError: 503 Server Error for http+docker://localhost/v1.43/services/create: Service Unavailable ("This node is not a swarm manager. Use "docker swarm init" or "docker swarm join" to connect this node to swarm and try again.")

Revert "Final commit of this PR marking the test case that validates logging in the Docker Swarm Operator. We log two times a different message and we assert that the two lines are given back in the logs in the expected sequence."

This reverts commit 048ba1e.

* Logging "since" timestamp to avoid memory issues
Fix for #28452

* Formatting - Fix for #28452

* Fix bugs (#28452): Correctly assign last_line_logged, last_timestamp
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants