Skip to content

Comments

Fix Edge Worker handles orphaned PID files#43153

Merged
jscheffl merged 10 commits intoapache:mainfrom
boschglobal:bugfix/edge-worker-pid-file
Oct 22, 2024
Merged

Fix Edge Worker handles orphaned PID files#43153
jscheffl merged 10 commits intoapache:mainfrom
boschglobal:bugfix/edge-worker-pid-file

Conversation

@OliverWannenwetsch
Copy link
Contributor

@OliverWannenwetsch OliverWannenwetsch commented Oct 18, 2024

Problem description

The Edge Worker run into the following Exception if the worker was shutdown in a undefined way (No clear shutdown).

Traceback (most recent call last):
  File "/home/airflow/.local/bin/airflow", line 8, in <module>
    sys.exit(main())
  File "/home/airflow/.local/lib/python3.8/site-packages/airflow/__main__.py", line 62, in main
    args.func(args)
  File "/home/airflow/.local/lib/python3.8/site-packages/airflow/utils/cli.py", line 115, in wrapper
    return f(*args, **kwargs)
  File "/home/airflow/.local/lib/python3.8/site-packages/airflow/utils/providers_configuration_loader.py", line 55, in wrapped_function
    return func(*args, **kwargs)

The root cause is that the current implementation does not consider, if PID file are already existing.
Also it also does not consider if PID files are created by parallel running Edge Worker instances

Bugfix description

This PR contains code to handle existing PID files for Edge workers:

  • Orphaned PID files created by crashed instances
  • PID files owned by other running Edge worker instances that have configured with the same PID file path (each instance must have its own PID file path!)

Changes in details

  • Added function _write_pid_to_pidfile to Edge Worker CLI that handles PID files in their different life cycles
  • Added pytests

@boring-cyborg
Copy link

boring-cyborg bot commented Oct 18, 2024

Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contributors' Guide (https://github.com/apache/airflow/blob/main/contributing-docs/README.rst)
Here are some useful points:

  • Pay attention to the quality of your code (ruff, mypy and type annotations). Our pre-commits will help you with that.
  • In case of a new feature add useful documentation (in docstrings or in docs/ directory). Adding a new operator? Check this short guide Consider adding an example DAG that shows how users should use it.
  • Consider using Breeze environment for testing locally, it's a heavy docker but it ships with a working Airflow and a lot of integrations.
  • Be patient and persistent. It might take some time to get a review or get the final approval from Committers.
  • Please follow ASF Code of Conduct for all communication including (but not limited to) comments on Pull Requests, Mailing list and Slack.
  • Be sure to read the Airflow Coding style.
  • Always keep your Pull Requests rebased, otherwise your build might fail due to changes not related to your commits.
    Apache Airflow is a community-driven project and together we are making it better 🚀.
    In case of doubts contact the developers at:
    Mailing List: dev@airflow.apache.org
    Slack: https://s.apache.org/airflow-slack

@OliverWannenwetsch OliverWannenwetsch changed the title Bugfix/edge worker pid file WIP - Bugfix/edge worker pid file Oct 18, 2024
@OliverWannenwetsch OliverWannenwetsch changed the title WIP - Bugfix/edge worker pid file Fix Edge Worker handles orphaned PID files Oct 21, 2024
@OliverWannenwetsch OliverWannenwetsch marked this pull request as ready for review October 21, 2024 13:13
@jscheffl jscheffl added type:bug-fix Changelog: Bug Fixes AIP-69 Edge Executor provider:edge Edge Executor / Worker (AIP-69) / edge3 labels Oct 21, 2024
Copy link
Contributor

@jscheffl jscheffl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution. That is really a problem I have seen also in my dev if the worker crashed for example if API backend is gone... then the PID file blocked a restart.

Some small requests to improve, otherwise... nice improvement!

@jscheffl
Copy link
Contributor

jscheffl commented Oct 21, 2024

+1: I see a couple of static checks failing, please run either pre-commit over your changes or breeze static-checks

Note: I'd ignore most of the other errors in the pipeline, seems to be CI general in-stabiltiy)

@jscheffl
Copy link
Contributor

+1... as we are on main but not releasing the package atm because it is not ready, can you please:

  • increment the third digit of the version number in providers/src/airflow/providers/edge/provider.yaml
  • Add some notes in changelog in providers/src/airflow/providers/edge/CHANGELOG.rst ?

THANKS!

@OliverWannenwetsch
Copy link
Contributor Author

+1: I see a couple of static checks failing, please run either pre-commit over your changes or breeze static-checks

Note: I'd ignore most of the other errors in the pipeline, seems to be CI general in-stabiltiy)

Fixed in commit 1d0cb9c

@OliverWannenwetsch
Copy link
Contributor Author

+1... as we are on main but not releasing the package atm because it is not ready, can you please:

  • increment the third digit of the version number in providers/src/airflow/providers/edge/provider.yaml
  • Add some notes in changelog in providers/src/airflow/providers/edge/CHANGELOG.rst ?

THANKS!

Fixed in commit 33fb875

@jscheffl jscheffl merged commit c5776c5 into apache:main Oct 22, 2024
@boring-cyborg
Copy link

boring-cyborg bot commented Oct 22, 2024

Awesome work, congrats on your first merged pull request! You are invited to check our Issue Tracker for additional contributions.

@OliverWannenwetsch OliverWannenwetsch deleted the bugfix/edge-worker-pid-file branch October 22, 2024 11:49
jscheffl added a commit that referenced this pull request Oct 22, 2024
* Fix Pytest caplog after PR 43153
harjeevanmaan pushed a commit to harjeevanmaan/airflow that referenced this pull request Oct 23, 2024
* initial test skeleton

* Added safe method for PID file creation

* Finished pytests and addressed reviews comments

* Fixed merging errors

* Update providers/src/airflow/providers/edge/cli/edge_command.py

Co-authored-by: Jens Scheffler <95105677+jscheffl@users.noreply.github.com>

* Update providers/src/airflow/providers/edge/cli/edge_command.py

Co-authored-by: Jens Scheffler <95105677+jscheffl@users.noreply.github.com>

* Increased version number of Edge Worker provider

* Fixed pytest and fixed merge problems

* Fixed findings from breeze static-check

---------

Co-authored-by: Jens Scheffler <95105677+jscheffl@users.noreply.github.com>
harjeevanmaan pushed a commit to harjeevanmaan/airflow that referenced this pull request Oct 23, 2024
* Fix Pytest caplog after PR 43153
PaulKobow7536 pushed a commit to PaulKobow7536/airflow that referenced this pull request Oct 24, 2024
* initial test skeleton

* Added safe method for PID file creation

* Finished pytests and addressed reviews comments

* Fixed merging errors

* Update providers/src/airflow/providers/edge/cli/edge_command.py

Co-authored-by: Jens Scheffler <95105677+jscheffl@users.noreply.github.com>

* Update providers/src/airflow/providers/edge/cli/edge_command.py

Co-authored-by: Jens Scheffler <95105677+jscheffl@users.noreply.github.com>

* Increased version number of Edge Worker provider

* Fixed pytest and fixed merge problems

* Fixed findings from breeze static-check

---------

Co-authored-by: Jens Scheffler <95105677+jscheffl@users.noreply.github.com>
PaulKobow7536 pushed a commit to PaulKobow7536/airflow that referenced this pull request Oct 24, 2024
* Fix Pytest caplog after PR 43153
ellisms pushed a commit to ellisms/airflow that referenced this pull request Nov 13, 2024
* initial test skeleton

* Added safe method for PID file creation

* Finished pytests and addressed reviews comments

* Fixed merging errors

* Update providers/src/airflow/providers/edge/cli/edge_command.py

Co-authored-by: Jens Scheffler <95105677+jscheffl@users.noreply.github.com>

* Update providers/src/airflow/providers/edge/cli/edge_command.py

Co-authored-by: Jens Scheffler <95105677+jscheffl@users.noreply.github.com>

* Increased version number of Edge Worker provider

* Fixed pytest and fixed merge problems

* Fixed findings from breeze static-check

---------

Co-authored-by: Jens Scheffler <95105677+jscheffl@users.noreply.github.com>
ellisms pushed a commit to ellisms/airflow that referenced this pull request Nov 13, 2024
* Fix Pytest caplog after PR 43153
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

AIP-69 Edge Executor area:providers provider:edge Edge Executor / Worker (AIP-69) / edge3 type:bug-fix Changelog: Bug Fixes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants