Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add SubprocessHook for running commands from operators #13423

Merged
merged 20 commits into from Feb 12, 2021

Conversation

dstandish
Copy link
Contributor

@dstandish dstandish commented Jan 1, 2021

The bash operator has dialed in how to run bash commands from python and airflow.

There are other operators that sometimes need to do this. But for various reasons, they might not want to inherit from BashOperator. And in this case, it would be nice if there were a bash hook that would just let you run a bash command in your operator.

We can also imagine circumstances where perhaps within a single "BashOperator", maybe you want to run a series of bash commands. Well, if you have a hook that encapsulates this functionality, then this is very clean to implement.

@github-actions
Copy link

github-actions bot commented Jan 1, 2021

The Workflow run is cancelling this PR. It has some failed jobs matching ^Pylint$,^Static checks,^Build docs$,^Spell check docs$,^Backport packages$,^Provider packages,^Checks: Helm tests$,^Test OpenAPI*.

@github-actions
Copy link

github-actions bot commented Jan 1, 2021

The Workflow run is cancelling this PR. Building image for the PR has been cancelled

Copy link
Member

@turbaszek turbaszek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. For few minutes I was considering making this hook stateless and keeping the state in operators but this adds unnecessary complexity.

@github-actions
Copy link

github-actions bot commented Jan 2, 2021

The PR most likely needs to run full matrix of tests because it modifies parts of the core of Airflow. However, committers might decide to merge it quickly and take the risk. If they don't merge it quickly - please rebase it to the latest master at your convenience, or amend the last commit of the PR, and push it with --force-with-lease.

@github-actions github-actions bot added the full tests needed We need to run full set of tests for this PR to merge label Jan 2, 2021
@dstandish
Copy link
Contributor Author

dstandish commented Jan 2, 2021

For few minutes I was considering making this hook stateless and keeping the state in operators but this adds unnecessary complexity.

Yes @turbaszek, this is one of the things I was thinking about also.

I could not think of a clean way, apart from having BashHook return a "runner" object that has methods run and send_sigterm.... but in essense this would just be another stateful hook, so might as well just leave it on BashHook, I figure.

I also considered making command env and output_encoding hook init params. But then you would not be able to use cached property in BashOperator, and this would add complexity to BashOperator as a result. And we should be able to reuse the hook multiple times in different ways.

Another thing I considered was, does it make sense to ensure that sub_process is unset at successful completion? But I think there's no benefit to aggressive unsetting. If run_command is used again, then sub_process will be overwritten with new process. We could add a check, to see if sub_process is still running (if is already set when Popen is about to be called), or at least a warning. However, these I think should probably be considered separately from this refactor.

In sum, I too think current approach is good enough and now will work on updating docs and possibly tests as necessary.

Thanks

@dstandish
Copy link
Contributor Author

dstandish commented Jan 2, 2021

one material change

in latest commit i made I made it so BashHook's run_command will use os.environ if env not supplied (and will not use os.environ otherwise)

added / updated tests

i left all but one BashOperator test untouched. only this one -- checking the args applied to Popen -- i moved to BashHook.

where appropriate, i translated the existing tests for BashHook and copied them there.

i will remove WIP designation

@dstandish dstandish changed the title WIP - add bash hook Add bash hook Jan 2, 2021
@dstandish dstandish force-pushed the add-bash-hook branch 2 times, most recently from 2a0b23d to 606b60c Compare January 2, 2021 21:58
@github-actions
Copy link

github-actions bot commented Jan 2, 2021

The Workflow run is cancelling this PR. It has some failed jobs matching ^Pylint$,^Static checks,^Build docs$,^Spell check docs$,^Backport packages$,^Provider packages,^Checks: Helm tests$,^Test OpenAPI*.

@github-actions
Copy link

github-actions bot commented Jan 2, 2021

The Workflow run is cancelling this PR. Building image for the PR has been cancelled

@turbaszek
Copy link
Member

We could add a check, to see if sub_process is still running (if is already set when Popen is about to be called), or at least a warning. However, these I think should probably be considered separately from this refactor.

That was also my concern, but I figured that the chance that we will run two processes in parallel in one operator (using one instance of a hook) is low.

tests/hooks/test_bash.py Outdated Show resolved Hide resolved
tests/hooks/test_bash.py Outdated Show resolved Hide resolved
airflow/hooks/bash.py Outdated Show resolved Hide resolved
airflow/hooks/bash.py Outdated Show resolved Hide resolved
airflow/hooks/bash.py Outdated Show resolved Hide resolved
airflow/hooks/bash.py Outdated Show resolved Hide resolved
airflow/operators/bash.py Outdated Show resolved Hide resolved
docs/spelling_wordlist.txt Outdated Show resolved Hide resolved
@dstandish
Copy link
Contributor Author

Alright folks... updated to make it shell-agnostic, and to take List[str]

I have called it SubprocessHook

This is obvs negotiable

Options:

  • CommandHook (what kind of command?)
  • ProcessHook (what kind of process? isn't everything a process)
  • ShellHook (i kindof like it)
  • SubprocessHook (it is a subprocess, it uses subprocess module... why get creative?)

Please take a look 🙏

@dstandish dstandish requested a review from ashb January 7, 2021 06:38
@ashb
Copy link
Member

ashb commented Jan 7, 2021

SubprocessHook is good.

(ShellHook is not because that has a different meaning -- see https://docs.python.org/3/library/subprocess.html#security-considerations)

@dstandish
Copy link
Contributor Author

dstandish commented Feb 9, 2021

thanks and updated 🤞

@dstandish dstandish force-pushed the add-bash-hook branch 2 times, most recently from 96e3690 to d4336ba Compare February 10, 2021 04:15
@dstandish
Copy link
Contributor Author

ok at long last @ashb i think this should be good

there seems to be just one test failure in one run which, while i don't see how it could be flakey, does appear to be flakey. it didn't happen in other configurations and doesn't repro locally.

perhaps someone could rerun that test? i am hesitant to force push again given how fussy builds have been lately

@ashb
Copy link
Member

ashb commented Feb 12, 2021

Yeah, that's a timing based flakey test. Merging.

@ashb ashb merged commit 59e33c1 into apache:master Feb 12, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
full tests needed We need to run full set of tests for this PR to merge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants