-
Notifications
You must be signed in to change notification settings - Fork 13.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AIRFLOW-8058] Add configurable execution context #8651
[AIRFLOW-8058] Add configurable execution context #8651
Conversation
I don't really understand it yet. Could you provide a practical example? Plus, please don't place code in |
So let's start with what exists today to give users more visibility and control into airflow's flow: |
continuing on this discussion: actually you have a really great usecase for this feature - #8432 so useful example for contextmanager can be something like this ( it can be part of airflow library, or just defined by the user in place accordingly to his requirements)
( based on #8432 implementation) |
I still don't understand why the additional execution context is configured as a global option. In my opinion, it is worth each task could have its own separate contexts. In other words, the additional context should be a task parameter, not an option in the global file. This will allow the tasks to have a different context and then use SSH tunnel or gcloud authorization. Such contexts will be able to be parameterized and e.g. one task will be able to use SSH tunnnel to connect to server A, and another task will be able to connect to server B If someone wants one global context, it will also be possible by defining the cluster policy. I propose that the context be defined at the task level. The example context with parameters will look like this. def ssh_tunnel(local_port, remote_port):
@contextmanager
def ssh_tunnel_context():
from sshtunnel import SSHTunnelForwarder
server = SSHTunnelForwarder(
'pahaz.urfuclub.ru',
ssh_username="pahaz",
ssh_password="secret",
remote_bind_address=('127.0.0.1', remote_port),
local_bind_address=('127.0.0.1', local_port)
)
try:
server.start()
yield
finally:
server.stop()
return ssh_tunnel_context The example task will look like this. task_a = MySQLExecuteQueryOperator(
task_id='execute_query',
execution_contexts=[
ssh_tunnel(3306, 3307),
]
) If you want to define a context that will be used by all operators, you can define the cluster policy as follows. def my_task_policy(task):
task.execution_contexts.append(my_global_task_context()) What do you think about my proposal? Will it meet your requirements? |
I would say it would be a nice option to be able to provide execution context per operator as well, however I assume that the idea is to be able to define a context that will affect all operators, instead of changing one by one. So sometimes a user will use context per operator (a new feature to implement), sometimes a user can just apply global context, so every operator will get GCP connection context regardless its type (or with some operator type check) |
@evgenyshulman I think task-level context + cluster policy can provide the same functionality as the global context manager. In that way more situations would be addressed. WDYT? |
@evgenyshulman My solution allows for two use cases - task context, global context) Your proposal allows you to complete only one use case - global context. Is there any reason to keep this context as a global option? Did I miss something? I am not sure which solution is the best and I would like to understand both solutions. |
return subprocess.call(args=cmd, stdout=dev_null, stderr=subprocess.STDOUT) | ||
|
||
@contextmanager | ||
def provide_gcp_context(task_instance, execution_context): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does execution_context already contain task_instance? It would be nice if the user could expect a common signature for many methods - pre_execute, on_execute, pre_execute, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@evgenyshulman My solution allows for two use cases - task context, global context) Your proposal allows you to complete only one use case - global context. Is there any reason to keep this context as a global option? Did I miss something? I am not sure which solution is the best and I would like to understand both solutions.
I am in favor of keeping the solution simple. The only concern is that current policy implementation requires airflow_local_settings that I don't see in use by most of the airflow cases. also in our use case, we need it on a global level so this is why we have done it this way. Do you see that you'll use it as a policy right after we implement it via policy?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pre_execute - good point, we will fix it in the next commit.
52213d1
to
adb7cc4
Compare
adb7cc4
to
de148b0
Compare
Depends on #9631 |
Support for getting current context at any codelocation that runs under the scope of BaseOperator.execute function. This functionality is part of AIP-31
de148b0
to
47b91e0
Compare
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
Let's keep it open for a moment |
Hello - Is this something we really need to release in 2.0.0rc1 ? If not - can someone set the right milestone please :)? |
Does anyone still work on it? |
I'm afraid not, would you like to pick it up? I think the case for this change was not that strong at that time. |
I'm afraid I'm going to close this as Wont Fix -- it feels very broad and fragile to use this global config setting for the example you've given (GCP project) This is also not necessary for the original target issue There is already a Sorry to leave this PR mostly silent for most of a year to then close it, but we can build a simpler solution. |
This PR is a part of [AIP-31], and closes #8058
The goal of this issue is to detach the execution context from the executing method's signature.
Before, the only way to retrieve execution context was via
**kwargs
in the function's signature.Now, it is possible to retrieve it using a simple external function call (only works when function is within execute context).
This develops AIP-31 to allow a more functional way of writing code, without being coupled to airflow's current implementation.
Collaborated with: @turbaszek @casassg @evgenyshulman
Make sure to mark the boxes below before creating PR: [x]
In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in UPDATING.md.
Read the Pull Request Guidelines for more information.