-
Notifications
You must be signed in to change notification settings - Fork 16.5k
Description
Description
The PythonVirtualenvOperator docs state that passing/serializing ti / task_instance is not supported. However, I’m seeing a confusing behavior difference depending on render_template_as_native_obj:
-
With
render_template_as_native_obj=False, templating{{ ti }} / {{ task_instance }}inop_kwargsappears to work (the task runs and prints values). -
With
render_template_as_native_obj=True, the task fails while serializing arguments for the subprocess with aPicklingError(non-obvious error mentioning loggers/structlog).
Minimal reproduction
from pendulum import datetime
from airflow.sdk import dag
from airflow.providers.standard.operators.python import PythonVirtualenvOperator
def venv_callable(ti, task_instance):
print("ti =", ti)
print("task_instance =", task_instance)
@dag(
start_date=datetime(2026, 1, 1),
schedule=None,
catchup=False,
render_template_as_native_obj=False, # <-- changing only this flips behavior
)
def test_simple():
PythonVirtualenvOperator(
task_id="repro",
python_callable=venv_callable,
python_version="3.10",
serializer="cloudpickle",
op_kwargs={
"ti": "{{ ti }}",
"task_instance": "{{ task_instance }}",
},
requirements=["apache-airflow==3.1.6"],
system_site_packages=False,
)
test_simple()
Steps to reproduce
- Run the DAG above with
render_template_as_native_obj=False→ task succeeds. - Change only
render_template_as_native_obj=True→ task fails during pickle/argument serialization.
Error / traceback (render_template_as_native_obj=True)
[2026-01-29 23:06:37] INFO - Use 'cloudpickle' as serializer.
[2026-01-29 23:06:37] ERROR - Task failed with exception
PicklingError: Only BytesLoggers to sys.stdout and sys.stderr can be pickled.
File "/home/airflow/.local/lib/python3.10/site-packages/airflow/sdk/execution_time/task_runner.py", line 1004 in run
...
File "/home/airflow/.local/lib/python3.10/site-packages/airflow/providers/standard/operators/python.py", line 529 in _write_args
File "/home/airflow/.local/lib/python3.10/site-packages/cloudpickle/cloudpickle.py", line 1537 in dumps
...
File "/home/airflow/.local/lib/python3.10/site-packages/structlog/_output.py", line 278 in __getstate__
Use case/motivation
We use PythonVirtualenvOperator to run code in an isolated environment with separate dependencies, but we often still need basic task-instance context inside the callable (identifiers, try number, sometimes access to basic TI metadata / XCom).
Native template rendering (render_template_as_native_obj=True) is valuable for other templates because it renders lists/dicts/booleans as native Python types, so we don’t want to avoid it globally. The current behavior makes it easy to accidentally pass a “live” non-serializable object and then hit a confusing pickle error.
I understand that calling the Airflow REST API directly from the virtualenv is a valid workaround, but it would be very helpful to have an official/supported mechanism for isolated environments:
- a small supported serializable handle/proxy (e.g.,
dag_id,run_id,task_id,try_number,map_index, etc.), - and a minimal supported helper/client (or a limited set of operations) to safely retrieve basic TaskInstance info from within the virtualenv without passing the full object.
Even if a full client is out of scope, UX would improve a lot with:
- explicit validation + a clearer error message when
TaskInstance/tiends up inop_args/op_kwargs, and/or - documentation that this case behaves differently (and fails earlier) with native rendering enabled.
Environment
- Apache Airflow: 3.1.6
- Provider: apache-airflow-providers-standard (PythonVirtualenvOperator)
- Runner Python: 3.10
- Virtualenv Python: 3.10
- serializer: cloudpickle
Related issues
Are you willing to submit a PR?
- Yes I am willing to submit a PR!
Code of Conduct
- I agree to follow this project's Code of Conduct