Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SFTPOperator throws object of type 'PlainXComArg' has no len() when using with Taskflow API #27328

Closed
2 tasks done
jtommi opened this issue Oct 27, 2022 · 14 comments · Fixed by #29068
Closed
2 tasks done
Assignees

Comments

@jtommi
Copy link

jtommi commented Oct 27, 2022

Apache Airflow Provider(s)

sftp

Versions of Apache Airflow Providers

apache-airflow-providers-sftp==4.1.0

Apache Airflow version

2.4.2 Python 3.10

Operating System

Debian 11 (Official docker image)

Deployment

Docker-Compose

Deployment details

Base image is apache/airflow:2.4.2-python3.10

What happened

When combining Taskflow API and SFTPOperator, it throws an exception that didn't happen with apache-airflow-providers-sftp 4.0.0

What you think should happen instead

The DAG should work as expected

How to reproduce

import pendulum

from airflow import DAG
from airflow.decorators import task
from airflow.providers.sftp.operators.sftp import SFTPOperator

with DAG(
    "example_sftp",
    schedule="@once",
    start_date=pendulum.datetime(2021, 1, 1, tz="UTC"),
    catchup=False,
    tags=["example"],
) as dag:

    @task
    def get_file_path():
        return "test.csv"

    local_filepath = get_file_path()

    upload = SFTPOperator(
        task_id=f"upload_file_to_sftp",
        ssh_conn_id="sftp_connection",
        local_filepath=local_filepath,
        remote_filepath="test.csv",
    )

Anything else

[2022-10-27T15:21:38.106+0000]` {logging_mixin.py:120} INFO - [2022-10-27T15:21:38.102+0000] {dagbag.py:342} ERROR - Failed to import: /opt/airflow/dags/test.py
Traceback (most recent call last):
  File "/home/airflow/.local/lib/python3.10/site-packages/airflow/models/dagbag.py", line 338, in parse
    loader.exec_module(new_module)
  File "<frozen importlib._bootstrap_external>", line 883, in exec_module
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
  File "/opt/airflow/dags/test.py", line 21, in <module>
    upload = SFTPOperator(
  File "/home/airflow/.local/lib/python3.10/site-packages/airflow/models/baseoperator.py", line 408, in apply_defaults
    result = func(self, **kwargs, default_args=default_args)
  File "/home/airflow/.local/lib/python3.10/site-packages/airflow/providers/sftp/operators/sftp.py", line 116, in __init__
    if len(self.local_filepath) != len(self.remote_filepath):
TypeError: object of type 'PlainXComArg' has no len()

It looks like the offending code was introduced in commit 5f073e3

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

@jtommi jtommi added area:providers kind:bug This is a clearly a bug labels Oct 27, 2022
@o-nikolas
Copy link
Contributor

Thanks for the bug report @jtommi!

I see that you've checked that you're willing to submit a PR, so I have assigned the task to you 😃

@uranusjr
Copy link
Member

Value checks in operators should be done in execute, not __init__.

@jtommi
Copy link
Author

jtommi commented Oct 28, 2022

@uranusjr Thanks for the initial direction to fixing this.

I drafted #27346.
I think the initial bug is fixed, but my test that checks for compatibility is failing.

def test_local_remote_file_paths_compatible_with_taskflow(self, mock_get):

The reason is expected str, bytes or os.PathLike object, not PlainXComArg on

local_folder = os.path.dirname(local_filepath)

The thing is that I don't understand why the code worked with version 2 of the provider package (I actually didn't check with version 4, but I didn't get import errors).
The only change from previous versions of the code is that before it did os.path.dirname(self.local_filepath) and now os.path.dirname(local_filepath), but I couldn't find any code that converted a PlainXComArg into a string.

@uranusjr
Copy link
Member

All the checks should be moved to execute anyway, just do that and don’t worry that much about the previous logic.

@jtommi
Copy link
Author

jtommi commented Oct 31, 2022

The thing is: either I broke (or didn't find) the code that made Taskflow work in version 4.0 of this package, or my test is incorrect.

@potiuk
Copy link
Member

potiuk commented Dec 4, 2022

This should also be fixed regardless of the SFTP provider version by #27251

@potiuk potiuk closed this as completed Dec 4, 2022
@jtommi
Copy link
Author

jtommi commented Dec 5, 2022

This should also be fixed regardless of the SFTP provider version by #27251

Great, I'll subscribe to #27251 and verify that it fixes the issue.
Thanks @potiuk for letting me know

@potiuk
Copy link
Member

potiuk commented Dec 5, 2022

Yeah. It's been released in 2.5.0 so just upgrading to 2.5.0 should be enough to test it.

@jtommi
Copy link
Author

jtommi commented Jan 16, 2023

@potiuk I finally upgraded to 2.5.0 (-python3.10) and the issue persists.
The version of apache-airflow-providers-sftp in that image is 4.2.0
I also tried apache-airflow-providers-sftp 4.2.1, but still same issue.

Pinning apache-airflow-providers-sftp to 4.0.0 still fixes the issue

@potiuk potiuk reopened this Jan 16, 2023
@potiuk
Copy link
Member

potiuk commented Jan 16, 2023

Yep. It looks like this change #26666 broke it (the number indicates it's a bit hell-ish change). the checks are in constructor but should be in the execute() method.

@potiuk
Copy link
Member

potiuk commented Jan 16, 2023

Woul you like to continue fixing it @jtommi ?

And @uranusjr - we both approved the change. I am starting to think that we should attempt to do do some automated prevention of similar issues - this is all too easy (and seemingly obvious) to perform such valiations and conversions in the constructor rather than in execute methods. And apparently it is easy to get past the aproval of both of us, so possibly that's a sign we should summon our CI to prevent such things. Though I am not sure yet how to do it.

@uranusjr
Copy link
Member

Personally I think the eventual goal would be to ban any logic in __init__ for custom operators since all logic should be in execute anyway, and attribute assignment can be trivially covered by @dataclasses.dataclass or @attr.define. Would that be a viable goal? Maybe we could invest in changing most operators to use one of those two, and write a pre-commit hook to ban implementing __init__ in custom operators (with limited exceptions of course).

@potiuk
Copy link
Member

potiuk commented Jan 18, 2023

Personally I think the eventual goal would be to ban any logic in __init__ for custom operators since all logic should be in execute anyway, and attribute assignment can be trivially covered by @dataclasses.dataclass or @attr.define. Would that be a viable goal? Maybe we could invest in changing most operators to use one of those two, and write a pre-commit hook to ban implementing __init__ in custom operators (with limited exceptions of course).

I like this. Though it will take quite some tmie I am afraid - unless we can do most of it semi-automatically (which I think might be quite possible).

potiuk added a commit to potiuk/airflow that referenced this issue Jan 20, 2023
The SFTP operator had logic in `__init__` that was running checks
on passed arguments, but that means that the operator woudl fail
when using TaskFlow.

Fixes: apache#27328
@potiuk
Copy link
Member

potiuk commented Jan 20, 2023

I like this. Though it will take quite some tmie I am afraid - unless we can do most of it semi-automatically (which I think might be quite possible).

Created #29069

potiuk added a commit that referenced this issue Jan 20, 2023
The SFTP operator had logic in `__init__` that was running checks
on passed arguments, but that means that the operator woudl fail
when using TaskFlow.

Fixes: #27328
maggesssss pushed a commit to maggesssss/airflow that referenced this issue Jan 21, 2023
The SFTP operator had logic in `__init__` that was running checks
on passed arguments, but that means that the operator woudl fail
when using TaskFlow.

Fixes: apache#27328
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants