Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for SFTP as file location #1481

Merged
merged 22 commits into from Jan 3, 2023
Merged

Support for SFTP as file location #1481

merged 22 commits into from Jan 3, 2023

Conversation

rajaths010494
Copy link
Contributor

@rajaths010494 rajaths010494 commented Dec 22, 2022

related-to: #969

Please describe the feature you'd like to see
It is a very common ETL operation to either push data to or load data from FTP sources

Describe the solution you'd like
SFTP should be valid location types in File

Are there any alternatives to this feature?
Implementors otherwise need to use workarounds like S3toSFTPOperator

Acceptance Criteria

  • All checks and tests in the CI should pass
  • Unit tests (90% code coverage or more, once available)
  • Integration tests (if the feature relates to a new database or external service)
  • Example DAG
  • Docstrings in reStructuredText for each of methods, classes, functions and module-level attributes (including Example DAG on how it should be used)
  • Exception handling in case of errors
  • Logging (are we exposing useful information to the user? e.g. source and destination)
  • Improve the documentation (README, Sphinx, and any other relevant)
  • How to use Guide for the feature (example)

@codecov
Copy link

codecov bot commented Dec 22, 2022

Codecov Report

Base: 97.43% // Head: 93.84% // Decreases project coverage by -3.58% ⚠️

Coverage data is based on head (4038b6e) compared to base (784223c).
Patch coverage: 100.00% of modified lines in pull request are covered.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1481      +/-   ##
==========================================
- Coverage   97.43%   93.84%   -3.59%     
==========================================
  Files          21       85      +64     
  Lines         740     4175    +3435     
  Branches        0      416     +416     
==========================================
+ Hits          721     3918    +3197     
- Misses         19      176     +157     
- Partials        0       81      +81     
Impacted Files Coverage Δ
python-sdk/src/astro/constants.py 93.75% <100.00%> (ø)
python-sdk/src/astro/exceptions.py 100.00% <100.00%> (ø)
python-sdk/src/astro/files/base.py 100.00% <100.00%> (ø)
python-sdk/src/astro/files/locations/base.py 76.76% <100.00%> (ø)
python-sdk/src/astro/files/locations/sftp.py 100.00% <100.00%> (ø)
python-sdk/src/astro/sql/operators/load_file.py 96.90% <0.00%> (ø)
python-sdk/src/astro/sql/operators/transform.py 87.09% <0.00%> (ø)
python-sdk/src/astro/custom_backend/serializer.py 54.32% <0.00%> (ø)
python-sdk/src/astro/sql/__init__.py 95.83% <0.00%> (ø)
... and 58 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@rajaths010494 rajaths010494 changed the title SFTP support for location Support for FTP as file location Dec 22, 2022
@rajaths010494 rajaths010494 marked this pull request as ready for review December 22, 2022 07:38
@rajaths010494 rajaths010494 mentioned this pull request Dec 22, 2022
9 tasks
@rajaths010494 rajaths010494 changed the title Support for FTP as file location Support for SFTP as file location Dec 22, 2022
python-sdk/example_dags/example_load_file.py Outdated Show resolved Hide resolved
python-sdk/noxfile.py Show resolved Hide resolved
python-sdk/src/astro/files/locations/sftp/sftp.py Outdated Show resolved Hide resolved
@utkarsharma2
Copy link
Collaborator

I think we should also support export_file() operator.

Copy link
Contributor

@pankajastro pankajastro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rajaths010494 do you have plan to add integration tests also?

python-sdk/noxfile.py Show resolved Hide resolved
python-sdk/src/astro/files/locations/sftp/sftp.py Outdated Show resolved Hide resolved
.github/workflows/ci-python-sdk.yaml Outdated Show resolved Hide resolved
Copy link
Contributor

@sunank200 sunank200 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a lot of common code on this PR and FTP. Do you think we would have a base class for that instead?

.github/workflows/ci-python-sdk.yaml Outdated Show resolved Hide resolved
python-sdk/src/astro/files/locations/sftp/sftp.py Outdated Show resolved Hide resolved
@rajaths010494
Copy link
Contributor Author

I think we should also support export_file() operator.

Currently smart_open doesn't support write. SFTP hook supports store_file from local_file to remote_file. For this we need a local_file to be created.
Let me explore some other ways as well.

python-sdk/noxfile.py Show resolved Hide resolved
python-sdk/pyproject.toml Show resolved Hide resolved
python-sdk/tests/files/locations/test_sftp.py Outdated Show resolved Hide resolved
@rajaths010494 rajaths010494 force-pushed the sftp_location branch 2 times, most recently from 75280e3 to b919e00 Compare December 29, 2022 07:26
Copy link
Contributor

@pankajastro pankajastro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

python-sdk/example_dags/example_load_file.py Show resolved Hide resolved
Copy link
Contributor

@sunank200 sunank200 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added comments

python-sdk/src/astro/files/base.py Outdated Show resolved Hide resolved
python-sdk/src/astro/files/locations/sftp.py Show resolved Hide resolved
python-sdk/src/astro/files/locations/sftp.py Outdated Show resolved Hide resolved
python-sdk/src/astro/files/locations/sftp.py Outdated Show resolved Hide resolved
python-sdk/src/astro/files/locations/sftp.py Outdated Show resolved Hide resolved
python-sdk/tests/files/locations/test_sftp.py Outdated Show resolved Hide resolved
python-sdk/tests/files/locations/test_sftp.py Outdated Show resolved Hide resolved
python-sdk/tests/files/locations/test_sftp.py Show resolved Hide resolved
@rajaths010494
Copy link
Contributor Author

Also, the deepsource is failing. It needs a fix. @rajaths010494

fixed it

@rajaths010494
Copy link
Contributor Author

Can you test this PR also on https://pypi.org/project/apache-airflow-providers-sftp/4.2.1rc2/ and make sure all the test are passing?

Tested new RC release on my local. It is working fine.

Copy link
Contributor

@sunank200 sunank200 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@pankajastro
Copy link
Contributor

Awesome, thanks @rajaths010494 for addressing all the feedback.

@pankajastro pankajastro merged commit aaecb75 into main Jan 3, 2023
@pankajastro pankajastro deleted the sftp_location branch January 3, 2023 12:22
utkarsharma2 added a commit that referenced this pull request Jan 17, 2023
related-to: #969

**Please describe the feature you'd like to see**
It is a very common ETL operation to either push data to or load data
from FTP sources

**Describe the solution you'd like**
SFTP should be valid location types in `File`

**Are there any alternatives to this feature?**
Implementors otherwise need to use workarounds like `S3toSFTPOperator`

**Acceptance Criteria**

- [x] All checks and tests in the CI should pass
- [x] Unit tests (90% code coverage or more, [once
available](#191))
- [x] Integration tests (if the feature relates to a new database or
external service)
- [x] Example DAG
- [x] Docstrings in
[reStructuredText](https://peps.python.org/pep-0287/) for each of
methods, classes, functions and module-level attributes (including
Example DAG on how it should be used)
- [x] Exception handling in case of errors
- [x] Logging (are we exposing useful information to the user? e.g.
source and destination)
- [x] Improve the documentation (README, Sphinx, and any other relevant)
- [x] How to use Guide for the feature
([example](https://airflow.apache.org/docs/apache-airflow-providers-postgres/stable/operators/postgres_operator_howto_guide.html))

Co-authored-by: Utkarsh Sharma <utkarsharma2@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants