Skip to content

[AIRFLOW-751 & AIRFLOW-756] Refactor SshOperator and add SftpOperator#1992

Closed
genomics-geek wants to merge 1 commit intoapache:masterfrom
genomics-geek:feature-sftp
Closed

[AIRFLOW-751 & AIRFLOW-756] Refactor SshOperator and add SftpOperator#1992
genomics-geek wants to merge 1 commit intoapache:masterfrom
genomics-geek:feature-sftp

Conversation

@genomics-geek
Copy link
Contributor

@genomics-geek genomics-geek commented Jan 13, 2017

This uses the paramiko library to add the following functionality:

  • execute commands on a remote host (SshOperator)
  • copying files to a remote host (SftpOperator)
  • copying files from a remote host (SftpOperator)

Dear Airflow Maintainers,

Please accept this PR that addresses the following issues:

@codecov-io
Copy link

codecov-io commented Jan 13, 2017

Current coverage is 67.23% (diff: 100%)

Merging #1992 into master will not change coverage

@@             master      #1992   diff @@
==========================================
  Files           135        135          
  Lines         10366      10366          
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
  Hits           6970       6970          
  Misses         3396       3396          
  Partials          0          0          

Powered by Codecov. Last update 648bd4e...49e5bdd

@bolkedebruin
Copy link
Contributor

Nice! Please provide unit tests / integration tests. Travis has a full functioning ssh server.

@genomics-geek
Copy link
Contributor Author

@jhsenjaliya - I would assume that the SftpHook actually should be named SshHook because it returns SSHClient. So I am up for merging these. What do you think?

@jhsenjaliya
Copy link

yes, you are right, let me share my code regarding the same, and we can collaborate more on how to accommodate both into ssh operator, Thanks !

@genomics-geek
Copy link
Contributor Author

genomics-geek commented Jan 14, 2017

@jhsenjaliya I pushed some changes here. Feel free to add to it. I don't love the way I am handling put/get at the moment, so let me know if you have more thoughts about it as well!

Also, we need to update unit tests / integration tests. Any ideas?

@genomics-geek genomics-geek changed the title [AIRFLOW-751] Added SFTP file transfer functionality [AIRFLOW-756 & AIRFLOW-751] This combines the refactoring of SSH with SFTP functionality. Jan 14, 2017
@genomics-geek genomics-geek changed the title [AIRFLOW-756 & AIRFLOW-751] This combines the refactoring of SSH with SFTP functionality. [AIRFLOW-751 & AIRFLOW-756] This combines the refactoring of SSH with SFTP functionality. Jan 14, 2017
@sdikby
Copy link

sdikby commented Jan 14, 2017

@genomics-geek @jhsenjaliya Is this new implementation of a merged SSHOperator+SftpOperator will be able to open an SSH Tunnel throughout the whole DAG execution without the need to call SSHOperator each time an SSH connection is needed? If not maybe this is also an idea worth to be implemented.
I have tried but without success.

Another thing is this new operator will filter out some directories/files when copying from the server? using some regex expressions on the directory/file names or creation date,...

Overall i think its a good idea to merge the both +1

@genomics-geek genomics-geek changed the title [AIRFLOW-751 & AIRFLOW-756] This combines the refactoring of SSH with SFTP functionality. [AIRFLOW-751 & AIRFLOW-756] Refactor SshOperator and add SftpOperator Jan 14, 2017
@genomics-geek
Copy link
Contributor Author

@jhsenjaliya had the ability to open SSH tunnels before. I believe @jhsenjaliya will add it into this refactoring :). we both were moving to use the paramiko library.

This uses the paramiko library to add the following functtionality:
- execute commands on a remote host
- copying files to a remote host
- copying files from a remote host

Addresses:
https://issues.apache.org/jira/browse/AIRFLOW-751
https://issues.apache.org/jira/browse/AIRFLOW-756
@jhsenjaliya
Copy link

@genomics-geek , I actually have new set of ssh_hook along with ssh_operator, I will look into this to see how it can be merged and I will add commit here.

@sdikby,

  1. I will make sure we continue support the tunnel feature with new ssh_hooks or ssh_operator.
  2. I think its good feature to have regex for filtering while transferring files. I will look into it, Thanks for suggestion.

@genomics-geek
Copy link
Contributor Author

@jhsenjaliya, great! I took a first attempt at it. It's really basic at the moment, but accomplishes get/put and execute command on remote server. I've used it successfully. Feel free to adjust to meet your needs while merging

@jhsenjaliya
Copy link

@genomics-geek, looks like pretty much all these files will be changing due to underlying new ssh_hook and ssh_operator along with some design changes, i will create another PR to be clean. Please help review it : #1999 , i have also added/updated unit tests for all. Thanks!

@jhsenjaliya
Copy link

@sdikby, Please take a look at #1999, I have kept the functionality of ssh tunneling, please check and review if you get chance.
looks like it will be hard to add support for regex via sftp since it does not support it, I am digging into it though.

@genomics-geek
Copy link
Contributor Author

@jhsenjaliya looks awesome. Thanks for this!

@genomics-geek
Copy link
Contributor Author

Closing since #1999 is addressing this. Thanks again @jhsenjaliya !

@genomics-geek genomics-geek deleted the feature-sftp branch January 23, 2017 14:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants