New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CLI to transfer between cloud storage and HDFS #1896

Closed
heuermh opened this Issue Feb 1, 2018 · 2 comments

Comments

Projects
None yet
2 participants
@heuermh
Member

heuermh commented Feb 1, 2018

Was thinking that it might be useful to have a command line class in ADAM (or a transfer-only code path in existing commands) based on or adapted from Conductor that transfers between cloud storage and HDFS. Conductor is Apache 2 licensed but probably isn't written for use as a library. We could take advantage of our encrypted-HDFS-sensitive file copy mechanism for merging (e.g. -disable_fast_concat) and fix the upload issue for ADAM Parquet+Avro+metadata directories.

@fnothaft

This comment has been minimized.

Member

fnothaft commented Feb 1, 2018

I'm not against this, but its hard to do this in a platform agnostic way, and it often requires you to depend on cloud vendor SDKs (which have lots of messy version compat issues and are often patched together in vendor distros). Also, the Hadoop FileSystem class allows many unimplemented methods, which makes things yet again worse (there's unfortunate backstory as to why I am mentioning this). My preference would be to have some sort of a pluggable way to plug in cloud-capable concatters, but that's also a bit of a mess.

@heuermh

This comment has been minimized.

Member

heuermh commented Feb 15, 2018

Closing as WontFix

@heuermh heuermh closed this Feb 15, 2018

@heuermh heuermh added this to the 0.24.0 milestone Mar 1, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment