Specifying Compression Options for Files #73

Open
ekohlwey opened this Issue Apr 5, 2012 · 1 comment

2 participants

@ekohlwey

It would be nice to have a way to specify compression options for input and output files. This can currently be done via the backend properties but that is quite a bit of code to write. I'm thinking potentially adding new "gz-compressed-text" or "lzo-compressed-text" format that will set different options for the actual Hadoop command.

@piccolbo

I like the idea, but not as separate option to mapreduce. I think this belongs to the I/O format subsystem, see the streaming.format option to make.input.format for instance which is a streaming only option. Does compression apply to any format? Then I think it is warranted to have an additional option to make.input.format for compression (same on output side). I think for the local backend we would just say that this option has no effect, like the streaming.format option. If you want to send a patch for this, just follow the processing of streaming.format and it will tell you exactly what to do. The only subtlety is that streaming wants specific options vs generic hadoop options in a certain order that I can never remember. Patches should go to dev branch please -- had trouble doing this right with Jeffrey's patch unfortunately.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment