Skip to content

Conversation

@rmetzger
Copy link
Contributor

This is needed because the Hadoop IF/OF's are using Hadoop's FileSystem stack, which is using the security credentials passed in the JobConf / Job class in the getSplits() method.

Note that access to secured Hadoop 1.x using Hadoop IF/OF's is not possible with this change. This limitation is due to missing methods in the old APIs.

I've also updated the version of the "de.javakaffee.kryo-serializers" from 0.27 to 0.36 because a user on the ML recently needed a specific Kryo serializer which was not available in the old dependency.

For the Java and Scala API, I renamed the first argument's name: readHadoopFile(org.apache.hadoop.mapreduce.lib.input.FileInputFormat<K,V> mapreduceInputFormat, Class<K> key, Class<V> value, String inputPath, Job job)

This makes it easier in IDE completions to distinguish between the mapreduce and the mapred variant. (before the argument was always called mapredInputFormat now, we have the mapreduceInputFormat variant where applicable)

@StephanEwen
Copy link
Contributor

Looks good. The HadoopFormatBase and similar classes could use a line or two more in comments, but otherwise, this seems well.

Any way to test this? There does not seem to be any test for the format wrappers, yet...

@rmetzger
Copy link
Contributor Author

I actually think that there is no need for the HadoopInputFormatBases to exist.
There are two implementations and two bases for mapred and mapreduce, but they have nothing in common.

There are some tests for the non secure case in org.apache.flink.test.hadoop.

@rmetzger
Copy link
Contributor Author

There might actually be a way of testing against a secured cluster: https://issues.apache.org/jira/browse/HADOOP-9848 / https://github.com/apache/hadoop/blob/master/hadoop-common-project/hadoop-minikdc/src/main/java/org/apache/hadoop/minikdc/MiniKdc.java
This seems to be available since Hadoop 2.3.0

@aljoscha
Copy link
Contributor

The Bases exist because there is a java-specific and a scals-specific version of each HadoopInputFormat.

@rmetzger
Copy link
Contributor Author

Okay, that makes sense.
I'll add some comments to the classes.

…utput format wrappers

This is needed because the Hadoop IF/OF's are using Hadoop's FileSystem stack, which is using
the security credentials passed in the JobConf / Job class in the getSplits() method.

Note that access to secured Hadoop 1.x using Hadoop IF/OF's is not possible with this change.
This limitation is due to missing methods in the old APIs.
@rmetzger
Copy link
Contributor Author

@mxm: I removed the comment.

@mxm
Copy link
Contributor

mxm commented Aug 25, 2015

It would be great if we implemented a test case against the MiniKDC server.

@rmetzger
Copy link
Contributor Author

I agree. Lets file a JIRA and do it separately, as this is probably a bigger task.

@mxm
Copy link
Contributor

mxm commented Aug 25, 2015

I've opened another issue for that: https://issues.apache.org/jira/browse/FLINK-2573

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whitespace

@uce
Copy link
Contributor

uce commented Aug 26, 2015

I'll address my trivial comment and merge this. Thanks!

@rmetzger
Copy link
Contributor Author

Thanks alot!

asfgit pushed a commit that referenced this pull request Aug 26, 2015
…utput format wrappers

This is needed because the Hadoop IF/OF's are using Hadoop's FileSystem stack, which is using
the security credentials passed in the JobConf / Job class in the getSplits() method.

Note that access to secured Hadoop 1.x using Hadoop IF/OF's is not possible with this change.
This limitation is due to missing methods in the old APIs.

- Add some comments & change dependency scope to test

This closes #1038.
@rmetzger
Copy link
Contributor Author

I'm manually closing this pull request. It has been merged by @uce.

@rmetzger rmetzger closed this Aug 27, 2015
Shiti pushed a commit to Shiti/flink that referenced this pull request Nov 5, 2015
…utput format wrappers

This is needed because the Hadoop IF/OF's are using Hadoop's FileSystem stack, which is using
the security credentials passed in the JobConf / Job class in the getSplits() method.

Note that access to secured Hadoop 1.x using Hadoop IF/OF's is not possible with this change.
This limitation is due to missing methods in the old APIs.

- Add some comments & change dependency scope to test

This closes apache#1038.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants