New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FLINK-1396][FLINK-1303] Hadoop Input/Output directly in API #363
Conversation
|
||
Add the following dependency to your `pom.xml` to use the Hadoop Compatibility Layer. | ||
Support for Hadoop Mappers and Reducers is contained in the `flink-addons` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
flink-staging
is the new flink-addons
;-)
Looks good. We could even integrate HadoopIFs even more, if we overload the "regular" Flink input functions |
Looks good. I have one suggestion concerning the Hadoop dependencies: The We should be able to exclude all transitive dependencies from the Hadoop dependency in Whenever we really execute Hadoop code, we have the flink runtime involved, which then has the necessary dependencies. To exclude all transitive dependencies, use
|
I addressed the comments. What do the others think about overloading readFile()? I made it like this on purpose. So that the user sees in the API that they are using Hadoop input formats or that they can be used. |
Hmm, yes. That's also a valid point. I am more leaning towards overloading, but would be fine with having separate functions as well. |
I vote for keeping @aljoscha's original approach. |
@StephanEwen If I add the exclusions then users that just add flink-java as a dependency will get weird errors when using Hadoop InputFormats. |
Does this occur during local execution, or collection execution? The dependencies are not covered by the runtime dependencies? |
I think if executing it in an IDE the dependencies are not there. Since flink-java does not depend on flink-runtime, which has the hadoop dependencies. |
d6bf958
to
0f632e3
Compare
Looks good. We are getting into very long package names here ;-) |
This adds methods on ExecutionEnvironment for reading with Hadoop Input/OutputFormat. This also adds support in the Scala API for Hadoop Input/OutputFormats.
0f632e3
to
8b3805b
Compare
This adds methods on ExecutionEnvironment for reading with Hadoop
Input/OutputFormat.
This also adds support in the Scala API for Hadoop Input/OutputFormats.
I also added tests and updated the documentation.