Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FLINK-4315] Deprecate Hadoop dependent methods in flink-java #2637

Closed
wants to merge 4 commits into from

Conversation

kenmy
Copy link

@kenmy kenmy commented Oct 14, 2016

No description provided.

…-scala are marked as depricated.

This change for moving this methods into the flink-hadoop-compatibility in the future.
Copy link
Contributor

@greghogan greghogan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In addition to deprecating the ExecutionEnvironment methods, should we also now provide the replacement utility structure? This will give users time to migrate to the future API.

@@ -192,6 +192,7 @@ public static ParameterTool fromSystemProperties() {
* @throws IOException If arguments cannot be parsed by {@link GenericOptionsParser}
* @see GenericOptionsParser
*/
@Deprecated
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this deprecated?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding the dependencies org.apache.hadoop in flink-java only for one method of this class looks not very good. The method fromGenericOptionsParser will be moved with another Hadoop dependent methods into flink-hadoop-compatibility. This method isn't used inside flink. If anybody uses this in his code, he should rewrite before migrating to next major version of flink.
You can look to the full version flink-java without Hadoop here https://github.com/kenmy/flink/tree/FLINK-4048

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@greghogan The GenericOptionsParser is a Hadoop utility, so it should not be directly referenced in flink-java

@fhueske
Copy link
Contributor

fhueske commented Oct 14, 2016

Hi @kenmy, thanks for your PR. Can you actually merge this PR with your work in PR #2576? We also want to add the alternatives to which users should switch. The docs of the deprecated methods should point to these alternatives.

Later (when hopefully everybody migrated their code) we would remove the deprecated methods.

Thanks, Fabian

Copy link
Contributor

@fhueske fhueske left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @kenmy, thanks for the update.
I made one suggestion to have static methods to build input formats rather than a new environment. What do you think?

Otherwise the PR looks good.

*
* The environment provides methods to interact with the hadoop cluster (data access).
*/
public final class FlinkHadoopEnvironment {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of adding a FlinkHadoopEnvironment, I would add a class HadoopInputs with static methods to create HadoopInputFormats.

This could be used like this:

import static org.apache.flink.hadoopcompatibility.HadoopInputs.createHadoopFileInput;

// ---

ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();

DataSet<Tuple2<LongWritable, Text>> input = env.createInput(
  createHadoopFileInput(new TextInputFormat(), LongWritable.class, Text.class, textPath));

We might be able to remove the flink-scala dependency if we go for this approach.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank @fhueske for review, . I agree with you, the name HadoopInputs is more suitable here. I renamed it. Moreover I extracted paramsFromGenericOptionsParser into the new class HadoopUtils. HadoopInputs was too unsuitable as place for it.

@@ -1211,7 +1210,6 @@ public String toString() {
public String myString;
public Object nothing;
public List<String> countries;
public Writable interfaceTest;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Writable represents in this test just an interface. Can you replace it by another interface to maintain the coverage of this test?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Coverage was decreased only on the 1 line. But I returned to the previous version.
I did not expect that the unused variable may affect the coverage.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The interface here checks for correct type extraction and handling of Flink. For that this variable does not need to be used but just declared.
However, this functionality is tested at other places as well.
So, +1 for removing this line.

* Use [[FlinkHadoopEnvironment#getHadoopEnvironment]] to get the correct environment.
*/
@Public
class FlinkHadoopEnvironment(parentEnv: ExecutionEnvironment) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you convert the Scala code into a HadoopInputs as well?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically the same changes as for the Java HadoopInputs.

*
* Use [[FlinkHadoopEnvironment#getHadoopEnvironment]] to get the correct environment.
*/
@Public
Copy link
Contributor

@fhueske fhueske Oct 31, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The connector modules do not have API annotations (´@public,@PublicEvolving), yet. Please remove all annotations for code inflink-hadoop-compatibility`.

* The HadoopInputs is the utility class for create {@link HadoopInputFormat}.
*
* Methods:
* createHadoopInput - create {@link org.apache.flink.api.java.hadoop.mapred.HadoopInputFormat} or {@link org.apache.flink.api.java.hadoop.mapred.HadoopInputFormat}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both classnames are the same.

import java.io.IOException;

/**
* The HadoopInputs is the utility class for create {@link HadoopInputFormat}.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would make the purpose of the class a bit more explicit.

HadoopInputs is a utility class to use Apache Hadoop InputFormats with Apache Flink.

It provides methods to create Flink InputFormat wrappers for Hadoop org.apache.hadoop.mapred.InputFormat and org.apache.hadoop.mapreduce.InputFormat.
Key value pairs produced by the Hadoop InputFormats are converted into Flink Tuple2 objects where the first field (Tuple2.f0) is the key and the second field (Tuple2.f1) is the value.

@@ -42,8 +42,8 @@ class WordCountMapredITCase extends JavaProgramTestBase {
protected def testProgram() {
val env = ExecutionEnvironment.getExecutionEnvironment

val input =
env.readHadoopFile(new TextInputFormat, classOf[LongWritable], classOf[Text], textPath)
val input = FlinkHadoopEnvironment.getHadoopEnvironment(env).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please include the deprecated method in the test as done with the Java tests.

@fhueske
Copy link
Contributor

fhueske commented Oct 31, 2016

Thanks for the update @kenmy. We are trying to keep the Java and Scala APIs as close as possible. Could you convert the Scala FlinkHadoopEnvironment into a HadoopInputs class as well?

I also noticed that there are quite a few Hadoop-related tests in the flink-tests module. I think it would be good to move the tests from the org.apache.flink.test.hadoop and org.apache.flink.api.scala.hadoop packages of flink-tests to flink-hadoop-compatibility.

In fact, there might be a bit of overlap with other tests in flink-hadoop-compatibility. It would be great if you could check for tests with overlapping test coverage. Then we could drop some of these tests.

Thanks for your work,
Fabian

@kenmy
Copy link
Author

kenmy commented Nov 2, 2016

Thanks @fhueske for a detailed review.
Done all except moving Hadoop-related tests into flink-hadoop-compatibility.
I'll do it sometime later. IMO this is the out of scope of issue "Deprecate Hadoop dependent methods in flink-java" as well as the moving it from flink-scala. May be moving the activity connected with hadoop-tests from this "god issue" to another issue will be better? Anyway I publish current state and I wait any advices how I may make my PR better.
BR, Evgeny

@fhueske
Copy link
Contributor

fhueske commented Nov 2, 2016

Thanks for the update @kenmy!
+1 to merge.

Regarding moving the Hadoop tests from flink-tests to flink-hadoop-compatibility I agree. Let's do this as a separate issue. Do you want to create a JIRA issue for that?

Thanks, Fabian

fhueske pushed a commit to fhueske/flink that referenced this pull request Nov 2, 2016
… in ExecutionEnvironment as @deprecated.

- Preparation to remove Hadoop dependency from flink-java
- Alternatives for deprecated functionality is provided in flink-hadoop-compatibility via HadoopInputs

This closes apache#2637
@fhueske
Copy link
Contributor

fhueske commented Nov 2, 2016

Merging

@asfgit asfgit closed this in 7d61e1f Nov 2, 2016
liuyuzhong pushed a commit to liuyuzhong/flink that referenced this pull request Dec 5, 2016
… in ExecutionEnvironment as @deprecated.

- Preparation to remove Hadoop dependency from flink-java
- Alternatives for deprecated functionality is provided in flink-hadoop-compatibility via HadoopInputs

This closes apache#2637.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants