[FLINK-4315] Deprecate Hadoop dependent methods in flink-java #2637

kenmy · 2016-10-14T15:00:24Z

No description provided.

…-scala are marked as depricated. This change for moving this methods into the flink-hadoop-compatibility in the future.

greghogan

In addition to deprecating the ExecutionEnvironment methods, should we also now provide the replacement utility structure? This will give users time to migrate to the future API.

greghogan · 2016-10-14T15:12:20Z

flink-java/src/main/java/org/apache/flink/api/java/utils/ParameterTool.java

@@ -192,6 +192,7 @@ public static ParameterTool fromSystemProperties() {
 	 * @throws IOException If arguments cannot be parsed by {@link GenericOptionsParser}
 	 * @see GenericOptionsParser
 	 */
+	@Deprecated


Why is this deprecated?

Adding the dependencies org.apache.hadoop in flink-java only for one method of this class looks not very good. The method fromGenericOptionsParser will be moved with another Hadoop dependent methods into flink-hadoop-compatibility. This method isn't used inside flink. If anybody uses this in his code, he should rewrite before migrating to next major version of flink.
You can look to the full version flink-java without Hadoop here https://github.com/kenmy/flink/tree/FLINK-4048

@greghogan The GenericOptionsParser is a Hadoop utility, so it should not be directly referenced in flink-java

fhueske · 2016-10-14T16:05:50Z

Hi @kenmy, thanks for your PR. Can you actually merge this PR with your work in PR #2576? We also want to add the alternatives to which users should switch. The docs of the deprecated methods should point to these alternatives.

Later (when hopefully everybody migrated their code) we would remove the deprecated methods.

Thanks, Fabian

…to the flink-hadoop-compatibility

fhueske

Hi @kenmy, thanks for the update.
I made one suggestion to have static methods to build input formats rather than a new environment. What do you think?

Otherwise the PR looks good.

fhueske · 2016-10-22T20:08:09Z

...compatibility/src/main/java/org/apache/flink/hadoopcompatibility/FlinkHadoopEnvironment.java

+ *
+ * The environment provides methods to interact with the hadoop cluster (data access).
+ */
+public final class FlinkHadoopEnvironment {


Instead of adding a FlinkHadoopEnvironment, I would add a class HadoopInputs with static methods to create HadoopInputFormats.

This could be used like this:

import static org.apache.flink.hadoopcompatibility.HadoopInputs.createHadoopFileInput; // --- ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment(); DataSet<Tuple2<LongWritable, Text>> input = env.createInput( createHadoopFileInput(new TextInputFormat(), LongWritable.class, Text.class, textPath));

We might be able to remove the flink-scala dependency if we go for this approach.

Thank @fhueske for review, . I agree with you, the name HadoopInputs is more suitable here. I renamed it. Moreover I extracted paramsFromGenericOptionsParser into the new class HadoopUtils. HadoopInputs was too unsuitable as place for it.

fhueske · 2016-10-22T20:15:08Z

flink-java/src/test/java/org/apache/flink/api/java/operator/JoinOperatorTest.java

@@ -1211,7 +1210,6 @@ public String toString() {
 		public String myString;
 		public Object nothing;
 		public List<String> countries;
-		public Writable interfaceTest;


Writable represents in this test just an interface. Can you replace it by another interface to maintain the coverage of this test?

Coverage was decreased only on the 1 line. But I returned to the previous version.
I did not expect that the unused variable may affect the coverage.

The interface here checks for correct type extraction and handling of Flink. For that this variable does not need to be used but just declared.
However, this functionality is tested at other places as well.
So, +1 for removing this line.

fhueske · 2016-10-31T15:22:31Z

...-compatibility/src/main/scala/org/apache/flink/api/scala/hadoop/FlinkHadoopEnvironment.scala

+ *  Use [[FlinkHadoopEnvironment#getHadoopEnvironment]] to get the correct environment.
+ */
+@Public
+class FlinkHadoopEnvironment(parentEnv: ExecutionEnvironment) {


Can you convert the Scala code into a HadoopInputs as well?

Basically the same changes as for the Java HadoopInputs.

fhueske · 2016-10-31T15:23:36Z

...-compatibility/src/main/scala/org/apache/flink/api/scala/hadoop/FlinkHadoopEnvironment.scala

+ *
+ *  Use [[FlinkHadoopEnvironment#getHadoopEnvironment]] to get the correct environment.
+ */
+@Public


The connector modules do not have API annotations (´@public,@PublicEvolving), yet. Please remove all annotations for code inflink-hadoop-compatibility`.

fhueske · 2016-10-31T15:27:25Z

...nk-hadoop-compatibility/src/main/java/org/apache/flink/hadoopcompatibility/HadoopInputs.java

+ * The HadoopInputs is the utility class for create {@link HadoopInputFormat}.
+ *
+ * Methods:
+ * createHadoopInput - create {@link org.apache.flink.api.java.hadoop.mapred.HadoopInputFormat} or {@link org.apache.flink.api.java.hadoop.mapred.HadoopInputFormat}


Both classnames are the same.

fhueske · 2016-10-31T15:34:59Z

...nk-hadoop-compatibility/src/main/java/org/apache/flink/hadoopcompatibility/HadoopInputs.java

+import java.io.IOException;
+
+/**
+ * The HadoopInputs is the utility class for create {@link HadoopInputFormat}.


I would make the purpose of the class a bit more explicit.

HadoopInputs is a utility class to use Apache Hadoop InputFormats with Apache Flink.

It provides methods to create Flink InputFormat wrappers for Hadoop org.apache.hadoop.mapred.InputFormat and org.apache.hadoop.mapreduce.InputFormat.
Key value pairs produced by the Hadoop InputFormats are converted into Flink Tuple2 objects where the first field (Tuple2.f0) is the key and the second field (Tuple2.f1) is the value.

fhueske · 2016-10-31T15:38:14Z

flink-tests/src/test/scala/org/apache/flink/api/scala/hadoop/mapred/WordCountMapredITCase.scala

@@ -42,8 +42,8 @@ class WordCountMapredITCase extends JavaProgramTestBase {
  protected def testProgram() {
    val env = ExecutionEnvironment.getExecutionEnvironment

-    val input =
-      env.readHadoopFile(new TextInputFormat, classOf[LongWritable], classOf[Text], textPath)
+    val input = FlinkHadoopEnvironment.getHadoopEnvironment(env).


Please include the deprecated method in the test as done with the Java tests.

fhueske · 2016-10-31T15:45:59Z

Thanks for the update @kenmy. We are trying to keep the Java and Scala APIs as close as possible. Could you convert the Scala FlinkHadoopEnvironment into a HadoopInputs class as well?

I also noticed that there are quite a few Hadoop-related tests in the flink-tests module. I think it would be good to move the tests from the org.apache.flink.test.hadoop and org.apache.flink.api.scala.hadoop packages of flink-tests to flink-hadoop-compatibility.

In fact, there might be a bit of overlap with other tests in flink-hadoop-compatibility. It would be great if you could check for tests with overlapping test coverage. Then we could drop some of these tests.

Thanks for your work,
Fabian

…he hadoopcompatibility project

kenmy · 2016-11-02T14:56:36Z

Thanks @fhueske for a detailed review.
Done all except moving Hadoop-related tests into flink-hadoop-compatibility.
I'll do it sometime later. IMO this is the out of scope of issue "Deprecate Hadoop dependent methods in flink-java" as well as the moving it from flink-scala. May be moving the activity connected with hadoop-tests from this "god issue" to another issue will be better? Anyway I publish current state and I wait any advices how I may make my PR better.
BR, Evgeny

fhueske · 2016-11-02T16:08:33Z

Thanks for the update @kenmy!
+1 to merge.

Regarding moving the Hadoop tests from flink-tests to flink-hadoop-compatibility I agree. Let's do this as a separate issue. Do you want to create a JIRA issue for that?

Thanks, Fabian

@deprecated

… in ExecutionEnvironment as @deprecated. - Preparation to remove Hadoop dependency from flink-java - Alternatives for deprecated functionality is provided in flink-hadoop-compatibility via HadoopInputs This closes apache#2637

fhueske · 2016-11-02T17:12:23Z

Merging

@deprecated

… in ExecutionEnvironment as @deprecated. - Preparation to remove Hadoop dependency from flink-java - Alternatives for deprecated functionality is provided in flink-hadoop-compatibility via HadoopInputs This closes apache#2637.

[FLINK-4315] Methods, connected with hadoop from flink-java and flink…

b2de2ef

…-scala are marked as depricated. This change for moving this methods into the flink-hadoop-compatibility in the future.

greghogan reviewed Oct 14, 2016

View reviewed changes

[FLINK-4315] The hadoop related methods are copied from flink-java in…

1c386e9

…to the flink-hadoop-compatibility

fhueske reviewed Oct 22, 2016

View reviewed changes

[FLINK-4315] Renamed FlinkHadoopEnvironment to HadoopInputs

c77ae39

fhueske reviewed Oct 31, 2016

View reviewed changes

[FLINK-4315] Extracted hadoop related methods from flink-scala into t…

c642d54

…he hadoopcompatibility project

asfgit closed this in 7d61e1f Nov 2, 2016

rmetzger added the component=<none> label Mar 14, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FLINK-4315] Deprecate Hadoop dependent methods in flink-java #2637

[FLINK-4315] Deprecate Hadoop dependent methods in flink-java #2637

kenmy commented Oct 14, 2016

greghogan left a comment

greghogan Oct 14, 2016

kenmy Oct 14, 2016

StephanEwen Oct 17, 2016

fhueske commented Oct 14, 2016 •

edited

Loading

fhueske left a comment

fhueske Oct 22, 2016

kenmy Oct 31, 2016

fhueske Oct 22, 2016

kenmy Oct 31, 2016

fhueske Oct 31, 2016

fhueske Oct 31, 2016

fhueske Oct 31, 2016

fhueske Oct 31, 2016 •

edited

Loading

fhueske Oct 31, 2016

fhueske Oct 31, 2016

fhueske Oct 31, 2016

fhueske commented Oct 31, 2016

kenmy commented Nov 2, 2016

fhueske commented Nov 2, 2016

fhueske commented Nov 2, 2016

[FLINK-4315] Deprecate Hadoop dependent methods in flink-java #2637

[FLINK-4315] Deprecate Hadoop dependent methods in flink-java #2637

Conversation

kenmy commented Oct 14, 2016

greghogan left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fhueske commented Oct 14, 2016 • edited Loading

fhueske left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fhueske Oct 31, 2016 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fhueske commented Oct 31, 2016

kenmy commented Nov 2, 2016

fhueske commented Nov 2, 2016

fhueske commented Nov 2, 2016

fhueske commented Oct 14, 2016 •

edited

Loading

fhueske Oct 31, 2016 •

edited

Loading