[ZEPPELIN-6006] Remove command line applications when downloading applications #4746

Reamer · 2024-03-28T09:22:17Z

What is this PR for?

This pull request removes the use of command line applications when downloading Spark, Flink, Hadoop and Livy.
This pull request also updates some Apache Commons libraries, which are primarily required for decompression.

What type of PR is it?

Improvement

What is the Jira issue?

https://issues.apache.org/jira/browse/ZEPPELIN-6006

How should this be tested?

CI

Questions:

Does the license files need to update? Yes
Is there breaking changes for older versions? No
Does this needs documentation? No

jongyoul · 2024-03-29T01:32:00Z

I have an idea. In my understanding, build-tools aims to run the check style properly. So I think we'd better make a new module for your improvement, instead of using build-tools. WDYT?

Reamer · 2024-04-02T07:45:09Z

You are right, the build-tools module was set up so that the checkstyle reports work properly and we can distribute the configuration cleanly. (see https://maven.apache.org/plugins/maven-checkstyle-plugin/examples/multi-module-config.html)

I think it is unnecessary to create another module, so I have moved it in there.

It was important to me that the code is contained in a module that is independent of the Zeppelin interpreter modules and the Zeppelin-zengine/zeppelin server.

I can also put the code in a new module. What do you think?

jongyoul · 2024-04-03T07:58:18Z

First of all, I totally agree with you that deciding the code from zeppelin-interpreter and zengine. BTW, I'm a bit afraid of adding more test-related code later in the module. So even if it's not necessary to add a new module this time, I recommend adding a new module like zeppelin-test and so on for the future. WDYT?

Reamer · 2024-04-03T09:50:27Z

In the long term, you will probably be right. I have stored the code in a new module.

Reamer · 2024-04-09T06:31:26Z

Ready for review.

jongyoul · 2024-04-11T00:56:39Z

zeppelin-test/pom.xml

+      </plugin>
+    </plugins>
+  </build>
+</project>


nit: add a new line at the end of the file.

Thanks for the hint. I often point out this uncleanliness to my colleagues myself. Adjusted.

jongyoul

LGTM except for a tiny comment.

pan3793 · 2024-04-11T02:14:58Z

zeppelin-test/src/main/java/org/apache/zeppelin/test/DownloadUtils.java

+          .setTaskName("Unarchiv")
+          .setUnit("MiB", 1048576) // setting the progress bar to use MiB as the unit
+          .setStyle(ProgressBarStyle.ASCII)
+          .setUpdateIntervalMillis(1000)


we may want to use a large interval in CI console to avoid too many logs. suggest making it configurable via env var

I do not see the use case, but have implemented this via the environment variable PROGRESS_BAR_UPDATE_INTERVAL.

pan3793 · 2024-04-11T02:17:10Z

zeppelin-test/src/main/java/org/apache/zeppelin/test/DownloadUtils.java

+    }
+
+
+    // download other dependencies for running flink with yarn and hive


I touched this part in fa6e3ee, could you please sync the change?

Many thanks for the hint. I have adjusted the download paths.

pan3793 · 2024-04-11T02:19:17Z

zeppelin-test/src/main/java/org/apache/zeppelin/test/DownloadUtils.java

+   * @return return livyHome
+   * @throws IOException
+   */
+  public static String downloadLivy(String livyVersion) {


livy has scala suffix since 0.8, see #4678

as the current master uses 0.7, leave it also fine

I have added methods and related tests to download livy 0.8.0 in the future.

pan3793 · 2024-04-11T02:20:49Z

zeppelin-test/src/test/java/org/apache/zeppelin/test/DownloadUtilsTest.java

+
+  @Test
+  void downloadHadoop() {
+    String hadoopHome = DownloadUtils.downloadHadoop("3.4.0");


seems it has not been tested yet. one of the major changes in 3.4 is switching AWS SDK from v1 to v2, that's kind of a breaking change.

These are only test methods. It is up to the caller to decide which version is finally downloaded.

pan3793 · 2024-04-11T02:27:45Z

zeppelin-test/src/main/java/org/apache/zeppelin/test/DownloadUtils.java

+public class DownloadUtils {
+  private static final Logger LOGGER = LoggerFactory.getLogger(DownloadUtils.class);
+
+  private static final String MIRROR_URL = "https://www.apache.org/dyn/closer.lua?preferred=true";


if possible, please make it configure via env var, i.e. APACHE_MIRROR (Spark uses it in some places, including build/mvn)

Good point. I have adapted it.

pan3793 · 2024-04-11T02:30:51Z

zeppelin-test/src/main/java/org/apache/zeppelin/test/DownloadUtils.java

+          throw new IOException("Failed to create directory " + newFile);
+        }
+      } else {
+        // fix for Windows-created archives


what's the exact issue? do we really support run test on windows?

I have the code from here. I don't think it's out of the question to support other operating systems in the future.
However, there is no comment in their Git. I will therefore also remove it.
https://github.com/eugenp/tutorials/blob/c0559cbb6d6c66c3a87898805e28310a02a52458/core-java-modules/core-java-io/src/main/java/com/baeldung/unzip/UnzipFile.java#L23-L27

pan3793 · 2024-04-11T02:33:02Z

zeppelin-test/src/main/java/org/apache/zeppelin/test/DownloadUtils.java

+   * @param hadoopVersion
+   * @return home of Spark installation
+   */
+  public static String downloadSpark(String sparkVersion, String hadoopVersion) {


maybe we should expose scalaVersion too, Spark 4 is going to release in mid-2024, it only supports Scala 2.13

At the moment I cannot say what the download directory of Spark 4.x looks like. But I have included the Scala version (default null).

pan3793

LGTM.

One additional suggestion, better to upgrade dep in the dedicated patch, some of the version bumping actually fixes CVEs, dedicated patch is easy to backport and track.

jongyoul · 2024-04-12T02:37:35Z

I approved it but I started CI again as it has 8 failed cases including a known one.

By the way, Does anyone have an idea for the known failure?

Reamer · 2024-04-12T05:36:37Z

I will take a look. There is also another failure.

Reamer · 2024-04-12T10:17:04Z

I approved it but I started CI again as it has 8 failed cases including a known one.

By the way, Does anyone have an idea for the known failure?

I have changed a method signature. The tests are now running again.

Reamer · 2024-04-12T10:17:53Z

One additional suggestion, better to upgrade dep in the dedicated patch, some of the version bumping actually fixes CVEs, dedicated patch is easy to backport and track.

I have opened #4757. Who runs the backport?

Reamer · 2024-04-16T07:45:08Z

New approval required.

Reamer force-pushed the downloadutils branch 2 times, most recently from f8fe349 to 27dfd71 Compare March 28, 2024 10:02

Reamer force-pushed the downloadutils branch from 27dfd71 to 8dcd5ab Compare April 2, 2024 07:10

Reamer force-pushed the downloadutils branch from 8dcd5ab to a6f6072 Compare April 3, 2024 07:55

Reamer force-pushed the downloadutils branch 2 times, most recently from 532fe2a to 5e87104 Compare April 3, 2024 09:48

Reamer requested a review from jongyoul April 5, 2024 09:14

jongyoul reviewed Apr 11, 2024

View reviewed changes

jongyoul previously approved these changes Apr 11, 2024

View reviewed changes

pan3793 reviewed Apr 11, 2024

View reviewed changes

Reamer mentioned this pull request Apr 11, 2024

[ZEPPELIN-5999] Reduce instance objects from Zeppelin #4726

Merged

1 task

Reamer dismissed jongyoul’s stale review via e9859ae April 11, 2024 16:20

Reamer force-pushed the downloadutils branch 2 times, most recently from e9859ae to 354f4ee Compare April 11, 2024 16:21

pan3793 approved these changes Apr 11, 2024

View reviewed changes

jongyoul previously approved these changes Apr 12, 2024

View reviewed changes

Reamer dismissed jongyoul’s stale review via 710a0d2 April 12, 2024 08:14

Reamer added 4 commits April 12, 2024 15:54

Move Files with java

4d8ca1f

Use java to download external dependecies

29f1013

Improve code after review

a60f0ef

Correct Mirror-URL and compilation

836c289

Reamer force-pushed the downloadutils branch from 710a0d2 to 836c289 Compare April 12, 2024 13:54

Reamer requested a review from jongyoul April 14, 2024 09:41

jongyoul approved these changes Apr 17, 2024

View reviewed changes

jongyoul merged commit 67098fd into apache:master Apr 17, 2024
27 of 28 checks passed

Reamer deleted the downloadutils branch April 17, 2024 05:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ZEPPELIN-6006] Remove command line applications when downloading applications #4746

[ZEPPELIN-6006] Remove command line applications when downloading applications #4746

Reamer commented Mar 28, 2024

jongyoul commented Mar 29, 2024

Reamer commented Apr 2, 2024 •

edited

Loading

jongyoul commented Apr 3, 2024

Reamer commented Apr 3, 2024

Reamer commented Apr 9, 2024

jongyoul Apr 11, 2024

Reamer Apr 11, 2024

jongyoul left a comment

pan3793 Apr 11, 2024

Reamer Apr 11, 2024

pan3793 Apr 11, 2024

Reamer Apr 11, 2024

pan3793 Apr 11, 2024

Reamer Apr 11, 2024

pan3793 Apr 11, 2024 •

edited

Loading

Reamer Apr 11, 2024

pan3793 Apr 11, 2024

Reamer Apr 11, 2024

pan3793 Apr 11, 2024

Reamer Apr 11, 2024

pan3793 Apr 11, 2024

Reamer Apr 11, 2024

pan3793 left a comment •

edited

Loading

jongyoul commented Apr 12, 2024

Reamer commented Apr 12, 2024

Reamer commented Apr 12, 2024

Reamer commented Apr 12, 2024

Reamer commented Apr 16, 2024

		}


		// download other dependencies for running flink with yarn and hive

[ZEPPELIN-6006] Remove command line applications when downloading applications #4746

[ZEPPELIN-6006] Remove command line applications when downloading applications #4746

Conversation

Reamer commented Mar 28, 2024

What is this PR for?

What type of PR is it?

What is the Jira issue?

How should this be tested?

Questions:

jongyoul commented Mar 29, 2024

Reamer commented Apr 2, 2024 • edited Loading

jongyoul commented Apr 3, 2024

Reamer commented Apr 3, 2024

Reamer commented Apr 9, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jongyoul left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pan3793 Apr 11, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pan3793 left a comment • edited Loading

Choose a reason for hiding this comment

jongyoul commented Apr 12, 2024

Reamer commented Apr 12, 2024

Reamer commented Apr 12, 2024

Reamer commented Apr 12, 2024

Reamer commented Apr 16, 2024

Reamer commented Apr 2, 2024 •

edited

Loading

pan3793 Apr 11, 2024 •

edited

Loading

pan3793 left a comment •

edited

Loading