Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ZEPPELIN-6006] Remove command line applications when downloading applications #4746

Merged
merged 4 commits into from
Apr 17, 2024

Conversation

Reamer
Copy link
Contributor

@Reamer Reamer commented Mar 28, 2024

What is this PR for?

This pull request removes the use of command line applications when downloading Spark, Flink, Hadoop and Livy.
This pull request also updates some Apache Commons libraries, which are primarily required for decompression.

What type of PR is it?

Improvement

What is the Jira issue?

How should this be tested?

  • CI

Questions:

  • Does the license files need to update? Yes
  • Is there breaking changes for older versions? No
  • Does this needs documentation? No

@Reamer Reamer force-pushed the downloadutils branch 2 times, most recently from f8fe349 to 27dfd71 Compare March 28, 2024 10:02
@jongyoul
Copy link
Member

I have an idea. In my understanding, build-tools aims to run the check style properly. So I think we'd better make a new module for your improvement, instead of using build-tools. WDYT?

@Reamer
Copy link
Contributor Author

Reamer commented Apr 2, 2024

You are right, the build-tools module was set up so that the checkstyle reports work properly and we can distribute the configuration cleanly. (see https://maven.apache.org/plugins/maven-checkstyle-plugin/examples/multi-module-config.html)

I think it is unnecessary to create another module, so I have moved it in there.

It was important to me that the code is contained in a module that is independent of the Zeppelin interpreter modules and the Zeppelin-zengine/zeppelin server.

I can also put the code in a new module. What do you think?

@jongyoul
Copy link
Member

jongyoul commented Apr 3, 2024

First of all, I totally agree with you that deciding the code from zeppelin-interpreter and zengine. BTW, I'm a bit afraid of adding more test-related code later in the module. So even if it's not necessary to add a new module this time, I recommend adding a new module like zeppelin-test and so on for the future. WDYT?

@Reamer Reamer force-pushed the downloadutils branch 2 times, most recently from 532fe2a to 5e87104 Compare April 3, 2024 09:48
@Reamer
Copy link
Contributor Author

Reamer commented Apr 3, 2024

In the long term, you will probably be right. I have stored the code in a new module.

@Reamer Reamer requested a review from jongyoul April 5, 2024 09:14
@Reamer
Copy link
Contributor Author

Reamer commented Apr 9, 2024

Ready for review.

</plugin>
</plugins>
</build>
</project>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: add a new line at the end of the file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the hint. I often point out this uncleanliness to my colleagues myself. Adjusted.

jongyoul
jongyoul previously approved these changes Apr 11, 2024
Copy link
Member

@jongyoul jongyoul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM except for a tiny comment.

.setTaskName("Unarchiv")
.setUnit("MiB", 1048576) // setting the progress bar to use MiB as the unit
.setStyle(ProgressBarStyle.ASCII)
.setUpdateIntervalMillis(1000)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we may want to use a large interval in CI console to avoid too many logs. suggest making it configurable via env var

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not see the use case, but have implemented this via the environment variable PROGRESS_BAR_UPDATE_INTERVAL.

}


// download other dependencies for running flink with yarn and hive
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I touched this part in fa6e3ee, could you please sync the change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Many thanks for the hint. I have adjusted the download paths.

* @return return livyHome
* @throws IOException
*/
public static String downloadLivy(String livyVersion) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

livy has scala suffix since 0.8, see #4678

as the current master uses 0.7, leave it also fine

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added methods and related tests to download livy 0.8.0 in the future.


@Test
void downloadHadoop() {
String hadoopHome = DownloadUtils.downloadHadoop("3.4.0");
Copy link
Member

@pan3793 pan3793 Apr 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems it has not been tested yet. one of the major changes in 3.4 is switching AWS SDK from v1 to v2, that's kind of a breaking change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are only test methods. It is up to the caller to decide which version is finally downloaded.

public class DownloadUtils {
private static final Logger LOGGER = LoggerFactory.getLogger(DownloadUtils.class);

private static final String MIRROR_URL = "https://www.apache.org/dyn/closer.lua?preferred=true";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if possible, please make it configure via env var, i.e. APACHE_MIRROR (Spark uses it in some places, including build/mvn)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I have adapted it.

throw new IOException("Failed to create directory " + newFile);
}
} else {
// fix for Windows-created archives
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the exact issue? do we really support run test on windows?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have the code from here. I don't think it's out of the question to support other operating systems in the future.
However, there is no comment in their Git. I will therefore also remove it.
https://github.com/eugenp/tutorials/blob/c0559cbb6d6c66c3a87898805e28310a02a52458/core-java-modules/core-java-io/src/main/java/com/baeldung/unzip/UnzipFile.java#L23-L27

* @param hadoopVersion
* @return home of Spark installation
*/
public static String downloadSpark(String sparkVersion, String hadoopVersion) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we should expose scalaVersion too, Spark 4 is going to release in mid-2024, it only supports Scala 2.13

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the moment I cannot say what the download directory of Spark 4.x looks like. But I have included the Scala version (default null).

Copy link
Member

@pan3793 pan3793 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

One additional suggestion, better to upgrade dep in the dedicated patch, some of the version bumping actually fixes CVEs, dedicated patch is easy to backport and track.

jongyoul
jongyoul previously approved these changes Apr 12, 2024
@jongyoul
Copy link
Member

I approved it but I started CI again as it has 8 failed cases including a known one.

By the way, Does anyone have an idea for the known failure?

@Reamer
Copy link
Contributor Author

Reamer commented Apr 12, 2024

I will take a look. There is also another failure.

@Reamer
Copy link
Contributor Author

Reamer commented Apr 12, 2024

I approved it but I started CI again as it has 8 failed cases including a known one.

By the way, Does anyone have an idea for the known failure?

I have changed a method signature. The tests are now running again.

@Reamer
Copy link
Contributor Author

Reamer commented Apr 12, 2024

One additional suggestion, better to upgrade dep in the dedicated patch, some of the version bumping actually fixes CVEs, dedicated patch is easy to backport and track.

I have opened #4757. Who runs the backport?

@Reamer
Copy link
Contributor Author

Reamer commented Apr 16, 2024

New approval required.

@jongyoul jongyoul merged commit 67098fd into apache:master Apr 17, 2024
27 of 28 checks passed
@Reamer Reamer deleted the downloadutils branch April 17, 2024 05:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants