ZEPPELIN-5543 Add CircleCI config to test on Linux ARM64 #4238

martin-g · 2021-09-27T19:36:38Z

What is this PR for?

Execute basic/smoke tests on Linux ARM64 at CircleCI.

What type of PR is it?

Improvement

Todos

- Add more builds ?! E.g. hadoop3, helium and/or ignite

What is the Jira issue?

https://issues.apache.org/jira/browse/ZEPPELIN-5543

How should this be tested?

The build should pass on CircleCI

Questions:

Does the licenses files need update? - NO
Is there breaking changes for older versions? - NO
Does this needs documentation? - Probably we should mention on the website that PRs will be tested on AMD64 at GHA and ARM64 at CircleCI ?

martin-g · 2021-09-27T19:42:30Z

Test results could be seen at https://app.circleci.com/pipelines/github/martin-g/zeppelin?branch=zeppelin-5543-use-circle.ci-for-testing-on-linux-arm64 (probably you need to be logged in into CircleCI to be able to see it!).

@zjffdu You will need to enable this repo at CircleCI so that the build there appears as a check for each PR. To do this you need to login into CircleCI by using the Login via GitHub and then navigate to Projectsand finally Add project (apache/zeppelin).

Currently the build fails with:

[ERROR] Failed to execute goal org.xolstice.maven.plugins:protobuf-maven-plugin:0.5.0:compile (default) on project zeppelin-jupyter-interpreter: Missing:
[ERROR] ----------
[ERROR] 1) com.google.protobuf:protoc:exe:linux-aarch_64:3.3.0
[ERROR] 
[ERROR]   Try downloading the file manually from the project website.
[ERROR] 
[ERROR]   Then, install it using the command: 
[ERROR]       mvn install:install-file -DgroupId=com.google.protobuf -DartifactId=protoc -Dversion=3.3.0 -Dclassifier=linux-aarch_64 -Dpackaging=exe -Dfile=/path/to/file
[ERROR] 
[ERROR]   Alternatively, if you host your own repository you can deploy the file there: 
[ERROR]       mvn deploy:deploy-file -DgroupId=com.google.protobuf -DartifactId=protoc -Dversion=3.3.0 -Dclassifier=linux-aarch_64 -Dpackaging=exe -Dfile=/path/to/file -Durl=[url] -DrepositoryId=[id]
[ERROR] 
[ERROR]   Path to dependency: 
[ERROR]   	1) org.apache.zeppelin:zeppelin-jupyter-interpreter:jar:0.11.0-SNAPSHOT
[ERROR]   	2) com.google.protobuf:protoc:exe:linux-aarch_64:3.3.0
[ERROR] 
[ERROR] ----------
[ERROR] 1 required artifact is missing.
[ERROR] 
[ERROR] for artifact: 
[ERROR]   org.apache.zeppelin:zeppelin-jupyter-interpreter:jar:0.11.0-SNAPSHOT
[ERROR] 
[ERROR] from the specified remote repositories:
[ERROR]   apache.snapshots (http://repository.apache.org/snapshots, releases=false, snapshots=true),
[ERROR]   central (https://repo.maven.apache.org/maven2, releases=true, snapshots=false)
[ERROR] -> [Help 1]
[ERROR] 
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR] 
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
[ERROR] 
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR]   mvn <args> -rf :zeppelin-jupyter-interpreter

Exited with code exit status 1

Once #4233 and #4237 are merged the build should pass!

.circleci/config.yml

- remove -DskipTests - delete ~/.m2/repository/org/apache/zeppelin after restoring the cache - always cache ~/.m2, not only on success. This should improve the speed - extract the JDK version as a parameter. This way it would be easy to add a new job for JDK 11, for example

martin-g · 2021-09-28T07:12:04Z

The new build at CircleCI failed due to:

INFO [2021-09-28 06:53:04,243] ({main} DownloadUtils.java[runShellCommand]:136) - Starting shell commands: wget https://dlcdn.apache.org//spark/spark-2.4.4/spark-2.4.4-bin-hadoop2.7.tgz -P /home/circleci/.cache/spark
 WARN [2021-09-28 06:53:04,401] ({main} DownloadUtils.java[download]:113) - Failed to download spark from mirror site, fallback to use apache archive
java.io.IOException: Fail to run shell commands: wget https://dlcdn.apache.org//spark/spark-2.4.4/spark-2.4.4-bin-hadoop2.7.tgz -P /home/circleci/.cache/spark
	at org.apache.zeppelin.interpreter.integration.DownloadUtils.runShellCommand(DownloadUtils.java:143)
	at org.apache.zeppelin.interpreter.integration.DownloadUtils.download(DownloadUtils.java:110)
	at org.apache.zeppelin.interpreter.integration.DownloadUtils.download(DownloadUtils.java:132)
	at org.apache.zeppelin.interpreter.integration.DownloadUtils.downloadSpark(DownloadUtils.java:58)
	at org.apache.zeppelin.interpreter.launcher.SparkInterpreterLauncherTest.setUp(SparkInterpreterLauncherTest.java:55)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        ...

Tests run: 7, Failures: 0, Errors: 3, Skipped: 0, Time elapsed: 59.461 sec <<< FAILURE! - in org.apache.zeppelin.interpreter.launcher.SparkInterpreterLauncherTest
testYarnClusterMode_1(org.apache.zeppelin.interpreter.launcher.SparkInterpreterLauncherTest)  Time elapsed: 0.642 sec  <<< ERROR!
java.io.IOException: Fail to set additional jars for spark interpreter
	at org.apache.zeppelin.interpreter.launcher.SparkInterpreterLauncher.buildEnvFromProperties(SparkInterpreterLauncher.java:139)
	at org.apache.zeppelin.interpreter.launcher.StandardInterpreterLauncher.launchDirectly(StandardInterpreterLauncher.java:77)
	at org.apache.zeppelin.interpreter.launcher.InterpreterLauncher.launch(InterpreterLauncher.java:110)
	at org.apache.zeppelin.interpreter.launcher.SparkInterpreterLauncherTest.testYarnClusterMode_1(SparkInterpreterLauncherTest.java:194)

https://dlcdn.apache.org//spark/spark-2.4.4/spark-2.4.4-bin-hadoop2.7.tgz gives 404.
Why does it try to download Spark 2.4.4 when the active Maven profile is -Pspark-3.0 ?

Update: It seems the archive is actually downloaded from the alternative url:

INFO [2021-09-28 06:53:04,405] ({main} DownloadUtils.java[runShellCommand]:136) - Starting shell commands: wget https://archive.apache.org/dist/spark/spark-2.4.4/spark-2.4.4-bin-hadoop2.7.tgz -P /home/circleci/.cache/spark
 INFO [2021-09-28 06:53:09,454] ({Thread-2} DownloadUtils.java[run]:166) -   2350K .......... .......... .......... .......... ..........  1%  519K 7m3s
...

It seems the problem is here:

  Path scalaFolder =  Paths.get(zConf.getZeppelinHome(), "/interpreter/spark/scala-" + scalaVersion);
        if (!scalaFolder.toFile().exists()) {
          throw new IOException("spark scala folder " + scalaFolder.toFile() + " doesn't exist");
        }

CircleCI allows to connect via SSH to the build node and debug! I will check the full stacktrace and debug the issue!

It is needed by SparkInterpreterLauncherTest to download Apache Spark

martin-g · 2021-09-28T07:54:33Z

Bad news, it seems CircleCI requires write permissions to the repo and Apache Infra does not like this: https://issues.apache.org/jira/browse/INFRA-22367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17421230#comment-17421230

Should I redo this PR to TravisCI ?

zjffdu · 2021-09-28T07:57:52Z

@martin-g We used travis-ci before, but it only provides limited credit for open source project. We ran out of credit very quickly based on our experience.

zjffdu · 2021-09-28T08:05:44Z

-Pspark-3.0

@martin-g SparkInterpreterLauncherTest is a special case that we hardcode the spark version it download. BTW -Pspark-3.0 only affect the zeppelin build and the spark version used in unit test of spark module. It won't affect the integration test (e.g SparkIntegrationTest24, SparkIntegrationTest30, SparkIntegrationTest31). Zeppelin's goal is to allow one instance of Zeppelin works with multiple versions of spark, so these integrations tests would always run no matter what spark profile is used.

martin-g · 2021-09-28T08:07:41Z

The credits at TravisCI are for the whole apache org. But yes, there are few projects which consume most of them - https://infra-reports.apache.org/cistats/. Those project have big job matrices there.
At Apache Tomcat and Apache Wicket we have just 3-4 jobs (for all supported branches) and we never had issues with the credits. Only the build wait time is bigger than desired.
Another way it to use a scheduled/cron job and run it just once per day/week, but this way there would be a feedback delay.

I will contact CircleCI and ask them whether something could be done about the write access!

martin-g · 2021-09-28T08:09:15Z

SparkInterpreterLauncherTest

circleci@ip-172-28-26-11:~/zeppelin$ ls -la interpreter/
total 19952
drwxrwxr-x  2 circleci circleci     4096 Sep 28 07:48 .
drwxrwxr-x 84 circleci circleci     4096 Sep 28 08:03 ..
-rw-rw-r--  1 circleci circleci 20422403 Sep 28 07:48 zeppelin-interpreter-shaded-0.11.0-SNAPSHOT.jar

There are no spark-x.y folders in the interpreter/ folder. Any idea why ?

zjffdu · 2021-09-28T08:11:10Z

SparkInterpreterLauncherTest

circleci@ip-172-28-26-11:~/zeppelin$ ls -la interpreter/
total 19952
drwxrwxr-x  2 circleci circleci     4096 Sep 28 07:48 .
drwxrwxr-x 84 circleci circleci     4096 Sep 28 08:03 ..
-rw-rw-r--  1 circleci circleci 20422403 Sep 28 07:48 zeppelin-interpreter-shaded-0.11.0-SNAPSHOT.jar

There are no spark-x.y folders in the interpreter/ folder. Any idea why ?

If you build spark module, there will be spark folder under interpreter/. It seems spark interpreter module is not bulit. Actually SparkInterpreterLauncherTest doesn't reply on spark interpreter module. It is in zeppelin-zengine module.

martin-g · 2021-09-28T08:13:33Z

What change should I do at https://github.com/apache/zeppelin/pull/4238/files#diff-78a8a19706dbd2a4425dd72bdab0502ed7a2cef16365ab7030a5a0588927bf47R58 to fix it ?

zjffdu · 2021-09-28T08:21:35Z

What change should I do at https://github.com/apache/zeppelin/pull/4238/files#diff-78a8a19706dbd2a4425dd72bdab0502ed7a2cef16365ab7030a5a0588927bf47R58 to fix it ?

Try do it in 2 steps:

mvn install -DskipTests
mvn verify

There is some problem with the initial build of Zeppelin. 'mvn clean install' on fresh checkout does not build due to missing zeppelin-** dependencies from the Maven reactor

EmbeddedMongo does not support Linux ARM64

martin-g · 2021-09-30T12:56:39Z

I have tested CircleCI, CirrusCI and Drone.io - all of them ask for write permissions for the GitHub repo. Something that Apache Infra team does not allow!

The next options are:

TravisCI - the problem with it is that there is a quota of free credits. Still somehow it works fine for many Apache projects: Ranger, Tomcat, Wicket, Parquet, ...
Apache Jenkins with a Linux ARM64 agent. I could help providing Linux ARM64 VM for the agent!
JFrog Artifactory Pipelines - this is something that Apache Infra team wants to explore more, but I have no experience with it yet.

Any preferences ?

martin-g · 2021-10-01T05:56:38Z

An update about TravisCI.
Since recently their ARM64 nodes are absolutely free (0 credits!) - https://blog.travis-ci.com/2021-08-06-oss-equinix :

This is is free to use for OSS (build minute costs zero credits if you run a build over your open source repository) as a part of our Partner Queue Solution.

Reamer · 2021-10-04T12:59:20Z

At the moment it's a Partner Queue Solution. (https://docs.travis-ci.com/user/billing-overview/#partner-queue-solution)
We can reactivate travis-ci for ARM builds as long as they are free.

martin-g · 2021-10-04T13:08:45Z

We can reactivate travis-ci for ARM builds as long as they are free.

It is free!
I've already started reviving .travis.yml from branch-0.9 (https://github.com/apache/zeppelin/compare/master...martin-g:use-travis-ci-for-linux-arm64?expand=1) and the builds do not consume my credits!

I face some problems with the Conda packages though:

+conda config --add channels conda-forge
+conda install -q numpy=1.21.2 pandas=1.3.3 matplotlib=3.4.3 pandasql=0.7.3 ipython=7.28.0 jupyter_client=5.3.4 ipykernel=6.4.1 bokeh=2.4.0
Collecting package metadata (current_repodata.json): ...working... done
Solving environment: ...working... failed with initial frozen solve. Retrying with flexible solve.
Solving environment: ...working... failed with repodata from current_repodata.json, will retry with next repodata source.
Collecting package metadata (repodata.json): ...working... done
Solving environment: ...working... failed with initial frozen solve. Retrying with flexible solve.
                                                                    
Found conflicts! Looking for incompatible packages.
This can take several minutes.  Press CTRL-C to abort.
failed
UnsatisfiableError: The following specifications were found
to be incompatible with the existing python installation in your environment:
Specifications:
  - jupyter_client=5.3.4 -> python[version='>=3.6,<3.7.0a0|>=3.8,<3.9.0a0|>=3.7,<3.8.0a0']
  - pandasql=0.7.3 -> python[version='>=2.7,<2.8.0a0|>=3.6,<3.7.0a0']
Your python: python=3.9

There is only Miniconda 3 for Linux Aarch64!
And I tried to use the latest versions of the packages at https://conda.anaconda.org/conda-forge/linux-aarch64/.
But some packages have no versions for Python 3.9!
Is there a way to pin Python to 3.8 ?

martin-g · 2021-10-05T11:03:21Z

Closing in favour of #4243

Reamer reviewed Sep 28, 2021

View reviewed changes

.circleci/config.yml Outdated Show resolved Hide resolved

.circleci/config.yml Show resolved Hide resolved

.circleci/config.yml Outdated Show resolved Hide resolved

martin-g added 2 commits September 28, 2021 09:49

ZEPPELIN-5543 Add CircleCI config to test on Linux ARM64

0303361

ZEPPELIN-5543 Install wget

77f6922

It is needed by SparkInterpreterLauncherTest to download Apache Spark

martin-g added 6 commits September 28, 2021 11:24

ZEPPELIN-5543 Split Maven build into install w/o tests + verify

1fff1f5

ZEPPELIN-5543 Do not fail 'mvn install' step

92ff950

There is some problem with the initial build of Zeppelin. 'mvn clean install' on fresh checkout does not build due to missing zeppelin-** dependencies from the Maven reactor

ZEPPELIN-5543 Make the setup similar to GitHub Actions 'core' workflow

13e25b2

ZEPPELIN-5543 Exclude notebookrepo-mongo

f1014df

EmbeddedMongo does not support Linux ARM64

ZEPPELIN-5543 Properly setup the env vars for the whole build job

b2dda9a

ZEPPELIN-5543 Try CirrusCI

29d030f

martin-g closed this Oct 5, 2021

ZEPPELIN-5543 Add CircleCI config to test on Linux ARM64 #4238

ZEPPELIN-5543 Add CircleCI config to test on Linux ARM64 #4238

Uh oh!

Conversation

martin-g commented Sep 27, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What is this PR for?

What type of PR is it?

Todos

What is the Jira issue?

How should this be tested?

Questions:

Uh oh!

martin-g commented Sep 27, 2021

Uh oh!

Uh oh!

Uh oh!

Uh oh!

martin-g commented Sep 28, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

martin-g commented Sep 28, 2021

Uh oh!

zjffdu commented Sep 28, 2021

Uh oh!

zjffdu commented Sep 28, 2021

Uh oh!

martin-g commented Sep 28, 2021

Uh oh!

martin-g commented Sep 28, 2021

Uh oh!

zjffdu commented Sep 28, 2021

Uh oh!

martin-g commented Sep 28, 2021

Uh oh!

zjffdu commented Sep 28, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

martin-g commented Sep 30, 2021

Uh oh!

martin-g commented Oct 1, 2021

Uh oh!

Reamer commented Oct 4, 2021

Uh oh!

martin-g commented Oct 4, 2021

Uh oh!

martin-g commented Oct 5, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

martin-g commented Sep 27, 2021 •

edited

Loading

martin-g commented Sep 28, 2021 •

edited

Loading

zjffdu commented Sep 28, 2021 •

edited

Loading