Skip to content

Conversation

@martin-g
Copy link
Member

@martin-g martin-g commented Sep 27, 2021

What is this PR for?

Execute basic/smoke tests on Linux ARM64 at CircleCI.

What type of PR is it?

Improvement

Todos

  • - Add more builds ?! E.g. hadoop3, helium and/or ignite

What is the Jira issue?

How should this be tested?

  • The build should pass on CircleCI

Questions:

  • Does the licenses files need update? - NO
  • Is there breaking changes for older versions? - NO
  • Does this needs documentation? - Probably we should mention on the website that PRs will be tested on AMD64 at GHA and ARM64 at CircleCI ?

@martin-g
Copy link
Member Author

Test results could be seen at https://app.circleci.com/pipelines/github/martin-g/zeppelin?branch=zeppelin-5543-use-circle.ci-for-testing-on-linux-arm64 (probably you need to be logged in into CircleCI to be able to see it!).

@zjffdu You will need to enable this repo at CircleCI so that the build there appears as a check for each PR. To do this you need to login into CircleCI by using the Login via GitHub and then navigate to Projectsand finally Add project (apache/zeppelin).

Currently the build fails with:

[ERROR] Failed to execute goal org.xolstice.maven.plugins:protobuf-maven-plugin:0.5.0:compile (default) on project zeppelin-jupyter-interpreter: Missing:
[ERROR] ----------
[ERROR] 1) com.google.protobuf:protoc:exe:linux-aarch_64:3.3.0
[ERROR] 
[ERROR]   Try downloading the file manually from the project website.
[ERROR] 
[ERROR]   Then, install it using the command: 
[ERROR]       mvn install:install-file -DgroupId=com.google.protobuf -DartifactId=protoc -Dversion=3.3.0 -Dclassifier=linux-aarch_64 -Dpackaging=exe -Dfile=/path/to/file
[ERROR] 
[ERROR]   Alternatively, if you host your own repository you can deploy the file there: 
[ERROR]       mvn deploy:deploy-file -DgroupId=com.google.protobuf -DartifactId=protoc -Dversion=3.3.0 -Dclassifier=linux-aarch_64 -Dpackaging=exe -Dfile=/path/to/file -Durl=[url] -DrepositoryId=[id]
[ERROR] 
[ERROR]   Path to dependency: 
[ERROR]   	1) org.apache.zeppelin:zeppelin-jupyter-interpreter:jar:0.11.0-SNAPSHOT
[ERROR]   	2) com.google.protobuf:protoc:exe:linux-aarch_64:3.3.0
[ERROR] 
[ERROR] ----------
[ERROR] 1 required artifact is missing.
[ERROR] 
[ERROR] for artifact: 
[ERROR]   org.apache.zeppelin:zeppelin-jupyter-interpreter:jar:0.11.0-SNAPSHOT
[ERROR] 
[ERROR] from the specified remote repositories:
[ERROR]   apache.snapshots (http://repository.apache.org/snapshots, releases=false, snapshots=true),
[ERROR]   central (https://repo.maven.apache.org/maven2, releases=true, snapshots=false)
[ERROR] -> [Help 1]
[ERROR] 
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR] 
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
[ERROR] 
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR]   mvn <args> -rf :zeppelin-jupyter-interpreter

Exited with code exit status 1

Once #4233 and #4237 are merged the build should pass!

- remove -DskipTests
- delete ~/.m2/repository/org/apache/zeppelin after restoring the cache
- always cache ~/.m2, not only on success. This should improve the speed
- extract the JDK version as a parameter. This way it would be easy to add a new job for JDK 11, for example
@martin-g
Copy link
Member Author

martin-g commented Sep 28, 2021

The new build at CircleCI failed due to:

INFO [2021-09-28 06:53:04,243] ({main} DownloadUtils.java[runShellCommand]:136) - Starting shell commands: wget https://dlcdn.apache.org//spark/spark-2.4.4/spark-2.4.4-bin-hadoop2.7.tgz -P /home/circleci/.cache/spark
 WARN [2021-09-28 06:53:04,401] ({main} DownloadUtils.java[download]:113) - Failed to download spark from mirror site, fallback to use apache archive
java.io.IOException: Fail to run shell commands: wget https://dlcdn.apache.org//spark/spark-2.4.4/spark-2.4.4-bin-hadoop2.7.tgz -P /home/circleci/.cache/spark
	at org.apache.zeppelin.interpreter.integration.DownloadUtils.runShellCommand(DownloadUtils.java:143)
	at org.apache.zeppelin.interpreter.integration.DownloadUtils.download(DownloadUtils.java:110)
	at org.apache.zeppelin.interpreter.integration.DownloadUtils.download(DownloadUtils.java:132)
	at org.apache.zeppelin.interpreter.integration.DownloadUtils.downloadSpark(DownloadUtils.java:58)
	at org.apache.zeppelin.interpreter.launcher.SparkInterpreterLauncherTest.setUp(SparkInterpreterLauncherTest.java:55)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        ...

Tests run: 7, Failures: 0, Errors: 3, Skipped: 0, Time elapsed: 59.461 sec <<< FAILURE! - in org.apache.zeppelin.interpreter.launcher.SparkInterpreterLauncherTest
testYarnClusterMode_1(org.apache.zeppelin.interpreter.launcher.SparkInterpreterLauncherTest)  Time elapsed: 0.642 sec  <<< ERROR!
java.io.IOException: Fail to set additional jars for spark interpreter
	at org.apache.zeppelin.interpreter.launcher.SparkInterpreterLauncher.buildEnvFromProperties(SparkInterpreterLauncher.java:139)
	at org.apache.zeppelin.interpreter.launcher.StandardInterpreterLauncher.launchDirectly(StandardInterpreterLauncher.java:77)
	at org.apache.zeppelin.interpreter.launcher.InterpreterLauncher.launch(InterpreterLauncher.java:110)
	at org.apache.zeppelin.interpreter.launcher.SparkInterpreterLauncherTest.testYarnClusterMode_1(SparkInterpreterLauncherTest.java:194)

https://dlcdn.apache.org//spark/spark-2.4.4/spark-2.4.4-bin-hadoop2.7.tgz gives 404.
Why does it try to download Spark 2.4.4 when the active Maven profile is -Pspark-3.0 ?

Update: It seems the archive is actually downloaded from the alternative url:

INFO [2021-09-28 06:53:04,405] ({main} DownloadUtils.java[runShellCommand]:136) - Starting shell commands: wget https://archive.apache.org/dist/spark/spark-2.4.4/spark-2.4.4-bin-hadoop2.7.tgz -P /home/circleci/.cache/spark
 INFO [2021-09-28 06:53:09,454] ({Thread-2} DownloadUtils.java[run]:166) -   2350K .......... .......... .......... .......... ..........  1%  519K 7m3s
...

It seems the problem is here:

  Path scalaFolder =  Paths.get(zConf.getZeppelinHome(), "/interpreter/spark/scala-" + scalaVersion);
        if (!scalaFolder.toFile().exists()) {
          throw new IOException("spark scala folder " + scalaFolder.toFile() + " doesn't exist");
        }

CircleCI allows to connect via SSH to the build node and debug! I will check the full stacktrace and debug the issue!

It is needed by SparkInterpreterLauncherTest to download Apache Spark
@martin-g
Copy link
Member Author

Bad news, it seems CircleCI requires write permissions to the repo and Apache Infra does not like this: https://issues.apache.org/jira/browse/INFRA-22367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17421230#comment-17421230

Should I redo this PR to TravisCI ?

@zjffdu
Copy link
Contributor

zjffdu commented Sep 28, 2021

@martin-g We used travis-ci before, but it only provides limited credit for open source project. We ran out of credit very quickly based on our experience.

@zjffdu
Copy link
Contributor

zjffdu commented Sep 28, 2021

-Pspark-3.0

@martin-g SparkInterpreterLauncherTest is a special case that we hardcode the spark version it download. BTW -Pspark-3.0 only affect the zeppelin build and the spark version used in unit test of spark module. It won't affect the integration test (e.g SparkIntegrationTest24, SparkIntegrationTest30, SparkIntegrationTest31). Zeppelin's goal is to allow one instance of Zeppelin works with multiple versions of spark, so these integrations tests would always run no matter what spark profile is used.

@martin-g
Copy link
Member Author

The credits at TravisCI are for the whole apache org. But yes, there are few projects which consume most of them - https://infra-reports.apache.org/cistats/. Those project have big job matrices there.
At Apache Tomcat and Apache Wicket we have just 3-4 jobs (for all supported branches) and we never had issues with the credits. Only the build wait time is bigger than desired.
Another way it to use a scheduled/cron job and run it just once per day/week, but this way there would be a feedback delay.

I will contact CircleCI and ask them whether something could be done about the write access!

@martin-g
Copy link
Member Author

SparkInterpreterLauncherTest

circleci@ip-172-28-26-11:~/zeppelin$ ls -la interpreter/
total 19952
drwxrwxr-x  2 circleci circleci     4096 Sep 28 07:48 .
drwxrwxr-x 84 circleci circleci     4096 Sep 28 08:03 ..
-rw-rw-r--  1 circleci circleci 20422403 Sep 28 07:48 zeppelin-interpreter-shaded-0.11.0-SNAPSHOT.jar

There are no spark-x.y folders in the interpreter/ folder. Any idea why ?

@zjffdu
Copy link
Contributor

zjffdu commented Sep 28, 2021

SparkInterpreterLauncherTest

circleci@ip-172-28-26-11:~/zeppelin$ ls -la interpreter/
total 19952
drwxrwxr-x  2 circleci circleci     4096 Sep 28 07:48 .
drwxrwxr-x 84 circleci circleci     4096 Sep 28 08:03 ..
-rw-rw-r--  1 circleci circleci 20422403 Sep 28 07:48 zeppelin-interpreter-shaded-0.11.0-SNAPSHOT.jar

There are no spark-x.y folders in the interpreter/ folder. Any idea why ?

If you build spark module, there will be spark folder under interpreter/. It seems spark interpreter module is not bulit. Actually SparkInterpreterLauncherTest doesn't reply on spark interpreter module. It is in zeppelin-zengine module.

@martin-g
Copy link
Member Author

@zjffdu
Copy link
Contributor

zjffdu commented Sep 28, 2021

What change should I do at https://github.com/apache/zeppelin/pull/4238/files#diff-78a8a19706dbd2a4425dd72bdab0502ed7a2cef16365ab7030a5a0588927bf47R58 to fix it ?

Try do it in 2 steps:

  • mvn install -DskipTests
  • mvn verify

There is some problem with the initial build of Zeppelin. 'mvn clean install' on fresh checkout does not build due to missing zeppelin-** dependencies from the Maven reactor
EmbeddedMongo does not support Linux ARM64
@martin-g
Copy link
Member Author

I have tested CircleCI, CirrusCI and Drone.io - all of them ask for write permissions for the GitHub repo. Something that Apache Infra team does not allow!

The next options are:

  • TravisCI - the problem with it is that there is a quota of free credits. Still somehow it works fine for many Apache projects: Ranger, Tomcat, Wicket, Parquet, ...
  • Apache Jenkins with a Linux ARM64 agent. I could help providing Linux ARM64 VM for the agent!
  • JFrog Artifactory Pipelines - this is something that Apache Infra team wants to explore more, but I have no experience with it yet.

Any preferences ?

@martin-g
Copy link
Member Author

martin-g commented Oct 1, 2021

An update about TravisCI.
Since recently their ARM64 nodes are absolutely free (0 credits!) - https://blog.travis-ci.com/2021-08-06-oss-equinix :

This is is free to use for OSS (build minute costs zero credits if you run a build over your open source repository) as a part of our Partner Queue Solution.

@Reamer
Copy link
Contributor

Reamer commented Oct 4, 2021

At the moment it's a Partner Queue Solution. (https://docs.travis-ci.com/user/billing-overview/#partner-queue-solution)
We can reactivate travis-ci for ARM builds as long as they are free.

@martin-g
Copy link
Member Author

martin-g commented Oct 4, 2021

We can reactivate travis-ci for ARM builds as long as they are free.

It is free!
I've already started reviving .travis.yml from branch-0.9 (https://github.com/apache/zeppelin/compare/master...martin-g:use-travis-ci-for-linux-arm64?expand=1) and the builds do not consume my credits!

I face some problems with the Conda packages though:

+conda config --add channels conda-forge
+conda install -q numpy=1.21.2 pandas=1.3.3 matplotlib=3.4.3 pandasql=0.7.3 ipython=7.28.0 jupyter_client=5.3.4 ipykernel=6.4.1 bokeh=2.4.0
Collecting package metadata (current_repodata.json): ...working... done
Solving environment: ...working... failed with initial frozen solve. Retrying with flexible solve.
Solving environment: ...working... failed with repodata from current_repodata.json, will retry with next repodata source.
Collecting package metadata (repodata.json): ...working... done
Solving environment: ...working... failed with initial frozen solve. Retrying with flexible solve.
                                                                    
Found conflicts! Looking for incompatible packages.
This can take several minutes.  Press CTRL-C to abort.
failed
UnsatisfiableError: The following specifications were found
to be incompatible with the existing python installation in your environment:
Specifications:
  - jupyter_client=5.3.4 -> python[version='>=3.6,<3.7.0a0|>=3.8,<3.9.0a0|>=3.7,<3.8.0a0']
  - pandasql=0.7.3 -> python[version='>=2.7,<2.8.0a0|>=3.6,<3.7.0a0']
Your python: python=3.9

There is only Miniconda 3 for Linux Aarch64!
And I tried to use the latest versions of the packages at https://conda.anaconda.org/conda-forge/linux-aarch64/.
But some packages have no versions for Python 3.9!
Is there a way to pin Python to 3.8 ?

@martin-g
Copy link
Member Author

martin-g commented Oct 5, 2021

Closing in favour of #4243

@martin-g martin-g closed this Oct 5, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants