Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error Could not initialize class ch.systemsx.cisd.hdf5.CharacterEncoding on AffineExport on h5 file #8

Closed
boazmohar opened this issue Apr 21, 2022 · 14 comments

Comments

@boazmohar
Copy link

Hi @StephanPreibisch,

I am trying to do an AffineExport with spark:

~/spark-janelia/flintstone.sh 4 \
/groups/spruston/home/moharb/BigStitcher-Spark/target/BigStitcher-Spark-0.0.2-SNAPSHOT.jar \ 
net.preibisch.bigstitcher.spark.AffineFusion \
-x '/groups/mousebrainmicro/mousebrainmicro/data/Lightsheet/20210812_AG/ML_Rendering-test/aligned_data.xml' \
-o  '/nrs/svoboda/moharb/test_ML.n5' -d '/s0' 

And get this error:

2022-04-21 15:45:37,731 [task-result-getter-0] ERROR [TaskSetManager]: Task 1 in stage 0.0 failed 4 times; aborting job
org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 0.0 failed 4 times, most recent failure: Lost task 1.3 in stage 0.0 (TID 78, 10.36.107.42, executor 0): java.lang.NoClassDefFoundError: Could not initialize class ch.systemsx.cisd.hdf5.CharacterEncoding
	at ch.systemsx.cisd.hdf5.HDF5BaseReader.<init>(HDF5BaseReader.java:143)
	at ch.systemsx.cisd.hdf5.HDF5BaseReader.<init>(HDF5BaseReader.java:126)
	at ch.systemsx.cisd.hdf5.HDF5ReaderConfigurator.reader(HDF5ReaderConfigurator.java:86)
	at ch.systemsx.cisd.hdf5.HDF5FactoryProvider$HDF5Factory.openForReading(HDF5FactoryProvider.java:54)
	at ch.systemsx.cisd.hdf5.HDF5Factory.openForReading(HDF5Factory.java:55)
	at bdv.img.hdf5.Hdf5ImageLoader.open(Hdf5ImageLoader.java:183)
	at bdv.img.hdf5.Hdf5ImageLoader.getSetupImgLoader(Hdf5ImageLoader.java:381)
	at bdv.img.hdf5.Hdf5ImageLoader.getSetupImgLoader(Hdf5ImageLoader.java:79)
	at net.preibisch.bigstitcher.spark.util.ViewUtil.getTransformedBoundingBox(ViewUtil.java:32)
	at net.preibisch.bigstitcher.spark.AffineFusion.lambda$call$7b7a6284$1(AffineFusion.java:268)
	at org.apache.spark.api.java.JavaRDDLike.$anonfun$foreach$1(JavaRDDLike.scala:351)
	at org.apache.spark.api.java.JavaRDDLike.$anonfun$foreach$1$adapted(JavaRDDLike.scala:351)
	at scala.collection.Iterator.foreach(Iterator.scala:941)
	at scala.collection.Iterator.foreach$(Iterator.scala:941)
	at org.apache.spark.InterruptibleIterator.foreach(InterruptibleIterator.scala:28)
	at org.apache.spark.rdd.RDD.$anonfun$foreach$2(RDD.scala:986)
	at org.apache.spark.rdd.RDD.$anonfun$foreach$2$adapted(RDD.scala:986)
	at org.apache.spark.SparkContext.$anonfun$runJob$5(SparkContext.scala:2139)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
	at org.apache.spark.scheduler.Task.run(Task.scala:127)
	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:446)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1377)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:449)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)

I can open it in Fiji and look at the data with BigStitcher without an issue.
The xml is in:
/groups/mousebrainmicro/mousebrainmicro/data/Lightsheet/20210812_AG/ML_Rendering-test/aligned_data.xml
Any idea what to do?
Found this, might be related.

Thanks,
Boaz

@carshadi
Copy link

carshadi commented Apr 22, 2022

Hi @boazmohar and @StephanPreibisch , I'm getting the same error on a SLURM cluster running a standalone spark cluster.

java info:

openjdk version "1.8.0_332"
OpenJDK Runtime Environment (Zulu 8.62.0.19-CA-linux64) (build 1.8.0_332-b09)
OpenJDK 64-Bit Server VM (Zulu 8.62.0.19-CA-linux64) (build 25.332-b09, mixed mode)

mvn:

Maven home: /home/cameron.arshadi/opt/apache-maven-3.8.5
Java version: 1.8.0_332, vendor: Azul Systems, Inc., runtime: /allen/scratch/aindtemp/cameron.arshadi/tools/jvm/zulu8.62.0.19-ca-jdk8.0.332-linux_x64/jre
Default locale: en_US, platform encoding: UTF-8
OS name: "linux", version: "3.10.0-1160.15.2.el7.x86_64", arch: "amd64", family: "unix"

submit command:

spark-submit --master ${MASTER_URL} \
             --total-executor-cores $((SLURM_NTASKS * SLURM_CPUS_PER_TASK)) \
             --class net.preibisch.bigstitcher.spark.AffineFusion \
             --deploy-mode client \
             --verbose \
             --conf spark.executor.instances=${SLURM_NTASKS_PER_NODE} \
             --conf spark.executor.cores=${SLURM_CPUS_PER_TASK} \
             --conf spark.executor.memory=${SPARK_MEM} \
             --conf spark.default.parallelism=${PARALLELISM} \
             /allen/scratch/aindtemp/cameron.arshadi/tools/jars/BigStitcher-Spark-0.0.2-SNAPSHOT.jar \
             -x "/allen/scratch/aindtemp/data/anatomy/exm-hemi-brain/aligned_data.xml" \
             -o "/allen/scratch/aindtemp/data/anatomy/exm-hemi-brain-fused.n5" \
             -d "/ch0/s0" \
             --blockSize "256,256,256" \
             --preserveAnisotropy \
             --UINT16 \
             --minIntensity 0.0 \
             --maxIntensity 65535.0 \
             --channelId 0

Using spark-3.2.1

This didn't happen when running locally with --master local[32]

@StephanPreibisch
Copy link
Contributor

@trautmane, do you have some time to look at that?

@StephanPreibisch
Copy link
Contributor

@mkitti - that sounds familiar, did we discuss this? The problem that HDF5 creates a local tmp directory?

@mkitti
Copy link
Contributor

mkitti commented Apr 26, 2022

I will check the pom this afternoon.

@StephanPreibisch
Copy link
Contributor

thanks @mkitti!

@mkitti
Copy link
Contributor

mkitti commented Apr 26, 2022

ch.systemsx.cisd.hdf5.CharacterEncoding definitely does exist:
https://sissource.ethz.ch/sispub/jhdf5/-/blob/master/source/java/ch/systemsx/cisd/hdf5/CharacterEncoding.java

@mkitti
Copy link
Contributor

mkitti commented Apr 26, 2022

The reported line number is slightly off. CharacterEncoding should be on line 141
https://sissource.ethz.ch/sispub/jhdf5/-/blob/master/source/java/ch/systemsx/cisd/hdf5/HDF5BaseReader.java#L141

@mkitti
Copy link
Contributor

mkitti commented Apr 26, 2022

We may need to take a close look at your classpaths. Also either of you are running on Debian or Ubuntu? Is it possible that you have on old version of the libsis-jhdf5-java Debian installed and present on your default classpath?

@mkitti
Copy link
Contributor

mkitti commented Apr 26, 2022

The current pom actually imports jhdf5 14.12.6. The above source links are for 19.04.

@mkitti
Copy link
Contributor

mkitti commented Apr 26, 2022

Line 143 lines up with older jhdf5 source at https://svnsis.ethz.ch/repos/cisd/jhdf5/trunk/source/java/ch/systemsx/cisd/hdf5/HDF5BaseReader.java
https://svnsis.ethz.ch/repos/cisd/jhdf5/tags/release/14.12.x/14.12.6/jhdf5/source/java/ch/systemsx/cisd/hdf5/HDF5BaseReader.java

        this.encodingForNewDataSets =
                useUTF8CharEncoding ? CharacterEncoding.UTF8 : CharacterEncoding.ASCII;

@carshadi
Copy link

Hi @mkitti ,

echo $CLASSPATH returns an empty string for me.

cat /etc/os-release

NAME="CentOS Linux"
VERSION="7 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"
PRETTY_NAME="CentOS Linux 7 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:7"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"

CENTOS_MANTISBT_PROJECT="CentOS-7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="7"

ldconfig -p | grep libsis-jhdf5-java returns nothing on the cluster login node

@boazmohar
Copy link
Author

boazmohar commented Apr 26, 2022

This is @mkitti with @boazmohar:
The problem is, as @trautmane found before, [lib]jhdf5.so getting extracted to a common temporary directory when parallel jobs are run. Multiple workers may try to extract the native shared library to a common directory, creating a problem.

Per https://unlimited.ethz.ch/display/JHDF/JHDF5+FAQ#JHDF5FAQ-Whataretheoptionstoprovidethenativelibraries? we can provide a JVM option to point Java to a pre-extracted location of the file.

In @boazmohar's case, we prepended SUBMIT_ARGS="--conf spark.executor.extraJavaOptions=-Dnative.libpath.jhdf5=/groups/spruston/home/moharb/libjhdf5.so", which fixes this issue.

We extracted libjhdf5.so from native\jhdf5\amd64-Linux inside the jhdf5 JAR file which you can open up as a zip file.

@carshadi
Copy link

Confirming the above also works on my end

--conf "spark.executor.extraJavaOptions=-Dnative.libpath.jhdf5=/allen/scratch/aindtemp/cameron.arshadi/tools/lib/libjhdf5.so" 

@mkitti
Copy link
Contributor

mkitti commented May 23, 2022

It may be useful to considering using native.caching.libpath here. If the jhdf5 library does not exist, then this will extract it to the specified path. If it does exist, it will check the version and refresh it if needed. The currently extracted version is correct, it will just use that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants