could not find XmlIoBasicImgLoader implementation for format bdv.n5 #2

boazmohar · 2021-12-25T22:24:36Z

Tried running this on the Janelia cluster and got this error.

mpicbg.spim.data.SpimDataInstantiationException: could not find XmlIoBasicImgLoader implementation for format bdv.n5

I am using the latest spark-janelia from here

This is how I build the repo and ran it

  [login1 - moharb@e05u15]~>git clone https://github.com/PreibischLab/BigStitcher-Spark.git
Cloning into 'BigStitcher-Spark'...
remote: Enumerating objects: 181, done.
remote: Counting objects: 100% (181/181), done.
remote: Compressing objects: 100% (104/104), done.
remote: Total 181 (delta 69), reused 108 (delta 22), pack-reused 0
Receiving objects: 100% (181/181), 35.31 KiB | 2.35 MiB/s, done.
Resolving deltas: 100% (69/69), done.
[login1 - moharb@e05u15]~>cd BigStitcher-Spark/
[login1 - moharb@e05u15]~/BigStitcher-Spark>~/apache-maven-3.8.4/bin/mvn clean package
[INFO] Scanning for projects...
[INFO]
[INFO] ------------------< net.preibisch:BigStitcher-Spark >-------------------
[INFO] Building BigStitcher Spark 0.0.1-SNAPSHOT
[INFO] --------------------------------[ jar ]---------------------------------
[INFO]
[INFO] --- maven-clean-plugin:3.1.0:clean (default-clean) @ BigStitcher-Spark ---
[INFO]
[INFO] --- maven-enforcer-plugin:3.0.0-M3:enforce (enforce-rules) @ BigStitcher-Spark ---
[INFO] Adding ignore: module-info
[INFO] Adding ignore: META-INF/versions/*/module-info
[INFO] Adding ignore: com.esotericsoftware.kryo.*
[INFO] Adding ignore: com.esotericsoftware.minlog.*
[INFO] Adding ignore: com.esotericsoftware.reflectasm.*
[INFO] Adding ignore: com.google.inject.*
[INFO] Adding ignore: jnr.ffi.*
[INFO] Adding ignore: org.apache.hadoop.yarn.*.package-info
[INFO] Adding ignore: org.apache.spark.unused.UnusedStubClass
[INFO] Adding ignore: org.hibernate.stat.ConcurrentStatisticsImpl
[INFO] Adding ignore: org.jetbrains.kotlin.daemon.common.*
[INFO] Adding ignore: org.junit.runner.Runner
[INFO] Adding ignore: module-info
[INFO] Adding ignore: module-info
[INFO]
[INFO] --- build-helper-maven-plugin:3.0.0:regex-property (sanitize-version) @ BigStitcher-Spark ---
[INFO]
[INFO] --- build-helper-maven-plugin:3.0.0:regex-property (guess-package) @ BigStitcher-Spark ---
[INFO]
[INFO] --- buildnumber-maven-plugin:1.4:create (default) @ BigStitcher-Spark ---
[INFO] Executing: /bin/sh -c cd '/groups/spruston/home/moharb/BigStitcher-Spark' && 'git' 'rev-parse' '--verify' 'HEAD'
[INFO] Working directory: /groups/spruston/home/moharb/BigStitcher-Spark
[INFO] Storing buildNumber: e2b676364526588195f16931e998a7a756ca778b at timestamp: 1640470297294
[INFO] Storing buildScmBranch: main
[INFO]
[INFO] --- scijava-maven-plugin:2.0.0:set-rootdir (set-rootdir) @ BigStitcher-Spark ---
[INFO] Setting rootdir: /groups/spruston/home/moharb/BigStitcher-Spark
[INFO]
[INFO] --- jacoco-maven-plugin:0.8.6:prepare-agent (jacoco-initialize) @ BigStitcher-Spark ---
[WARNING] The artifact xml-apis:xml-apis:jar:2.0.2 has been relocated to xml-apis:xml-apis:jar:1.0.b2
[INFO] argLine set to -javaagent:/groups/spruston/home/moharb/.m2/repository/org/jacoco/org.jacoco.agent/0.8.6/org.jacoco.agent-0.8.6-runtime.jar=destfile=/groups/spruston/home/moharb/BigStitcher-Spark/target/jacoco.exec
[INFO]
[INFO] --- maven-resources-plugin:3.1.0:resources (default-resources) @ BigStitcher-Spark ---
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] skip non existing resourceDirectory /groups/spruston/home/moharb/BigStitcher-Spark/src/main/resources
[INFO]
[INFO] --- maven-compiler-plugin:3.8.1:compile (default-compile) @ BigStitcher-Spark ---
[INFO] Compiling 6 source files to /groups/spruston/home/moharb/BigStitcher-Spark/target/classes
[INFO]
[INFO] --- maven-resources-plugin:3.1.0:testResources (default-testResources) @ BigStitcher-Spark ---
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] skip non existing resourceDirectory /groups/spruston/home/moharb/BigStitcher-Spark/src/test/resources
[INFO]
[INFO] --- maven-compiler-plugin:3.8.1:testCompile (default-testCompile) @ BigStitcher-Spark ---
[INFO] No sources to compile
[INFO]
[INFO] --- maven-surefire-plugin:2.22.2:test (default-test) @ BigStitcher-Spark ---
[INFO] No tests to run.
[INFO]
[INFO] --- maven-jar-plugin:3.2.0:jar (default-jar) @ BigStitcher-Spark ---
[INFO] Building jar: /groups/spruston/home/moharb/BigStitcher-Spark/target/BigStitcher-Spark.jar
[INFO]
[INFO] >>> maven-source-plugin:3.2.1:jar (attach-sources-jar) > generate-sources @ BigStitcher-Spark >>>
[INFO]
[INFO] --- maven-enforcer-plugin:3.0.0-M3:enforce (enforce-rules) @ BigStitcher-Spark ---
[INFO] Adding ignore: module-info
[INFO] Adding ignore: META-INF/versions/*/module-info
[INFO] Adding ignore: com.esotericsoftware.kryo.*
[INFO] Adding ignore: com.esotericsoftware.minlog.*
[INFO] Adding ignore: com.esotericsoftware.reflectasm.*
[INFO] Adding ignore: com.google.inject.*
[INFO] Adding ignore: jnr.ffi.*
[INFO] Adding ignore: org.apache.hadoop.yarn.*.package-info
[INFO] Adding ignore: org.apache.spark.unused.UnusedStubClass
[INFO] Adding ignore: org.hibernate.stat.ConcurrentStatisticsImpl
[INFO] Adding ignore: org.jetbrains.kotlin.daemon.common.*
[INFO] Adding ignore: org.junit.runner.Runner
[INFO] Adding ignore: module-info
[INFO] Adding ignore: module-info
[INFO]
[INFO] --- build-helper-maven-plugin:3.0.0:regex-property (sanitize-version) @ BigStitcher-Spark ---
[INFO]
[INFO] --- build-helper-maven-plugin:3.0.0:regex-property (guess-package) @ BigStitcher-Spark ---
[INFO]
[INFO] --- buildnumber-maven-plugin:1.4:create (default) @ BigStitcher-Spark ---
[INFO]
[INFO] --- scijava-maven-plugin:2.0.0:set-rootdir (set-rootdir) @ BigStitcher-Spark ---
[INFO]
[INFO] --- jacoco-maven-plugin:0.8.6:prepare-agent (jacoco-initialize) @ BigStitcher-Spark ---
[INFO] argLine set to -javaagent:/groups/spruston/home/moharb/.m2/repository/org/jacoco/org.jacoco.agent/0.8.6/org.jacoco.agent-0.8.6-runtime.jar=destfile=/groups/spruston/home/moharb/BigStitcher-Spark/target/jacoco.exec
[INFO]
[INFO] <<< maven-source-plugin:3.2.1:jar (attach-sources-jar) < generate-sources @ BigStitcher-Spark <<<
[INFO]
[INFO]
[INFO] --- maven-source-plugin:3.2.1:jar (attach-sources-jar) @ BigStitcher-Spark ---
[INFO] Building jar: /groups/spruston/home/moharb/BigStitcher-Spark/target/BigStitcher-Spark-sources.jar
[INFO]
[INFO] --- jacoco-maven-plugin:0.8.6:report (jacoco-site) @ BigStitcher-Spark ---
[INFO] Skipping JaCoCo execution due to missing execution data file.
[INFO]
[INFO] --- maven-assembly-plugin:3.1.1:single (make-assembly) @ BigStitcher-Spark ---
[INFO] Building jar: /groups/spruston/home/moharb/BigStitcher-Spark/target/BigStitcher-Spark-jar-with-dependencies.jar
[INFO]
[INFO] --- maven-jar-plugin:3.2.0:test-jar (default) @ BigStitcher-Spark ---
[INFO] Skipping packaging of the test-jar
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time:  32.978 s
[INFO] Finished at: 2021-12-25T17:12:02-05:00
[INFO] ------------------------------------------------------------------------
[login1 - moharb@e05u15]~/BigStitcher-Spark>TERMINATE=1 RUNTIME=8:00 TMPDIR=~/tmp ~/spark-janelia/flintstone.sh 3 \
> ~/BigStitcher-Spark/target/BigStitcher-Spark-jar-with-dependencies.jar \
> net.preibisch.bigstitcher.spark.AffineFusion \
> -x '/nrs/svoboda/moharb/dataset.xml' \
> -o  '/nrs/svoboda/moharb/output3.n5' \
> -d '/GFP/s0' \
> --channelId 0 \
> --UINT8 \
> --minIntensity 1 \
> --maxIntensity 254

On e05u15.int.janelia.org with Python 3.6.10 :: Anaconda, Inc., running:

  /groups/spruston/home/moharb/spark-janelia/spark-janelia  --consolidate_logs --nnodes=3 --gb_per_slot=15 --driverslots=32 --worker_slots=32 --minworkers=1 --hard_runtime=8:00 --submitargs=" --verbose --conf spark.default.parallelism=270 --conf spark.executor.instances=6 --conf spark.executor.cores=5 --conf spark.executor.memory=75g --class net.preibisch.bigstitcher.spark.AffineFusion /groups/spruston/home/moharb/BigStitcher-Spark/target/BigStitcher-Spark-jar-with-dependencies.jar -x /nrs/svoboda/moharb/dataset.xml -o /nrs/svoboda/moharb/output3.n5 -d /GFP/s0 --channelId 0 --UINT8 --minIntensity 1 --maxIntensity 254" generate-and-launch-run


Created:
  /groups/spruston/home/moharb/.spark/20211225_171249/conf
  /groups/spruston/home/moharb/.spark/20211225_171249/logs
  /groups/spruston/home/moharb/.spark/20211225_171249/scripts

Running:
  /groups/spruston/home/moharb/.spark/20211225_171249/scripts/00-queue-lsf-jobs.sh

Sat Dec 25 17:12:49 EST 2021 [e05u15.int.janelia.org] submitting jobs to scheduler
This job will be billed to svoboda
Job <114095184> is submitted to default queue <local>.
This job will be billed to svoboda
Job <114095185> is submitted to default queue <short>.
This job will be billed to svoboda
Job <114095186> is submitted to default queue <local>.
This job will be billed to svoboda
Job <114095189> is submitted to default queue <local>.
This job will be billed to svoboda
Job <114095190> is submitted to default queue <short>.

Queued jobs are:
JOBID      USER    STAT  QUEUE      FROM_HOST   EXEC_HOST   JOB_NAME   SUBMIT_TIME
114095184  moharb  PEND  local      e05u15                  *171249_ma Dec 25 17:12
114095185  moharb  PEND  short      e05u15                  *171249_ur Dec 25 17:12
114095186  moharb  PEND  local      e05u15                  *249_wo[1] Dec 25 17:12
114095186  moharb  PEND  local      e05u15                  *249_wo[2] Dec 25 17:12
114095186  moharb  PEND  local      e05u15                  *249_wo[3] Dec 25 17:12
114095189  moharb  PEND  local      e05u15                  *171249_dr Dec 25 17:12
114095190  moharb  PEND  short      e05u15                  *171249_sd Dec 25 17:12


To get web user interface URL after master has started, run:
  grep "Bound MasterWebUI to" /groups/spruston/home/moharb/.spark/20211225_171249/logs/01-master.log

These are the driver logs with the error

Sat Dec 25 17:13:34 EST 2021 [h07u13] running /misc/local/spark-3.0.1/bin/spark-submit --deploy-mode client --master spark://10.36.107.39:7077  --verbose --conf spark.default.parallelism=270 --conf spark.executor.instances=6 --conf spark.executor.cores=5 --conf spark.executor.memory=75g --class net.preibisch.bigstitcher.spark.AffineFusion /groups/spruston/home/moharb/BigStitcher-Spark/target/BigStitcher-Spark-jar-with-dependencies.jar -x /nrs/svoboda/moharb/dataset.xml -o /nrs/svoboda/moharb/output3.n5 -d /GFP/s0 --channelId 0 --UINT8 --minIntensity 1 --maxIntensity 254
Using properties file: /groups/spruston/home/moharb/.spark/20211225_171249/conf/spark-defaults.conf
Adding default property: spark.storage.blockManagerHeartBeatMs=30000
Adding default property: spark.driver.maxResultSize=0
Adding default property: spark.kryoserializer.buffer.max=1024m
Adding default property: spark.rpc.askTimeout=300s
Adding default property: spark.driver.memory=479g
Adding default property: spark.submit.deployMode=cluster
Adding default property: spark.rpc.retry.wait=30s
Adding default property: spark.core.connection.ack.wait.timeout=600s
Parsed arguments:
  master                  spark://10.36.107.39:7077
  deployMode              client
  executorMemory          75g
  executorCores           5
  totalExecutorCores      null
  propertiesFile          /groups/spruston/home/moharb/.spark/20211225_171249/conf/spark-defaults.conf
  driverMemory            479g
  driverCores             null
  driverExtraClassPath    null
  driverExtraLibraryPath  null
  driverExtraJavaOptions  null
  supervise               false
  queue                   null
  numExecutors            6
  files                   null
  pyFiles                 null
  archives                null
  mainClass               net.preibisch.bigstitcher.spark.AffineFusion
  primaryResource         file:/groups/spruston/home/moharb/BigStitcher-Spark/target/BigStitcher-Spark-jar-with-dependencies.jar
  name                    net.preibisch.bigstitcher.spark.AffineFusion
  childArgs               [-x /nrs/svoboda/moharb/dataset.xml -o /nrs/svoboda/moharb/output3.n5 -d /GFP/s0 --channelId 0 --UINT8 --minIntensity 1 --maxIntensity 254]
  jars                    null
  packages                null
  packagesExclusions      null
  repositories            null
  verbose                 true

Spark properties used, including those specified through
 --conf and those from the properties file /groups/spruston/home/moharb/.spark/20211225_171249/conf/spark-defaults.conf:
  (spark.default.parallelism,270)
  (spark.driver.memory,479g)
  (spark.executor.instances,6)
  (spark.executor.memory,75g)
  (spark.rpc.askTimeout,300s)
  (spark.storage.blockManagerHeartBeatMs,30000)
  (spark.kryoserializer.buffer.max,1024m)
  (spark.submit.deployMode,cluster)
  (spark.core.connection.ack.wait.timeout,600s)
  (spark.driver.maxResultSize,0)
  (spark.rpc.retry.wait,30s)
  (spark.executor.cores,5)

    
2021-12-25 17:13:36,473 [main] WARN [NativeCodeLoader]: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Main class:
net.preibisch.bigstitcher.spark.AffineFusion
Arguments:
-x
/nrs/svoboda/moharb/dataset.xml
-o
/nrs/svoboda/moharb/output3.n5
-d
/GFP/s0
--channelId
0
--UINT8
--minIntensity
1
--maxIntensity
254
Spark config:
(spark.storage.blockManagerHeartBeatMs,30000)
(spark.driver.maxResultSize,0)
(spark.jars,file:/groups/spruston/home/moharb/BigStitcher-Spark/target/BigStitcher-Spark-jar-with-dependencies.jar)
(spark.kryoserializer.buffer.max,1024m)
(spark.app.name,net.preibisch.bigstitcher.spark.AffineFusion)
(spark.rpc.askTimeout,300s)
(spark.driver.memory,479g)
(spark.executor.instances,6)
(spark.submit.pyFiles,)
(spark.default.parallelism,270)
(spark.submit.deployMode,client)
(spark.master,spark://10.36.107.39:7077)
(spark.rpc.retry.wait,30s)
(spark.executor.memory,75g)
(spark.executor.cores,5)
(spark.core.connection.ack.wait.timeout,600s)
Classpath elements:
file:/groups/spruston/home/moharb/BigStitcher-Spark/target/BigStitcher-Spark-jar-with-dependencies.jar


[-x, /nrs/svoboda/moharb/dataset.xml, -o, /nrs/svoboda/moharb/output3.n5, -d, /GFP/s0, --channelId, 0, --UINT8, --minIntensity, 1, --maxIntensity, 254]
mpicbg.spim.data.SpimDataInstantiationException: could not find XmlIoBasicImgLoader implementation for format bdv.n5
	at mpicbg.spim.data.generic.sequence.ImgLoaders.createXmlIoForFormat(ImgLoaders.java:72)
	at mpicbg.spim.data.generic.sequence.XmlIoAbstractSequenceDescription.fromXml(XmlIoAbstractSequenceDescription.java:110)
	at mpicbg.spim.data.generic.XmlIoAbstractSpimData.fromXml(XmlIoAbstractSpimData.java:153)
	at net.preibisch.mvrecon.fiji.spimdata.XmlIoSpimData2.fromXml(XmlIoSpimData2.java:164)
	at net.preibisch.mvrecon.fiji.spimdata.XmlIoSpimData2.fromXml(XmlIoSpimData2.java:52)
	at mpicbg.spim.data.generic.XmlIoAbstractSpimData.load(XmlIoAbstractSpimData.java:95)
	at net.preibisch.bigstitcher.spark.AffineFusion.call(AffineFusion.java:94)
	at net.preibisch.bigstitcher.spark.AffineFusion.call(AffineFusion.java:39)
	at picocli.CommandLine.executeUserObject(CommandLine.java:1853)
	at picocli.CommandLine.access$1100(CommandLine.java:145)
	at picocli.CommandLine$RunLast.executeUserObjectOfLastSubcommandWithSameParent(CommandLine.java:2255)
	at picocli.CommandLine$RunLast.handle(CommandLine.java:2249)
	at picocli.CommandLine$RunLast.handle(CommandLine.java:2213)
	at picocli.CommandLine$AbstractParseResultHandler.execute(CommandLine.java:2080)
	at picocli.CommandLine.execute(CommandLine.java:1978)
	at net.preibisch.bigstitcher.spark.AffineFusion.main(AffineFusion.java:283)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
	at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:928)
	at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
	at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
	at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
	at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1007)
	at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1016)
	at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
2021-12-25 17:13:37,007 [Thread-1] INFO [ShutdownHookManager]: Shutdown hook called
2021-12-25 17:13:37,007 [Thread-1] INFO [ShutdownHookManager]: Deleting directory /tmp/spark-fcf876a8-e1e9-4de3-bffe-d05ee3fb0a4a

------------------------------------------------------------
Sender: LSF System <lsfadmin@h07u13>
Subject: Job 114095189: <spark_moharb_20211225_171249_dr> in cluster <Janelia> Exited

Job <spark_moharb_20211225_171249_dr> was submitted from host <e05u15> by user <moharb> in cluster <Janelia> at Sat Dec 25 17:12:49 2021
Job was executed on host(s) <32*h07u13>, in queue <local>, as user <moharb> in cluster <Janelia> at Sat Dec 25 17:13:31 2021
</groups/spruston/home/moharb> was used as the home directory.
</groups/spruston/home/moharb/BigStitcher-Spark> was used as the working directory.
Started at Sat Dec 25 17:13:31 2021
Terminated at Sat Dec 25 17:13:37 2021
Results reported at Sat Dec 25 17:13:37 2021

Your job looked like:

------------------------------------------------------------
# LSBATCH: User input
/groups/spruston/home/moharb/.spark/20211225_171249/scripts/04-launch-driver.sh
------------------------------------------------------------

Exited with exit code 1.

Resource usage summary:

    CPU time :                                   3.96 sec.
    Max Memory :                                 11 MB
    Average Memory :                             11.00 MB
    Total Requested Memory :                     491520.00 MB
    Delta Memory :                               491509.00 MB
    Max Swap :                                   -
    Max Processes :                              4
    Max Threads :                                5
    Run time :                                   6 sec.
    Turnaround time :                            48 sec.

The output (if any) is above this job summary.

Same happens for nonRigid, am I doing something wrong?

Thanks!
Boaz

The text was updated successfully, but these errors were encountered:

StephanPreibisch · 2021-12-26T13:32:10Z

Thanks @boazmohar ... @tpietzsch we have seen this behavior before, do you remember what the reason was? Thanks :)

StephanPreibisch · 2021-12-26T13:36:04Z

not your fault @boazmohar ... this is about the annotations that are used to find Imgloader implementations at runtime ...

StephanPreibisch · 2021-12-26T14:01:29Z

Hi @boazmohar, I found how we solved a the same problem before ... I updated the pom.xml

Can you please build with

mvn clean package -P fatjar

and use the BigStitcher-Spark-0.0.1-SNAPSHOT.jar that is created? It should work ...

boazmohar · 2021-12-26T15:09:08Z

@StephanPreibisch it changed location of the error, it is now on the workers running the task:

2021-12-26 10:06:29,223 [task-result-getter-2] ERROR [TaskSetManager]: Task 35 in stage 0.0 failed 4 times; aborting job
org.apache.spark.SparkException: Job aborted due to stage failure: Task 35 in stage 0.0 failed 4 times, most recent failure: Lost task 35.3 in stage 0.0 (TID 129, 10.36.107.23, executor 6): mpicbg.spim.data.SpimDataInstantiationException: could not find XmlIoBasicImgLoader implementation for format bdv.n5
	at mpicbg.spim.data.generic.sequence.ImgLoaders.createXmlIoForFormat(ImgLoaders.java:72)
	at mpicbg.spim.data.generic.sequence.XmlIoAbstractSequenceDescription.fromXml(XmlIoAbstractSequenceDescription.java:110)

boazmohar · 2021-12-26T15:23:58Z

My bad, there was an issue with access to my home folder, putting the .jar on nrs solved it!

StephanPreibisch · 2021-12-26T16:30:32Z

Cool, did you already try it on a bigger dataset?

boazmohar · 2021-12-26T17:12:49Z

It is failing again, same error.
I submitted a help desk ticket and asked Tim Wang to try running it.

StephanPreibisch · 2021-12-26T17:54:36Z

Which same error? I thought it worked? If the same error persists we need to solve it …

boazmohar · 2021-12-27T02:33:07Z

Well, it worked on the example dataset ONCE:

Saved, e.g. view with './n5-view -i /nrs/svoboda/moharb/output3.n5 -d /GFP/s0
done, took: 8220 ms.

Full log:
04-driver.log

But the worker stdout still has this error:
stdout.txt

Here is one where the dataset is bigger and it failed. maybe it is running out of memory and the error is unrelated??

stdout_failed.txt

StephanPreibisch · 2021-12-27T12:56:31Z

Hi, there is still a bug when creating the FATJAR ... I will need to examine what is going on

StephanPreibisch · 2021-12-27T15:01:40Z

Hi @boazmohar, can you please pull the newest version, call mvn clean package -P fatjar and try:

java -cp target/BigStitcher-Spark-0.0.1-SNAPSHOT.jar net.preibisch.bigstitcher.spark.TestDependencies -x '/Users/spreibi/Documents/Microscopy/Stitching/Truman/standard/dataset.xml'

It should load and display one of the stacks ... works for me, meaning it can find the N5 imgloader

StephanPreibisch · 2021-12-27T15:03:32Z

of course adjust the XML path and maybe try it locally on your machine ... and then on an interactive session on the cluster. If that all works, so should the Spark resaving (I think).

boazmohar · 2021-12-27T17:17:16Z

It is working locally and from nomechine on the cluster.
It is still not working on spark.
One thing I noticed is that it is launching the tasks before loading the jar (at least in the log file):

021-12-27 11:34:41,447 [dispatcher-Executor] INFO [ResourceUtils]: ==============================================================
2021-12-27 11:34:41,467 [dispatcher-Executor] INFO [CoarseGrainedExecutorBackend]: Successfully registered with driver
2021-12-27 11:34:41,469 [dispatcher-Executor] INFO [Executor]: Starting executor ID 0 on host 10.36.107.13
2021-12-27 11:34:41,557 [dispatcher-Executor] INFO [Utils]: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 17913.
2021-12-27 11:34:41,558 [dispatcher-Executor] INFO [NettyBlockTransferService]: Server created on 10.36.107.13:17913
2021-12-27 11:34:41,559 [dispatcher-Executor] INFO [BlockManager]: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
2021-12-27 11:34:41,565 [dispatcher-Executor] INFO [BlockManagerMaster]: Registering BlockManager BlockManagerId(0, 10.36.107.13, 17913, None)
2021-12-27 11:34:41,572 [dispatcher-Executor] INFO [BlockManagerMaster]: Registered BlockManager BlockManagerId(0, 10.36.107.13, 17913, None)
2021-12-27 11:34:41,573 [dispatcher-Executor] INFO [BlockManager]: Initialized BlockManager: BlockManagerId(0, 10.36.107.13, 17913, None)
2021-12-27 11:34:41,648 [dispatcher-Executor] INFO [CoarseGrainedExecutorBackend]: Got assigned task 20
2021-12-27 11:34:41,651 [dispatcher-Executor] INFO [CoarseGrainedExecutorBackend]: Got assigned task 21
2021-12-27 11:34:41,652 [dispatcher-Executor] INFO [CoarseGrainedExecutorBackend]: Got assigned task 22
2021-12-27 11:34:41,652 [dispatcher-Executor] INFO [CoarseGrainedExecutorBackend]: Got assigned task 23
2021-12-27 11:34:41,653 [dispatcher-Executor] INFO [CoarseGrainedExecutorBackend]: Got assigned task 24
2021-12-27 11:34:41,655 [Executor task launch worker for task 21] INFO [Executor]: Running task 21.0 in stage 0.0 (TID 21)
2021-12-27 11:34:41,655 [Executor task launch worker for task 22] INFO [Executor]: Running task 22.0 in stage 0.0 (TID 22)
2021-12-27 11:34:41,655 [Executor task launch worker for task 24] INFO [Executor]: Running task 24.0 in stage 0.0 (TID 24)
2021-12-27 11:34:41,655 [Executor task launch worker for task 23] INFO [Executor]: Running task 23.0 in stage 0.0 (TID 23)
2021-12-27 11:34:41,655 [Executor task launch worker for task 20] INFO [Executor]: Running task 20.0 in stage 0.0 (TID 20)
2021-12-27 11:34:41,675 [Executor task launch worker for task 23] INFO [Executor]: Fetching spark://h07u21:35021/jars/BigStitcher-Spark-0.0.1-SNAPSHOT.jar with timestamp 1640622879317
2021-12-27 11:34:41,698 [Executor task launch worker for task 23] INFO [TransportClientFactory]: Successfully created connection to h07u21/10.36.107.31:35021 after 1 ms (0 ms spent in bootstraps)
2021-12-27 11:34:41,699 [Executor task launch worker for task 23] INFO [Utils]: Fetching spark://h07u21:35021/jars/BigStitcher-Spark-0.0.1-SNAPSHOT.jar to /scratch/moharb/spark-114599123-1/spark-bb5a2dd5-fed1-4caa-83df-fff14cc90256/executor-160b31f0-6981-45c2-b399-7b8db7e3d19a/spark-8e944ff4-3dcf-44f5-84cf-c9773c878a38/fetchFileTemp2825303330217301914.tmp
2021-12-27 11:34:42,182 [Executor task launch worker for task 23] INFO [Utils]: Copying /scratch/moharb/spark-114599123-1/spark-bb5a2dd5-fed1-4caa-83df-fff14cc90256/executor-160b31f0-6981-45c2-b399-7b8db7e3d19a/spark-8e944ff4-3dcf-44f5-84cf-c9773c878a38/-3846531471640622879317_cache to /groups/spruston/home/moharb/.spark/20211227_113327/logs/worker-1-dir/app-20211227113439-0000/0/./BigStitcher-Spark-0.0.1-SNAPSHOT.jar
2021-12-27 11:34:42,597 [Executor task launch worker for task 23] INFO [Executor]: Adding file:/groups/spruston/home/moharb/.spark/20211227_113327/logs/worker-1-dir/app-20211227113439-0000/0/./BigStitcher-Spark-0.0.1-SNAPSHOT.jar to class loader
2021-12-27 11:34:42,773 [Executor task launch worker for task 23] INFO [TorrentBroadcast]: Started reading broadcast variable 0 with 1 pieces (estimated total size 4.0 MiB)
2021-12-27 11:34:42,806 [Executor task launch worker for task 23] INFO [TransportClientFactory]: Successfully created connection to h07u21/10.36.107.31:36127 after 1 ms (0 ms spent in bootstraps)
2021-12-27 11:34:42,825 [Executor task launch worker for task 23] INFO [MemoryStore]: Block broadcast_0_piece0 stored as bytes in memory (estimated size 2.3 KiB, free 39.8 GiB)
2021-12-27 11:34:42,831 [Executor task launch worker for task 23] INFO [TorrentBroadcast]: Reading broadcast variable 0 took 57 ms
2021-12-27 11:34:42,904 [Executor task launch worker for task 23] INFO [MemoryStore]: Block broadcast_0 stored as values in memory (estimated size 4.3 KiB, free 39.8 GiB)
2021-12-27 11:34:43,551 [Executor task launch worker for task 21] ERROR [Executor]: Exception in task 21.0 in stage 0.0 (TID 21)
mpicbg.spim.data.SpimDataInstantiationException: could not find XmlIoBasicImgLoader implementation for format bdv.n5
	at mpicbg.spim.data.generic.sequence.ImgLoaders.createXmlIoForFormat(ImgLoaders.java:72)
	at mpicbg.spim.data.generic.sequence.XmlIoAbstractSequenceDescription.fromXml(XmlIoAbstractSequenceDescription.java:110)
	at mpicbg.spim.data.generic.XmlIoAbstractSpimData.fromXml(XmlIoAbstractSpimData.java:153)
	at net.preibisch.mvrecon.fiji.spimdata.XmlIoSpimData2.fromXml(XmlIoSpimData2.java:164)
	at net.preibisch.mvrecon.fiji.spimdata.XmlIoSpimData2.fromXml(XmlIoSpimData2.java:52)

Maybe it is related to some race condition?
Maybe some of this discussion is relevant.

StephanPreibisch · 2021-12-27T19:26:59Z

@trautmane do you have any ideas? Thanks 😊

mzouink · 2021-12-27T19:36:55Z

I think it can be solved by adding this line :
ImgLoaders.registerManually(XmlIoN5ImageLoader.class); but it is a hack.
The XmlIoN5ImageLoader should be added automatically to the BDV loaders register.

StephanPreibisch · 2021-12-27T19:59:27Z

Could you give this a try @mzouink? That would be awesome ...

mzouink · 2021-12-27T20:28:08Z

done, @boazmohar, I added the code to a new branch fix_bdv_n5
can you plz try again with this branch code: https://github.com/PreibischLab/BigStitcher-Spark/tree/fix_bdv_n5
If it works then we can know the reason for the bug.

boazmohar · 2021-12-27T20:49:42Z

Sorry still happaning but not in all workers, some of them are working and I see the right output, some have an issue with loading the right class as before.

I think that for the smaller dataset it is enough retries to get thru.

mzouink · 2021-12-27T20:54:14Z

we are making progress. now the problem is inside the jobs.
can you plz pull and try again. I added a line. https://github.com/PreibischLab/BigStitcher-Spark/tree/fix_bdv_n5

boazmohar · 2021-12-27T21:09:30Z

Still failing in some

Also found a new error

Not sure if it a separate issue or not. This is a big dataset.

trautmane · 2021-12-28T15:59:21Z

Hi @StephanPreibisch and @boazmohar - I'm looking at all of this for the first time now under the assumption that the problem still exists. Please let me know if that assumption is incorrect.

boazmohar · 2021-12-28T16:02:03Z

Yes, still an issue, thanks for looking into it!

@boazmohar - I just left a question for you on slack ...

boazmohar · 2021-12-29T17:03:31Z

Can confirm this solved this issue!
THANKS @trautmane
Here is the result of the non-rigid on a large dataset

Still had some memory issues but i used 5 cores per task and N_TASKS_PER_EXECUTOR_CORE=1000
BTW it took 3.5h with 2011 cores but allocating only 202 tasks at a time.
Would be happy to hear if there is a more principled way to do this (tried 10, 30 100, 300, 1000 until it stopped failing more then 4 times per task).

StephanPreibisch · 2021-12-29T22:25:37Z

Awesome, thanks @trautmane! Could you please merge this into master?

The fix is a change to the spimdata repo and I don't have access to that, so I submitted this pull request. Someone with write access to spimdata ( @tpietzsch , @StephanPreibisch , or ??? ) needs to review and approve my changes. Once a new version of spimdata is deployed, we can update this repo to use it.

I still do not understand why it worked locally with Spark though, do you know?

I did not play with a local Spark version, so I can't definitively answer this - but my guess is that with the local Spark you are just more likely to get lucky and avoid concurrent access to ImgLoaders.build(). When I was testing/debugging, I needed to use multiple worker nodes to regularly encounter the issue. Some initial tests I ran with just one worker node on a small data set failed to reproduce the problem.

thanks again!!

StephanPreibisch · 2021-12-29T22:27:31Z

You should post the image and statistics on Twitter @boazmohar 😊 Thanks so much for b pushing this!!

Now only the cloud version is missing @mzouink 😊

StephanPreibisch · 2021-12-30T14:40:00Z

Thanks @trautmane, makes sense ... I didn't realize on my phone that it was in another repo, @tpietzsch has to approve it and cut a new release so we can distribute, I don't want to do it without him. It always creates this ripple-effect through many Fiji plugins ...

Yeah - that was a concern of mine too. @tpietzsch left a TODO comment in January 2021 about thread safety but didn't fix it then. Hopefully, he just didn't have time to fix it but it is also possible he ran into some other ripple-effect problem.

tpietzsch · 2022-01-04T16:12:17Z

Yeah - that was a concern of mine too. @tpietzsch left a TODO comment in January 2021 about thread safety but didn't fix it then. Hopefully, he just didn't have time to fix it but it is also possible he ran into some other ripple-effect problem.

No, no ripple-effect expected... I just didn't have time to think it through and fix it...

tpietzsch · 2022-01-06T10:33:16Z

This is now merged and released in spim_data-2.2.7

trautmane · 2022-01-13T15:07:40Z

This is fixed by 0110317 .

boazmohar closed this as completed Dec 26, 2021

StephanPreibisch reopened this Dec 26, 2021

trautmane mentioned this issue Dec 29, 2021

Fix img loaders for spark bigdataviewer/spimdata#8

Closed

trautmane mentioned this issue Dec 31, 2021

OutOfMemoryError caused by creation of too many N5ImageLoader fetcher threads #4

Closed

trautmane closed this as completed Jan 13, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

could not find XmlIoBasicImgLoader implementation for format bdv.n5 #2

could not find XmlIoBasicImgLoader implementation for format bdv.n5 #2

boazmohar commented Dec 25, 2021

StephanPreibisch commented Dec 26, 2021

StephanPreibisch commented Dec 26, 2021

StephanPreibisch commented Dec 26, 2021

boazmohar commented Dec 26, 2021

boazmohar commented Dec 26, 2021

StephanPreibisch commented Dec 26, 2021

boazmohar commented Dec 26, 2021

StephanPreibisch commented Dec 26, 2021

boazmohar commented Dec 27, 2021

StephanPreibisch commented Dec 27, 2021

StephanPreibisch commented Dec 27, 2021 •

edited

StephanPreibisch commented Dec 27, 2021

boazmohar commented Dec 27, 2021 •

edited

StephanPreibisch commented Dec 27, 2021

mzouink commented Dec 27, 2021

StephanPreibisch commented Dec 27, 2021

mzouink commented Dec 27, 2021

boazmohar commented Dec 27, 2021

mzouink commented Dec 27, 2021

boazmohar commented Dec 27, 2021 •

edited

trautmane commented Dec 28, 2021

boazmohar commented Dec 28, 2021 •

edited by trautmane

boazmohar commented Dec 29, 2021 •

edited

StephanPreibisch commented Dec 29, 2021 •

edited by trautmane

StephanPreibisch commented Dec 29, 2021

StephanPreibisch commented Dec 30, 2021 •

edited by trautmane

tpietzsch commented Jan 4, 2022

tpietzsch commented Jan 6, 2022

trautmane commented Jan 13, 2022

could not find XmlIoBasicImgLoader implementation for format bdv.n5 #2

could not find XmlIoBasicImgLoader implementation for format bdv.n5 #2

Comments

boazmohar commented Dec 25, 2021

StephanPreibisch commented Dec 26, 2021

StephanPreibisch commented Dec 26, 2021

StephanPreibisch commented Dec 26, 2021

boazmohar commented Dec 26, 2021

boazmohar commented Dec 26, 2021

StephanPreibisch commented Dec 26, 2021

boazmohar commented Dec 26, 2021

StephanPreibisch commented Dec 26, 2021

boazmohar commented Dec 27, 2021

StephanPreibisch commented Dec 27, 2021

StephanPreibisch commented Dec 27, 2021 • edited

StephanPreibisch commented Dec 27, 2021

boazmohar commented Dec 27, 2021 • edited

StephanPreibisch commented Dec 27, 2021

mzouink commented Dec 27, 2021

StephanPreibisch commented Dec 27, 2021

mzouink commented Dec 27, 2021

boazmohar commented Dec 27, 2021

mzouink commented Dec 27, 2021

boazmohar commented Dec 27, 2021 • edited

trautmane commented Dec 28, 2021

boazmohar commented Dec 28, 2021 • edited by trautmane

boazmohar commented Dec 29, 2021 • edited

StephanPreibisch commented Dec 29, 2021 • edited by trautmane

StephanPreibisch commented Dec 29, 2021

StephanPreibisch commented Dec 30, 2021 • edited by trautmane

tpietzsch commented Jan 4, 2022

tpietzsch commented Jan 6, 2022

trautmane commented Jan 13, 2022

StephanPreibisch commented Dec 27, 2021 •

edited

boazmohar commented Dec 27, 2021 •

edited

boazmohar commented Dec 27, 2021 •

edited

boazmohar commented Dec 28, 2021 •

edited by trautmane

boazmohar commented Dec 29, 2021 •

edited

StephanPreibisch commented Dec 29, 2021 •

edited by trautmane

StephanPreibisch commented Dec 30, 2021 •

edited by trautmane