I successfully installed cactus in our cluster and tested the general pipeline with data from https://github.com/ComparativeGenomicsToolkit/cactus/tree/master/examples, but not the pangenome pipeline yet. Recently, I run cactus pangenome pipeline on my own 23 softmasked 10x assembly genomes from same species, with one specific chromosome-level reference genome in size of 1GB.
#(1) Constructing the Minigraph GFA
cactus-minigraph ./test sample.cactus.seqfile sample.cactus.gfa.gz --buildHal --realTimeLogging --reference RefGenome
#(2) Mapping the Genomes Back to the Minigraph
cactus-graphmap ./test sample.cactus.seqfile sample.cactus.gfa.gz sample.cactus.paf --realTimeLogging --reference RefGenome --outputFasta sample.cactus.gfa.fa.gz
#(3)Creating the Cactus Alignment
cactus-align ./test sample.cactus.seqfile sample.cactus.paf sample.cactus.hal --pangenome --pafInput --outVG --reference RefGenome --realTimeLogging
I successfully run and yield expected results from the first two steps but encountered errors as following when running the third step.
[2022-07-07T15:36:28-0700] [MainThread] [W] [toil.fileStores.abstractFileStore] LOG-TO-MASTER: Job used more disk than requested. For CWL, consider increasing the outdirMin requirement, otherwise, consider increasing the disk requirement. Job 'exportHal' kind-exportHal/instance-mezjujq8 v4 used 1495.33% disk (29.9 GiB [32111943680B] used, 2.0 GiB [2147483648B] requested).
Traceback (most recent call last):
File "/path/cactus/cactus_env/lib/python3.7/site-packages/toil/worker.py", line 405, in workerScript
job._runner(jobGraph=None, jobStore=jobStore, fileStore=fileStore, defer=defer)
File "/path/cactus/cactus_env/lib/python3.7/site-packages/toil/job.py", line 2399, in _runner
returnValues = self._run(jobGraph=None, fileStore=fileStore)
File "/path/cactus/cactus_env/lib/python3.7/site-packages/toil/job.py", line 2317, in _run
return self.run(fileStore)
File "/path/cactus/cactus_env/lib/python3.7/site-packages/toil/job.py", line 2540, in run
rValue = userFunction(*((self,) + tuple(self._args)), **self._kwargs)
File "/path/cactus/cactus_env/lib/python3.7/site-packages/cactus/progressive/cactus_progressive.py", line 333, in exportHal
cactus_call(parameters=["halAppendCactusSubtree"] + args)
File "/path/cactus/cactus_env/lib/python3.7/site-packages/cactus/shared/common.py", line 866, in cactus_call
raise RuntimeError("Command {} exited {}: {}".format(call, process.returncode, out))
RuntimeError: Command ['halAppendCactusSubtree', 'tmpiaodpy26.tmp', 'tmpn3kk183w.tmp', '(sample1:1.0,sample2:1.0,sample3:1.0,sample4:1.0,...(other19 samples)...,RefGenome:1.0,_MINIGRAPH_:1.0)Anc0;', 'tmp_alignment.hal', '--inMemory'] exited 1: stdout=None, stderr=Warning: --inMemory is obsolete, use --hdf5InMemory
Exception caught: error parsing sequence 0
[2022-07-07T15:36:28-0700] [MainThread] [E] [toil.worker] Exiting the worker because of a failed job on host cluster01
<=========
Do you happen to know why the error arose? What am I supposed to correct for the pipeline?
Y.L.
Hi there,
I successfully installed cactus in our cluster and tested the general pipeline with data from https://github.com/ComparativeGenomicsToolkit/cactus/tree/master/examples, but not the pangenome pipeline yet. Recently, I run cactus pangenome pipeline on my own 23 softmasked 10x assembly genomes from same species, with one specific chromosome-level reference genome in size of 1GB.
Here is the pipeline I used:
I successfully run and yield expected results from the first two steps but encountered errors as following when running the third step.
Do you happen to know why the error arose? What am I supposed to correct for the pipeline?
Thanks in advances!
Y.L.