Skip to content

Exception caught: error parsing sequence 0 #745

@Lin-Yuying

Description

@Lin-Yuying

Hi there,

I successfully installed cactus in our cluster and tested the general pipeline with data from https://github.com/ComparativeGenomicsToolkit/cactus/tree/master/examples, but not the pangenome pipeline yet. Recently, I run cactus pangenome pipeline on my own 23 softmasked 10x assembly genomes from same species, with one specific chromosome-level reference genome in size of 1GB.

Here is the pipeline I used:

#(1) Constructing the Minigraph GFA
cactus-minigraph ./test sample.cactus.seqfile sample.cactus.gfa.gz --buildHal --realTimeLogging --reference RefGenome
#(2) Mapping the Genomes Back to the Minigraph
cactus-graphmap ./test sample.cactus.seqfile sample.cactus.gfa.gz sample.cactus.paf --realTimeLogging --reference RefGenome --outputFasta sample.cactus.gfa.fa.gz
#(3)Creating the Cactus Alignment
cactus-align ./test sample.cactus.seqfile sample.cactus.paf sample.cactus.hal --pangenome --pafInput --outVG --reference RefGenome --realTimeLogging

I successfully run and yield expected results from the first two steps but encountered errors as following when running the third step.

[2022-07-07T15:36:28-0700] [MainThread] [W] [toil.fileStores.abstractFileStore] LOG-TO-MASTER: Job used more disk than requested. For CWL, consider increasing the outdirMin requirement, otherwise, consider increasing the disk requirement. Job 'exportHal' kind-exportHal/instance-mezjujq8 v4 used 1495.33% disk (29.9 GiB [32111943680B] used, 2.0 GiB [2147483648B] requested).
        Traceback (most recent call last):
          File "/path/cactus/cactus_env/lib/python3.7/site-packages/toil/worker.py", line 405, in workerScript
            job._runner(jobGraph=None, jobStore=jobStore, fileStore=fileStore, defer=defer)
          File "/path/cactus/cactus_env/lib/python3.7/site-packages/toil/job.py", line 2399, in _runner
            returnValues = self._run(jobGraph=None, fileStore=fileStore)
          File "/path/cactus/cactus_env/lib/python3.7/site-packages/toil/job.py", line 2317, in _run
            return self.run(fileStore)
          File "/path/cactus/cactus_env/lib/python3.7/site-packages/toil/job.py", line 2540, in run
            rValue = userFunction(*((self,) + tuple(self._args)), **self._kwargs)
          File "/path/cactus/cactus_env/lib/python3.7/site-packages/cactus/progressive/cactus_progressive.py", line 333, in exportHal
            cactus_call(parameters=["halAppendCactusSubtree"] + args)
          File "/path/cactus/cactus_env/lib/python3.7/site-packages/cactus/shared/common.py", line 866, in cactus_call
            raise RuntimeError("Command {} exited {}: {}".format(call, process.returncode, out))
        RuntimeError: Command ['halAppendCactusSubtree', 'tmpiaodpy26.tmp', 'tmpn3kk183w.tmp', '(sample1:1.0,sample2:1.0,sample3:1.0,sample4:1.0,...(other19 samples)...,RefGenome:1.0,_MINIGRAPH_:1.0)Anc0;', 'tmp_alignment.hal', '--inMemory'] exited 1: stdout=None, stderr=Warning: --inMemory is obsolete, use --hdf5InMemory
        Exception caught: error parsing sequence 0
        
        [2022-07-07T15:36:28-0700] [MainThread] [E] [toil.worker] Exiting the worker because of a failed job on host cluster01
<=========

Do you happen to know why the error arose? What am I supposed to correct for the pipeline?

Thanks in advances!

Y.L.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions