Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error using read_type = hifi and fastq.gz file #123

Closed
davised opened this issue Aug 11, 2021 · 6 comments
Closed

Error using read_type = hifi and fastq.gz file #123

davised opened this issue Aug 11, 2021 · 6 comments

Comments

@davised
Copy link

davised commented Aug 11, 2021

Describe the bug

nd.asm.fasta file is empty, with the IndexError produced shown below.

Error message

The pid log did not provide any error message, but I got this on stderr because the files are empty:

Traceback (most recent call last):
  File "/local/cluster/bin/nextDenovo", line 856, in <module>
    main(args)
  File "/local/cluster/bin/nextDenovo", line 827, in main
    asm, stat = gather_ctg_cns_output(cfg, task.subtasks, seq_info)
  File "/local/cluster/bin/nextDenovo", line 291, in gather_ctg_cns_output
    out = cal_n50_info(stat, asm + '.stat')
  File "/local/cluster/NextDenovo-2.4.0/lib/kit.py", line 171, in cal_n50_info
    out += "%-5s %18d%20s\n" % ("Min.", stat[-1], '-')
IndexError: list index out of range

Config file

$ cat run.cfg
[General]
job_type = local
job_prefix = nextDenovo
task = all # 'all', 'correct', 'assemble'
rewrite = yes # yes/no
deltmp = no
rerun = 3
parallel_jobs = 20
input_type = corrected # raw, corrected
read_type = hifi # clr, ont, hifi
input_fofn = ./input.fofn
workdir = ./01_rundir
# usetempdir = /data/davised/nextDenovo

[correct_option]
read_cutoff = 1k
genome_size = 100m # estimated genome size
pa_correction = 3
sort_options = -m 20g -t 20
minimap2_options_raw =  -t 8
correction_options = -p 15

[assemble_option]
minimap2_options_cns = -t 8 -k17 -w17
nextgraph_options = -a 1

Input fofn

$ cat input.fofn
m64047_210502_022250.ccs.fastq.gz

Operating system

$ lsb_release -a
LSB Version:	:core-4.1-amd64:core-4.1-noarch:cxx-4.1-amd64:cxx-4.1-noarch:desktop-4.1-amd64:desktop-4.1-noarch:languages-4.1-amd64:languages-4.1-noarch:printing-4.1-amd64:printing-4.1-noarch
Distributor ID:	CentOS
Description:	CentOS Linux release 7.2.1511 (Core)
Release:	7.2.1511
Codename:	Core

GCC

$ gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/local/cluster/centos/devtoolset-7/root/usr/bin/../libexec/gcc/x86_64-redhat-linux/7/lto-wrapper
Target: x86_64-redhat-linux
Configured with: ../configure --enable-bootstrap --enable-languages=c,c++,fortran,lto --prefix=/opt/rh/devtoolset-7/root/usr --mandir=/opt/rh/devtoolset-7/root/usr/share/man --infodir=/opt/rh/devtoolset-7/root/usr/share/info --with-bugurl=http://bugzilla.redhat.com/bugzilla --enable-shared --enable-threads=posix --enable-checking=release --enable-multilib --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-gnu-unique-object --enable-linker-build-id --with-gcc-major-version-only --enable-plugin --with-linker-hash-style=gnu --enable-initfini-array --with-default-libstdcxx-abi=gcc4-compatible --with-isl=/builddir/build/BUILD/gcc-7.2.1-20170829/obj-x86_64-redhat-linux/isl-install --enable-libmpx --enable-gnu-indirect-function --with-tune=generic --with-arch_32=i686 --build=x86_64-redhat-linux
Thread model: posix
gcc version 7.2.1 20170829 (Red Hat 7.2.1-1) (GCC)

Python

$ python3 --version
Python 3.7.2

NextDenovo

$ nextDenovo -v
nextDenovo v2.4.0

To Reproduce (Optional)
Use a input fastq.gz hifi file and you will receive the same IndexError.

I think a check that the split fastas in the 01.split_seed.sh.work/split_seed0 folder are valid, and another check at the 02.cns_align.sh.work folder that the output files each have something in them after alignment (unless 0 size files might be expected in some cases) would help resolve this type of issue in the future.

Additional context (Optional)

output folder structure & sizes

01_rundir
├── [   6]  01.raw_align
│   ├── [ 234]  01.db_stat.sh
│   ├── [   0]  01.db_stat.sh.done
│   ├── [   3]  01.db_stat.sh.work
│   │   └── [   6]  db_stat0
│   │       ├── [ 504]  nextDenovo.sh
│   │       ├── [   0]  nextDenovo.sh.done
│   │       ├── [1013]  nextDenovo.sh.e
│   │       └── [  30]  nextDenovo.sh.o
│   └── [2.5K]  input.reads.stat
├── [   8]  02.cns_align
│   ├── [ 161]  01.split_seed.sh
│   ├── [   0]  01.split_seed.sh.done
│   ├── [   3]  01.split_seed.sh.work
│   │   └── [  18]  split_seed0
│   │       ├── [7.3G]  cns0.fasta
│   │       ├── [ 16K]  cns0.fasta.idx
│   │       ├── [7.0G]  cns1.fasta
│   │       ├── [ 16K]  cns1.fasta.idx
│   │       ├── [7.3G]  cns2.fasta
│   │       ├── [ 16K]  cns2.fasta.idx
│   │       ├── [7.3G]  cns3.fasta
│   │       ├── [ 16K]  cns3.fasta.idx
│   │       ├── [7.5G]  cns4.fasta
│   │       ├── [ 16K]  cns4.fasta.idx
│   │       ├── [7.1G]  cns5.fasta
│   │       ├── [ 16K]  cns5.fasta.idx
│   │       ├── [ 444]  nextDenovo.sh
│   │       ├── [   0]  nextDenovo.sh.done
│   │       ├── [1.2K]  nextDenovo.sh.e
│   │       └── [  30]  nextDenovo.sh.o
│   ├── [8.2K]  02.cns_align.sh
│   ├── [   0]  02.cns_align.sh.done
│   └── [  23]  02.cns_align.sh.work
│       ├── [   8]  cns_align00
│       │   ├── [   2]  cns.filt.dovt.ovl
│       │   ├── [   0]  cns.filt.dovt.ovl.bl
│       │   ├── [ 675]  nextDenovo.sh
│       │   ├── [   0]  nextDenovo.sh.done
│       │   ├── [7.2G]  nextDenovo.sh.e
│       │   └── [  30]  nextDenovo.sh.o
│       ├── [   8]  cns_align01
│       │   ├── [   2]  cns.filt.dovt.ovl
│       │   ├── [   0]  cns.filt.dovt.ovl.bl
│       │   ├── [ 686]  nextDenovo.sh
│       │   ├── [   0]  nextDenovo.sh.done
│       │   ├── [7.2G]  nextDenovo.sh.e
│       │   └── [  30]  nextDenovo.sh.o
│       ├── [   8]  cns_align02
│       │   ├── [   2]  cns.filt.dovt.ovl
│       │   ├── [   0]  cns.filt.dovt.ovl.bl
│       │   ├── [ 686]  nextDenovo.sh
│       │   ├── [   0]  nextDenovo.sh.done
│       │   ├── [7.2G]  nextDenovo.sh.e
│       │   └── [  30]  nextDenovo.sh.o
│       ├── [   8]  cns_align03
│       │   ├── [   2]  cns.filt.dovt.ovl
│       │   ├── [   0]  cns.filt.dovt.ovl.bl
│       │   ├── [ 686]  nextDenovo.sh
│       │   ├── [   0]  nextDenovo.sh.done
│       │   ├── [7.2G]  nextDenovo.sh.e
│       │   └── [  30]  nextDenovo.sh.o
│       ├── [   8]  cns_align04
│       │   ├── [   2]  cns.filt.dovt.ovl
│       │   ├── [   0]  cns.filt.dovt.ovl.bl
│       │   ├── [ 686]  nextDenovo.sh
│       │   ├── [   0]  nextDenovo.sh.done
│       │   ├── [7.2G]  nextDenovo.sh.e
│       │   └── [  30]  nextDenovo.sh.o
│       ├── [   8]  cns_align05
│       │   ├── [   2]  cns.filt.dovt.ovl
│       │   ├── [   0]  cns.filt.dovt.ovl.bl
│       │   ├── [ 686]  nextDenovo.sh
│       │   ├── [   0]  nextDenovo.sh.done
│       │   ├── [7.2G]  nextDenovo.sh.e
│       │   └── [  30]  nextDenovo.sh.o
│       ├── [   8]  cns_align06
│       │   ├── [   2]  cns.filt.dovt.ovl
│       │   ├── [   0]  cns.filt.dovt.ovl.bl
│       │   ├── [ 675]  nextDenovo.sh
│       │   ├── [   0]  nextDenovo.sh.done
│       │   ├── [7.2G]  nextDenovo.sh.e
│       │   └── [  30]  nextDenovo.sh.o
│       ├── [   8]  cns_align07
│       │   ├── [   2]  cns.filt.dovt.ovl
│       │   ├── [   0]  cns.filt.dovt.ovl.bl
│       │   ├── [ 686]  nextDenovo.sh
│       │   ├── [   0]  nextDenovo.sh.done
│       │   ├── [7.2G]  nextDenovo.sh.e
│       │   └── [  30]  nextDenovo.sh.o
│       ├── [   8]  cns_align08
│       │   ├── [   2]  cns.filt.dovt.ovl
│       │   ├── [   0]  cns.filt.dovt.ovl.bl
│       │   ├── [ 686]  nextDenovo.sh
│       │   ├── [   0]  nextDenovo.sh.done
│       │   ├── [7.2G]  nextDenovo.sh.e
│       │   └── [  30]  nextDenovo.sh.o
│       ├── [   8]  cns_align09
│       │   ├── [   2]  cns.filt.dovt.ovl
│       │   ├── [   0]  cns.filt.dovt.ovl.bl
│       │   ├── [ 686]  nextDenovo.sh
│       │   ├── [   0]  nextDenovo.sh.done
│       │   ├── [7.2G]  nextDenovo.sh.e
│       │   └── [  30]  nextDenovo.sh.o
│       ├── [   8]  cns_align10
│       │   ├── [   2]  cns.filt.dovt.ovl
│       │   ├── [   0]  cns.filt.dovt.ovl.bl
│       │   ├── [ 686]  nextDenovo.sh
│       │   ├── [   0]  nextDenovo.sh.done
│       │   ├── [7.2G]  nextDenovo.sh.e
│       │   └── [  30]  nextDenovo.sh.o
│       ├── [   8]  cns_align11
│       │   ├── [   2]  cns.filt.dovt.ovl
│       │   ├── [   0]  cns.filt.dovt.ovl.bl
│       │   ├── [ 675]  nextDenovo.sh
│       │   ├── [   0]  nextDenovo.sh.done
│       │   ├── [7.3G]  nextDenovo.sh.e
│       │   └── [  30]  nextDenovo.sh.o
│       ├── [   8]  cns_align12
│       │   ├── [   2]  cns.filt.dovt.ovl
│       │   ├── [   0]  cns.filt.dovt.ovl.bl
│       │   ├── [ 686]  nextDenovo.sh
│       │   ├── [   0]  nextDenovo.sh.done
│       │   ├── [7.3G]  nextDenovo.sh.e
│       │   └── [  30]  nextDenovo.sh.o
│       ├── [   8]  cns_align13
│       │   ├── [   2]  cns.filt.dovt.ovl
│       │   ├── [   0]  cns.filt.dovt.ovl.bl
│       │   ├── [ 686]  nextDenovo.sh
│       │   ├── [   0]  nextDenovo.sh.done
│       │   ├── [7.3G]  nextDenovo.sh.e
│       │   └── [  30]  nextDenovo.sh.o
│       ├── [   8]  cns_align14
│       │   ├── [   2]  cns.filt.dovt.ovl
│       │   ├── [   0]  cns.filt.dovt.ovl.bl
│       │   ├── [ 686]  nextDenovo.sh
│       │   ├── [   0]  nextDenovo.sh.done
│       │   ├── [7.3G]  nextDenovo.sh.e
│       │   └── [  30]  nextDenovo.sh.o
│       ├── [   8]  cns_align15
│       │   ├── [   2]  cns.filt.dovt.ovl
│       │   ├── [   0]  cns.filt.dovt.ovl.bl
│       │   ├── [ 675]  nextDenovo.sh
│       │   ├── [   0]  nextDenovo.sh.done
│       │   ├── [7.0G]  nextDenovo.sh.e
│       │   └── [  30]  nextDenovo.sh.o
│       ├── [   8]  cns_align16
│       │   ├── [   2]  cns.filt.dovt.ovl
│       │   ├── [   0]  cns.filt.dovt.ovl.bl
│       │   ├── [ 686]  nextDenovo.sh
│       │   ├── [   0]  nextDenovo.sh.done
│       │   ├── [7.0G]  nextDenovo.sh.e
│       │   └── [  30]  nextDenovo.sh.o
│       ├── [   8]  cns_align17
│       │   ├── [   2]  cns.filt.dovt.ovl
│       │   ├── [   0]  cns.filt.dovt.ovl.bl
│       │   ├── [ 686]  nextDenovo.sh
│       │   ├── [   0]  nextDenovo.sh.done
│       │   ├── [7.0G]  nextDenovo.sh.e
│       │   └── [  30]  nextDenovo.sh.o
│       ├── [   8]  cns_align18
│       │   ├── [   2]  cns.filt.dovt.ovl
│       │   ├── [   0]  cns.filt.dovt.ovl.bl
│       │   ├── [ 675]  nextDenovo.sh
│       │   ├── [   0]  nextDenovo.sh.done
│       │   ├── [7.5G]  nextDenovo.sh.e
│       │   └── [  30]  nextDenovo.sh.o
│       ├── [   8]  cns_align19
│       │   ├── [   2]  cns.filt.dovt.ovl
│       │   ├── [   0]  cns.filt.dovt.ovl.bl
│       │   ├── [ 686]  nextDenovo.sh
│       │   ├── [   0]  nextDenovo.sh.done
│       │   ├── [7.5G]  nextDenovo.sh.e
│       │   └── [  30]  nextDenovo.sh.o
│       └── [   8]  cns_align20
│           ├── [   2]  cns.filt.dovt.ovl
│           ├── [   0]  cns.filt.dovt.ovl.bl
│           ├── [ 675]  nextDenovo.sh
│           ├── [   0]  nextDenovo.sh.done
│           ├── [7.0G]  nextDenovo.sh.e
│           └── [  30]  nextDenovo.sh.o
└── [  15]  03.ctg_graph
    ├── [2.6K]  01.ctg_graph.input.ovls
    ├── [ 732]  01.ctg_graph.input.seqs
    ├── [ 287]  01.ctg_graph.sh
    ├── [   0]  01.ctg_graph.sh.done
    ├── [   3]  01.ctg_graph.sh.work
    │   └── [   8]  ctg_graph0
    │       ├── [   0]  nd.asm.p.fasta
    │       ├── [   1]  nd.asm.p.fasta.blc
    │       ├── [ 568]  nextDenovo.sh
    │       ├── [   0]  nextDenovo.sh.done
    │       ├── [1.9K]  nextDenovo.sh.e
    │       └── [  30]  nextDenovo.sh.o
    ├── [2.3K]  02.ctg_align.sh
    ├── [   0]  02.ctg_align.sh.done
    ├── [   8]  02.ctg_align.sh.work
    │   ├── [   8]  ctg_align0
    │   │   ├── [  92]  cns2.fasta.sort.bam
    │   │   ├── [  16]  cns2.fasta.sort.bam.bai
    │   │   ├── [ 680]  nextDenovo.sh
    │   │   ├── [   0]  nextDenovo.sh.done
    │   │   ├── [1.4K]  nextDenovo.sh.e
    │   │   └── [  30]  nextDenovo.sh.o
    │   ├── [   8]  ctg_align1
    │   │   ├── [  92]  cns3.fasta.sort.bam
    │   │   ├── [  16]  cns3.fasta.sort.bam.bai
    │   │   ├── [ 680]  nextDenovo.sh
    │   │   ├── [   0]  nextDenovo.sh.done
    │   │   ├── [1.4K]  nextDenovo.sh.e
    │   │   └── [  30]  nextDenovo.sh.o
    │   ├── [   8]  ctg_align2
    │   │   ├── [  92]  cns0.fasta.sort.bam
    │   │   ├── [  16]  cns0.fasta.sort.bam.bai
    │   │   ├── [ 680]  nextDenovo.sh
    │   │   ├── [   0]  nextDenovo.sh.done
    │   │   ├── [1.4K]  nextDenovo.sh.e
    │   │   └── [  30]  nextDenovo.sh.o
    │   ├── [   8]  ctg_align3
    │   │   ├── [  92]  cns1.fasta.sort.bam
    │   │   ├── [  16]  cns1.fasta.sort.bam.bai
    │   │   ├── [ 680]  nextDenovo.sh
    │   │   ├── [   0]  nextDenovo.sh.done
    │   │   ├── [1.4K]  nextDenovo.sh.e
    │   │   └── [  30]  nextDenovo.sh.o
    │   ├── [   8]  ctg_align4
    │   │   ├── [  92]  cns4.fasta.sort.bam
    │   │   ├── [  16]  cns4.fasta.sort.bam.bai
    │   │   ├── [ 680]  nextDenovo.sh
    │   │   ├── [   0]  nextDenovo.sh.done
    │   │   ├── [1.4K]  nextDenovo.sh.e
    │   │   └── [  30]  nextDenovo.sh.o
    │   └── [   8]  ctg_align5
    │       ├── [  92]  cns5.fasta.sort.bam
    │       ├── [  16]  cns5.fasta.sort.bam.bai
    │       ├── [ 680]  nextDenovo.sh
    │       ├── [   0]  nextDenovo.sh.done
    │       ├── [1.4K]  nextDenovo.sh.e
    │       └── [  30]  nextDenovo.sh.o
    ├── [ 774]  03.ctg_cns.input.bams
    ├── [1.4K]  03.ctg_cns.sh
    ├── [   0]  03.ctg_cns.sh.done
    ├── [   5]  03.ctg_cns.sh.work
    │   ├── [   7]  ctg_cns0
    │   │   ├── [   0]  nd.asm.f.part000.fasta
    │   │   ├── [ 759]  nextDenovo.sh
    │   │   ├── [   0]  nextDenovo.sh.done
    │   │   ├── [3.3K]  nextDenovo.sh.e
    │   │   └── [  30]  nextDenovo.sh.o
    │   ├── [   7]  ctg_cns1
    │   │   ├── [   0]  nd.asm.f.part001.fasta
    │   │   ├── [ 759]  nextDenovo.sh
    │   │   ├── [   0]  nextDenovo.sh.done
    │   │   ├── [3.3K]  nextDenovo.sh.e
    │   │   └── [  30]  nextDenovo.sh.o
    │   └── [   7]  ctg_cns2
    │       ├── [   0]  nd.asm.f.part002.fasta
    │       ├── [ 759]  nextDenovo.sh
    │       ├── [   0]  nextDenovo.sh.done
    │       ├── [3.3K]  nextDenovo.sh.e
    │       └── [  30]  nextDenovo.sh.o
    └── [   0]  nd.asm.fasta

You note that the cns_align subdirs in 02.cns_align.sh.work have large nextDenovo.sh.e files, and each is giving a warning about alignment length of 0.

I traced the error back to the 01.split_seed.sh command. The split_cns.py expects a fasta file so when a fastq file is provided, the output is a mess of invalid files. I can resolve the error by converting my fastq to fasta.

$ cat 01.split_seed.sh
/local/cluster/bin/python3 /local/cluster/NextDenovo-2.4.0/lib/split_cns.py  -f /nfs1/MICRO/Bartholomew_Lab/davised/pacbio/nextdenovo/./input.fofn -l 18407 -c 6

I'm not sure if adding fastq support into split_cns.py makes sense or if disallowing fastq as input to the hifi workflow makes sense.

Thanks for this software and I'm excited to compare the outputs to my other assemblies (based on what my colleagues have told me this should compare favorably).

@davised
Copy link
Author

davised commented Aug 11, 2021

While I was typing up this bug report the fasta version finished.

$ cat nd.asm.fasta.stat
Type           Length (bp)            Count (#)
N10             27062032                   1
N20             27062032                   1
N30             25992854                   2
N40             25992854                   2
N50             11566870                   3
N60             11193937                   4
N70              7931554                   5
N80              7209047                   6
N90              2551250                   8

Min.               24333                   -
Max.            27062032                   -
Ave.             1747497                   -
Total          108344837                  62

Very happy with these stats! I still need to make sure everything looks OK since I have imperfect inputs (not a mono culture) but I'm optimistic currently.

Cheers

@moold
Copy link
Member

moold commented Aug 12, 2021

This is a known bug, see #119 , you can transform the fastq file to fasta file to avoid this error. I will fix it in next release.

@davised
Copy link
Author

davised commented Aug 12, 2021

Thanks for taking the time to respond. I look forward to your next release.

@davised davised closed this as completed Aug 12, 2021
@Neato-Nick
Copy link

Can I use a fasta.gz or does it need to be uncompressed fasta?

@davised
Copy link
Author

davised commented Oct 1, 2021

fasta.gz should work fine. can always test it and then gunzip if it fails.

@Neato-Nick
Copy link

Confirmed that gzipped fasta does work fine

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants