Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

read_type: hifi; File ".../nextDenovo", line 856, in <module> main(args) ... IndexError: list index out of range #119

Closed
Ural-Yunusbaev opened this issue Jul 5, 2021 · 6 comments

Comments

@Ural-Yunusbaev
Copy link

Describe the bug
I run on nextDenovo in SLURM in 1 node using 60 cores & 70G RAM
sbatch --nodes=1 --ntasks=1 --cpus-per-task=60 --mem=70G ./_NextDenovo2.4_slurm.sh

When I Run nextDenovo using test.ecoli.HiFi.fastq with
read_type: hifi
it reports: File ".../nextDenovo", line 856, in main(args) ... IndexError: list index out of range
Meantime when I run other types of reads of the same organism with
read_type: clr OR ont
it goes smoothly

I tried bac, insect, plant genomes with HiFi reads and had the same error.
Meantime when I run the same organisms with CLR or ONT reads it goes smoothly.

Error message
log message

nextDenovo /scratch/ural/ecolHiFi/run.cfg
[INFO] 2021-07-04 17:41:09,106 NextDenovo start...
[INFO] 2021-07-04 17:41:09,239 version:v2.4.0 logfile:pid99685.log.info
[WARNING] 2021-07-04 17:41:09,240 Re-write workdir
[INFO] 2021-07-04 17:41:09,242 mkdir: /scratch/ural/ecolHiFi/01_rundir
[INFO] 2021-07-04 17:41:09,243 mkdir: /scratch/ural/ecolHiFi/01_rundir/01.raw_align
[INFO] 2021-07-04 17:41:09,244 mkdir: /scratch/ural/ecolHiFi/01_rundir/02.cns_align
[INFO] 2021-07-04 17:41:09,245 mkdir: /scratch/ural/ecolHiFi/01_rundir/03.ctg_graph
[INFO] 2021-07-04 17:41:14,259 Total jobs: 1
[INFO] 2021-07-04 17:41:14,260 Submit jobID:[99688] jobCmd:[/scratch/ural/ecolHiFi/01_rundir/01.raw_align/01.db_stat.sh.work/db_stat0/nextDenovo.sh] in the local_cycle.
[INFO] 2021-07-04 17:41:18,116 db_stat done
[INFO] 2021-07-04 17:41:18,119 updated options:
rerun:                        3
deltmp:                       1
rewrite:                      1
task:                         assemble
job_type:                     local
read_cutoff:                  1k
read_type:                    hifi
parallel_jobs:                2
seed_depth:                   40.0
pa_correction:                2
seed_cutfiles:                3
genome_size:                  4.8m
seed_cutoff:                  16242
input_type:                   corrected
blocksize:                    1797250571
job_prefix:                   nextDenovo
ctg_cns_options:              -sp -p 10
nextgraph_options:            -a 1 -R 0.7
minimap2_options_map:         -x asm20
sort_options:                 -m 40g -t 8 -k 40 -k 40
minimap2_options_raw:         -t 8 -x ava-hifi
workdir:                      /scratch/ural/ecolHiFi/01_rundir
input_fofn:                   /scratch/ural/ecolHiFi/input.fofn
correction_options:           -p 10 -max_lq_length 10000
raw_aligndir:                 /scratch/ural/ecolHiFi/01_rundir/01.raw_align
cns_aligndir:                 /scratch/ural/ecolHiFi/01_rundir/02.cns_align
ctg_graphdir:                 /scratch/ural/ecolHiFi/01_rundir/03.ctg_graph
minimap2_options_cns:         -t 60 -x ava-hifi --minide 0.1 --maxhan1 1000 -f 800
[INFO] 2021-07-04 17:41:18,119 summary of input data:
file:�[35m /scratch/ural/ecolHiFi/01_rundir/01.raw_align/input.reads.stat �[0m
[Read length stat]
Types            Count (#) Length (bp)
N10                   8025   16603
N20                  16625   15795
N30                  25586   15240
N40                  34844   14792
N50                  44359   14418
N60                  54112   14089
N70                  64079   13800
N80                  74245   13543
N90                  84605   13276

Types               Count (#)           Bases (bp)  Depth (X)
Raw                     95514           1389500381     289.48
Filtered                    0                    0       0.00
Clean                   95514           1389500381     289.48

*Suggested seed_cutoff (genome size: 4.80Mb, expected seed depth: 40, real seed depth: 40.00): 16242 bp
[INFO] 2021-07-04 17:41:23,130 Total jobs: 1
[INFO] 2021-07-04 17:41:23,131 Submit jobID:[99697] jobCmd:[/scratch/ural/ecolHiFi/01_rundir/02.cns_align/01.split_seed.sh.work/split_seed0/nextDenovo.sh] in the local_cycle.
[INFO] 2021-07-04 17:41:29,498 split_seed done
[INFO] 2021-07-04 17:41:29,510 Total jobs: 6
[INFO] 2021-07-04 17:41:29,511 Submit jobID:[99708] jobCmd:[/scratch/ural/ecolHiFi/01_rundir/02.cns_align/02.cns_align.sh.work/cns_align0/nextDenovo.sh] in the local_cycle.
[INFO] 2021-07-04 17:41:30,013 Submit jobID:[99713] jobCmd:[/scratch/ural/ecolHiFi/01_rundir/02.cns_align/02.cns_align.sh.work/cns_align1/nextDenovo.sh] in the local_cycle.
[INFO] 2021-07-04 17:41:33,431 Submit jobID:[99976] jobCmd:[/scratch/ural/ecolHiFi/01_rundir/02.cns_align/02.cns_align.sh.work/cns_align2/nextDenovo.sh] in the local_cycle.
[INFO] 2021-07-04 17:41:33,989 Submit jobID:[99984] jobCmd:[/scratch/ural/ecolHiFi/01_rundir/02.cns_align/02.cns_align.sh.work/cns_align3/nextDenovo.sh] in the local_cycle.
[INFO] 2021-07-04 17:41:37,261 Submit jobID:[100188] jobCmd:[/scratch/ural/ecolHiFi/01_rundir/02.cns_align/02.cns_align.sh.work/cns_align4/nextDenovo.sh] in the local_cycle.
[INFO] 2021-07-04 17:41:37,847 Submit jobID:[100254] jobCmd:[/scratch/ural/ecolHiFi/01_rundir/02.cns_align/02.cns_align.sh.work/cns_align5/nextDenovo.sh] in the local_cycle.
[INFO] 2021-07-04 17:41:41,608 cns_align done
[INFO] 2021-07-04 17:41:46,619 Total jobs: 1
[INFO] 2021-07-04 17:41:46,620 Submit jobID:[100515] jobCmd:[/scratch/ural/ecolHiFi/01_rundir/03.ctg_graph/01.ctg_graph.sh.work/ctg_graph0/nextDenovo.sh] in the local_cycle.
[INFO] 2021-07-04 17:41:47,627 ctg_graph done
[INFO] 2021-07-04 17:41:52,639 Total jobs: 3
[INFO] 2021-07-04 17:41:52,640 Submit jobID:[100548] jobCmd:[/scratch/ural/ecolHiFi/01_rundir/03.ctg_graph/02.ctg_align.sh.work/ctg_align0/nextDenovo.sh] in the local_cycle.
[INFO] 2021-07-04 17:41:53,141 Submit jobID:[100580] jobCmd:[/scratch/ural/ecolHiFi/01_rundir/03.ctg_graph/02.ctg_align.sh.work/ctg_align1/nextDenovo.sh] in the local_cycle.
[INFO] 2021-07-04 17:41:53,643 Submit jobID:[100612] jobCmd:[/scratch/ural/ecolHiFi/01_rundir/03.ctg_graph/02.ctg_align.sh.work/ctg_align2/nextDenovo.sh] in the local_cycle.
[INFO] 2021-07-04 17:41:54,651 ctg_align done
[INFO] 2021-07-04 17:41:59,666 Total jobs: 2
[INFO] 2021-07-04 17:41:59,667 Submit jobID:[100645] jobCmd:[/scratch/ural/ecolHiFi/01_rundir/03.ctg_graph/03.ctg_cns.sh.work/ctg_cns0/nextDenovo.sh] in the local_cycle.
[INFO] 2021-07-04 17:42:00,169 Submit jobID:[100664] jobCmd:[/scratch/ural/ecolHiFi/01_rundir/03.ctg_graph/03.ctg_cns.sh.work/ctg_cns1/nextDenovo.sh] in the local_cycle.
[INFO] 2021-07-04 17:42:01,177 ctg_cns done
[INFO] 2021-07-04 17:42:01,178 remove temporary result: /scratch/ural/ecolHiFi/01_rundir/03.ctg_graph/02.ctg_align.sh.work/ctg_align0/cns2.fasta.sort.bam
[INFO] 2021-07-04 17:42:01,180 remove temporary result: /scratch/ural/ecolHiFi/01_rundir/03.ctg_graph/02.ctg_align.sh.work/ctg_align1/cns0.fasta.sort.bam
[INFO] 2021-07-04 17:42:01,181 remove temporary result: /scratch/ural/ecolHiFi/01_rundir/03.ctg_graph/02.ctg_align.sh.work/ctg_align2/cns1.fasta.sort.bam
Traceback (most recent call last):
  File "/homes/ural/soft/NextDenovo2.4/NextDenovo/nextDenovo", line 856, in 
    main(args)
  File "/homes/ural/soft/NextDenovo2.4/NextDenovo/nextDenovo", line 827, in main
    asm, stat = gather_ctg_cns_output(cfg, task.subtasks, seq_info)
  File "/homes/ural/soft/NextDenovo2.4/NextDenovo/nextDenovo", line 291, in gather_ctg_cns_output
    out = cal_n50_info(stat, asm + '.stat')
  File "/homes/ural/soft/NextDenovo2.4/NextDenovo/lib/kit.py", line 171, in cal_n50_info
    out += "%-5s %18d%20s\n" % ("Min.", stat[-1], '-')
IndexError: list index out of range

Genome characteristics
ecoli 4.8m

Input data
ecoli 4.8m from https://sra-pub-src-1.s3.amazonaws.com/SRR10971019/m54316_180808_005743.fastq.1

Config file
[General]
job_type = local # local, slurm, sge, pbs, lsf
job_prefix = nextDenovo
task = assemble # all, correct, assemble
rewrite = yes # yes/no
deltmp = yes
parallel_jobs = 2 # number of tasks used to run in parallel
input_type = raw # raw, corrected
read_type = hifi # clr, ont, hifi
input_fofn = input.fofn
workdir = 01_rundir

[correct_option]
read_cutoff = 1k
genome_size = 4.8m # estimated genome size

[assemble_option]
minimap2_options_cns = -t 60
nextgraph_options = -a 1 # -q, min short branch len for output, 0=disable, set 5-16 to adjust the assembly size [0]
.

Operating system
Which operating system and version are you using?
You can use the command lsb_release -a to get it.
lsb_release -a
bash: lsb_release: command not found...

GCC
What version of GCC are you using?
You can use the command gcc -v to get it.
gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/libexec/gcc/x86_64-redhat-linux/4.8.5/lto-wrapper
Target: x86_64-redhat-linux
Configured with: ../configure --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --with-bugurl=http://bugzilla.redhat.com/bugzilla --enable-bootstrap --enable-shared --enable-threads=posix --enable-checking=release --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-gnu-unique-object --enable-linker-build-id --with-linker-hash-style=gnu --enable-languages=c,c++,objc,obj-c++,java,fortran,ada,go,lto --enable-plugin --enable-initfini-array --disable-libgcj --with-isl=/builddir/build/BUILD/gcc-4.8.5-20150702/obj-x86_64-redhat-linux/isl-install --with-cloog=/builddir/build/BUILD/gcc-4.8.5-20150702/obj-x86_64-redhat-linux/cloog-install --enable-gnu-indirect-function --with-tune=generic --with-arch_32=x86-64 --build=x86_64-redhat-linux
Thread model: posix
gcc version 4.8.5 20150623 (Red Hat 4.8.5-44) (GCC)

Python
What version of Python are you using?
You can use the command python --version to get it.
python3 --version
Python 3.6.8
head -1 /homes/ural/soft/NextDenovo2.4/NextDenovo/nextDenovo
#!/usr/bin/env python3

NextDenovo
What version of NextDenovo are you using?
You can use the command nextDenovo -v to get it.
nextDenovo v2.4.0

To Reproduce (Optional)
Steps to reproduce the behavior. Providing a minimal test dataset on which we can reproduce the behavior will generally lead to quicker turnaround time!

Additional context (Optional)
Add any other context about the problem here.

@moold
Copy link
Member

moold commented Jul 6, 2021

Hi, The current version has bug that can not parse fastq file correctly for HiFi or corrected data, so you can transform the fastq file to fasta file.

@Ural-Yunusbaev
Copy link
Author

Thanks!

@Tzu-Haw
Copy link

Tzu-Haw commented Jul 15, 2021

Hi, I've encountered similar issue.
nextDenovo worked with raw Nanopore reads, but when changing the input file into the corrected reads I got a similar error message. I've converted the fastq file to fasta file, but it still not worked.

Here is the error message:
File "/usr/local/bin/nextDenovo", line 856, in
main(args)
File "/usr/local/bin/nextDenovo", line 609, in main
reset_cfg(cfg)
File "/usr/local/bin/nextDenovo", line 530, in reset_cfg
tcfg.update(int(g.group(1)), int(g.group(3)), float(g.group(2)))
File "/mnt/data1/bioinfo/NextDenovo/lib/config_parser.py", line 36, in update
gs = parse_num_unit(self.cfg['genome_size'])
File "/mnt/data1/bioinfo/NextDenovo/lib/kit.py", line 120, in parse_num_unit
value = float(contents[0][:-2])
ValueError: could not convert string to float: 'au'

@moold
Copy link
Member

moold commented Jul 16, 2021

It seems the genome_size you set is not correct, so could you paste your config file to here?

@Tzu-Haw
Copy link

Tzu-Haw commented Jul 16, 2021

Hi,
You are right! I accidentally set my genome_size to auto, and it's fixed after changing to the estimated genome size.
Thanks for your help!

Best,
Tzu-Haw

@DaniPaulo
Copy link

Hi @Ural-Yunusbaev,
I'm still trying to figure out how to run NextDenovo in a HPC environment using SLURM.
Would you be able to share your NextDenovo2.4_slurm.sh and run.cfg with me?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants