Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

segmentation fault after ctg_graph was done #153

Closed
HippoYI opened this issue Sep 5, 2022 · 10 comments
Closed

segmentation fault after ctg_graph was done #153

HippoYI opened this issue Sep 5, 2022 · 10 comments

Comments

@HippoYI
Copy link

HippoYI commented Sep 5, 2022

Describe the bug
I am running an assembly of about 300M genome(0.6% het rate) using a 512GB machine. The Ultralong reads is about 27X.

Error message
The program run well and get nd.asm.p.fasta after runing ctg_graph, but then the program stopped and reported segmentation fault (core dumped). This meant that the program failed to run "02.ctg_align" and "03.ctg_cns". I have tried many parameters in run.cfg and even change to a machine wit 2TB memory, but the error still occurred at the same point.

Input data
Total base count=8358015912bp, sequencing depth=27X, average/N50 read length=100709

Config file
[General]
job_type = local
job_prefix = nextDenovo
task = all
rewrite = yes
deltmp = yes
parallel_jobs = 2
input_type = raw
read_type = ont
input_fofn = input.fofn
workdir = 01_rundir

[correct_option]
read_cutoff = 1k
genome_size = 300m
sort_options = -m 40g -t 5
minimap2_options_raw = -t 5
pa_correction = 5
correction_options = -p 4

[assemble_option]
minimap2_options_cns = -t 5
nextgraph_options = -a 1 -q 10

Operating system
CentOS Linux release 7.9.2009

GCC

Python
Python 2.7.5 and Python 3.6.2

NextDenovo
2.5.0

As the FAQ mentioned that nd.asm.p.fasta contains more structural & base errors than nd.asm.fasta, so I really want to solve this. Any ideas or suggestions on how to fix this problem?

Thank you!

@moold
Copy link
Member

moold commented Sep 5, 2022

Could you share the failed subtask log here?

@HippoYI
Copy link
Author

HippoYI commented Sep 5, 2022

I posted the running log and the **.e file in "ctg_graph1" directory which point to the last and the failed subtask. I am not sure that's what you need. If not, please let me know.
nextDenovo.sh.e.txt
pid6864.log.txt

@moold
Copy link
Member

moold commented Sep 5, 2022

See the instructions below:
Error message
Paste the complete log message, include the main task log and failed subtask log.
The main task log is usually located in your working directory and is named pidXXX.log.info and the main task log will tell you the failed subtask log in the last few lines, such as:

[ERROR] 2020-07-01 11:06:57,184 cns_align failed: please check the following logs:
[ERROR] 2020-07-01 11:06:57,185 ~/NextDenovo/test_data/01_rundir/02.cns_align/02.cns_align.sh.work/cns_align0/nextDenovo.sh.e

@HippoYI
Copy link
Author

HippoYI commented Sep 7, 2022

As I didn't save the running situation at the screen last time, I rerun the program in the last 2 days. As you can see in the "snapshot.jpg", the subtask did not give any error message, just "Segmentation fault (core dumped)" after ctg_graph was done.

snapshot

@moold
Copy link
Member

moold commented Sep 7, 2022

Hi,
Acutally, you don't have to rerun the whole process, just see here to continue running unfinished tasks.

For the segmentation falut, I guess this is caused by the calgs function in the file lib/kit.py, so you can replace this function with the following python code:

def calgs(infile):
	from Bio import SeqIO
	gs = 0
	for seq_record in SeqIO.parse(infile, "fasta"):
		gs += len(seq_record.seq)
	return gs

@HippoYI
Copy link
Author

HippoYI commented Sep 7, 2022

Hi, I replaced the calgs function in kit.py, and got these info:

[56473 INFO] 2022-09-07 15:27:58 skip step: db_split
[56473 INFO] 2022-09-07 15:27:58 skip step: raw_align
[56473 INFO] 2022-09-07 15:27:58 skip step: sort_align
[56473 INFO] 2022-09-07 15:27:58 skip step: seed_cns
[56473 INFO] 2022-09-07 15:27:58 seed_cns finished, and final corrected reads file:
[56473 INFO] 2022-09-07 15:27:58 /data/yixin/projects/JH_genome_analysis/New_genome_assembly_related/NextD-assembly/./01_rundir/02.cns_align/01.seed_cns.sh.work/seed_cns*/cns.fasta
[56473 INFO] 2022-09-07 15:27:58 skip step: cns_align
[56473 INFO] 2022-09-07 15:27:58 skip step: ctg_graph
Segmentation fault (core dumped)

@moold
Copy link
Member

moold commented Sep 7, 2022

oo, so, Next, try to change this line total_seed_len = cal_total_seed_len(get_seed_files(idx=True)) in file nextDenovo to total_seed_len =1000 and this line minlen = cal_minlen_from_idx(part_idx_files, len(part_idx_files), gs * mindepth - total_seed_len) in file nextDenovo to minlen = 2000

@HippoYI
Copy link
Author

HippoYI commented Sep 7, 2022

wow, great! ... It worked after changing those two lines, and now I can finally get the "nd.asm.fasta". I am just curious about the changes, will it affect the final contigs corrections when the total seed length was fixed to 1000?

@moold
Copy link
Member

moold commented Sep 8, 2022

For your data, it should not.

@moold moold closed this as completed Sep 8, 2022
@HippoYI
Copy link
Author

HippoYI commented Sep 8, 2022

Thanks so much. I really appreciate your help in resolving this !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants