Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More details on w2rap-contigger step 5 needed - run with 100bp PE short reads (Illumina) #48

Open
wul88 opened this issue Sep 2, 2020 · 1 comment

Comments

@wul88
Copy link

wul88 commented Sep 2, 2020

Hi, dear develop team and fellow users,

I am trying to use w2rap-contigger to assembly a mammalian genome (~2.5GB in size) using 70X coverage paired-end 100bp short reads.
I ran this on PBS cluster, a node with 40 cores, 25GB/core. I have completed the first 4 steps (step by step). The step 5 stopped when running out of walltime of 48 hrs. The following is the command and output from the program:

w2rap-contigger -t 30 -m 1000 -r ../trimmed_reads/kimba_fastptrim_1.fq,../trimmed_reads/kimba_fastptrim_2.fq -o contig_dir -p Kimba --min_freq 10
-d 38 -K 72 --from_step 5 --to_step 5 --dump_all 1

Welcome to w2rap-contigger
Loading reads in fastb/qualp format...
DONE!
Reading large_K clean graph and paths...
DONE!
--== Step 5: Assembling gaps ==--
Mon Aug 31 15:12:52 2020: inverting paths
Mon Aug 31 15:39:34 2020: Finding unsatisfied path clusters
Mon Aug 31 17:03:23 2020: Merging 117884534 clusters
Wed Sep 02 14:05:41 2020: 6732306 non-inverted clusters

First of all, I'd like to know if the program is limited to 250 bp PE reads? Anyone run with shorter reads successfully?
Secondly, could anyone kindly help me with how many more tasks are involved in step 5? or what kind of walltime should I set?
Third, I checked the processes running on the node. The w2rap-contigger seems running on one thread only. The CPU is at most 100%. I wonder should I use fewer cores with longer walltime. Here is the info on processes:

Tasks: 1117 total, 3 running, 1114 sleeping, 0 stopped, 0 zombie
Cpu(s): 4.4%us, 0.3%sy, 0.0%ni, 95.3%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 1058717736k total, 644741916k used, 413975820k free, 1819956k buffers
Swap: 8388604k total, 24684k used, 8363920k free, 38329132k cached

Any help is appreciated, Thanks in advance!

Lan

@bjclavijo
Copy link
Member

Hi Lan,

Sorry about the delay, we've not been keeping a very keen eye on issues lately!

Yes w2rap-contigger will run OK with 100bp reads as long as you set large K to a value smaller than 100 (which you have done).
Step 5 gets on to use all processors just after the point you get to, so I would recommend running on 16 to 32 cpus if possible. What is a bit strange in your case is that you have WAY too many unsatisfied clusters. That points to the graph on the first steps being too disconnected. I notice you're using trimmed reads, you probably want to use reads without trimming, and also have a look at your k-mer spectrum to check what you're against (use KAT for that, or any other k-mer analysis).

Best,

bj

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants