Skip to content
This repository has been archived by the owner on Mar 16, 2022. It is now read-only.

too long time running run_filter_stage2 #372

Open
tangerzhang opened this issue May 23, 2016 · 6 comments
Open

too long time running run_filter_stage2 #372

tangerzhang opened this issue May 23, 2016 · 6 comments

Comments

@tangerzhang
Copy link

tangerzhang commented May 23, 2016

Hello,
I am working on a plant genome pacbio assembly and I got 52 X corrected reads.
When feeding these preads to FALCON assembly, it took me more than two days running run_filter_stage2 and has not finished right now.
I checked the las.fofn file, which contains 323036 lines. I assume that the long running time is caused by so many las files?
Is that normal? Any suggestions?
Thanks a lot!

###My configure file looks like:
[General]
input_fofn = preads.fofn
input_type = preads
length_cutoff = 10000
length_cutoff_pr = 9000 
sge_option_da = -pe orte 8 -q all.q
sge_option_la = -pe orte 8 -q all.q
sge_option_pda = -pe orte 8 -q all.q
sge_option_pla = -pe orte 8 -q all.q
sge_option_fc = -pe orte 8 -q all.q
sge_option_cns = -pe orte 8 -q all.q
pa_concurrent_jobs = 60
cns_concurrent_jobs = 60
ovlp_concurrent_jobs = 60
pa_HPCdaligner_option =  -v -dal4 -t16 -e.70 -l1000 -s1000  
ovlp_HPCdaligner_option = -l4800 -k18 -h480 -w8 -H15000 -M32
pa_DBsplit_option = -x200 -s50
ovlp_DBsplit_option = -x200 -s50
falcon_sense_option = --output_multi --min_idt 0.70 --min_cov 3  --max_n_read 200 --n_core 6 
overlap_filtering_setting = --max_diff 100 --max_cov 80 --min_cov 2 --bestn 10 --n_core 24
@pb-jchin
Copy link
Contributor

yes. you need -dal option on the ovlp_HPCdaligner_option parameters. You have way to many smaller las files for the filter to go through. The excessive shell processes probably is the culprit of the slowness. Try "-dal128" (in newer version "-B128") to reduce the final number of merged files in the final overlapping stage. I typically watch how many merge jobs will be there by examining the 1-preads_ovl/run_jobs.sh

@pb-jchin
Copy link
Contributor

Another note, if you have already get many many small las files, you could manually merge them and ask fc_ovlp_filter.py to take the merged las files as input. However, you have to make sure you don't redundant entries in the merged files.

@tangerzhang
Copy link
Author

Thanks Jason.
I have re-sumbited the job with -dal128.
I will see the results. That would take too long.

2016-05-24 10:41 GMT+08:00 Jason Chin notifications@github.com:

Another note, if you have already get many many small las files, you
could manually merge them and ask fc_ovlp_filter.py to take the merged las
files as input. However, you have to make sure you don't redundant entries
in the merged files.


You are receiving this because you authored the thread.
Reply to this email directly or view it on GitHub
#372 (comment)

@tangerzhang
Copy link
Author

tangerzhang commented May 26, 2016

Hi Jason,
I tried -B128 but still have the same problem.
I think it might be a bug after I updating the latest falcon release.
My previous run (successful case) in which I used falcon v0.4 generate a las.fofn file contain only preads.*.las. The context of las.fofn is attached below:

/home/zhangxt/project/AP85/1-preads_ovl/las_files/preads.62.las
/home/zhangxt/project/AP85/1-preads_ovl/las_files/preads.73.las
/home/zhangxt/project/AP85/1-preads_ovl/las_files/preads.104.las
/home/zhangxt/project/AP85/1-preads_ovl/las_files/preads.63.las
/home/zhangxt/project/AP85/1-preads_ovl/las_files/preads.132.las

However, the failure one (latest falcon release) generated a las.fofn file which contains all las file, including L1.*.las, L2.*.las and preads.*.las. Part of the file were attached below:

/home/zhangxt/project/LgSXasm/try_corOutCoverage80/falcon_t1/1-preads_ovl/m_00001/L1.1.114.las
/home/zhangxt/project/LgSXasm/try_corOutCoverage80/falcon_t1/1-preads_ovl/m_00001/L1.1.207.las
...

Is this a bug or anything I did wrong?
I can only use preads.*las right now but I would like to know what cause this problem. I could avoid this in the future.
Thanks!

@pb-jchin
Copy link
Contributor

yes. it is a bug. I submitted a PR already. see #367

@pb-cdunn
Copy link

Could you tell us what commit you are using? git rev-parse HEAD. Did you simply download the latest release. I am about to issue a new release with the fix.

The good news is that you will not need to re-run everything. After updating FALCON (the tip of master is fine), simply:

rm -rf 2-*/
rm -rf 1-*/

And restart. Stage-0 should be fine.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants