-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
About ld refinement on putative somatic SNVs, spend a lot of time #18
Comments
Yes, the cellScan step usually take long time since we need to extract cell-level read information. Could you let me know how many cells you have included? You can select cells using the option |
@jinzhuangdou thank you for your reply! The sample I ran had 8609 cells. |
The for cell in cell_lst:
para = "merge" + ":" + cell + ":" + args.out + ":" + args.app_path
joblst.append(para)
with Pool(processes=args.nthreads) as pool:
result = pool.map(bamSplit, joblst) # <--- bamSplit scans the whole BAM file for each cell This means that if 8609 cells takes 30 hours, 2x8609 cells would take five days and 10x8609 cells would take four months. It could be rewritten to scan the BAM file just once, writing all the cell-specific BAM files in parallel and on the fly. That would likely reduce execution time from 30+ hours to a few minutes. It would make it possible to run Monopogen on much larger samples. |
Hello,
I use single cell sequencing data to run somatic SNV calling from scRNA-seq. It takes a lot of time when I run the second step (cellScan), about 30+ hours. Is there any way to improve the running speed?
The bam file used is about 8G, and the cpu has 32 cores.
Could you please provide some guidance on how to resolve this issue?
Thank you in advance for your assistance.
The text was updated successfully, but these errors were encountered: