About ld refinement on putative somatic SNVs, spend a lot of time #18

monoplasty · 2023-09-20T09:29:07Z

Hello,
I use single cell sequencing data to run somatic SNV calling from scRNA-seq. It takes a lot of time when I run the second step (cellScan), about 30+ hours. Is there any way to improve the running speed?

The bam file used is about 8G, and the cpu has 32 cores.

Could you please provide some guidance on how to resolve this issue?
Thank you in advance for your assistance.

jinzhuangdou · 2023-09-20T14:48:47Z

Yes, the cellScan step usually take long time since we need to extract cell-level read information. Could you let me know how many cells you have included? You can select cells using the option --keep 0.8 (select cells with most variable reads) to reduce the computational burden.

monoplasty · 2023-09-22T03:12:28Z

@jinzhuangdou thank you for your reply! The sample I ran had 8609 cells. --keep uses the default value of 0.8 without modification.

slinnarsson · 2023-09-22T10:48:10Z

The cellScan step is written in such a way that execution time will be quadratic in the number of cells. It takes the list of cell barcodes, and for each barcode, scans the entire BAM file to find the reads from that cell.

for cell in cell_lst:
	para = "merge" + ":" + cell + ":" + args.out + ":" + args.app_path
	joblst.append(para)
with Pool(processes=args.nthreads) as pool:
	result = pool.map(bamSplit, joblst)  # <--- bamSplit scans the whole BAM file for each cell

This means that if 8609 cells takes 30 hours, 2x8609 cells would take five days and 10x8609 cells would take four months.

It could be rewritten to scan the BAM file just once, writing all the cell-specific BAM files in parallel and on the fly. That would likely reduce execution time from 30+ hours to a few minutes. It would make it possible to run Monopogen on much larger samples.

monoplasty changed the title ~~About ld refinement on putative somatic SNVs~~ About ld refinement on putative somatic SNVs, spend a lot of time Sep 20, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

About ld refinement on putative somatic SNVs, spend a lot of time #18

About ld refinement on putative somatic SNVs, spend a lot of time #18

monoplasty commented Sep 20, 2023

jinzhuangdou commented Sep 20, 2023

monoplasty commented Sep 22, 2023

slinnarsson commented Sep 22, 2023

About ld refinement on putative somatic SNVs, spend a lot of time #18

About ld refinement on putative somatic SNVs, spend a lot of time #18

Comments

monoplasty commented Sep 20, 2023

jinzhuangdou commented Sep 20, 2023

monoplasty commented Sep 22, 2023

slinnarsson commented Sep 22, 2023