Skip to content

0 Selected intervals and empty VCF output when running Cue on PacBio CLR #14

@LYC-vio

Description

@LYC-vio

Hi,

Thank you for developing this excellent tool. I've recently tried to use Cue to call SVs on a long-read data BAM file (NA24385_Pacbio_CLR_SRX7668835, aligned to hg19 using minimap2), but got an empty VCF output with no error reported.

In the logging info there were lines saying that no intervals where selected:

...
INFO:root:Number of bins: 108262
INFO:root:Selected 0 intervals
INFO:root:Selected 0 interval pairs out of 0 pairs
INFO:root:Processed 238694 reads
INFO:root:Generating SV predictions for chr22
INFO:root:Number of target interval pairs: 0
INFO:root:Selected 0 intervals
INFO:root:Selected 0 interval pairs out of 0 pairs
INFO:root:Processed 310571 reads
INFO:root:Generating SV predictions for chr20
INFO:root:Number of target interval pairs: 0
...

However I have no idea what might cause this issue.

I also noticed that you used Cue-long to run on the CLR data in your paper, did that refer to another version of Cue or there were additional settings required in the yaml configuration for long reads?

Thank you

Best,
Yichen

Here's the detailed configuration I used in my run:

*********************************
*  cue (v0.2.2): discovery mode *
*********************************
[INFO]  ========== Model config ==========
	model_path: Softwares/cue/data/models/cue.v2.pt
	gpu_ids: []
	n_jobs_per_gpu: 1
	n_cpus: 20
	report_interval: 100
	batch_size: 16
	logging_level: INFO
	signal_set: SV_SIGNAL_SET.SHORT
	class_set: SV_CLASS_SET.BASIC5ZYG
	num_keypoints: 1
	model_architecture: HG
	image_dim: 256
	sigma: 10
	stride: 4
	heatmap_peak_threshold: 0.4
	pretrained_refinenn_path: None
	config_file: call_model.yaml
	experiment_dir: Cue/NA24385_Pacbio_CLR_SRX7668835_cpus
	devices: [device(type='cpu'), device(type='cpu'), device(type='cpu'), device(type='cpu'), device(type='cpu'), device(type='cpu'), device(type='cpu'), device(type='cpu'), device(type='cpu'), device(type='cpu'), device(type='cpu'), device(type='cpu'), device(type='cpu'), device(type='cpu'), device(type='cpu'), device(type='cpu'), device(type='cpu'), device(type='cpu'), device(type='cpu'), device(type='cpu')]
	device: cpu
	log_dir: Cue/NA24385_Pacbio_CLR_SRX7668835_cpus/logs/
	report_dir: Cue/NA24385_Pacbio_CLR_SRX7668835_cpus/reports/
	log_file: Cue/NA24385_Pacbio_CLR_SRX7668835_cpus/logs/main.log
	classes: ['NEG', 'DEL-HOM', 'INV-HOM', 'DUP-HOM', 'DEL-HET', 'INV-HET', 'DUP-HET', 'IDUP-HOM', 'IDUP-HET']
	num_classes: 9
	n_signals: 6
[INFO] ========== Data config =========
	bam: NA24385_Pacbio_CLR_SRX7668835/minimap2_NA24385_Pacbio_CLR_SRX7668835.bam
	fai: refdata-hg19-2.1.0/fasta/genome.fa.fai
	chr_names: None
	logging_level: ERROR
	n_cpus: 1
	min_refine_buffer: 2000
	refine_buffer_frac_size: 5
	refine_pair_dist_frac_size: 2
	refine_bp_kernels: [0, 50, 500]
	refine_min_support: 2
	refine_disable: False
	min_pair_support: 2
	min_pair_distance: 4000
	max_pair_distance: 1000000
	scan_target_intervals: True
	stream: True
	view_mode: False
	store_img: False
	empty_annotation: False
	bins_per_block: 8000
	min_sv_len: 4000
	min_qual_score: 50
	bam_type: BAM_TYPE.SHORT
	signal_set: SV_SIGNAL_SET.SHORT
	signal_set_origin: SHORT
	bed: None
	blacklist_bed: None
	signal_vmax: {'RD': 600, 'RD_LOW': 800, 'RD_CLIPPED': 600, 'SM': 200, 'SR_RP': 600, 'LR': 600, 'LLRR': 100, 'RL': 100, 'LLRR_VS_LR': 1}
	signal_mapq: {'RD': 20, 'RD_LOW': 0, 'RD_CLIPPED': 20, 'SM': 20, 'SR_RP': 0, 'LR': 0, 'LLRR': 1, 'RL': 1, 'LLRR_VS_LR': 1}
	bin_size: 750
	interval_size: 150000
	step_size: 50000
	shift_size: None
	heatmap_dim: 1000
	image_dim: 256
	class_set: SV_CLASS_SET.BASIC5ZYG
	num_keypoints: 1
	bbox_padding: 0
	config_file: call_data.yaml
	dataset_dir: Cue/NA24385_Pacbio_CLR_SRX7668835_cpus
	info_dir: Cue/NA24385_Pacbio_CLR_SRX7668835_cpus/info/
	image_dir: Cue/NA24385_Pacbio_CLR_SRX7668835_cpus/images/
	annotation_dir: Cue/NA24385_Pacbio_CLR_SRX7668835_cpus/annotations/
	annotated_images_dir: Cue/NA24385_Pacbio_CLR_SRX7668835_cpus/annotated_images/
	classes: ['NEG', 'DEL-HOM', 'INV-HOM', 'DUP-HOM', 'DEL-HET', 'INV-HET', 'DUP-HET', 'IDUP-HOM', 'IDUP-HET']
	num_classes: 9
	num_signals: 6
	uid: 0000000000
	log_file: Cue/NA24385_Pacbio_CLR_SRX7668835_cpus/info/main.log

The BAM file was generated with:

minimap2 -t 30 --MD -Y -L -a -H -x map-pb refdata-hg19-2.1.0/fasta/genome.fa PacBio_CLR_ncbi-SRX7668835/SRR11008518.fastq | samtools sort -o minimap2_NA24385_Pacbio_CLR_SRX7668835.bam

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions