-
Notifications
You must be signed in to change notification settings - Fork 20
Closed
Description
Hi,
Thank you for developing this excellent tool. I've recently tried to use Cue to call SVs on a long-read data BAM file (NA24385_Pacbio_CLR_SRX7668835, aligned to hg19 using minimap2), but got an empty VCF output with no error reported.
In the logging info there were lines saying that no intervals where selected:
...
INFO:root:Number of bins: 108262
INFO:root:Selected 0 intervals
INFO:root:Selected 0 interval pairs out of 0 pairs
INFO:root:Processed 238694 reads
INFO:root:Generating SV predictions for chr22
INFO:root:Number of target interval pairs: 0
INFO:root:Selected 0 intervals
INFO:root:Selected 0 interval pairs out of 0 pairs
INFO:root:Processed 310571 reads
INFO:root:Generating SV predictions for chr20
INFO:root:Number of target interval pairs: 0
...
However I have no idea what might cause this issue.
I also noticed that you used Cue-long to run on the CLR data in your paper, did that refer to another version of Cue or there were additional settings required in the yaml configuration for long reads?
Thank you
Best,
Yichen
Here's the detailed configuration I used in my run:
*********************************
* cue (v0.2.2): discovery mode *
*********************************
[INFO] ========== Model config ==========
model_path: Softwares/cue/data/models/cue.v2.pt
gpu_ids: []
n_jobs_per_gpu: 1
n_cpus: 20
report_interval: 100
batch_size: 16
logging_level: INFO
signal_set: SV_SIGNAL_SET.SHORT
class_set: SV_CLASS_SET.BASIC5ZYG
num_keypoints: 1
model_architecture: HG
image_dim: 256
sigma: 10
stride: 4
heatmap_peak_threshold: 0.4
pretrained_refinenn_path: None
config_file: call_model.yaml
experiment_dir: Cue/NA24385_Pacbio_CLR_SRX7668835_cpus
devices: [device(type='cpu'), device(type='cpu'), device(type='cpu'), device(type='cpu'), device(type='cpu'), device(type='cpu'), device(type='cpu'), device(type='cpu'), device(type='cpu'), device(type='cpu'), device(type='cpu'), device(type='cpu'), device(type='cpu'), device(type='cpu'), device(type='cpu'), device(type='cpu'), device(type='cpu'), device(type='cpu'), device(type='cpu'), device(type='cpu')]
device: cpu
log_dir: Cue/NA24385_Pacbio_CLR_SRX7668835_cpus/logs/
report_dir: Cue/NA24385_Pacbio_CLR_SRX7668835_cpus/reports/
log_file: Cue/NA24385_Pacbio_CLR_SRX7668835_cpus/logs/main.log
classes: ['NEG', 'DEL-HOM', 'INV-HOM', 'DUP-HOM', 'DEL-HET', 'INV-HET', 'DUP-HET', 'IDUP-HOM', 'IDUP-HET']
num_classes: 9
n_signals: 6
[INFO] ========== Data config =========
bam: NA24385_Pacbio_CLR_SRX7668835/minimap2_NA24385_Pacbio_CLR_SRX7668835.bam
fai: refdata-hg19-2.1.0/fasta/genome.fa.fai
chr_names: None
logging_level: ERROR
n_cpus: 1
min_refine_buffer: 2000
refine_buffer_frac_size: 5
refine_pair_dist_frac_size: 2
refine_bp_kernels: [0, 50, 500]
refine_min_support: 2
refine_disable: False
min_pair_support: 2
min_pair_distance: 4000
max_pair_distance: 1000000
scan_target_intervals: True
stream: True
view_mode: False
store_img: False
empty_annotation: False
bins_per_block: 8000
min_sv_len: 4000
min_qual_score: 50
bam_type: BAM_TYPE.SHORT
signal_set: SV_SIGNAL_SET.SHORT
signal_set_origin: SHORT
bed: None
blacklist_bed: None
signal_vmax: {'RD': 600, 'RD_LOW': 800, 'RD_CLIPPED': 600, 'SM': 200, 'SR_RP': 600, 'LR': 600, 'LLRR': 100, 'RL': 100, 'LLRR_VS_LR': 1}
signal_mapq: {'RD': 20, 'RD_LOW': 0, 'RD_CLIPPED': 20, 'SM': 20, 'SR_RP': 0, 'LR': 0, 'LLRR': 1, 'RL': 1, 'LLRR_VS_LR': 1}
bin_size: 750
interval_size: 150000
step_size: 50000
shift_size: None
heatmap_dim: 1000
image_dim: 256
class_set: SV_CLASS_SET.BASIC5ZYG
num_keypoints: 1
bbox_padding: 0
config_file: call_data.yaml
dataset_dir: Cue/NA24385_Pacbio_CLR_SRX7668835_cpus
info_dir: Cue/NA24385_Pacbio_CLR_SRX7668835_cpus/info/
image_dir: Cue/NA24385_Pacbio_CLR_SRX7668835_cpus/images/
annotation_dir: Cue/NA24385_Pacbio_CLR_SRX7668835_cpus/annotations/
annotated_images_dir: Cue/NA24385_Pacbio_CLR_SRX7668835_cpus/annotated_images/
classes: ['NEG', 'DEL-HOM', 'INV-HOM', 'DUP-HOM', 'DEL-HET', 'INV-HET', 'DUP-HET', 'IDUP-HOM', 'IDUP-HET']
num_classes: 9
num_signals: 6
uid: 0000000000
log_file: Cue/NA24385_Pacbio_CLR_SRX7668835_cpus/info/main.log
The BAM file was generated with:
minimap2 -t 30 --MD -Y -L -a -H -x map-pb refdata-hg19-2.1.0/fasta/genome.fa PacBio_CLR_ncbi-SRX7668835/SRR11008518.fastq | samtools sort -o minimap2_NA24385_Pacbio_CLR_SRX7668835.bam
Metadata
Metadata
Assignees
Labels
No labels