Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is CNVnator a software that can only run in single-thread mode? #281

Open
jingydz opened this issue Jun 29, 2023 · 10 comments
Open

Is CNVnator a software that can only run in single-thread mode? #281

jingydz opened this issue Jun 29, 2023 · 10 comments

Comments

@jingydz
Copy link

jingydz commented Jun 29, 2023

Is CNVnator a software that can only run in single-thread mode? Is CNVnator a software that can only run in single-thread mode?

@abyzov
Copy link
Member

abyzov commented Jul 4, 2023 via email

@jingydz
Copy link
Author

jingydz commented Aug 7, 2023

Sorry for forgetting to reply, thank you, I have solved it, just need to add "export OMP_NUM_THREADS=number" to solve it.

@jingydz
Copy link
Author

jingydz commented Aug 7, 2023

In addition, I have a sample, I have run it many times, but it has been unable to run the result, can you help me to see why?

Command

CNVnator_input=/xxx/xxx.marked.realigned.recal.bam
time $CNVnator_HOME/src/cnvnator -root ${CNVnator_output}.root -tree ${CNVnator_input} -unique
time $CNVnator_HOME/src/cnvnator -root ${CNVnator_output}.root -genome hg38 -his ${bin_size} -d $Chromosomes

Log

Error in TBasket::Create: Cannot allocate 14069 bytes for ID = position Title = chr2
Error in TTree::Fill: Failed filling branch:chr2.position, nbytes=-1, entry=44877163
This error is symptomatic of a Tree created as a memory-resident Tree
Instead of doing:
TTree *T = new TTree(...)
TFile *f = new TFile(...)
you should do:
TFile *f = new TFile(...)
TTree *T = new TTree(...)
...SysError in TFile::Seek: cannot seek to position -1971239420 in file /xxx/xxx.root, retpos=-1 (Invalid argument)
Can't find any histograms.
。。。

@abyzov
Copy link
Member

abyzov commented Aug 8, 2023 via email

@jingydz
Copy link
Author

jingydz commented Aug 8, 2023

No, there were a dozen of them. Yes, I later learned that if I use bam I don't need to specify the genome, but it ignores the --genome parameter.

This bam like:
ERR1347664.194830679 1123 chr1 9998 0 2S38M60S = 10005 68 AACGATAACCCTAACCCTAACCCCAACCCTAACCCTAACCATGACCCTTACGTCTACCCGAACCCCAACCCTAACCCTACCCCCCCGCCTGACACAAACT '00''''7707<7<7<000707<000''000707'7<7'''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''' AS:i:33 MC:Z:39S61M MQ:i:0 XA:Z:chr9,-138129062,61S34M5S,0;chr6,+147869,4S36M60S,1;chr16,+88880231,5S30M65S,0;chr12,+10242,5S35M60S,1;chr12,+10543,5S35M60S,1;chr18,+94652,5S35M60S,1;chr12,+124267493,5S35M60S,1;chr15,-101981028,65S30M5S,0;chr13,-114354167,60S35M5S,1;chr18,+63673,5S19M1D16M60S,1;chr10,-80216942,64S27M9S,0;chr12_GL877875v1_alt,+543,5S35M60S,1;chr12_GL877875v1_alt,+242,5S35M60S,1; XS:i:34 MD:Z:0N0N0N18T16 NM:i:4 RG:Z:ERR1347664

@jingydz
Copy link
Author

jingydz commented Aug 8, 2023

Another problem is that I see someone filtering the vcf results of CNVnator with the following conditions: "CNV calls were filtered using stringent criteria including P-value < 0.05 and minimum size > 1 Kb, and calls with > 50% of q0 (zero mapping quality) reads within the CNV regions were removed (q0 filter). "

minimum size can be filtered using the SVLEN key.
But which key does p-value and q0 correspond to?

example:
chr1 1 CNVnator_del_1 N . PASS END=10000;SVTYPE=DEL;SVLEN=-10000;IMPRECISE;natorRD=0;natorP1=1.59373e-11;natorP2=1.87087e-51;natorP3=1.99216e-11;natorP4=2.03817e-39;natorQ0=-1 GT:CN 1/1:0

@jingydz
Copy link
Author

jingydz commented Aug 10, 2023

Error

"File is more than 2 Gigabytes"
I found that some root files larger than "2GB" could not run successfully to get results.
SysError in TFile::Flush: error flushing file xxx.root (File too large)

such as:
【4.1T 8月 10 01:19 SAMEA3302667.root】
【8.7T Aug 10 09:12 SAMEA3302715.root】
【5.7T Aug 10 09:46 SAMEA3302857.root】

@abyzov
Copy link
Member

abyzov commented Aug 16, 2023 via email

@jingydz
Copy link
Author

jingydz commented Aug 16, 2023

Hi,
sorry to reply you late. I have solved this problem and the solution I used is to divide it into a single chromosome run (1... 22+X+Y+M), which is feasible (parameter -chrom).
Yes, there is nothing wrong with my bam file, it is just that it is a bit large, ranging from 40G to 150G, but the root files they generate are all unable to get vcf files larger than 2GB. (I feel this situation is random, as only about 1 in 10 files larger than 2GB have a problem with the file being too large to get the vcf file). They all report errors in the step of building the histogram.

Error log

Parsing file /xxx/sample.marked.realigned.recal.bam ...
Allocating memory ...
Done.
Filling and saving tree for 'chr1' ...
Filling and saving tree for 'chr2' ...
Filling and saving tree for 'chr3' ...
Filling and saving tree for 'chr4' ...
Filling and saving tree for 'chr5' ...
Filling and saving tree for 'chr6' ...
Filling and saving tree for 'chr7' ...
Filling and saving tree for 'chr8' ...
Filling and saving tree for 'chr9' ...
Filling and saving tree for 'chr10' ...
Filling and saving tree for 'chr11' ...
Filling and saving tree for 'chr12' ...
Filling and saving tree for 'chr13' ...
Filling and saving tree for 'chr14' ...
Filling and saving tree for 'chr15' ...
Filling and saving tree for 'chr16' ...
Filling and saving tree for 'chr17' ...
Filling and saving tree for 'chr18' ...
Filling and saving tree for 'chr19' ...
Filling and saving tree for 'chr20' ...
Filling and saving tree for 'chr21' ...
Filling and saving tree for 'chr22' ...
Filling and saving tree for 'chrX' ...
Filling and saving tree for 'chrY' ...
Filling and saving tree for 'chrM' ...
...
Filling and saving tree for 'HLA-DRB115:02:01' ...
Filling and saving tree for 'HLA-DRB1
15:03:01:01' ...
Filling and saving tree for 'HLA-DRB115:03:01:02' ...
Filling and saving tree for 'HLA-DRB1
16:02:01' ...
Writing histograms ...
Total of 1378403710 reads were placed.

real 49m47.400s
user 40m51.651s
sys 8m30.983s
Allocating memory ...
Done.
Calculating histograms with bin size of 500 for 'chr1' ...
Making directory bin_500 ...
Making GC histogram for 'chr1' ...
SysError in TFile::Seek: cannot seek to position -1938320448 in file /xxx/sample.root, retpos=-1 (Invalid argument)
SysError in TFile::Seek: cannot seek to position -1297036177263337374 in file /xxx/sample.root, retpos=-1 (Invalid argument)
SysError in TFile::Seek: cannot seek to position -1297036177263337374 in file /xxx/sample.root, retpos=-1 (Invalid argument)
SysError in TFile::Seek: cannot seek to position -1297036177263337230 in file /xxx/sample.root, retpos=-1 (Invalid argument)
SysError in TFile::Flush: error flushing file /xxx/sample.root (File too large)
Done.
...
real 0m0.198s
user 0m0.151s
sys 0m0.042s
Reading calls ...

real 0m0.062s
user 0m0.036s
sys 0m0.018s
Processing: 0
Parsing done:
Tot DEL DUP INS INV TRA
0 0 0 0 0 0

Anyway, I've solved the problem (splitting into single chromosomes), thanks for your reply. If someone is having the same problem as me, hopefully this solution will help them.

@abyzov
Copy link
Member

abyzov commented Aug 22, 2023 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants