Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FREEC cannot open .vcf.gz file #41

Open
bibbers93 opened this issue Aug 2, 2018 · 7 comments
Open

FREEC cannot open .vcf.gz file #41

bibbers93 opened this issue Aug 2, 2018 · 7 comments

Comments

@bibbers93
Copy link

Hi,

I have a WGS .BAM file that I want to interrogate for CNA and LOH.

My config file is pretty much set up for with default settings, with files specified where necessary. I'm primarily using FREEC to generate the LOH analysis but I have an error reading the SNP vcf file.

Config file:

chrLenFile = /home/HG38/Hg38_chromosomeLengthFile.txt
window = 50000
step = 3000
ploidy = 2
breakPointThreshold = 0.75
chrFiles = /home/HG38/chromosomes/
intercept = 1
outputDir = /home/FREEC-ANALYSIS
sex = XY
breakPointType = 2
coefficientOfVaration = 0.05

[sample]
mateFile = /HOME/WGS.mpileup.pileup.gz
inputFormat = pileup
mateOrientation = FR

[BAF]
SNPfile = /home/common_all_20180418.vcf
minimalCoveragePerPosition = 0
fastaFile = /home/HG38/hg38.fa

The programme runs fine, with no errors or warnings until

Starting to read common_all_20180418.vcf
Error: unable to open /home/common_all_20180418.vcf

At first I wondered if this was because the SNP file was .vcf.gz but that resulted in the same error which is why i then unzipped it.

I'm using FREEC v11.4.

Can you help?

Best,

@valeu
Copy link
Contributor

valeu commented Aug 2, 2018

This is a standard error when a file is not present. What do you get when you do
ls -l /home/common_all_20180418.vcf
Best
Valentina

@bibbers93
Copy link
Author

bibbers93 commented Aug 2, 2018

the file is present in the specified directory

-rw-rw-r--. 1 user user 1593039139 Aug  2 15:08 common_all_20180418.vcf.gz

it's also been bgzipped back to previous

@bibbers93
Copy link
Author

bibbers93 commented Aug 2, 2018

apologies, I am switching between 2 online servers and they have slightly different directory trees. there was one directory missing between /home/ and the file. Thanks for pointing that out!

I have a second question, when I don't specify a pileup file in the config as mateFile, and instead I use a BAM file,

config file looks like this:

chrLenFile = /home/HG38/Hg38_chromosomeLengthFile.txt
window = 50000
step = 3000
ploidy = 2
breakPointThreshold = 0.75
chrFiles = /home/HG38/chromosomes/
intercept = 1
outputDir = /home/FREEC-ANALYSIS
sex = XY
breakPointType = 2
coefficientOfVaration = 0.05

[sample]
mateFile = /HOME/WGS.bam
inputFormat = BAM
mateOrientation = FR

[BAF]
makePileup=/home/subdir/common_all_20180418.vcf
SNPfile = /home/subdir/common_all_20180418.vcf
minimalCoveragePerPosition = 0
fastaFile = /home/HG38/hg38.fa

I get an error telling saying

Error: /home/FREEC-ANALYSIS/WGS.bam_NewCaptureRegions.bed is an empty file.
..will use 0 SNPs for calculation of the BAF profile

Is this correct for the first iteration of FREEC whilst it makes the necessary minipileup file, or is something missing from the config file?

To get round this I made my own mpileup file with samtools mpileup, and once overcoming the previous problem, it's now running with no errors as of yet.

best,

Ll.

@valeu
Copy link
Contributor

valeu commented Aug 2, 2018

Thank you for pointing out the problem. It seems that I did not debug the "minipileup" option for WGS, but only for WES, where a target.bed file is provided with exon coordinates. I will check what is gong on.

UPDATE: it seems that your "/home/HG38/Hg38_chromosomeLengthFile.txt" is in a format that FREEC cannot read in function "createBedFileWithChromosomeLengths()". Can you try to use fa.fai format? and then check that WGS.bam_NewCaptureRegions.bed is not empty?

Also, if you use this "makePileup" option, you .BAM file with reads should contain "chr" prefixes for chromosomes.. Is it the case?

@valeu
Copy link
Contributor

valeu commented Aug 4, 2018

I will have a look at your .BAM reading issue. However, I would not recommend to use

window = 50000
step = 3000

Instead, just comment or remove these parameters and let FREEC find the best window size for your data.

If you want to keep it, rather use


window = 50000
step = 25000

BTW, parameter coefficientOfVaration = 0.05 to detect the best window size is not used, if you already provide window.

And I would also let FREEC choose the best parameters for the model:
#intercept = 1

You will also greatly improve the output if you use gemMappabilityFile to mask low mappability regions.

@jlac
Copy link

jlac commented Nov 29, 2018

Thank you for pointing out the problem. It seems that I did not debug the "minipileup" option for WGS, but only for WES, where a target.bed file is provided with exon coordinates. I will check what is gong on.

I am running into the same issue outlined above, where the WGS workflow is not properly handling a BAM file without chr prefix in the chromosome names. Has this been fixed in some way that I have missed?

UPDATE: it seems that your "/home/HG38/Hg38_chromosomeLengthFile.txt" is in a format that FREEC cannot read in function "createBedFileWithChromosomeLengths()". Can you try to use fa.fai format? and then check that WGS.bam_NewCaptureRegions.bed is not empty?

Also, if you use this "makePileup" option, you .BAM file with reads should contain "chr" prefixes for chromosomes.. Is it the case?

@valeu
Copy link
Contributor

valeu commented Dec 13, 2018

v11.5 takes .fai into account in function "createBedFileWithChromosomeLengths()". Is it the version you are using?

Yes, I think that if you create a miniPileup, .BAM files should contain "chr" prefixes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants