Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VCF file does not contain second state when subclonal CN #16

Closed
sdentro opened this issue Jun 14, 2015 · 9 comments
Closed

VCF file does not contain second state when subclonal CN #16

sdentro opened this issue Jun 14, 2015 · 9 comments
Assignees
Labels
Milestone

Comments

@sdentro
Copy link
Contributor

sdentro commented Jun 14, 2015

There is a discrepancy between the generated VCF file and *subclones.txt. The VCF file does not report the fraction of cells carrying the first CN state and the (optional) second state is not included. This corresponds to columns frac1_A, nMaj2_A, nMin2_A and frac2_A from *subclones.txt

@sdentro sdentro added the bug label Jun 14, 2015
@keiranmraine
Copy link
Contributor

@sdentro, should I fix this while I'm rolling the new release or do you want to leave things as they are until the pancan dataset is completed?

@sdentro
Copy link
Contributor Author

sdentro commented Nov 24, 2015

No, please fix when you've got a moment!

On 24/11/2015 05:47, Keiran Raine wrote:

@sdentro https://github.com/sdentro, should I fix this while I'm
rolling the new release or do you want to leave things as they are
until the pancan dataset is completed?


Reply to this email directly or view it on GitHub
#16 (comment).

The Wellcome Trust Sanger Institute is operated by Genome Research
Limited, a charity registered in England with number 1021457 and a
company registered in England with number 2742969, whose registered
office is 215 Euston Road, London, NW1 2BE.

@keiranmraine keiranmraine self-assigned this Nov 24, 2015
@keiranmraine keiranmraine added this to the 1.5.0 milestone Nov 24, 2015
@keiranmraine
Copy link
Contributor

I'm going to have to extend the generic copynumber module in cgpVcf to handle these. Should be relatively straight forward though.

This bit is the specific line that retrieves the data

Will need to change the call to Sanger::CGP::Vcf::VCFCNConverter to set a param for extended genotype and additional header lines.

@keiranmraine
Copy link
Contributor

@sdentro , query. Currently the CN for the wildtype is set to 2:1, but if we provide cell fraction and secondary for the tumour we are expected to provide the for the normal. They can all be defined as . but are the appropriate values available?

@sdentro
Copy link
Contributor Author

sdentro commented Jan 4, 2016

@keiranmraine Not entirely sure what you mean. The copy number fractions (frac1_A and frac2_A) are in fraction of tumour cells. For the normal cells there is no equivalent.

There is the normal cell contamination, which is defined as (1-rho), with rho in the rho_and_psi.txt output file (use the one on the FRAC_GENOME line), but that's a different thing altogether.

@keiranmraine
Copy link
Contributor

@sdentro - understood, no-equivalent, it's fine for it to be . but wanted to be sure

@keiranmraine
Copy link
Contributor

@sdentro, new example output:

...
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=TCN,Number=1,Type=Integer,Description="Total copy number">
##FORMAT=<ID=MCN,Number=1,Type=Integer,Description="Minor allele copy number">
##FORMAT=<ID=FCF,Number=1,Type=Float,Description="Fraction Cells first state">
##FORMAT=<ID=TCS,Number=1,Type=Integer,Description="Total copy number second state">
##FORMAT=<ID=MCS,Number=1,Type=Integer,Description="Minor allele copy number second state">
##FORMAT=<ID=FCS,Number=1,Type=Float,Description="Fraction Cells second state">
##vcfProcessLog_20160128.1=<InputVCFSource=<battenberg_CN_to_VCF.pl>,InputVCFVer=<1.3.0>>
##SAMPLE=<ID=NORMAL,Platform=ILLUMINA,Protocol=WGS,SampleName=f393baf8-9fbc-6986-e040-11ac0d484502>
##SAMPLE=<ID=TUMOUR,Platform=ILLUMINA,Protocol=WGS,SampleName=f393baf9-2710-9203-e040-11ac0d484504>
#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  NORMAL  TUMOUR
1   794318  .   A   <CNV>   .   .   SVTYPE=CNV;END=36053642 GT:TCN:MCN:FCF:TCS:MCS:FCS  ./.:2:1:.:.:.:. ./.:1:0:1:.:.:.
1   36130050    .   A   <CNV>   .   .   SVTYPE=CNV;END=46007002 GT:TCN:MCN:FCF:TCS:MCS:FCS  ./.:2:1:.:.:.:. ./.:1:1:1:.:.:.
1   46008912    .   C   <CNV>   .   .   SVTYPE=CNV;END=66672298 GT:TCN:MCN:FCF:TCS:MCS:FCS  ./.:2:1:.:.:.:. ./.:2:1:1:.:.:.
1   66695422    .   T   <CNV>   .   .   SVTYPE=CNV;END=66891760 GT:TCN:MCN:FCF:TCS:MCS:FCS  ./.:2:1:.:.:.:. ./.:5:1:1:.:.:.
1   66893498    .   G   <CNV>   .   .   SVTYPE=CNV;END=87091400 GT:TCN:MCN:FCF:TCS:MCS:FCS  ./.:2:1:.:.:.:. ./.:2:1:0.873303869283368:2:2:0.126696130716632
1   87092182    .   G   <CNV>   .   .   SVTYPE=CNV;END=94021300 GT:TCN:MCN:FCF:TCS:MCS:FCS  ./.:2:1:.:.:.:. ./.:1:1:1:.:.:.

@keiranmraine
Copy link
Contributor

Resolved by be48cd0

@sdentro
Copy link
Contributor Author

sdentro commented Jan 28, 2016

@keiranmraine I don't think the subclonal copy number is correct (start
coordinate 66893498)

That is either a mixture of 2+1 and 2+0 or 2+1 and 2+2. In the VCF
tumour field should then be:

Minor allele set to 0:

|./.:2:1:0.873303869283368:2:0:0.126696130716632|

or total copy number set to 4:

|./.:2:1:0.873303869283368:4:2:0.126696130716632|

On 28/01/2016 15:27, Keiran Raine wrote:

@sdentro https://github.com/sdentro, new example output:

|... ##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=TCN,Number=1,Type=Integer,Description="Total copy
number"> ##FORMAT=<ID=MCN,Number=1,Type=Integer,Description="Minor
allele copy number">
##FORMAT=<ID=FCF,Number=1,Type=Float,Description="Fraction Cells first
state"> ##FORMAT=<ID=TCS,Number=1,Type=Integer,Description="Total copy
number second state">
##FORMAT=<ID=MCS,Number=1,Type=Integer,Description="Minor allele copy
number second state">
##FORMAT=<ID=FCS,Number=1,Type=Float,Description="Fraction Cells
second state">
##vcfProcessLog_20160128.1=<InputVCFSource=<battenberg_CN_to_VCF.pl>,InputVCFVer=<1.3.0>>
##SAMPLE=<ID=NORMAL,Platform=ILLUMINA,Protocol=WGS,SampleName=f393baf8-9fbc-6986-e040-11ac0d484502>
##SAMPLE=<ID=TUMOUR,Platform=ILLUMINA,Protocol=WGS,SampleName=f393baf9-2710-9203-e040-11ac0d484504>
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT NORMAL TUMOUR 1 794318 .
A . . SVTYPE=CNV;END=36053642 GT:TCN:MCN:FCF:TCS:MCS:FCS
./.:2:1:.:.:.:. ./.:1:0:1:.:.:. 1 36130050 . A . .
SVTYPE=CNV;END=46007002 GT:TCN:MCN:FCF:TCS:MCS:FCS ./.:2:1:.:.:.:.
./.:1:1:1:.:.:. 1 46008912 . C . . SVTYPE=CNV;END=66672298
GT:TCN:MCN:FCF:TCS:MCS:FCS ./.:2:1:.:.:.:. ./.:2:1:1:.:.:. 1 66695422
. T . . SVTYPE=CNV;END=66891760 GT:TCN:MCN:FCF:TCS:MCS:FCS
./.:2:1:.:.:.:. ./.:5:1:1:.:.:. 1 66893498 . G . .
SVTYPE=CNV;END=87091400 GT:TCN:MCN:FCF:TCS:MCS:FCS ./.:2:1:.:.:.:.
./.:2:1:0.873303869283368:2:2:0.126696130716632 1 87092182 . G .
. SVTYPE=CNV;END=94021300 GT:TCN:MCN:FCF:TCS:MCS:FCS ./.:2:1:.:.:.:.
./.:1:1:1:.:.:. |


Reply to this email directly or view it on GitHub
#16 (comment).

The Wellcome Trust Sanger Institute is operated by Genome Research
Limited, a charity registered in England with number 1021457 and a
company registered in England with number 2742969, whose registered
office is 215 Euston Road, London, NW1 2BE.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants