Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

'' instead of None for inexistent VEP values #639

Closed
jdelafon opened this issue Dec 15, 2015 · 6 comments · Fixed by #642
Closed

'' instead of None for inexistent VEP values #639

jdelafon opened this issue Dec 15, 2015 · 6 comments · Fixed by #642

Comments

@jdelafon
Copy link

Empty values when reading from custom fields added through VEP (such as HGVSc and HGVSp) are represented by the empty string in the database. I think it could be safely replaced by NULL.

sqlite> select vep_hgvsc,vep_hgvsp from variants where vep_hgvsp='' limit 10;
|
|
|
...
sqlite> select vep_hgvsc,vep_hgvsp from variants where vep_hgvsp is NULL;
sqlite>
@jxchong
Copy link
Contributor

jxchong commented Dec 15, 2015

Agreed, although currently GEMINI doesn't officially support the most recent versions of VEP that do HGVSc and HGVSp. GEMINI officially only supports up to VEP v75.

@brentp Whenever you get around to officially supporting VEP v83, this is an issue for all custom VEP fields that I've tested in VEP v82 and v83.

@brentp
Copy link
Collaborator

brentp commented Dec 16, 2015

ok. this is news to me. any time you find something like this, an example VCF with no genotypes will be a great help. It will likely take me longer to update vep and annotate an example than it will to fix the bug-- so, yeah examples greatly appreciated.

@jxchong
Copy link
Contributor

jxchong commented Dec 17, 2015

VEP vcf made with VEP v82/v83

$ gemini load -v test.VEP.nogeno.vcf.gz -t VEP --cores 4 test.VEP.db

test.VEP.nogeno.vcf.gz

$ gemini query -q "SELECT distinct(vep_gene_pheno) from variants" test.VEP.db
$ gemini query -q "SELECT vep_hgvsc,vep_hgvsp from variants where vep_hgvsp='' limit 10" test.VEP.db

$ gemini query -q "SELECT distinct(vep_gene_pheno) from variants" test.VEP.db

None
1   
$ gemini query -q "SELECT vep_hgvsc,vep_hgvsp from variants where vep_hgvsp='' limit 10" test.VEP.db




ENST00000327044.6:c.26+22C>T
ENST00000379409.2:c.865-10A>C
ENST00000379409.2:c.865-6A>C
ENST00000491024.1:c.397+24A>G
ENST00000379389.4:c.-33T>C
ENST00000379370.2:c.5651+5C>T
ENST00000354700.5:c.1407+4C>T
$ gemini query -q "SELECT distinct(exon) from variants limit 10" test.VEP.db
1/1

None
10/14
13/14
16/19
10/19
9/19
4/12
12/12

@brentp
Copy link
Collaborator

brentp commented Dec 17, 2015

@jxchong and @muraveill so, I made some changes and now have:

$ gemini query -q "SELECT distinct(vep_gene_pheno) from variants" test.VEP.db
None
1
$ gemini query -q "SELECT vep_hgvsc,vep_hgvsp from variants where vep_hgvsp='' limit 10" test.VEP.db
$ gemini query -q "SELECT vep_hgvsc,vep_hgvsp from variants where vep_hgvsp is NULL limit 10" test.VEP.db
None    None
None    None
None    None
None    None
ENST00000327044.6:c.26+22C>T    None
ENST00000379409.2:c.865-10A>C   None
ENST00000379409.2:c.865-6A>C    None
ENST00000491024.1:c.397+24A>G   None
ENST00000379389.4:c.-33T>C  None
ENST00000379370.2:c.5651+5C>T   None

Does that look sensible? I'm

@jxchong
Copy link
Contributor

jxchong commented Dec 17, 2015

Works for me

@brentp
Copy link
Collaborator

brentp commented Dec 17, 2015

Thanks guys, this will be in 0.18.1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants