Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reference mismatch #45

Closed
denisemauldin opened this issue Mar 18, 2016 · 7 comments
Closed

Reference mismatch #45

denisemauldin opened this issue Mar 18, 2016 · 7 comments

Comments

@denisemauldin
Copy link

Hi there,

I was wondering if you'd consider implementing a specific flag for vt normalize. We're getting an error that looks like this with some of our genomes:

Variant is not consistent: chrY:59034049-59034049 - N(REF) vs A(FASTA)

This is because it's in the PAR region and the reference we're feeding it has that masked. We don't really want to use the -n flag because we would like to catch other errors if they happen, but we'd like to ignore errors where the REF is N.

Thanks,
Denise

@atks
Copy link
Owner

atks commented Mar 18, 2016

The -n flag when enabled, will still output the message that the reference is not consistent. Is it possible to make a scan of the logs for such warnings that do not occur in the the pseudoautosomal regions?

@denisemauldin
Copy link
Author

Hi there,

It is possible to do that, but we have 7k+ genomes that we're processing, so having an additional process to scan the logs is additional overhead. Figured I'd inquire here whether it'd be possible to have a flag for ignoring this specific case.

Thanks,
Denise

@atks
Copy link
Owner

atks commented Mar 19, 2016

I'm a bit hesitant about having that additional feature as I'd prefer that errors are captured. It is definitely possible though but I need to convince myself that it is necessary.

Is it possible for the normalization step to use a reference sequence file which does not mask those bases. Because I assume that the variants were called using a reference that did not mask those bases in the first place.

@denisemauldin
Copy link
Author

Hi there,

I do prefer that it prints the log, but that we can not have vt exit when it runs into a REF N situation. I'm inquiring about an additional flag. Example behaviour:

vt normalize -o output.vcf -r hg19.ref.fa input.vcf
-> dies on all reference mismatches
vt normalize -n -o output.vcf -r hg19.ref.fa input.vcf
-> warns on all reference mismatches
vt normalize -s -o output.vcf -r hg19.ref.fa input.vcf
-> warns on reference mismatches where ref is N, dies on other reference mismatches

Thanks,
Denise

@atks
Copy link
Owner

atks commented Mar 19, 2016

Ah, I understand now. I think that is a good feature.

@atks
Copy link
Owner

atks commented Mar 19, 2016

can you please pull it and try it. The option is -m

http://genome.sph.umich.edu/wiki/Vt#Normalization

@slagelwa
Copy link

FYI seems to be working well. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants