Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve docs for extract_snps.py #19

Closed
JPFinnigan opened this issue Dec 7, 2015 · 4 comments
Closed

Improve docs for extract_snps.py #19

JPFinnigan opened this issue Dec 7, 2015 · 4 comments

Comments

@JPFinnigan
Copy link

Hey Guys,

Quick question re: usage.

$extract_snps.py -h
usage: extract_snps.py [-h] [-v] [--testset] [genome_file] [snp_file]

What exactly is a [genome_file]? I've tried running this command with FASTA files (e.g. mm10.fa) and the mm10 genes.gtf files successfully used for $extract_exons.py and $extract_splice_sites.py but in both cases I receive the following error:

Traceback (most recent call last):
  File "/Users/johnfinnigan/Desktop/Utilities/HISAT2/extract_snps.py", line 284, in <module>
    extract_snps(args.genome_file, args.snp_file, args.verbose, args.testset)
  File "/Users/johnfinnigan/Desktop/Utilities/HISAT2/extract_snps.py", line 242, in extract_snps
    assert len(snp_list) > 0
AssertionError

I'm not entirely sure what I'm doing wrong. Any help would be greatly appreciated

@roryk
Copy link
Contributor

roryk commented Dec 8, 2015

Hi @JPFinnigan,

This could be happening because your mm10.fa and GTF file do not match in the chromosome names. Can you check that they match?

@JPFinnigan
Copy link
Author

Hello,

The FASTA file and the snp_file have the same naming convention for chromosomes (chr_). I think this problem may stem from the fact that my snp_file is a .vcf. I was looking through the some other posted issues and it seems as though its not yet possible to pass a .vcf to extract_snps.py. But perhaps this is not the case?

Either way I wanted to explain myself. I'm sure this has come up in your internal discussions, but we're excited about the potential of using HISAT2 to support alignment against custom reference-graphs created from SNV/INDEL call-sets derived from joint mutation calling from tumor-normal pairs. (e.g a graph reference containing the reference allele, plus germline SNP/INDEL, plus somatic SNP/INDEL). We (cc: @iskandr) would be eager to hear your thoughts.

@roryk
Copy link
Contributor

roryk commented Dec 8, 2015

Hi @JPFinnigan,

There isn't a script to do it from a VCF file, but I think it wouldn't be too hard to write one. I think you'd have to decompose multiallelic variants and normalize them so they are all left shifted first. You can do both of those operations with vt decompose and vt normalize (http://genome.sph.umich.edu/wiki/Vt) @infphilo, if we did that and called SNP/indel/deletion via just looking at length differences between ref and alt, would that be enough?

@infphilo
Copy link
Collaborator

infphilo commented May 7, 2016

HISAT2 now supports VCF file - see the HISAT2 webpage. I will update the usage information to make clear which files and formats are used.

@infphilo infphilo closed this as completed May 7, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants