-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Some questions about working on plant species #45
Comments
HI @CrazyHsu The SNPsplit genome preparation is designed to work with VCF files provided by the Mouse Genomes Project, either SNPsplit is really designed to work with hybrid diploid genomes where the genomes of both parental strains is known. As an example, when genome 1 is called you know that the reads must have come from the maternal strain, and reads specific for genome 2 came from the paternal strain. In the scenario where you perform variant calling on the data, and then use this as the basis for an N-masked genome, you might be able to calle allele1 or allele2, but you can probably not tell wether it is the paternal or maternal genome. All you can look at is allelic imbalance, unless you have completely phased genotypes. There a also a few more considerations:
Having said that, in theory SNPsplit should be able to handle this as well, but it will require some work in order to get there. Your options are:
Let's say you call your straing Next, the SNPs will need to be contained within a folder called: SNPs_maize inside there, the script expects one file per chromosome, e.g. like so:
So you will have to split your VCF SNP by chromosome. And finally, these file need to adhere to the following format:
where the first line is the name of the chromosome (like in a FastA file). this is also explained in the option:
I believe column 6 is optional, but the first 5 need to be present. It is sufficient to put 1 as strand if the sequence is based relative to the top strand. So you will need to reformat your file just a little, and you should be able to run it. If the number of chromosomes in maize is different, you need to change that accordingly. And finally, you will also need to edit the
I am afraid both approaches will require some amount of tinkering... |
Haven't heard back in a while, I hope all is well. |
Hello Felix,
I'm working on hybrid maize and also want to identify reads from specific genome. Is SNPsplit suitable for the species?
I have mapped the paried-end reads to the maize genome by using Hisat2, and the call the alleles (stored in VCFv4.2 format) by freebayes.
I followed the instruction you mentioned in the document.
First, I prepared the genome:
But it turned out:
It may means I have set uncorrected genome name, then I found "unknown" in line
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT unknown
in the vcf file, then I ran the command again with--strain
from "maize" to "unknown", :The exception came out:
It may means I should put the genome in a fold, so I reran the command as below with the genome in maize_genome fold:
But there are so many exceptions like below:
There are many files generated in SNPs_unknown, but they are all empty. And the content of unknown_SNP_filtering_report.txt is:
Head 10 line of unknown_genome_preparation_report.txt is:
Is there anything wrong when I use the SNPsplit_genome_preparation? And how can I do?
Best,
Feng
The text was updated successfully, but these errors were encountered: