-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Phased human genomes #22
Comments
Hi Lindsay, If you have phased SNP information for human data, SNPsplit should work in pretty much the same way. Since are currently two ways of getting of arriving at the point where you can align the data to the N-masked genome, use SNPsplit and then carry on with your downstream analyses:
I would be happy to get this to work for you if you could supply a copy of your VCF file (because they all look different...).
Once the N-masked genome is was generated, you can:
Just as a comment, the information of the phased genome is preserved in a way, because the SNP given as |
Hi,
|
Hi @Megha20 ,
If you wanted to go ahead with this, there are good and bad news. The good news are that you don't really have to deal with strains and so on, but you are kind of interested in all of the SNPs. This is however also a big problem, as the file you linked has more than 320 million lines! Since this has to be held in memory, such an 'all-dbSNP' approach would consume a HUGE amount of RAM (probably more than 100GB). You could either change the entire code that looks for high confidence SNPs in the VCF file or write a new script that will simply write out every SNP that has a single
Unless you have used a UMI approach for the RRBS, it is indeed recommended not to deduplicate. SNPsplit itself doesn't really care about what you feed it with. I hope this helps, |
I was wondering how SNPsplit would handle phased human genomes. I didn't see anything in the documentation that seemed to take it into account but I could've missed it. Thanks!
The text was updated successfully, but these errors were encountered: