-
Notifications
You must be signed in to change notification settings - Fork 124
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bioinformatics topic ideas #7
Comments
Variant Calling |
SeqIO in BioPerl or BioPython. My vote's for Python, but maybe it could be taught in a way that's applicable to both if people are interested in both? |
A bit more specific to people who do network analysis but tools such as iGraph in R, or Networkx in python might be useful for some? |
yeah R also has a package for for SeqIO. I wrote some code in python that can work with BAM files and fastq/a On Mon, Jun 15, 2015 at 2:56 PM, mbonsma notifications@github.com wrote:
|
This is so great. Ideas! If we even had a session where everyone who works with fasta files just talks about their life, I would be so happy. Haha. |
I've been using NetworkX in Python for a while now, to do a variety of things, particularly focused around the use of the MCL algorithm for network clustering. I'd love to learn more about visualizing networks using NetworkX + matplotlib if anyone has any expertise in that? Or just matplotlib in general, really. |
Also, I could present on de novo transcriptome assembly or phylogenomics (the two big areas I've been devoting time to lately), as well as general stuff like BLAST and variant tools, sequence alignment, building gene trees, etc, if there's interest in beginner type stuff. |
I have experience in matplotlib. It is quite nice to use a script for generating figures because you can change things systematically and easily for different journals. |
As well, I can do variant calling, bisulfite alignment, and reference guided alignment. I seem to be doing that a log lately. |
Oh great! We should chat. I could use some recommendations for variant calling, but in a highly specific context. |
Sure. On Wed, Jun 17, 2015 at 10:58 PM, MattStata notifications@github.com
|
Hi Matt, Did you have any questions? I am not sure how you want to communicate. On Wed, Jun 17, 2015 at 11:06 PM, Elliott Sales de Andrade <
|
Well basically, I'm looking for a program that can use RNA-seq reads against a set of coding sequences I've assembled and identify putative SNPs. Do you have any recommendations? I've never used any SNP-calling software before so I'm not really sure what's out there and what the required inputs for most are -- I would assume the majority map genomic reads against a genome, rather than RNA-seq against CDS. As for communicating, I think here is probably fine, as long as this doesn't turn into a really lengthy side-discussion and totally derail the thread. |
Hi Matt, So you already know your SNPs and have their sequence or position? What It makes a difference. Your problem reminds me of a targeted sequencing Ricardo On Thu, Jun 18, 2015 at 9:55 PM, Matt Stata notifications@github.com
|
I have no reference. I have de novo transcriptome assemblies for two plant species, from which I've extracted orthologous pairs of coding sequences. I would like to now use the original reads from three different individuals to get some idea of the genetic diversity and in particular the degree of heterozygosity, in the interest of deciding whether we need to self the plants several times to reduce heterozygosity before starting a genome sequencing project. I could write something to do this using BLAT results for the read mapping or something, but if there is an existing tool that would save me some trouble. I imagine there must be something that is either intended for this or flexible enough to use in this situation? |
Hi Matt, I am pretty sure BLAT would be too slow. Did you use Trinity or ABYSS for Have you looked into PAGAN? https://code.google.com/p/pagan-msa/wiki/PAGAN?tm=6 On Sat, Jun 20, 2015 at 11:02 AM, Matt Stata notifications@github.com
|
BLAT actually works quite well for mapping reads, and can be really fast with the right settings and run in parallel with gnu parallel. I've used it quite a bit for that. My pipeline, which I'm still refining, is something like this: -Multiple assemblies (Trinity, IDBA, SOAPDeNovo-Trans) combined Basically this is all aimed at getting a good set of pairwise orthologs with predicted function, so that I can then do further downstream analysis with interspecific hybrids of these two species. But as I mentioned, we also intend to eventually sequence the genomes of the two parent species and so would like to get a rough idea of the degree of heterozygosity so as to decide whether we need to self a few more times before starting sequencing. So I'd like to see what SNPs exist in my coding regions (particularly percentage of SNPs at synonymous sites, which should be a reasonable approximation of SNPs for the other neutral parts of the genome) for each species. PAGAN sounds interesting, but I don't see how it fits my problem -- were you suggesting it just as an alternative to BLAT? |
I was suggesting PAGAN instead of BLAT Have you seen this paper? It implements a statistical approach to estimate http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.0020041 Is it possible to get frequencies from your data on SNPs? Ricardo On Sat, Jun 20, 2015 at 2:48 PM, Matt Stata notifications@github.com
|
Suggestions and ideas for topics we'd like to see covered at some point. This thread is for bioinformatics-related topics, or things that people working with genetic data might find useful. Posting a suggestion does not lock you in to presenting on that topic.
The text was updated successfully, but these errors were encountered: