Introduction to High Throughput DNA Sequence Data Analysis Using R / Bioconductor
Martin Morgan, Roswell Park Cancer Institute
Tuesday March 8, 2016
Modern methods of high-throughput genomic data generate large primary data sets that require significant data manipulation and statistical summary before arriving at biological insight. This workshop starts by outlining basic DNA sequence analysis work flows, from primary data generation to biological interpretation. We use this outline, and especially the 'RNA-seq known gene differential expression' work flow, to identify relevant data management and statistical issues. The workshop then steps through R and Bioconductor code to implement essential stages in data management and statistical analysis. We conclude with a brief overview of the resources available for further study.
Participants should have an interest in and basic exposure to biological or statistical research questions that use high-throughput genomic data, for instance microarray or RNA-seq gene-level differential expression analysis. Participants should be comfortable with R, e.g., able to install and load new packages, use the help system, and write scripts. Participants are required to bring a laptop with wireless internet capabilities, and with a recent version of Chrome, Firefox, or Safari installed.
Statisticians with basic understanding of R, and with an interest in high-throughput DNA sequence data, especially of designed experiments.
Dr. Martin Morgan leads the successful open source, open development Bioconductor project (http://bioconductor.org) for the analysis and comprehension of high throughput genomic data. Dr. Morgan's interests include statistical computation, integrative analysis of multiple 'omics data sets, and effective data comprehension.