This is a course introducing the R programming language for statistical analysis. The course is meant for beginners; there are no prerequisites. You MUST bring a laptop with WiFi access. You must be able to install software on your laptop. There are five one-hour sessions, from 4 to 5 pm on January 9, 10, 11, 12, 13. Participants are expected to attend the majority of sessions. All sessions are in the Gaylord-Cary Meeting Room of the Research Studies Center map.
The instructors are Martin Morgan, Nitesh Turaga, Lori Shepherd, Yubo Cheng.
Registration closed.
Notes: Installation instructions
Day 1 ensures that all participants have a working installation of R and RStudio. Remember to bring your WiFi-enabled laptop. This session will not involve instruction, but will instead involve instructors helping participants to download and install relevant software. Instructions are available at the following links; try to install the software yourself, and come to this session if you need further help.
We also briefly introduces
- Using RStudio
- Functions and help pages
- Scripts
Notes: Using RStudio and R
Day 2 introduces the basics of RStudio and R.
- Using RStudio
- Vectors and lists
- Classes: data.frames and beyond
- Help!
Download: BRFSS-subset.csv and ALL-phenoData.csv (e.g.,
right-click and "Save as..." ALL-phenoData.csv
)
Notes: Data import and manipulation
Day 3 inputs and manipulates two data sets. The first is a subset of data collected by the CDC through its extensive Behavioral Risk Factor Surveillance System (BRFSS) telephone survey. The second is a small data set describing 128 patients from a classic microarray experiment.
read.csv()
and other R functions for data input.- Introspection --
class()
,dim()
,head()
,summary()
. - Subsetting --
[
,subset()
,is.na()
,%in%
;$
and[[
. table()
,with()
,aggregate()
,- Descriptive and basic statistics:
length()
,mean()
,median()
,t.test()
. - 'Formula' notation
- Visualization:
plot()
,boxplot()
,hist()
- Working with factors:
levels()
,droplevels()
.
Download: BRFSS-subset.csv, ALL-phenoData.csv, and
ALL-expression.csv (e.g., right-click and "Save as..."
ALL-phenoData.csv
)
Notes: Statistics
Day 4 introduces R facilities for univariate and multivariate statistical analysis. We continue to use the microarray experiment data to illustrate these concepts.
- Data cleaning --
factor()
,as.matrix()
,t()
- Summarizing / exploration --
summary()
,mean()
,plot()
,hist()
, ... - Univariate --
t.test()
,chisq.test()
,lm()
, ... - Clustering --
dist()
,cmdscale()
(multi-dimensional scaling) - Packages --
library()
Download: BRFSS-subset.csv
Notes: Visualization
Day 5 starts with some real-world use tips, and then introduces two approaches to visualizing data -- base R graphics, and ggplot2
- Organizing projects: scripts/, extdata/ and data/ directories;
saveRDS()
/readRDS()
;setwd()
,source()
. - Discovering, installing, and loading packages:
library()
,search()
. - Base R's
plot()
,hist()
,par()
. - ggplot2 grammar of graphics
ggplot()
,aes()
,geom*()
,facet*()
.