Skip to content

Latest commit

 

History

History
117 lines (88 loc) · 4.75 KB

README.md

File metadata and controls

117 lines (88 loc) · 4.75 KB

Introduction to R

This is a course introducing the R programming language for statistical analysis. The course is meant for beginners; there are no prerequisites. You MUST bring a laptop with WiFi access. You must be able to install software on your laptop. There are five one-hour sessions, from 4 to 5 pm on January 9, 10, 11, 12, 13. Participants are expected to attend the majority of sessions. All sessions are in the Gaylord-Cary Meeting Room of the Research Studies Center map.

The instructors are Martin Morgan, Nitesh Turaga, Lori Shepherd, Yubo Cheng.

Registration closed.

Day 1: Installation

Notes: Installation instructions

Day 1 ensures that all participants have a working installation of R and RStudio. Remember to bring your WiFi-enabled laptop. This session will not involve instruction, but will instead involve instructors helping participants to download and install relevant software. Instructions are available at the following links; try to install the software yourself, and come to this session if you need further help.

We also briefly introduces

  • Using RStudio
  • Functions and help pages
  • Scripts

Day 2: Using R

Notes: Using RStudio and R

Day 2 introduces the basics of RStudio and R.

  • Using RStudio
  • Vectors and lists
  • Classes: data.frames and beyond
  • Help!

Day 3: Data import and manipulation

Download: BRFSS-subset.csv and ALL-phenoData.csv (e.g., right-click and "Save as..." ALL-phenoData.csv)
Notes: Data import and manipulation

Day 3 inputs and manipulates two data sets. The first is a subset of data collected by the CDC through its extensive Behavioral Risk Factor Surveillance System (BRFSS) telephone survey. The second is a small data set describing 128 patients from a classic microarray experiment.

  • read.csv() and other R functions for data input.
  • Introspection -- class(), dim(), head(), summary().
  • Subsetting -- [, subset(), is.na(), %in%; $ and [[.
  • table(), with(), aggregate(),
  • Descriptive and basic statistics: length(), mean(), median(), t.test().
  • 'Formula' notation
  • Visualization: plot(), boxplot(), hist()
  • Working with factors: levels(), droplevels().

Day 4: Statistics

Download: BRFSS-subset.csv, ALL-phenoData.csv, and ALL-expression.csv (e.g., right-click and "Save as..." ALL-phenoData.csv)
Notes: Statistics

Day 4 introduces R facilities for univariate and multivariate statistical analysis. We continue to use the microarray experiment data to illustrate these concepts.

  • Data cleaning -- factor(), as.matrix(), t()
  • Summarizing / exploration -- summary(), mean(), plot(), hist(), ...
  • Univariate -- t.test(), chisq.test(), lm(), ...
  • Clustering -- dist(), cmdscale() (multi-dimensional scaling)
  • Packages -- library()

Day 5: Visualization

Download: BRFSS-subset.csv
Notes: Visualization

Day 5 starts with some real-world use tips, and then introduces two approaches to visualizing data -- base R graphics, and ggplot2

  • Organizing projects: scripts/, extdata/ and data/ directories; saveRDS() / readRDS(); setwd(), source().
  • Discovering, installing, and loading packages: library(), search().
  • Base R's plot(), hist(), par().
  • ggplot2 grammar of graphics ggplot(), aes(), geom*(), facet*().