Interactive R code illustrating the basics of R, RStudio and the tidyverse for health data science, including medical research, epidemiology & related fields.
- 01. The basics (RStudio interface; using R scripts and Quarto documents; assigning values (
<-
); using functions and packages; exploring data frames) - 02. Visualising (Making bar charts, line charts, Box plots and scatterplots with
ggplot2
) - 03. Importing data (Importing and exporting data from/to file with
readr
and from/to database withdbplyr
) - 04. Data summaries (Summarising continuous and categorical data with
dplyr
andjanitor
; the pipe|>
; table 1 withgtsummary
) - 05. Data classes (How R represents different variable types, including numerical, categorical, dates; vectors; lists; data.frames)
- 06. Data subsets (Sorting and filtering rows and selecting columns with
dplyr
) - 07. Data transformation (Calculate new variables, categorise continuous variables, regroup a categorical variable - with
dplyr
andforcats
) - 08. Data cleaning (Deleting and renaming variables; parsing data classes; labelling and recoding values - with
dplyr
andjanitor
) - 09. Data reshaping (Joining datasets with
dplyr
; wide and long formats; reshaping wide to long and vice versa withtidyr
) - 10. SQL (SQL server structures, methods to run an SQL query in R with
dbplyr
andDBI
, SQL chunks in a qmd, commonly used SQL clauses) - 11. Git (to be created)
- 12. Quarto (Producing automated, reproducible reports; inline code; markdown hyperlinks; chunk options; YAML headers and global options)
- 13. Workflows (How to structure analytical projects, data and output locations)
Where to start: after installing R and RStudio, download these files (Click the green "Code" button, "Download ZIP"), unzip, open IntRo.Rproj
and 01_basics.qmd
. If you're familiar with Git, you can simply clone the repo.
Author: Andrea Mazzella
This course draws frequently from Hadley Wickham's excellent R for Data Science (2e) book.
The datasets used are either simulated or in the public domain.