Skip to content

andreamazzella/IntRo

Repository files navigation

IntRo: An introduction to R for health data

Interactive R code illustrating the basics of R, RStudio and the tidyverse for health data science, including medical research, epidemiology & related fields.

  • 01. The basics (RStudio interface; using R scripts and Quarto documents; assigning values (<-); using functions and packages; exploring data frames)
  • 02. Visualising (Making bar charts, line charts, Box plots and scatterplots with ggplot2)
  • 03. Importing data (Importing and exporting data from/to file with readr and from/to database with dbplyr)
  • 04. Data summaries (Summarising continuous and categorical data with dplyr and janitor; the pipe |>; table 1 with gtsummary)
  • 05. Data classes (How R represents different variable types, including numerical, categorical, dates; vectors; lists; data.frames)
  • 06. Data subsets (Sorting and filtering rows and selecting columns with dplyr)
  • 07. Data transformation (Calculate new variables, categorise continuous variables, regroup a categorical variable - with dplyr and forcats)
  • 08. Data cleaning (Deleting and renaming variables; parsing data classes; labelling and recoding values - with dplyr and janitor)
  • 09. Data reshaping (Joining datasets with dplyr; wide and long formats; reshaping wide to long and vice versa with tidyr)
  • 10. SQL (SQL server structures, methods to run an SQL query in R with dbplyr and DBI, SQL chunks in a qmd, commonly used SQL clauses)
  • 11. Git (to be created)
  • 12. Quarto (Producing automated, reproducible reports; inline code; markdown hyperlinks; chunk options; YAML headers and global options)
  • 13. Workflows (How to structure analytical projects, data and output locations)

Where to start: after installing R and RStudio, download these files (Click the green "Code" button, "Download ZIP"), unzip, open IntRo.Rproj and 01_basics.qmd. If you're familiar with Git, you can simply clone the repo.

Author: Andrea Mazzella

This course draws frequently from Hadley Wickham's excellent R for Data Science (2e) book.

The datasets used are either simulated or in the public domain.

Releases

No releases published

Packages

No packages published

Languages