Material for the workshop “Data Science with R in the tidyverse” (May 16-17-18 2018)
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Failed to load latest commit information.
1-Data transformation.Rmd
Slides Data Science R.pdf

Workshop: Data Science with R in the tidyverse

This repository contains the material for the PhD Workshop R for Data Science in the tidyverse by Matteo Sostero at Sant'Anna School of Advanced Studies (Pisa) in May 2018. The workshop is intended mainly for first-year PhDs in Economics, but all members of the Computational Modellers Society are welcome to attend!

Poster Image credit: Kevin Ku

The workshop covers topics on data science throughout a typical research project using the tidyverse. No prior knowledge of R is required! Hopefully even those of you familiar with “base R” (but not the tidyverse) will find something new.

The material is based on R for Data Science by Garrett Grolemund and Hadley Wickham. See also their book R for Data Science: Import, Tidy, Transform, Visualize, and Model Data published by O'Reilly Media, 2017.

Important Preparation ⚠️

Please bring your laptop 💻 and charger 🔌!

Preparation (10 minutes ⌛️):

  1. Install an up-to-date version of R;
  2. Install an up-to-date version of RStudio;
  3. (Windows only) install Rtools;
  4. Install git on your system;
  5. Get a copy of the material by creating a new project in RStudio:
    1. File > New Project > Version Control > Git
    2. The repository URL is
    3. Create project as a subdirectory of your choice;
    4. ☑️ “Open in new session”
  6. In RStudio, install tidyverse and nycflights13:


Date Time Room
Wednesday 16/05/2018 10:30–12:30 Aula 3 Toscanelli
Thursday 17/05/2018 10:30–12:30 Aula 3 Toscanelli
Friday 18/05/2018 10:30–12:30 Aula 3 Toscanelli

By popular demand, more sessions will be scheduled in the next few weeks.


(In no particular order):

  • 📃 Data input: importing data from “messy” files, reading common and exotic formats and directory structures; preserving metadata; web scraping.
  • 📐 Data transformation: “tidying” strategies for fixing common issues with data; text processing with regular expressions; reshaping (wide-long) tabular data; merging and appending data.
  • ♻️ Automation: using the pipe %>%; automate tasks with functional programming with purrr.
  • Tidy statistical modeling: automated approaches to estimation and diagnostics.
  • 📊 Visualization: plot data with ggplot2; principles and recipes for visualization.


Course slides.

Session notebooks: