Workshop: Data Science with R in the tidyverse
This repository contains the material for the PhD Workshop R for Data Science in the tidyverse by Matteo Sostero at Sant'Anna School of Advanced Studies (Pisa) in May 2018. The workshop is intended mainly for first-year PhDs in Economics, but all members of the Computational Modellers Society are welcome to attend!
Image credit: Kevin Ku
The workshop covers topics on data science throughout a typical research project using the tidyverse. No prior knowledge of R is required! Hopefully even those of you familiar with “base R” (but not the tidyverse) will find something new.
The material is based on R for Data Science by Garrett Grolemund and Hadley Wickham. See also their book R for Data Science: Import, Tidy, Transform, Visualize, and Model Data published by O'Reilly Media, 2017.
Please bring your laptop
Preparation (10 minutes
- Install an up-to-date version of R;
- Install an up-to-date version of RStudio;
- (Windows only) install Rtools;
- Install git on your system;
- Get a copy of the material by creating a new project in RStudio:
File > New Project > Version Control > Git
- The repository URL is https://github.com/CoMoS-SA/workshop-R-tidyverse.git
- Create project as a subdirectory of your choice;
☑️“Open in new session”
- In RStudio, install
|Wednesday 16/05/2018||10:30–12:30||Aula 3 Toscanelli|
|Thursday 17/05/2018||10:30–12:30||Aula 3 Toscanelli|
|Friday 18/05/2018||10:30–12:30||Aula 3 Toscanelli|
By popular demand, more sessions will be scheduled in the next few weeks.
(In no particular order):
📃Data input: importing data from “messy” files, reading common and exotic formats and directory structures; preserving metadata; web scraping. 📐Data transformation: “tidying” strategies for fixing common issues with data; text processing with regular expressions; reshaping (wide-long) tabular data; merging and appending data. ♻️Automation: using the pipe
%>%; automate tasks with functional programming with purrr.
✨Tidy statistical modeling: automated approaches to estimation and diagnostics. 📊Visualization: plot data with ggplot2; principles and recipes for visualization.