Part 1. Introduction

1. Software

Python
Shell scripting
~~Compiled languages, with passing mention of {Rcpp}~~

2. Workflow

Dynamic documents a.k.a. notebooks (R Markdown, Quarto, Jupyter)
Versioning via Git/GitHub

Part 2. Data

3. Data manipulation

Wrangling
- Pivoting
- New {dplyr} functions, incl. complex joins and row-wise operations
- Vectorization, iteration, mapping onto lists
Special data types
- Factors and labelled data with {labelled}
- Time series with {tsibble} and windowed functions
- Simple features with {sf}
Databases
- SQL etc. with {dbplyr} and friends
- SQL from the command line, e.g. Postgres -- see this book
- Passing mentions of big data packages

4. APIs and Web scraping

HTTP, GET and POST, user agents
Example APIs with {httr}
Web scraping with {rvest}

Part 3. Models

5. Linear and nonlinear models

OLS and logit, plus ordinal, multinomial and count data, à la Long and Freese, but using R instead of Stata. Heiss 2020 covers everything.

Refresher on linear models
Refresher on logit and MLE
Ordered logit with MASS::polr
Multinomial logit with nnet::multinom and {mlogit}
Count models, Beta regression
Reminder on survey weights

6. Panel models

Fixed and random effects, standard error clustering, and other Wooldrige-type econometrics. Use Heiss 2020 again, and Zorn 2023.

Linear sandwiches and the like
FE with {fixest}
RE with {plm}

Teachable example: Swiss et al., "Does Critical Mass Matter? Women’s Political Representation and Child Health in Developing Countries", 2012

7. Multilevel models

Hierarchical data and mixed models, using Gelman et al. 2022 and Bayesian estimators.

How it works
Estimation with {lme4}
Bayesian estimation with {brms}

8. Natural experiments

A session on natural experiments and causal inference, but focused on a single technique in the workshop hour.

This course has excellent slides and lots of interesting examples in the labs (and assignments). This other one might also have more.

Notes on causal inference, DAGs, ‘natural’ experiments, survey experiments
Overview of RDD, DiD, IV designs
Demo: synthetic controls ({gsynth})

9. Time series

References:

Use Heiss 2020 again.
Hanck et al., Introduction to Econometrics with R (2023)
Hyndman and Athanasopoulos, Forecasting: Principles and Practice (2021)

Topics:

Basics, serial correlation and autoregression
Forecasting
GAMs with {gratia}
Changepoint detection

Part 4. Extras

10. Networks

Data
Viz
Models (ERGMs)

11. Text

Data
Models (LDA, topic models)

12. Space

Maps
Spatial dependence

Unlisted stuff

13. Event/duration data

Event history/survival analysis, using Mills 2010 or something more recent, like this forthcoming handbook chapter that comes with its own example(s).

Possible examples:

Swiss and Fallon, "Women’s Transnational Activism, Norm Cascades, and Quota Adoption in The Developing World" (2017)
Hughes, "Windows of Political Opportunity: Institutional Instability and Gender Inequality in the World's National Legislatures" (2007)

14. Machine learning

… with tidymodels.

Could be its own course. In almost random order:

Setup
Workflow: training, CV
Learning via L1/L2 regularization
SVD and PCA
Correspondence analysis (CA, MCA)
Random forests
CART, gradient-boosted trees
KNN
SVM
...
...
...

15. JavaScript libraries

interactive graphics, d3.js, Observable
Shiny (cool example)
maps with Leaflet
3D renderers

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

More advanced topics