Skip to content

More advanced topics

François Briatte edited this page Sep 11, 2023 · 8 revisions

Here are some of the topics that the course does not cover due to time constraints.

Part 1. Introduction

1. Software

  • Python
  • Shell scripting
  • Compiled languages, with passing mention of {Rcpp}

2. Workflow

  • Dynamic documents a.k.a. notebooks (R Markdown, Quarto, Jupyter)
  • Versioning via Git/GitHub

Part 2. Data

3. Data manipulation

  • Wrangling
  • Special data types
    • Factors and labelled data with {labelled}
    • Time series with {tsibble} and windowed functions
    • Simple features with {sf}
  • Databases
    • SQL etc. with {dbplyr} and friends
    • SQL from the command line, e.g. Postgres -- see this book
    • Passing mentions of big data packages

4. APIs and Web scraping

  • HTTP, GET and POST, user agents
  • Example APIs with {httr}
  • Web scraping with {rvest}

Part 3. Models

5. Linear and nonlinear models

OLS and logit, plus ordinal, multinomial and count data, à la Long and Freese, but using R instead of Stata. Heiss 2020 covers everything.

  • Refresher on linear models
  • Refresher on logit and MLE
  • Ordered logit with MASS::polr
  • Multinomial logit with nnet::multinom and {mlogit}
  • Count models, Beta regression
  • Reminder on survey weights

6. Panel models

Fixed and random effects, standard error clustering, and other Wooldrige-type econometrics. Use Heiss 2020 again, and Zorn 2023.

Teachable example: Swiss et al., "Does Critical Mass Matter? Women’s Political Representation and Child Health in Developing Countries", 2012

7. Multilevel models

Hierarchical data and mixed models, using Gelman et al. 2022 and Bayesian estimators.

  • How it works
  • Estimation with {lme4}
  • Bayesian estimation with {brms}

8. Natural experiments

A session on natural experiments and causal inference, but focused on a single technique in the workshop hour.

This course has excellent slides and lots of interesting examples in the labs (and assignments). This other one might also have more.

  • Notes on causal inference, DAGs, ‘natural’ experiments, survey experiments
  • Overview of RDD, DiD, IV designs
  • Demo: synthetic controls ({gsynth})

9. Time series

References:

Topics:

  • Basics, serial correlation and autoregression
  • Forecasting
  • GAMs with {gratia}
  • Changepoint detection

Part 4. Extras

10. Networks

  • Data
  • Viz
  • Models (ERGMs)

11. Text

  • Data
  • Models (LDA, topic models)

12. Space

  • Maps
  • Spatial dependence

Unlisted stuff

13. Event/duration data

Event history/survival analysis, using Mills 2010 or something more recent, like this forthcoming handbook chapter that comes with its own example(s).

Possible examples:

  • Swiss and Fallon, "Women’s Transnational Activism, Norm Cascades, and Quota Adoption in The Developing World" (2017)
  • Hughes, "Windows of Political Opportunity: Institutional Instability and Gender Inequality in the World's National Legislatures" (2007)

14. Machine learning

… with tidymodels.

Could be its own course. In almost random order:

  1. Setup
  2. Workflow: training, CV
  3. Learning via L1/L2 regularization
  4. SVD and PCA
  5. Correspondence analysis (CA, MCA)
  6. Random forests
  7. CART, gradient-boosted trees
  8. KNN
  9. SVM
  10. ...
  11. ...
  12. ...

15. JavaScript libraries

  • interactive graphics, d3.js, Observable
  • Shiny (cool example)
  • maps with Leaflet
  • 3D renderers