Skip to content

Elsewhere

François Briatte edited this page Nov 3, 2023 · 20 revisions

Where else can you learn data science, how is it taught, and what topics are covered? A short selection of similar courses in economics, political science, statistics, and a few other disciplines, with details on the additional topics.

For a more exhaustive review of similar material, see the Awesome Computational Social Science list.

Data science

Roger Beecham, Computational Methods for Social Data Science (University of Essex, 2022)

An excellent short summer school, especially for its interesting examples of ecological and spatial models.

Jenny Bryan, Data Wrangling, Exploration, and Analysis with R (fml. University of British Columbia, c. 2019)

A course that covers the same basic topics as ours, with a few more things: Git/GitHub, R Markdown, Shiny, and Web scraping.

Mine Çetinkaya-Rundel, Intro to Data Science (Duke University, 2018)

A fairly extensive course that covers the basics and also spends a bit some time on inference by explaining Simpson's paradox, bootstrapping, the CLT and simulation by sampling from distributions, and Bayesian inference. Also covers Web scraping, Shiny, and a quick tour of tidy text analysis.

Di Cook and Michael Lydeamore, Exploratory Data Analysis (Monash University, 2023)

As the name indicates, this is what we should all be teaching first: how to study single variables, and then relationships and comparisons, and then model-dependent formulations of what the data exhibit. Many links to great R tutorials and packages, e.g. {tsibble} for time series.

David Dreyer Lassen and Sebastian Barfort, Social Data Science (University of Copenhagen, 2015)

A course intended for economists, which covers the same topics as we do, except for one session on machine learning that covers KNN and CART. The summer school version, from 2016, does not have the very useful slides from the original.

Michael K. Freeman, Technical Foundations of Informatics (University of Washington, 2016)

A broad-ranging course that dips into APIs, Git, Plotly, R Markdown Shiny, all of which are relevant to our purposes, but would require far more time to be all covered. Designed as a showcase of modern data science technology rather than as a 'statistics++' course, like ours. (The instructor also has an interesting course on client-side Web development, which is definitely off-topic.)

Rafael A. Irizarry et al. Introduction to Data Science (Harvard University, 2022)

A course that follows the chapters of the Irizarry handbook that we use in the course. Covers data wrangling and visualization, statistical inference and modeling, and machine learning. The GitHub repository has lots of material, and the ml folder, in particular, goes beyond the scope of our own course by covering e.g. regression trees with {rpart}, and random forests.

Jeff T. Leek, Advanced Data Science (DataCamp, 2017)

Another broad-ranging course that covers many of the topics mentioned elsewhere on this page, including text analysis and topic models, dimension reduction, smoothing, machine learning, deep learning and more.

Grant McDermott, Data Science for Economists (University of Oregon, 2021)

Another course to check if you want to get introduced to more data science technology than just 'basic' R. Included: using the shell, data wrangling with {data.table}, Web scraping and APIs, R programming (functions and parallelisation), Docker, big data with the Google Compute Engine (GCE), databases with SQL and BigQuery, and Apache Spark.

Katherine Ognyanova, Computational Social Science (Rutgers University, 2023)

An introductory course with a very interesting reading list, and many sessions on network analysis and text mining. The other two topics that this course touches upon and that I would like to spend more time on myself are APIs and machine learning.

Albert Rapp, Yet Again: R + Data Science (Ulm University, 2021)

A course that covers the same bases as we do, with a few more things on classification (random forests) and managing R projects.

Cosma R. Shalizi, Statistical Computing (Carnegie Mellon University, 2014)

Statistical computing, i.e. data science before it was called data science. For a more recent version of the course, see this version, by Ryan Tibshirani.

Ista Zahn et al., Data Science Workshops (Harvard University, 2021)

A series of workshops that cover the same topics as we do, but also showing how to do it with Python and Stata.

Data wrangling

Friedrich Geiecke, Data for Data Scientists (London School of Economics and Political Science, 2022)

A purely data-focused course that covers text and regex, Web scraping and APIs, and databases (SQL, BigQuery, MongoDB). Many links to useful resources, e.g. Steinert-Threlkeld, Twitter as Data, Cambridge University Press, 2018, or this {rvest} tutorial for Web scraping.

Kieran J. Healy, Data Wrangling (Statistical Horizons, 2022)

A short course focused on learning {dplyr} and regular expressions. Also covers iterations via the purrr::map functions, and uses {broom} to show model manipulations.

Kate Saunders et al., Wild-caught data (Monash University, 2023)

A course that takes the approach of having students start with data: "let them eat cake (first)." Basic (and not-so-basic) data wrangling, and lots of case studies.

Data visualization

Chris Adolph, Visualizing Data and Models (University of Washington, 2023)

An advanced data visualization course, using base R, {ggplot2} and {tile}, a package coded by the main course instructor.

Jessica Cooperstone, Data Visualization in R (Ohio State University, 2023)

A course that covers R basics, visualization principles, and {ggplot2} basics, as well as correlations and principal components analysis.

Andrew Heiss, Data Visualization (Georgia State University, 2023)

A wide-ranging course on the principles of data visualization, with videos of the lectures and many, many interesting examples of how to visualize e.g. uncertainty or spatial data.

Gaston Sanchez, Data Visualization (University of California, Berkeley, n.d.)

Lots of amazingly well-designed, well-thought slides by Gaston, whom I met once at a conference, and who struck me as one of the nicest and most skilled instructors that I have ever met in over 15 years of meeting people interested in teaching data science in all its guises.

Machine learning

Very few links below, but some of the topics that they cover, e.g. CART, are also covered in some of the courses mentioned in the other sections.

Alison Hill and Garrett Grolemund, Introduction to Machine Learning in the tidyverse (rstudio::conf, 2020)

This workshop is cited in the syllabus when machine learning is mentioned, because it covers the fundamentals: classification, ensembling, cross-validation and so on. In my view, this is where to start. For more similar material, see Machine learning with tidymodels and the Tidy Modeling with R book by Max Kuhn and Julia Silge, 2023.

Emmanuel Flachaire and Ewen Gallic, Machine Learning and Statistical Learning (Aix-Marseille University, n.d.)

Two short courses that cover a lot of topics, using R (and sometimes Python): logistic regression, K-Nearest Neighbours (KNN), Support Vector Machines (SVM), ridge and lasso Regression (L1/L2 regularisation), classification and regression trees (CART), bagging, random forests, boosting, neural networks and deep learning…

Statistics

Florian M. Hollenbach, Introduction to Political Science Research & Methods (Copenhagen Business School, 2018)

A course that will give you an idea of what basic quantitative skills are expected from political scientists. On the menu: causal inference, observational studies, measurement, prediction, linear regression, probability, uncertainty, and hypothesis testing.

Kosuke Imai, Quantitative Social Science: An Introduction (Princeton University Press, 2017)

The material from the book, which we use for a few sessions. See also the QSS-swirl exercises that come with it.

Steven M. Miller, Foundations of Social Science Research for Public Policy (Clemson University, 2021)

A course that goes deeper into regression than we do, by also covering ordinal logit, instrumental variables, regression discontinuity designs, plus Bayesian inference. For a good stats refresher, see the undergraduate Quantitative Methods in Political Science. Oh, and the author has designed his own data science course for social scientists, to be taught one day.

Danielle Navarro, Learning Statistics with R (University of Adelaide, c. 2011)

An introductory course that turned into a book, intended for psychologists: covers the same topics as we do, plus analysis of variance (ANOVA) and Bayesian statistics.

Germán Rodríguez, Generalized Linear Models (Princeton University, 2016)

Many Stata users will already be familiar with the excellent teaching material put together by Germán Rodríguez, who has also written the corresponding R code for this course, as well as his other ones on multilevel models, survival analysis and demographic methods. Each course covers much more advanced modelling strategies than we do.

Matthew Sagalnik, Advanced Data Analysis for the Social Sciences (Princeton University, 2015)

A good overview of the more advanced topics: Maximum Likelihood (MLE), logit and probit, Poisson regression and count models, and mixed/multilevel models. The Spring 2018 version, Advanced Social Statistics, covers slightly different grounds, e.g. it ends on machine learning.

Cosma R. Shalizi, Modern Regression (Carnegie Mellon University, 2015)

A course that gave birth to The Truth About Linear Regression, a very detailed take on the topic, with many examples and breakdowns of how linear regression actually works. Coded in base R, and far too detailed for our own purposes.

Cosma R. Shalizi, Undergraduate Advanced Data Analysis (Carnegie Mellon University, 2019)

A very detailed and in-depth course that led to a very detailed and in-depth handbook: Advanced Data Analysis from an Elementary Point of View. As the name indicates, it goes back to the statistical fundamentals, which require a solid background in statistical theory to make sense.

Anton Strezhnev, Causal Inference (University of Chicago, 2023)

A full treatment of natural experiments and associated designs, including instrumental variables, differences-in-differences, and regression discontinuity designs. See also the other courses in the same department.

Omar Wasow, Applied Quantitative Analysis (Princeton University, 2021)

The same topics that we cover, plus more on missing data, matching (Coarsened Exact Matching, Propensity Score Matching, Mahalanobis Matching), panel data and fixed effects, and 'natural' experiments.

Surveys

Some material (not necessarily courses) on the topic.

Anthony Joseph Damico, Analyze Survey Data for Free (n.d.)

The absolute reference (according to no one else by myself) on using survey data in R. Initially an online book, which was then turned into an actual R package. The code is very sensitive to website changes, and has not been actively maintained since c. 2017, but the wealth of information still stands.

Andi Fugard, Intermediate Quantitative Social Research (Birkbeck, University of London, 2017-2020)

A tutorial with many well-coded examples, including some using the European Social Survey.

Federico Vegetti, Introduction to Survey Statistics (University of Heidelberg)

Another tutorial that uses the European Social Survey, although with a different weighting scheme than the one used by Andy Fugard, which is yet different from the one used by Daniel Oberski in this script. I stil need to figure out whether the results differ, and which one is most correct for each round.

Stephanie Zimmer et al., Tidy Survey Book, 2023

Draft of a forthcoming book, with many examples using the {survey} and {srvyr} packages. Based on a workshop delivered in 2022.