Machine Learning (ST5401), 7.5 credits

Note

Note: I am no longer teaching this course, so this material does not necessarily reflect the current curriculum.

Machine Learning (ST5401), 7.5 credits

Introduction and aims

In this course you will learn how to formulate and organize practical machine learning problems, identify and estimate appropriate machine learning models for prediction and clustering, evaluate and select among different machine learning models and algorithms and implement machine learning models and algorithms in a programming language.

The course gives you knowledge about machine learning that is used within marketing, finance, economics, textual analysis, digital humanities and social sciences. You will encounter many different forms of data, including images and text.

The course covers a number of machine learning methods with a focus on prediction. The course deals with supervised and unsupervised machine learning as well as semi-supervised and active learning. The course includes flexible regression and classification, regularization, methods for predictive model performance evaluation, Gaussian processes, clustering algorithms and mixture models.

Lecturers

Mattias Villani
Professor of Statistics, Stockholm and Linköping University
Probabilistic machine learning and Bayesian methods

Frank Miller
Professor of Statistics, Stockholm University
Experimental design, active learning and optimization methods

Computer lab assistant

Karl Sigfrid
PhD student in Statistics, Stockholm University

Course description

The formal course description document with all the details about grading etc is here.

Course literature

The course will use the following book as the main course literature:

Machine Learning - a first course for engineers and scientists (MLES) by Lindholm et al. (2021). Forthcoming at Cambridge University Press. A free PDF version is available here. The previous title of the book was 'Supervised Machine Learning'.
Additional course material linked from this page, such as articles and tutorials.

Schedule

The course schedule on TimeEdit is here: Schedule.

Lectures

Material under Extra are extra material that will help you understand the course content.
Material under Bonus are not required course material, but may be of interest to the curious student.

Lecture 1 - Introduction, k-NN and decision trees
Reading: MLES 1-2 | Slides
Bonus: Python Jupyter notebook for linear regression | Python code for nonlinear regression

Lecture 2 - Regularized non-linear regression and classification
Reading: MLES 3 | Slides
Code: Spline regression: Notebook pdf html | Spline package demo \

Lecture 3 - Evaluating predictive performance and hyperparameter learning
Reading: MLES 4 | Slides
Bonus: Some slides about entropy

Lecture 4 - Ensemble methods
Reading: MLES 7 | Slides
Extra: Gradient boosting visualized | Gradient boosting playground

Lecture 5 - Learning from large-scale data
Reading: MLES 5 | Slides

Lecture 6 - Neural networks and Deep learning
Reading: MLES 6.1-6.2 | Slides
Code: Neural net MNIST in keras
Extras: Video on Neural networks | Video on learning a neural network | keras cheat sheet

Lecture 7 - Image data and convolutional neural networks
Reading: MLES 6.3-6.4 | Slides
Code: ConvNet MNIST in keras
Extras: Filter spreadsheet

Lecture 8 - Gaussian process regression and classification
Reading: MLES 9 | Slides
Extras: GP visualization

Lecture 9 - Unsupervised learning - mixture models and clustering
Reading: MLES 10.1-10.3 | Slides
Code: EM for univariate Gaussian mixtures | EM for multivariate Gaussian mixtures

Lecture 10 - Textual data and topic models
Reading: Multinomial-Dirichlet analysis | Topic models intro | Slides

Lecture 11 - Semi-supervised learning
Reading: MLES 10.1 | Slides

Lecture 12 - Active learning
Reading: Settles (2010), especially Sections 1, 2, 3.1, 3.5, 3.6, 7.1 | Slides
Code: Active learning - illustrating example

Computer labs

The three computer labs are central to the course. Expect to allocate substantial time for each lab. Many of the exam questions will be computer based, so working on the labs will also help you with the exam.
R will be used as the course's programming language, see below for more info.
The labs should be done in pairs of students.
Each lab report should be submitted as a PDF along with the .R file with code. Submission is done through Athena.
There are four hours of computer time allocated to each lab. The idea is that you:
- should start working on the lab before the computer session
- so that you are in a position to ask questions at the session
- and then finish up the report after the lab session.

Computer Lab 1 - Regularized nonlinear regression and classification.
Lab 1a: Regularized regression: R notebook | pdf version | html version
Lab 1b: Regularized classification: R notebook | pdf version | html version
Submission: Athena.

Computer Lab 2 - Neural Networks and Gaussian Processes.
Lab 2: R notebook | pdf version | html version
Submission: Athena.

Computer Lab 3 - Unsupervised, semisupervised and active learning.
Lab 3: R notebook | pdf version | html version
Submission: Athena.

Lab assistant: Karl Sigfrid

Examination

The course examination consists of:

Written lab reports (deadlines given in Athena)
Computer exam

R

Analyzing data in R will be big part of the course, so you need to know a little R programming. The course R programming 7.5 credits or equivalent course is a prerequisite for this course. If you feel a little rusty on R, you can find a lot of material for studying it online, including tutorials, videos and free books. Here are some material:
- Download R
- RStudio - probably the best software/editor for R.
- Official introduction to R
- R Cheat sheets
- The labs and exam will be done using R notebooks in RStudio.
Here are some machine learning packages in R:
- Machine learning R packages on CRAN.
- caret - a meta package for predictive ML models in R. See the Caret package vignette and a list of available models in Caret.
- keras - a package that brings Tensorflow for deep learning to R. Here is the quick start to keras.

Name		Name	Last commit message	Last commit date
Latest commit History 109 Commits
Code		Code
Data		Data
Labs		Labs
Literature		Literature
Misc		Misc
Slides		Slides
.gitignore		.gitignore
GPdata.RData		GPdata.RData
PalmerPenguinsData.png		PalmerPenguinsData.png
PalmerPenguinsData.svg		PalmerPenguinsData.svg
README.md		README.md
kNN4Penguins.svg		kNN4Penguins.svg

mattiasvillani/MLcourse

Folders and files

Latest commit

History

Repository files navigation

Machine Learning (ST5401), 7.5 credits

Introduction and aims

Lecturers

Computer lab assistant

Course description

Course literature

Schedule

Lectures

Computer labs

Lab assistant: Karl Sigfrid

Examination

R

About

Resources

Stars

Watchers

Forks

Languages