Skip to content

mattiasvillani/MLcourse

Repository files navigation

Note

Note: I am no longer teaching this course, so this material does not necessarily reflect the current curriculum.

Machine Learning (ST5401), 7.5 credits


Introduction and aims

In this course you will learn how to formulate and organize practical machine learning problems, identify and estimate appropriate machine learning models for prediction and clustering, evaluate and select among different machine learning models and algorithms and implement machine learning models and algorithms in a programming language.

The course gives you knowledge about machine learning that is used within marketing, finance, economics, textual analysis, digital humanities and social sciences. You will encounter many different forms of data, including images and text.

The course covers a number of machine learning methods with a focus on prediction. The course deals with supervised and unsupervised machine learning as well as semi-supervised and active learning. The course includes flexible regression and classification, regularization, methods for predictive model performance evaluation, Gaussian processes, clustering algorithms and mixture models.


Lecturers


Mattias Villani
Professor of Statistics, Stockholm and Linköping University
Probabilistic machine learning and Bayesian methods


Frank Miller
Professor of Statistics, Stockholm University
Experimental design, active learning and optimization methods

Computer lab assistant


Karl Sigfrid
PhD student in Statistics, Stockholm University


Course description

The formal course description document with all the details about grading etc is here.

Course literature

The course will use the following book as the main course literature:

  • Machine Learning - a first course for engineers and scientists (MLES) by Lindholm et al. (2021). Forthcoming at Cambridge University Press. A free PDF version is available here. The previous title of the book was 'Supervised Machine Learning'.
  • Additional course material linked from this page, such as articles and tutorials.

Schedule

The course schedule on TimeEdit is here: Schedule.


Lectures

Material under Extra are extra material that will help you understand the course content.
Material under Bonus are not required course material, but may be of interest to the curious student.

Lecture 1 - Introduction, k-NN and decision trees
Reading: MLES 1-2 | Slides
Bonus: Python Jupyter notebook for linear regression | Python code for nonlinear regression

Lecture 2 - Regularized non-linear regression and classification
Reading: MLES 3 | Slides
Code: Spline regression: Notebook pdf html | Spline package demo \

Lecture 3 - Evaluating predictive performance and hyperparameter learning
Reading: MLES 4 | Slides
Bonus: Some slides about entropy

Lecture 4 - Ensemble methods
Reading: MLES 7 | Slides
Extra: Gradient boosting visualized | Gradient boosting playground

Lecture 5 - Learning from large-scale data
Reading: MLES 5 | Slides

Lecture 6 - Neural networks and Deep learning
Reading: MLES 6.1-6.2 | Slides
Code: Neural net MNIST in keras
Extras: Video on Neural networks | Video on learning a neural network | keras cheat sheet

Lecture 7 - Image data and convolutional neural networks
Reading: MLES 6.3-6.4 | Slides
Code: ConvNet MNIST in keras
Extras: Filter spreadsheet

Lecture 8 - Gaussian process regression and classification
Reading: MLES 9 | Slides
Extras: GP visualization

Lecture 9 - Unsupervised learning - mixture models and clustering
Reading: MLES 10.1-10.3 | Slides
Code: EM for univariate Gaussian mixtures | EM for multivariate Gaussian mixtures

Lecture 10 - Textual data and topic models
Reading: Multinomial-Dirichlet analysis | Topic models intro | Slides

Lecture 11 - Semi-supervised learning
Reading: MLES 10.1 | Slides

Lecture 12 - Active learning
Reading: Settles (2010), especially Sections 1, 2, 3.1, 3.5, 3.6, 7.1 | Slides
Code: Active learning - illustrating example


Computer labs

  • The three computer labs are central to the course. Expect to allocate substantial time for each lab. Many of the exam questions will be computer based, so working on the labs will also help you with the exam.

  • R will be used as the course's programming language, see below for more info.

  • The labs should be done in pairs of students.

  • Each lab report should be submitted as a PDF along with the .R file with code. Submission is done through Athena.

  • There are four hours of computer time allocated to each lab. The idea is that you:

    • should start working on the lab before the computer session
    • so that you are in a position to ask questions at the session
    • and then finish up the report after the lab session.

Computer Lab 1 - Regularized nonlinear regression and classification.
Lab 1a: Regularized regression: R notebook | pdf version | html version
Lab 1b: Regularized classification: R notebook | pdf version | html version
Submission: Athena.

Computer Lab 2 - Neural Networks and Gaussian Processes.
Lab 2: R notebook | pdf version | html version
Submission: Athena.

Computer Lab 3 - Unsupervised, semisupervised and active learning.
Lab 3: R notebook | pdf version | html version
Submission: Athena.

Lab assistant: Karl Sigfrid


Examination

The course examination consists of:

  • Written lab reports (deadlines given in Athena)
  • Computer exam

R


About

Machine Learning, 7.5 credits, a course that I have previously given at Department of Statistics at Stockholm University.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published