DSCI 571: Supervised Learning I
Welcome to DSCI 571, an introductory supervised machine learning course! In this course we will focus on basic machine learning concepts such as data splitting, cross-validation, generalization error, overfitting, the fundamental trade-off, the golden rule, and data preprocessing. You will also be exposed to common machine learning algorithms such as decision trees, K-nearest neighbours, SVMs, naive Bayes, and logistic regression using the scikit-learn framework.
2020-21 instructor: Varada Kolhatkar
Course Learning Outcomes
By the end of the course, students are expected to be able to:
- describe supervised learning and identify what kind of tasks it is suitable for;
- explain common machine learning concepts such as classification and regression, data splitting, overfitting, parameters and hyperparameters, and the golden rule;
- identify when and why to apply data pre-processing techniques such as imputation, scaling, and one-hot encoding;
- describe at a high level how common machine learning algorithms work, including decision trees, K-nearest neighbours, and naive Bayes;
- use Python and the
scikit-learnpackage to responsibly develop end-to-end supervised machine learning pipelines on real-- world datasets and to interpret your results carefully.
The following deliverables will determine your course grade:
|Lab Assignment 1||15%|
|Lab Assignment 2||15%|
|Lab Assignment 3||15%|
|Lab Assignment 4||15%|
We will be meeting three times every week: twice for lectures and once for the lab.
Lectures of this course will be a combination of pre-recorded videos and class discussions and activities. You are expected to watch the videos before the lecture. We'll spend the lecture time in group activities and Q&A sessions.
We are providing you with a
conda environment file which is available here. You can download this file and create a conda environment for the course and activate it as follows.
conda env create -f env-dsci-571.yaml conda activate 571
In order to use this environment in
Jupyter, you will have to install
nb_conda_kernels in the environment where you have installed
Jupyter (typically the
base environment). You will then be able to select this new environment in
Note that this is not a complete list of the packages we'll be using in the course and there might be a few packages you will be installing using
conda install later in the course. But this is a good enough list to get you started.
- A Course in Machine Learning (CIML) by Hal Daumé III (also relevant for DSCI 572, 573, 575, 563)
- Introduction to Machine Learning with Python: A Guide for Data Scientists by Andreas C. Mueller and Sarah Guido.
- The Elements of Statistical Learning (ESL)
- Data Mining: Practical Machine Learning Tools and Techniques (PMLTT)
- Artificial intelligence: A Modern Approach by Russell, Stuart and Peter Norvig.
- Artificial Intelligence 2E: Foundations of Computational Agents (2017) by David Poole and Alan Mackworth (of UBC!).
- Mike's CPSC 330
Mike is currently teaching an undergrad course on applied machine learning. Unlike DSCI 571, CPSC 330 is a semester-long course but there is a lot of overlap and sharing of notes between these courses. You might find the course useful.
- Mike's CPSC 340
- Machine Learning (Andrew Ng's famous Coursera course)
- Foundations of Machine Learning online course from Bloomberg.
- Machine Learning Exercises In Python, Part 1 (translation of Andrew Ng's course to Python, also relevant for DSCI 561, 572, 563)
- A Visual Introduction to Machine Learning (Part 1)
- A Few Useful Things to Know About Machine Learning (an article by Pedro Domingos)
- Metacademy (sort of like a concept map for machine learning, with suggested resources)
- Machine Learning 101 (slides by Jason Mayes, engineer at Google)
Please see the general MDS policies.