Skip to content
Switch branches/tags


Failed to load latest commit information.

DSCI 571: Supervised Learning I

Welcome to DSCI 571, an introductory supervised machine learning course! In this course we will focus on basic machine learning concepts such as data splitting, cross-validation, generalization error, overfitting, the fundamental trade-off, the golden rule, and data preprocessing. You will also be exposed to common machine learning algorithms such as decision trees, K-nearest neighbours, SVMs, naive Bayes, and logistic regression using the scikit-learn framework.

2020-21 instructor: Varada Kolhatkar

Course Learning Outcomes

By the end of the course, students are expected to be able to:

  • describe supervised learning and identify what kind of tasks it is suitable for;
  • explain common machine learning concepts such as classification and regression, data splitting, overfitting, parameters and hyperparameters, and the golden rule;
  • identify when and why to apply data pre-processing techniques such as imputation, scaling, and one-hot encoding;
  • describe at a high level how common machine learning algorithms work, including decision trees, K-nearest neighbours, and naive Bayes;
  • use Python and the scikit-learn package to responsibly develop end-to-end supervised machine learning pipelines on real-- world datasets and to interpret your results carefully.


The following deliverables will determine your course grade:

Assessment Weight
Lab Assignment 1 15%
Lab Assignment 2 15%
Lab Assignment 3 15%
Lab Assignment 4 15%
Quiz 1 20%
Quiz 2 20%

Class Meetings

We will be meeting three times every week: twice for lectures and once for the lab.

Lecture format

Lectures of this course will be a combination of pre-recorded videos and class discussions and activities. You are expected to watch the videos before the lecture. We'll spend the lecture time in group activities and Q&A sessions.

Lecture Schedule

Lecture Topic Datasets Resources and optional readings
Motivation and course information
  • Indian Liver Patient Records
  • House Sales in King County
  • IMDB movie reviews
  • 1 Terminology, baselines, decision trees
  • House Sales in King County
  • Canada US cities toy dataset
  • 2 ML fundamentals
  • Canada US cities toy dataset
  • 3 kNNs, SVM RBF
  • Canada US cities toy dataset
  • Spotify Song Attributes
  • 4 Preprocessing and pipelines
  • Spotify Song Attributes
  • California Housing
  • 5 Categorical features and text features
  • The adult census dataset
  • 6 Hyperparameter optimization, optimization bias
  • The adult census dataset
  • 7 Naive Bayes
  • SMS Spam Collection Dataset
  • Conditional probability visualization
    8 Logistic Regression, multi-class classification
  • SMS Spam Collection Dataset
  • Installation

    We are providing you with a conda environment file which is available here. You can download this file and create a conda environment for the course and activate it as follows.

    conda env create -f env-dsci-571.yaml
    conda activate 571

    In order to use this environment in Jupyter, you will have to install nb_conda_kernels in the environment where you have installed Jupyter (typically the base environment). You will then be able to select this new environment in Jupyter.

    Note that this is not a complete list of the packages we'll be using in the course and there might be a few packages you will be installing using conda install later in the course. But this is a good enough list to get you started.

    Reference Material


    Online courses



    Please see the general MDS policies.