Skip to content

UBC-MDS/DSCI_571_sup-learn-1

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 

Repository files navigation

In this course we will focus on basic machine learning concepts such as data splitting, cross-validation, generalization error, overfitting, the fundamental trade-off, the golden rule, and data preprocessing. You will also be exposed to common machine learning algorithms such as decision trees, $k$-nearest neighbours, SVMs, naive Bayes, and logistic regression using the scikit-learn framework.

LICENSE

© 2022 Varada Kolhatkar and Mike Gelbart

Software licensed under the MIT License, non-software content licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) License. See the license file for more information.

Important links

Course learning outcomes

Click to expand!

By the end of the course, students are expected to be able to:

  • describe supervised learning and identify what kind of tasks it is suitable for;
  • explain common machine learning concepts such as classification and regression, data splitting, overfitting, parameters and hyperparameters, and the fundamental trade-off in machine learning;
  • identify when and why to apply data pre-processing techniques such as imputation, scaling, ordinal encoding, and one-hot encoding;
  • broadly describe the intuition behind common machine learning algorithms, including decision trees, K-nearest neighbours, naive Bayes, and logistic regression;
  • use Python and the scikit-learn package to responsibly develop end-to-end supervised machine learning pipelines on real-world datasets

Deliverables

Click to expand!

The following deliverables will determine your course grade:

Assessment Weight Where to submit
Lab Assignment 1 15% Gradescope
Lab Assignment 2 15% Gradescope
Lab Assignment 3 15% Gradescope
Lab Assignment 4 15% Gradescope
Quiz 1 20% Canvas
Quiz 2 20% Canvas

See Calendar for the due dates.

Teaching Team

Click to expand!
Role Name Slack Handle GHE Handle
Lecture Instructor Varada Kolhatkar @varada @kvarada
Lab Instructor Varada Kolhatkar @varada @kvarada
Teaching Assistant Sana Ayromlou
Teaching Assistant Keng Man Glenn Chang
Teaching Assistant Colby DeLisle
Teaching Assistant Farnoosh Hashemi
Teaching Assistant Alireza Iranpour
Teaching Assistant Faeze Keshavarz
Teaching Assistant Daniel Ramandi

Lectures

Format

Click to expand!

This class will follow a semi-flipped classroom format. You will be required to watch a few pre-recorded videos (~30 to ~50 min long) before most of the classes. During lecture time we will focus on more examples, exercises, Q&A, discussions, demos, and class activities. It's optional but highly recommended to download the appropriate datasets provided below and put them under your local lectures/data directory, and run the lecture Jupyter notebooks on your own and experiment with the code.

Lecture schedule

This course occurs during Block 2 in the 2022/23 school year.

Lecture Topic Assigned videos Resources and optional readings
Course information 📹 Pre-watch: 1.0
1 Terminology, baselines, decision trees 📹 Pre-watch: 2.1, 2.2, 2.3, 2.4
2 ML fundamentals 📹 Pre-watch: 3.1, 3.2, 3.3, 3.4 An article by Pedro Domingos
3 $k$-NNs, SVM RBF 📹 Pre-watch: 4.1, 4.2, 4.3, 4.4, 5.1
4 Preprocessing, pipelines, column transformer 📹 Pre-watch: 5.2, 5.3, 5.4, 6.1
5 More preprocessing, text features 📹 Pre-watch: 6.2
6 Hyperparameter optimization, optimization bias 📹 Pre-watch: 8.1, 8.2
7 Naive Bayes None
  • Conditional probability visualization
  • Naive Bayes chapter, Jurafsky and Martin
  • 8 Logistic Regression, multi-class classification 📹 Pre-watch: 7.1, 7.2, 7.3

    Datasets

    Here is the list of Kaggle datasets we'll use in this class.

    If you want to be extra prepared, you may want to download these datasets in advance and save them under the lectures/data directory in your local copy of the repository.

    Labs

    During labs, you will be given time to work on your own or in groups. There will be a lot of opportunity for discussion and getting help during lab sessions. (Usually I enjoy labs a lot. It's also an opportunity for me to know you a bit better 🙂.)

    Installation

    We are providing you with a conda environment file which is available here. You can download this file and create a conda environment for the course and activate it as follows.

    conda env create -f env-dsci-571.yaml
    conda activate 571
    

    In order to use this environment in Jupyter, you will have to install nb_conda_kernels in the environment where you have installed Jupyter (typically the base environment). You will then be able to select this new environment in Jupyter. For more details on this, refer to "Making environments work well with JupyterLab section" in your 521 lecture 6.

    I've only tried installing this environment file on a couple of machines, and it's possible that you will encounter problems with some of the packages from the yml file when you run the commands above. This is not unusual. It often means that the package with the given version is not available for your operating system via conda yet. There are a couple of options for you when this happens:

    1. Get rid of the line with that package from the yml file.
    2. Create the environment without that package.
    3. Activate the environment and install the package manually either with conda install or pip install in the environment.

    Note that this is not a complete list of the packages we'll be using in the course and there might be a few packages you will be installing using conda install later in the course. But this is a good enough list to get you started.

    Course communication

    Click to expand!

    We all are here to help you learn and succeed in the course and the program. Here is how we'll be communicating with each other during the course.

    Clarifications on the lecture notes or lab questions

    If there is any clarification on the lecture material or lab questions, I'll open an issue in the course repository and tag you. It is your responsibility to read the messages whenever you are tagged. (I know that there are too many things for you to keep track of. You do not have to read all the messages but please make sure to carefully read the messages whenever you are tagged.)

    Questions on lecture material or labs

    If you have questions about the lecture material or lab questions please post them on the course Slack channel rather than direct messaging me or the TAs. Here are the advantages of doing so:

    • You'll get a quicker response.
    • Your classmates will benefit from the discussion.

    When you ask your question on the course channel, please avoid tagging the instructor unless it's specific for the instructor (e.g., if you notice some mistake in the lecture notes). If you tag a specific person, other teaching team members or your colleagues are discouraged to respond. This is decrease the response rate on the channel.

    Please use some consistent convention when you ask questions on Slack to facilitate easy search for others or future you. For example, if you want to ask a question on Exercise 3.2 from Lab 1, start your post with the label lab1-ex2.3. Or if you have a question on lecture 2 material, start your post with the label lecture2. Once the question is answered/solved, you can add "(solved)" tag before the label (e.g., (solved) lab1-ex2.3. Do not delete your post even if you figure out the answer on your own. The question and the discussion can still be beneficial to others.

    Questions related to grading

    For each deliverable, after I return grades, I'll let you know who has graded what in our course Slack by opening an issue in the course GitHub repository. If you have questions related to grading

    • First, make sure your concerns are reasonable (read the "Reasonable grading concerns" policy).
    • If you believe that your request is reasonable, open a regrade request on Gradescope.
    • If you are unable to resolve the issue with the TA, send a Slack message to the instructor, including the appropriate TA in the conversation.

    Questions related to your personal situation or talking about sensitive information

    I am open for a conversation with you. If you want to talk about anything sensitive, please direct message me on Slack (and tag me) rather than posting it on the course channel. It might take a while for me to get back to you, but I'll try my best to respond as soon as possible.

    Working during the COVID-19 global pandemic

    Click to expand!

    We are working together on this course during this transition period between hybrid to in-person teaching and learning. Everyone is struggling to some extent. If you tell me you are having trouble, I am not going to judge you or think less of you. I hope you will extend me the same grace! Let's try to be open with each other and help each other.

    Here are some ground rules:

    • If you are unable to submit a deliverable on time, please reach out before the deliverable is due.
    • If you need extra support, the teaching team is here to work with you. Our goal is to help each of you succeed in the course.
    • If you are struggling with the material, getting back to in-person teaching and learning, or anything else, please reach out. I will try to find time and listen to you empathetically.
    • If I am unable to help you, I might know someone who can. UBC has some great student support resources.

    Please read Covid Campus Rules.

    Masks: This class is going to be in person. UBC no longer requires students, faculty and staff to wear non-medical masks, but continues to recommend that masks be worn in indoor public spaces.

    Your personal health: If you are ill or believe you have COVID-19 symptoms or been exposed to SARS-CoV-2 use the Thrive Health self-assessment tool for guidance, or download the BC COVID-19 Support App for iOS or Android device and follow the instructions provided. Follow the advice from Public Health.

    Stay home if you have recently tested positive for COVID-19 or are required to quarantine. You can check this website to find out if you should self-isolate or self-monitor. If you are unable to submit a deliverable on time or unable to appear for an in-person quiz, check out MDS policies on academic concession and remote quiz requests.

    Your precautions will help reduce risk and keep everyone safer. In this class, the marking scheme is intended to provide flexibility so that you can prioritize your health and still be able to succeed:

    • All course notes will be provided online.
    • All homework assignments can be done and handed in online.
    • All exams will be held online.
    • Most of the class activity will be video recorded and will be made available to you.
    • Before each class, I'll also try to post some videos on YouTube to facilitate hybrid learning.
    • There will be at least a few office hours which will be held online.

    Reference Material

    Click to expand!

    Books

    Online courses

    Misc

    Policies

    Please see the general MDS policies.

    About

    No description or website provided.

    Topics

    Resources

    License

    Stars

    Watchers

    Forks

    Releases

    No releases published

    Packages

    No packages published