This repository contains the lab instructions, demonstration scripts, Python notebooks, and data for the course Machine Learning Fundamentals Part I - Classification Models.
The Python notebooks are designed to be run using the Google Colab environment.
This five-day instructor-led course is intended for IT professionals who wish to learn the fundamental principles of machine learning classification models. Students are typically developers or data analysts that have some programming skills, with a mathematical background. In this course, students will learn about classification machine learning models together with the algorithms that underpin these models. Students will also learn about the need for data quality, and how the correct selection of data can help to build more accurate models. Students will also learn how to test and validate models.
This course is generic, and doesn’t depend on any particular platform (Azure, AWS, Google Cloud etc), but does require some basic familiarity with Python. Students should also be familiar with the use of matrices and vectors in algebra, and have some basic understanding of differential calculus, probability and statistics.
Note: This course focusses on classification models. Two further courses cover cluster models and regression models. This course does not cover deep learning or neural networks, although the elements of this course can provide a foundation for further study in those areas.
This course is intended for developers and analysts who are new to machine learning, want to understand how machine learning models work, and need to understand how to build high-quality models. Attendees must have some familiarity with Python, an understanding of matrix and vector arithmetic and some basic familiarity with probability, statistics, and differential calculus.
After completing this course, students will be able to:
- Explain the uses of different types of machine learning models (classification, clustering, and regression).
- Understand common machine learning classification algorithms.
- Create and test a classification machine learning model.
- Create non-binary classification models.
- Summarize the statistical concepts utilized by many machine learning models.
- Select features for a classification model
- Measure the performance of a classification model
- Build classification models that can handle imbalanced datasets
Before attending this course, students must have:
- Familiarity with the Python programming language.
- An understanding of basic probability and statistics.
- Ideally, some familiarity with the basics of differential calculus.
To help you prepare for this class, review the following resources:
- Python for Beginners, at https://www.python.org/about/gettingstarted
- NumPy Quickstart, at https://numpy.org/doc/stable/user/quickstart.html
- Introduction to SciKit-Learn in Python, at https://www.section.io/engineering-education/introduction-to-scikit-learn-in-python/
- Statistics, at https://stats.libretexts.org/Bookshelves/Introductory_Statistics/Book%3A_Introductory_Statistics_(Shafer_and_Zhang)
- Calculus, at https://stats.libretexts.org/Courses/Lake_Tahoe_Community_College/Interactive_Calculus_Q1
-
Introduction to Machine Learning Models
This module introduces machine learning together with the classification, clustering, and regression machine learning models. Students will learn the purpose of these models, and the types of problems to which students can apply them.
Click here for the demonstration notebooks and data files.
-
Understanding Classification Algorithms
This module describes different algorithms that are commonly used to create a classification machine learning model. It provides a tour through the algorithms, summarizing their strengths and weaknesses, and when each is most appropriate.
-
Creating a Classification Model
This module provides an overview of the essential steps in building a machine learning model: data preparation, model construction and tuning, and testing and validation.
Click here for the demonstration notebooks and data files.
Click here for the lab instructions, notebooks, and data files.
-
Understanding Binary and Non-Binary Classification
This module describes the differences between binary and multi-valued classification and shows how to create a multi-class classification model.
Click here for the demonstration notebooks and data files.
Click here for the lab instructions, notebooks, and data files.
-
Reviewing Statistics Concepts
This module summarizes key statistics terminology, and some common techniques used to analyze the distribution, scale, and relationships between items in a dataset. This information is essential to understanding the validity of a machine learning model.
Click here for the demonstration notebooks and data files.
-
Exploring Data and Selecting Features and Algorithms
This module explains how to refine a machine learning model, by selecting the most relevant features from the dataset, examining the distribution of values, investigating correlation between features, normalizing data, and removing bias. This is useful in refining the features of the dataset used to create a machine learning model.
Click here for the demonstration notebooks and data files.
-
Measuring the Performance of a Classification Model
This module describes how to assess the accuracy and performance for a classification model, and how to balance precision and recall where appropriate.
Click here for the lab instructions, notebooks, and data files.
-
Understanding Imbalanced Classification
This module discusses the problems that can arise when using an imbalanced dataset to create a classification model, how to recognize potential problems, and how to address them.
Click here for the lab instructions, notebooks, and data files.