Introduction to Machine Learning
Optimized for | Description |
---|---|
OS | Linux* Ubuntu 20.04, 20 Windows* 10 |
Hardware | Skylake with GEN9 or newer |
Software | Intel® AI Analytics Tookkit, Jupyter Notebooks, Intel DevCloud |
pip install seaborn |
The Jupyter Notebooks in this training are inended to give professors and students an accesible but challenging introduction to machine learning. It enumerates and describes many commonly used Scikit-learn* allgorithms which are used daily to address machine learning challenges. It has a secondary benefit of demonstrating how to accelerate commonly used Scikit-learn algorithms for Intel CPUs using Intel Extensions for Scikit-learn* which is part of the Intel AI Analytics Toolkit powered by oneAPI.
This workshop is designed to be used on the DevCloud and includes details on submitting batch jobs on the DevCloud environment.
Code samples are licensed under the MIT license. See License.txt for details. Third party program Licenses can be found here: third-party-programs.txt
- Python* Programming
- Calculus
- Linear algebra
- Statistics
- 11 Modules (18 hours)
- 11 Lab Exercises
Modules | Description | Recommended Video | Duration |
---|---|---|---|
Introduction to Machine Learning and Tools | + Classify the type of problem to be solved. + Demonstrate supervised learning algorithms. + Choose an algorithm, tune parameters, and validate a model + Explain key concepts like under- and over-fitting, regularization, and cross-validation + Apply Intel Extension for Scikit-learn* patching to leverage underlying compute capabilities of hardware. |
Introduction to Intel(r) Extension for Scikit-learn | 60 min |
Supervised Learning and K Nearest Neighbors | + Explain supervised learning as applied to regression and classification problems. + Apply K-Nearest Neighbor (KNN) algorithm for classification. + Apply patching to leverage underlying compute capabilities of hardware |
KNearest Neighbor | 120 min |
Train Test Splits Validation Linear Regression | + Explain the difference between over-fitting and under-fitting + Describe Bias-variance tradeoffs + Find the optimal training and test data set splits. + Apply cross-validation + Apply a linear regression model for supervised learning. + Apply Intel® Extension for Scikit-learn* to leverage underlying compute capabilities of hardware |
Introduction to Intel(r) Extension for Scikit-learn | 120 min |
Regularization and Gradient Descent | + Explain cost functions, regularization, feature selection, and hyper-parameters + Summarize complex statistical optimization algorithms like gradient descent and its application to linear regression + Apply patching to leverage underlying compute capabilities of hardware |
N/A | 120 min |
Logistic Regression and Classification Error Metrics | + Describe Logistic regression and how it differs from linear regression + Identify metrics for classification errors and scenarios in which they can be used + Apply patching to leverage underlying compute capabilities of hardware |
Logistic Regression Walkthrough | 120 min |
SVM and Kernels | + Apply support vector machines (SVMs) for classification problems + Recognize SVM similarity to logistic regression + Compute the cost function of SVMs + Apply regularization in SVMs and some tips to obtain non-linear classifications with SVMs + Apply patching to leverage underlying compute capabilities of hardware |
N/A | 120 min |
Decision Trees | + Recognize Decision trees and apply them for classification problems + Recognize how to identify the best split and the factors for splitting + Explain strengths and weaknesses of decision trees + Explain how regression trees help with classifying continuous values + Describe motivation for choosing Random Forest Classifier over Decision Trees + Apply patching to Random Forest Classifier |
N/A | 120 min |
Bagging | + Describe bootstrapping and aggregating (aka “bagging”) to reduce variance + Reduce the correlation seen in bagging using Random Forest algorithm + Apply patching to leverage underlying compute capabilities of hardware |
N/A | 120 min |
Boosting and Stacking | + Explain how the boosting algorithm helps reduce variance and bias. + Apply patching to leverage underlying compute capabilities of hardware |
N/A | 120 min |
Introduction to Unsupervised Learning and Clustering Methods | + Describe unsupervised learning algorithms their application + Apply clustering + Apply dimensionality reduction + Apply patching to leverage underlying compute capabilities of hardware |
KMeans Walkthrough Introduction to Intel(r) Extension for Scikit-learn |
120 min |
Dimensionality Reduction and Advanced Topics | + Explain and Apply Principal Component Analysis (PCA) + Explain Multidimensional Scaling (MDS) + Apply patching to leverage underlying compute capabilities of hardware |
PCA Walkthrough | 120 min |
Each module folder has a Jupyter Notebook file (*.ipynb
), this can be opened in Jupyter Lab to view the training contant, edit code and compile/run.
The training content can be accessed locally on the computer after installing necessary tools, or you can directly access using Intel DevCloud without any installation.
The Jupyter Notebooks can be downloaded locally to computer and accessed:
- Install Jupyter Lab on local computer: Installation Guide
- Install Intel oneAPI Base Toolkit on local computer: Installation Guide
- git clone the repo and access the Notebooks using Jupyter Lab
- Navigate to the Introduction to Machine Learning folder
- pip install -r requirements.txt
The Jupyter notebooks are tested and can be run on Intel DevCloud without any installation necessary, below are the steps to access these Jupyter notebooks on Intel DevCloud:
- Register on Intel DevCloud
- Login, Get Started and Launch Jupyter Lab
- Open Terminal in Jupyter Lab
- [RECOMMENDED] Navigate to pre-populated samples folder:
cd oneAPI-samples/AI-and-Analytics/Jupyter/Introduction_to_Machine_Learning/
- [OPTIONAL] git clone the temporary patch repo and access the Notebooks https://github.com/IntelSoftware/Introduction_to_Machine_Learning.git
- pip install -r requirements.txt