Skip to content

Introduction to Machine Learning with focus on Scikit-learn* algorithms and how to accelerate those algorithms with a couple of line of code on CPU using Intel Extensions for Scikit-learn

IntelSoftware/Introduction_to_Machine_Learning

Repository files navigation

Title

Introduction to Machine Learning

Requirements

Optimized for Description
OS Linux* Ubuntu 20.04, 20 Windows* 10
Hardware Skylake with GEN9 or newer
Software Intel® AI Analytics Tookkit, Jupyter Notebooks, Intel DevCloud
pip install seaborn

Purpose

The Jupyter Notebooks in this training are inended to give professors and students an accesible but challenging introduction to machine learning. It enumerates and describes many commonly used Scikit-learn* allgorithms which are used daily to address machine learning challenges. It has a secondary benefit of demonstrating how to accelerate commonly used Scikit-learn algorithms for Intel CPUs using Intel Extensions for Scikit-learn* which is part of the Intel AI Analytics Toolkit powered by oneAPI.

This workshop is designed to be used on the DevCloud and includes details on submitting batch jobs on the DevCloud environment.

License

Code samples are licensed under the MIT license. See License.txt for details. Third party program Licenses can be found here: third-party-programs.txt

Content Details

Pre-requisites

  • Python* Programming
  • Calculus
  • Linear algebra
  • Statistics

Syllabus

  • 11 Modules (18 hours)
  • 11 Lab Exercises

Modules Description Recommended Video Duration
Introduction to Machine Learning and Tools + Classify the type of problem to be solved.
+ Demonstrate supervised learning algorithms.
+ Choose an algorithm, tune parameters, and validate a model
+ Explain key concepts like under- and over-fitting, regularization, and cross-validation
+ Apply Intel Extension for Scikit-learn* patching to leverage underlying compute capabilities of hardware.
Introduction to Intel(r) Extension for Scikit-learn 60 min
Supervised Learning and K Nearest Neighbors + Explain supervised learning as applied to regression and classification problems.
+ Apply K-Nearest Neighbor (KNN) algorithm for classification.
+ Apply patching to leverage underlying compute capabilities of hardware
KNearest Neighbor 120 min
Train Test Splits Validation Linear Regression + Explain the difference between over-fitting and under-fitting
+ Describe Bias-variance tradeoffs
+ Find the optimal training and test data set splits.
+ Apply cross-validation
+ Apply a linear regression model for supervised learning.
+ Apply Intel® Extension for Scikit-learn* to leverage underlying compute capabilities of hardware
Introduction to Intel(r) Extension for Scikit-learn 120 min
Regularization and Gradient Descent + Explain cost functions, regularization, feature selection, and hyper-parameters
+ Summarize complex statistical optimization algorithms like gradient descent and its application to linear regression
+ Apply patching to leverage underlying compute capabilities of hardware
N/A 120 min
Logistic Regression and Classification Error Metrics + Describe Logistic regression and how it differs from linear regression
+ Identify metrics for classification errors and scenarios in which they can be used
+ Apply patching to leverage underlying compute capabilities of hardware
Logistic Regression Walkthrough 120 min
SVM and Kernels + Apply support vector machines (SVMs) for classification problems
+ Recognize SVM similarity to logistic regression
+ Compute the cost function of SVMs
+ Apply regularization in SVMs and some tips to obtain non-linear classifications with SVMs
+ Apply patching to leverage underlying compute capabilities of hardware
N/A 120 min
Decision Trees + Recognize Decision trees and apply them for classification problems
+ Recognize how to identify the best split and the factors for splitting
+ Explain strengths and weaknesses of decision trees
+ Explain how regression trees help with classifying continuous values
+ Describe motivation for choosing Random Forest Classifier over Decision Trees
+ Apply patching to Random Forest Classifier
N/A 120 min
Bagging + Describe bootstrapping and aggregating (aka “bagging”) to reduce variance
+ Reduce the correlation seen in bagging using Random Forest algorithm
+ Apply patching to leverage underlying compute capabilities of hardware
N/A 120 min
Boosting and Stacking + Explain how the boosting algorithm helps reduce variance and bias.
+ Apply patching to leverage underlying compute capabilities of hardware
N/A 120 min
Introduction to Unsupervised Learning and Clustering Methods + Describe unsupervised learning algorithms their application
+ Apply clustering
+ Apply dimensionality reduction
+ Apply patching to leverage underlying compute capabilities of hardware
KMeans Walkthrough
Introduction to Intel(r) Extension for Scikit-learn
120 min
Dimensionality Reduction and Advanced Topics + Explain and Apply Principal Component Analysis (PCA)
+ Explain Multidimensional Scaling (MDS)
+ Apply patching to leverage underlying compute capabilities of hardware
PCA Walkthrough 120 min

Content Structure

Each module folder has a Jupyter Notebook file (*.ipynb), this can be opened in Jupyter Lab to view the training contant, edit code and compile/run.

Install Directions

The training content can be accessed locally on the computer after installing necessary tools, or you can directly access using Intel DevCloud without any installation.

Local Installation of JupyterLab and oneAPI Tools

The Jupyter Notebooks can be downloaded locally to computer and accessed:

  • Install Jupyter Lab on local computer: Installation Guide
  • Install Intel oneAPI Base Toolkit on local computer: Installation Guide
  • git clone the repo and access the Notebooks using Jupyter Lab
  • Navigate to the Introduction to Machine Learning folder
  • pip install -r requirements.txt

Access using Intel DevCloud

The Jupyter notebooks are tested and can be run on Intel DevCloud without any installation necessary, below are the steps to access these Jupyter notebooks on Intel DevCloud:

  1. Register on Intel DevCloud
  2. Login, Get Started and Launch Jupyter Lab
  3. Open Terminal in Jupyter Lab
  4. [RECOMMENDED] Navigate to pre-populated samples folder:
cd oneAPI-samples/AI-and-Analytics/Jupyter/Introduction_to_Machine_Learning/
  1. [OPTIONAL] git clone the temporary patch repo and access the Notebooks https://github.com/IntelSoftware/Introduction_to_Machine_Learning.git
  2. pip install -r requirements.txt

About

Introduction to Machine Learning with focus on Scikit-learn* algorithms and how to accelerate those algorithms with a couple of line of code on CPU using Intel Extensions for Scikit-learn

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published