GitHub - IntelSoftware/Introduction_to_Machine_Learning: Introduction to Machine Learning with focus on Scikit-learn* algorithms and how to accelerate those algorithms with a couple of line of code on CPU using Intel Extensions for Scikit-learn

Title

Introduction to Machine Learning

Requirements

Optimized for	Description
OS	Linux* Ubuntu 20.04, 20 Windows* 10
Hardware	Skylake with GEN9 or newer
Software	Intel® AI Analytics Tookkit, Jupyter Notebooks, Intel DevCloud
	pip install seaborn

Purpose

The Jupyter Notebooks in this training are inended to give professors and students an accesible but challenging introduction to machine learning. It enumerates and describes many commonly used Scikit-learn* allgorithms which are used daily to address machine learning challenges. It has a secondary benefit of demonstrating how to accelerate commonly used Scikit-learn algorithms for Intel CPUs using Intel Extensions for Scikit-learn* which is part of the Intel AI Analytics Toolkit powered by oneAPI.

This workshop is designed to be used on the DevCloud and includes details on submitting batch jobs on the DevCloud environment.

License

Code samples are licensed under the MIT license. See License.txt for details. Third party program Licenses can be found here: third-party-programs.txt

Content Details

Pre-requisites

Python* Programming
Calculus
Linear algebra
Statistics

Syllabus

11 Modules (18 hours)
11 Lab Exercises

Modules	Description	Recommended Video	Duration
Introduction to Machine Learning and Tools	+ Classify the type of problem to be solved. + Demonstrate supervised learning algorithms. + Choose an algorithm, tune parameters, and validate a model + Explain key concepts like under- and over-fitting, regularization, and cross-validation + Apply Intel Extension for Scikit-learn* patching to leverage underlying compute capabilities of hardware.	Introduction to Intel(r) Extension for Scikit-learn	60 min
Supervised Learning and K Nearest Neighbors	+ Explain supervised learning as applied to regression and classification problems. + Apply K-Nearest Neighbor (KNN) algorithm for classification. + Apply patching to leverage underlying compute capabilities of hardware	KNearest Neighbor	120 min
Train Test Splits Validation Linear Regression	+ Explain the difference between over-fitting and under-fitting + Describe Bias-variance tradeoffs + Find the optimal training and test data set splits. + Apply cross-validation + Apply a linear regression model for supervised learning. + Apply Intel® Extension for Scikit-learn* to leverage underlying compute capabilities of hardware	Introduction to Intel(r) Extension for Scikit-learn	120 min
Regularization and Gradient Descent	+ Explain cost functions, regularization, feature selection, and hyper-parameters + Summarize complex statistical optimization algorithms like gradient descent and its application to linear regression + Apply patching to leverage underlying compute capabilities of hardware	N/A	120 min
Logistic Regression and Classification Error Metrics	+ Describe Logistic regression and how it differs from linear regression + Identify metrics for classification errors and scenarios in which they can be used + Apply patching to leverage underlying compute capabilities of hardware	Logistic Regression Walkthrough	120 min
SVM and Kernels	+ Apply support vector machines (SVMs) for classification problems + Recognize SVM similarity to logistic regression + Compute the cost function of SVMs + Apply regularization in SVMs and some tips to obtain non-linear classifications with SVMs + Apply patching to leverage underlying compute capabilities of hardware	N/A	120 min
Decision Trees	+ Recognize Decision trees and apply them for classification problems + Recognize how to identify the best split and the factors for splitting + Explain strengths and weaknesses of decision trees + Explain how regression trees help with classifying continuous values + Describe motivation for choosing Random Forest Classifier over Decision Trees + Apply patching to Random Forest Classifier	N/A	120 min
Bagging	+ Describe bootstrapping and aggregating (aka “bagging”) to reduce variance + Reduce the correlation seen in bagging using Random Forest algorithm + Apply patching to leverage underlying compute capabilities of hardware	N/A	120 min
Boosting and Stacking	+ Explain how the boosting algorithm helps reduce variance and bias. + Apply patching to leverage underlying compute capabilities of hardware	N/A	120 min
Introduction to Unsupervised Learning and Clustering Methods	+ Describe unsupervised learning algorithms their application + Apply clustering + Apply dimensionality reduction + Apply patching to leverage underlying compute capabilities of hardware	KMeans Walkthrough Introduction to Intel(r) Extension for Scikit-learn	120 min
Dimensionality Reduction and Advanced Topics	+ Explain and Apply Principal Component Analysis (PCA) + Explain Multidimensional Scaling (MDS) + Apply patching to leverage underlying compute capabilities of hardware	PCA Walkthrough	120 min

Content Structure

Each module folder has a Jupyter Notebook file (*.ipynb), this can be opened in Jupyter Lab to view the training contant, edit code and compile/run.

Install Directions

The training content can be accessed locally on the computer after installing necessary tools, or you can directly access using Intel DevCloud without any installation.

Local Installation of JupyterLab and oneAPI Tools

The Jupyter Notebooks can be downloaded locally to computer and accessed:

Install Jupyter Lab on local computer: Installation Guide
Install Intel oneAPI Base Toolkit on local computer: Installation Guide
git clone the repo and access the Notebooks using Jupyter Lab
Navigate to the Introduction to Machine Learning folder
pip install -r requirements.txt

Access using Intel DevCloud

The Jupyter notebooks are tested and can be run on Intel DevCloud without any installation necessary, below are the steps to access these Jupyter notebooks on Intel DevCloud:

Register on Intel DevCloud
Login, Get Started and Launch Jupyter Lab
Open Terminal in Jupyter Lab
[RECOMMENDED] Navigate to pre-populated samples folder:

cd oneAPI-samples/AI-and-Analytics/Jupyter/Introduction_to_Machine_Learning/

[OPTIONAL] git clone the temporary patch repo and access the Notebooks https://github.com/IntelSoftware/Introduction_to_Machine_Learning.git
pip install -r requirements.txt

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
01_Introduction_to_Machine_Learning_and_Tools		01_Introduction_to_Machine_Learning_and_Tools
02-Supervised_Learning_and_K_Nearest_Neighbors		02-Supervised_Learning_and_K_Nearest_Neighbors
03-Train_Test_Splits_Validation_Linear_Regression		03-Train_Test_Splits_Validation_Linear_Regression
04-Regularization_and_Gradient_Descent		04-Regularization_and_Gradient_Descent
05-Logistic_Regression_and_Classification_Error_Metrics		05-Logistic_Regression_and_Classification_Error_Metrics
06-SVM_and_Kernels		06-SVM_and_Kernels
07-Decision_Trees		07-Decision_Trees
08-Bagging		08-Bagging
09-Boosting_and_Stacking		09-Boosting_and_Stacking
10-Introduction_Clustering_Methods		10-Introduction_Clustering_Methods
11-Dimensionality_Reduction_and_Advanced_Topics		11-Dimensionality_Reduction_and_Advanced_Topics
data		data
ANSWERS_PWD_PROT.zip		ANSWERS_PWD_PROT.zip
Makefile		Makefile
README.md		README.md
TeacherKit.ipynb		TeacherKit.ipynb
Welcome.ipynb		Welcome.ipynb
requirements.txt		requirements.txt
sample.json		sample.json
third-party-programs.txt		third-party-programs.txt

IntelSoftware/Introduction_to_Machine_Learning

Folders and files

Latest commit

History

Repository files navigation

Title

Requirements

Purpose

License

Content Details

Pre-requisites

Syllabus

Content Structure

Install Directions

Local Installation of JupyterLab and oneAPI Tools

Access using Intel DevCloud

About

Resources

Stars

Watchers

Forks

Languages