![Erudio logo](img/erudio-logo-small.png)
---
![Sklearn logo](img/scikit-learn-logo-small.png)

# Machine Learning with scikit-learn

## Resources

This training material is available under a CC BY-NC-ND 4.0 license.  You can find it at:

> https://github.com/ErudioOne/scikit-learn

Before attending this course, please configure the environments you will need.  Within the repository, find the file `requirements.txt` to install software using `pip`, or the file `environment.yml` to install software using `conda`.

Please contact us, [Erudio LLC](http://erudio.one) for hands-on, instructor-led, onsite or remote, training.

In [None]:
import sys
sys.version

In [None]:
import sklearn
sklearn.__version__

In [None]:
try:
    from sklearnex import patch_sklearn
    patch_sklearn()
except:
    print("Intel accelerator not installed (not required)", file=sys.stderr)

## What Is Machine Learning?


* Difference between "Deep Learning" and other ML techniques
* Overview of techniques used in Machine Learning
* Classification vs. Regression vs. Clustering
* Dimensionality Reduction
* Feature Engineering
* Feature Selection
* Categorical vs. Ordinal vs. Continuous variables
* One-hot encoding
* Hyperparameters
* Grid Search
* Metrics

<div><a href="SKLearn-01_WhatIsML.ipynb"><img src="img/open-notebook.png" align="left"/></a></div>

## Exploring a Data Set

* Looking for anomalies and data integrity problems
* Cleaning data
* Massaging data format to be model-ready
* Choosing features and a target
* Train/test split

<div><a href="SKLearn-02_Exploring.ipynb"><img src="img/open-notebook.png" align="left"/></a></div>

## Classification

* Choosing a model
* Feature importances
* Cut points in a decision tree
* Comparing multiple classifiers

<div><a href="SKLearn-03_Classification.ipynb"><img src="img/open-notebook.png" align="left"/></a></div>

## Regression

* Sample data sets in scikit-learn
* Linear regressors
* Probabilistic regressors
* Other regressors

<div><a href="SKLearn-04_Regression.ipynb"><img src="img/open-notebook.png" align="left"/></a></div>

## Hyperparameters

* Understanding hyperparameters
* Manual search of parameter space
* GridsearchCV
* Attributes of grid search and wrapped model

<div><a href="SKLearn-05_Hyperparameters.ipynb"><img src="img/open-notebook.png" align="left"/></a></div>

## Clustering

* Overview of (some) clustering algorithms
* Kmeans clustering
* Agglomerative clustering
* Density based clustering: DBSan and HDBScan
* n_clusters, labels, and predictions
* Visualizing results

<div><a href="SKLearn-06_Clustering.ipynb"><img src="img/open-notebook.png" align="left"/></a></div>

## Decomposition
* Principal Component Analysis (PCA)
* Non-Negative Matrix Factorization (NMF)
* Latent Dirichlet Allocation (LDA)
* Independent component analysis (ICA)

<div><a href="SKLearn-07_Decomposition.ipynb"><img src="img/open-notebook.png" align="left"/></a></div>

## Feature Expansion

* A Synthetic Example
* Polynomial Features
* One-Hot Encoding
* Binning Values

<div><a href="SKLearn-08_FeatureExpansion.ipynb"><img src="img/open-notebook.png" align="left"/></a></div>

## Feature Selection

* Scaling with:
  * StandardScaler
  * RobustScaler
  * MinMaxScaler
  * Normalizer
* Univariate Selection
* Model-driven Selection

<div><a href="SKLearn-09_FeatureSelection.ipynb"><img src="img/open-notebook.png" align="left"/></a></div>

## Pipelines

* Feature Selection and Engineering
* Grid search
* Model

<div><a href="SKLearn-10_Pipelines.ipynb"><img src="img/open-notebook.png" align="left"/></a></div>

## Robust Train/Test Splits 

* cross_val_score
* ShuffleSplit
* KFold, RepeatedKFold, LeaveOneOut, LeavePOut, StratifiedKFold

<div><a href=SKLearn-11_TrainTest.ipynb"><img src="img/open-notebook.png" align="left"/></a></div>

## Specialized and custom metrics

* Top N recommendations

<div><a href="SKLearn-12_CustomMetrics.ipynb"><img src="img/open-notebook.png" align="left"/></a></div>


---

Materials licensed under [CC BY-NC-ND 4.0](https://creativecommons.org/licenses/by-nc-nd/4.0/) by the authors