# Intermediate Machine Learning with scikit-learn

## Resources

This training material is available under a CC BY-NC-SA 4.0 license.  You can find it at:

> <font size="+2">https://github.com/DavidMertz/ML-Live-Intermediate</font>

Before attending this course, please configure the environments you will need.  Within the repository, find the file `requirements.txt` to install software using `pip`, or the file `environment.yml` to install software using `conda`.

Please contact me and my training company, [KDM Training](http://kdm.training) for hands-on, instructor-led, onsite or remote, training.  Our email is info@kdm.training.

## What Is Machine Learning?


The session *Beginner Machine Learning with `scikit-learn`* addresses the topics outlines here in more detail.  For this course, we will only cover a quick overview of these topics. That course also covers the main topics in *supervised* machine learning: classification, regression, and hyperparameters.

* Overview of techniques used in Machine Learning
* Classification vs. Regression vs. Clustering
* Dimensionality Reduction
* Feature Engineering
* Feature Selection
* Categorical vs. Ordinal vs. Continuous variables
* One-hot encoding
* Hyperparameters
* Grid Search
* Metrics

<div><a href="WhatIsML.ipynb"><img src="img/open-notebook.png" align="left"/></a></div>

## Clustering

* Overview of (some) clustering algorithms
* Kmeans clustering
* Agglomerative clustering
* Density based clustering: DBSan and HDBScan
* n_clusters, labels, and predictions
* Visualizing results

<div><a href="Clustering.ipynb"><img src="img/open-notebook.png" align="left"/></a></div>

## Feature Engineering and Feature Selection
* Principal Component Analysis (PCA)
* Non-Negative Matrix Factorization (NMF)
* Latent Dirichlet Allocation (LDA)
* Independent component analysis (ICA)
* SelectKBest
* Dimensionality expansion
* Polynomial Features
* One-Hot Encoding
* Scaling with StandardScaler, RobustScaler, MinMaxScaler, Normalizer, and others
* Binning values with quantiles or binarize

<div><a href="Features.ipynb"><img src="img/open-notebook.png" align="left"/></a></div>

## Pipelines

* Feature Selection and Engineering
* Grid search
* Model

<div><a href="Pipelines.ipynb"><img src="img/open-notebook.png" align="left"/></a></div>

## Robust Train/Test Splits 

* cross_val_score
* ShuffleSplit
* KFold, RepeatedKFold, LeaveOneOut, LeavePOut, StratifiedKFold

<div><a href="TrainTest.ipynb"><img src="img/open-notebook.png" align="left"/></a></div>