# Table of Contents
* [Learning Objectives:](#Learning-Objectives:)
* [Machine Learning with Scikit Learn](#Machine-Learning-with-Scikit-Learn)
	* [API and Terminology](#API-and-Terminology)
		* [Scikit Learn modules](#Scikit-Learn-modules)
	* [Chosing an Estimator](#Chosing-an-Estimator)


# Learning Objectives:

After completion of this module, learners should be able to:

* Understand and explain estimators, models and scoring metrics
* Import scikit-learn modules

# Machine Learning with Scikit Learn

`scikit-learn` is an open source Machine Learning Toolkit built on Numpy and Scipy. Methods available in `scikit-learn` can be used for supervised and unsupervised learning. Among the many features of `scikit-learn` are

* classification
* regression
* clustering
* support vector machines
* random forests
* gradient boosting
* k-means
* DBSCAN

The [User Guide](http://scikit-learn.org/stable/user_guide.html) and [Documentation](http://scikit-learn.org/stable/documentation.html) are the best place to learn how to use the methods available in `scikit-learn` and there are several [tutorals avilable online](http://scikit-learn.org/stable/tutorial/index.html)

This course will provide an introduction to `sklearn` with a focus on highlighting how the methods work together to understand the performance of a given model.

## API and Terminology

While the following definitions may be the most widely accepted in the fields of Machine Learning and Statistics, they are useful to help understand the `sklearn` modules and API

* **estimator**: A method used to make a prediction for supervised and unsupervised learning
    * **classifier**: An estimator with a discrete response to input data. *Assign a label to each data point.* Classifiers implement a `fit` member function.
    * **regressor**: An estimator with a continuous response to input data. *Predict output value of each data point.*
    * **cluster**: Performs clustering of input data. *Discover grouping within the data set.*
    * **transformer**: Transforms input data according to a set of requirements. *Preprocess data to have zero mean and unit variance*
* **model**: Nearly synonymous with **estimator**. A **model** may be a more concrete instance of an **estimator**.
* **metric**: A set of scores given to a **model** or **estimator** to indicate its accuracy. *Estimators for supervised learning implement a `score` member function.*

### Scikit Learn modules

Each of the following modules must be individually imported. The modules listed here include **estimators** and higher-level methods to perform operations such as cross validation, grid search and pipelining.

In [None]:
import sklearn
sklearn.__all__

In [None]:
import sklearn.cluster
help(sklearn.cluster)

## Chosing an Estimator

See the [Scikit Learn Flowchart](http://scikit-learn.org/stable/tutorial/machine_learning_map/)

As shown in the flowchart, the algorithms in scikit-learn mainly fall into:
    
* Classification - Predicting the label or class membership of observation
* Dimensionality reduction (Principle component analysis, independent component analysis)
* Regression - Predicting a continuous response variable rather than class membership
* Clustering - Unsupervised algorithms grouping similar observations

In the scikit-learn notebooks we work algorithms from each of these groups.