# Spot-Checking Classification Algorithms

Spot-checking as a "lazy" way of discovering which ML algos perform well on your problem.

## 1. Algorithm Spot-Checking

Idea is to spot-check, so the question now is NOT:

    "What algorithm should I ultimately use on my dataset?"

Instead it is:

    "What algorithms should I immediately spot-check on my dataset?"

## Algorithms Overview

We are going to take a look at 6 classification algorithms that you can spot-check on your dataset. 

Starting with 2 linear ML algorithms:

* Logistic Regression
* Linear Discriminant Analysis

Then looking at 4 nonlinear ML algorithms:

* k-Nearest Neighbors
* Naive Bayes
* Classification and Regression Trees
* Support Vector Machines

Each recipe is demonstrated on the diabetes dataset. A test harness using 10-fold cross-validation is used to demonstrate how to spot-check each ML algorithm. Mean accuracy measures are used to indicate algorithm performance. 

_DISCLAIMER: The recipes assume that you know about each ML algorithm, and more or less how to use them. We will not go into the API or parameterization of each algorithm. If you do not know, you still can proceed: just give some trust for now, and study more later!_

# Linear ML Algorithms

## Logistic Regression


You can construct a logistic regression model using the LogisticRegression class, which is explained [here](http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html).

In [1]:
from pandas import read_csv
from sklearn.model_selection import KFold
from sklearn.model_selection import cross_val_score

In [2]:
from sklearn.linear_model import LogisticRegression           # <---

In [3]:
# load dataset
filename = 'pima-indians-diabetes.data.csv'
names = ['preg', 'plas', 'pres', 'skin', 'test', 'mass', 'pedi', 'age', 'class']
dataframe = read_csv(filename, names=names)
array = dataframe.values
X = array[:,0:8]
Y = array[:,8]

In [4]:
# Logistic Regression Classification
kfold = KFold(n_splits=10, random_state=7)
model = LogisticRegression()  # specify a solver if you do not want a warning..
results = cross_val_score(model, X, Y, cv=kfold)
print(results.mean())

0.7695146958304853




## Linear Discriminant Analysis (LDA)

You can construct an LDA model using the LinearDiscriminantAnalysis class, which is documented [here](http://scikit-learn.org/stable/modules/generated/sklearn.discriminant_analysis.LinearDiscriminantAnalysis.html).

In [5]:
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis           # <---

In [6]:
# LDA Classification
kfold = KFold(n_splits=10, random_state=7)
model = LinearDiscriminantAnalysis()                           # <---
results = cross_val_score(model, X, Y, cv=kfold)
print(results.mean())

0.773462064251538


# Non-linear ML Algorithms

## k-Nearest Neighbors (kNN)

You can construct a KNN model using the KNeighborsClassifier class, documented [here](http://scikit-learn.org/stable/modules/generated/sklearn.neighbors.KNeighborsClassifier.html).

In [7]:
from sklearn.neighbors import KNeighborsClassifier                           # <---

In [8]:
# KNN Classification
kfold = KFold(n_splits=10, random_state=7)
model = KNeighborsClassifier()                           # <---
results = cross_val_score(model, X, Y, cv=kfold)
print(results.mean())

0.7265550239234451


## Naive Bayes

You can construct a Naive Bayes model using the GaussianNB class, documented [here](http://scikit-learn.org/stable/modules/generated/sklearn.naive_bayes.GaussianNB.html).

In [9]:
from sklearn.naive_bayes import GaussianNB                           # <---

In [10]:
# Gaussian Naive Bayes Classification
kfold = KFold(n_splits=10, random_state=7)
model = GaussianNB()                           # <---
results = cross_val_score(model, X, Y, cv=kfold)
print(results.mean())

0.7551777170198223


## Classification and Regression Trees

You can construct a CART model using the DecisionTreeClassifier class, documented [here](http://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeClassifier.html).

In [11]:
from sklearn.tree import DecisionTreeClassifier                           # <---

In [12]:
kfold = KFold(n_splits=10, random_state=7)
model = DecisionTreeClassifier()                           # <---
results = cross_val_score(model, X, Y, cv=kfold)
print(results.mean())

0.6887047163362954


## Support Vector Machines (SVM)

You can construct an SVM model using the SVC class, documented [here](http://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html).

In [13]:
from sklearn.svm import SVC                           # <---

In [17]:
kfold = KFold(n_splits=10, random_state=7)
model = SVC()                           # <---
                                        ### be explicit with gamma to e.g. auto if you want to avoid warnings
results = cross_val_score(model, X, Y, cv=kfold)
print(results.mean())

0.6510252904989747


## Summary

What we did:

* we discovered 6 ML algorithms that you can use to spot-check on your classification problem in Python using scikit-learn. Specifically, you learned how to spot-check 2 linear ML algorithms (Logistic Regression, Linear Discriminant Analysis) as well as how to spot-check 4 nonlinear algorithms (k-Nearest Neighbors, Naive Bayes, Classification and Regression Trees and Support Vector Machines).

## What's next 

You will now discover how you can use spot-checking on REGRESSION ML problems and practice with some different regression algorithms.