# KNN Classification
1. Pick a value for K.
2. Search for the K observations in the training data that are "nearest" to the measurements of the unknown iris.
3. Use the most popular response value from the K nearest neighbors as the predicted response value for the unknown iris.

In [1]:
# Load data

# import load_iris function from datasets module
from sklearn.datasets import load_iris

# save "bunch" object containing iris dataset and its attributes
iris = load_iris()

# store feature matrix in "X"
X = iris.data

# store response vector in "y"
y = iris.target

In [2]:
# Print the shapes of X and y
print(X.shape)
print(y.shape)

(150, 4)
(150,)


# Scikit-Learn Workflow

## Step 1. Import the class you plan to use

In [3]:
from sklearn.neighbors import KNeighborsClassifier

## Step 2. "Instantiate" the "estimator"
- "Estimator" is scikit-learn's term for model
- "Instantiate" means "make an instance of"

In [4]:
knn = KNeighborsClassifier(n_neighbors=1)

- Name of the object does not matter
- Can specify tuning parameters (aka "hyperparameters") during this step
- All parameters not specified are set to their defaults

In [5]:
print(knn)

KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',
           metric_params=None, n_jobs=1, n_neighbors=1, p=2,
           weights='uniform')


## Step 3: Fit the model with data (aka "model training")

- Model is learning the relationship between X and y
- Occurs in-place

In [6]:
knn.fit(X, y)

KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',
           metric_params=None, n_jobs=1, n_neighbors=1, p=2,
           weights='uniform')

## Step 4: Predict the response for a new observation

- New observations are called "out-of-sample" data
- Uses the information it learned during the model training process

In [10]:
knn.predict([[3, 5, 4, 2]])

array([2])

In [14]:
X_new = [[3, 5, 4, 2], [5, 4, 3, 2]] # These are OUT OF SAMPLE observations
knn.predict(X_new)

array([1, 1])

## Let's use a different value for K

In [11]:
# Instantiate the model (using the value K=5)
knn = KNeighborsClassifier(n_neighbors=5)

# Fit the model with data
knn.fit(X, y)

# Predict the response for new observations
knn.predict(X_new)

array([1, 1])

## Let's use a different Classifier

In [13]:
from sklearn.linear_model import LogisticRegression

# Instantiate the estimator
logreg = LogisticRegression()

# Fit the model
logreg.fit(X, y)

# Predict
logreg.predict(X_new)

array([2, 0])