### Loading the data

In [81]:
# import load_iris function from datasets module
from sklearn.datasets import load_iris

# save "bunch" object containing iris dataset and its attributes
iris = load_iris()

# store feature matrix in "X"
X = iris.data

# store response vector in "y"
y = iris.target

### Using K-Nearest Neighbors

In [82]:
from sklearn.neighbors import KNeighborsClassifier

# create the model we need
knn = KNeighborsClassifier(n_neighbors=1)
print(knn)

KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',
           metric_params=None, n_jobs=1, n_neighbors=1, p=2,
           weights='uniform')


In [83]:
# fitting the model to our data (aka "training") - occurs inplace
knn.fit(X, y)

KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',
           metric_params=None, n_jobs=1, n_neighbors=1, p=2,
           weights='uniform')

In [84]:
# recall the encoding of results
for i in range(len(iris.target_names)):
    print(i, " - ", iris.target_names[i])

0  -  setosa
1  -  versicolor
2  -  virginica


In [85]:
# we can now use the model to predict values
result = knn.predict([[3, 5, 4, 2]])

print(result, iris.target_names[result])

[2] ['virginica']


In [86]:
# predicting multiple observations at once
X_new = [[3, 5, 4, 2], [5, 4, 3, 2]]
result = knn.predict(X_new)

print(result, iris.target_names[result])

[2 1] ['virginica' 'versicolor']


### Using a different value for K

In [87]:
# instantiate the model (using the value K=5)
knn = KNeighborsClassifier(n_neighbors=5)

# fit the model with data
knn.fit(X, y)

# predict the response for new observations
result = knn.predict(X_new)

print(result, iris.target_names[result])

[1 1] ['versicolor' 'versicolor']


So just changing the parameter produces very different results in our quick test. Testing is required to determine the best value for k. Other parameters can have an influence too. Choosing the best combination of parameters is called **hypertuning**.

However, this is only part of it as we have multiple models to choose from.

### Using an alternative model - Logistic Regression

In [88]:
from sklearn.linear_model import LogisticRegression

# instantiate the model (using the default parameters)
logreg = LogisticRegression()

# fit the model with data
logreg.fit(X, y)

# predict the response for new observations
result = logreg.predict(X_new)

print(result, iris.target_names[result])

[2 0] ['virginica' 'setosa']
