## KNN with Multiple Labels

In [1]:
#Import scikit-learn dataset library
from sklearn import datasets

#Load dataset
wine = datasets.load_wine()

In [2]:
print(wine.DESCR)

.. _wine_dataset:

Wine recognition dataset
------------------------

**Data Set Characteristics:**

    :Number of Instances: 178 (50 in each of three classes)
    :Number of Attributes: 13 numeric, predictive attributes and the class
    :Attribute Information:
 		- Alcohol
 		- Malic acid
 		- Ash
		- Alcalinity of ash  
 		- Magnesium
		- Total phenols
 		- Flavanoids
 		- Nonflavanoid phenols
 		- Proanthocyanins
		- Color intensity
 		- Hue
 		- OD280/OD315 of diluted wines
 		- Proline

    - class:
            - class_0
            - class_1
            - class_2
		
    :Summary Statistics:
    
                                   Min   Max   Mean     SD
    Alcohol:                      11.0  14.8    13.0   0.8
    Malic Acid:                   0.74  5.80    2.34  1.12
    Ash:                          1.36  3.23    2.36  0.27
    Alcalinity of Ash:            10.6  30.0    19.5   3.3
    Magnesium:                    70.0 162.0    99.7  14.3
    Total Phenols:                0

The dataset comprises 
* 13 features ('alcohol', 'malic_acid', 'ash', 'alcalinity_of_ash', 'magnesium', 'total_phenols', 'flavanoids', 'nonflavanoid_phenols', 'proanthocyanins', 'color_intensity', 'hue', 'od280/od315_of_diluted_wines', 'proline') and 
* a target (type of cultivars). 

### Exploring Data

In [3]:
# print the names of the features
print(wine.feature_names)

['alcohol', 'malic_acid', 'ash', 'alcalinity_of_ash', 'magnesium', 'total_phenols', 'flavanoids', 'nonflavanoid_phenols', 'proanthocyanins', 'color_intensity', 'hue', 'od280/od315_of_diluted_wines', 'proline']


In [4]:
print(wine.data.shape)

(178, 13)


In [5]:
print(wine.data[0])

[1.423e+01 1.710e+00 2.430e+00 1.560e+01 1.270e+02 2.800e+00 3.060e+00
 2.800e-01 2.290e+00 5.640e+00 1.040e+00 3.920e+00 1.065e+03]


In [6]:
print(wine.target.shape)

(178,)


In [7]:
# print the label species(class_0, class_1, class_2)
print(wine.target_names)

['class_0' 'class_1' 'class_2']


In [8]:
print(wine.target[:10])

[0 0 0 0 0 0 0 0 0 0]


### Splitting Data
70% training and 30% test

In [9]:
# Import train_test_split function
from sklearn.model_selection import train_test_split

# Split dataset into training set and test set
# 70% training and 30% test
X_train, X_test, y_train, y_test = train_test_split(wine.data, wine.target, test_size=0.3) 

In [10]:
X_train.shape

(124, 13)

In [11]:
y_train.shape

(124,)

In [12]:
X_test.shape

(54, 13)

In [13]:
y_test.shape

(54,)

### Generating Model for K=5
Let's build KNN classifier model for k=5.

In [14]:
#Import knearest neighbors Classifier model
from sklearn.neighbors import KNeighborsClassifier

#Create KNN Classifier
knn = KNeighborsClassifier(n_neighbors=5)

In [15]:
#Train the model using the training sets
knn.fit(X_train, y_train)

KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',
                     metric_params=None, n_jobs=None, n_neighbors=5, p=2,
                     weights='uniform')

In [16]:
#Predict the response for test dataset
y_pred = knn.predict(X_test)

In [17]:
print(y_pred)

[1 1 0 0 1 0 2 1 0 2 0 0 0 0 0 0 0 1 1 2 2 2 1 0 1 1 0 2 1 1 2 2 2 2 1 2 0
 0 2 2 1 2 0 2 0 0 2 2 1 2 1 0 2 1]


In [18]:
print(y_test)

[1 1 0 2 1 0 1 1 0 2 0 2 0 0 0 1 0 1 1 1 1 0 1 0 1 1 0 1 1 1 1 2 1 1 1 2 0
 0 2 2 2 2 0 1 0 0 2 1 1 2 1 2 2 1]


Groundtruth : 

In [19]:
#Import scikit-learn metrics module for accuracy calculation
from sklearn import metrics
# Model Accuracy, how often is the classifier correct?
print("Accuracy:",metrics.accuracy_score(y_test, y_pred))

Accuracy: 0.7222222222222222


### Generating Model for K=7
Let's build KNN classifier model for k=7.

In [20]:
#Import knearest neighbors Classifier model
from sklearn.neighbors import KNeighborsClassifier

#Create KNN Classifier
knn = KNeighborsClassifier(n_neighbors=7)

#Train the model using the training sets
knn.fit(X_train, y_train)

#Predict the response for test dataset
y_pred = knn.predict(X_test)

In [21]:
#Import scikit-learn metrics module for accuracy calculation
from sklearn import metrics
# Model Accuracy, how often is the classifier correct?
print("Accuracy:",metrics.accuracy_score(y_test, y_pred))

Accuracy: 0.6851851851851852


### References

* KNN Classification using Scikit-learn, Avinash Navlani, August 3rd, 2018, https://www.datacamp.com/community/tutorials/k-nearest-neighbor-classification-scikit-learn