## Image Classification With KNN

In the previous section, we exported the filtered and processed images as a pickle file. First, we will load the data back into our system.

In [1]:
path = './traffic_data.pkl'

def unpickle(file):
    import pickle
    with open(file, 'rb') as fo:
        dict = pickle.load(fo, encoding='bytes')
    return dict

X = unpickle(path)

Next we load the data into a pandas DataFrame and examine the data.

In [8]:
import pandas as pd

df = pd.DataFrame(X)

In [9]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 309 entries, 0 to 308
Columns: 864001 entries, 0 to 864000
dtypes: int64(864000), object(1)
memory usage: 2.0+ GB


In [10]:
df.shape

(309, 864001)

In [11]:
df.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,863991,863992,863993,863994,863995,863996,863997,863998,863999,864000
0,230,230,230,230,230,230,230,230,231,231,...,66,68,79,71,73,71,68,77,69,pedestrian
1,251,205,253,211,247,209,255,201,255,200,...,148,146,145,145,145,145,146,147,148,pedestrian
2,202,202,195,196,207,207,203,208,208,206,...,72,72,73,74,75,75,75,75,74,pedestrian
3,254,254,254,254,254,254,254,254,254,254,...,67,65,65,65,65,67,70,73,74,pedestrian
4,155,155,155,155,155,155,155,155,156,156,...,89,90,84,81,86,91,90,88,87,pedestrian


Now that we have seen the data, we can extract the last column which is the target value. We will also convert the string of names into an integer value using the LabelEncoder.

In [12]:
labels = pd.Series(df[864000].values)
df = df.drop(864000,axis=1)

from sklearn import preprocessing
import numpy as np

le = preprocessing.LabelEncoder()
le.fit(['bus','bicycle','car','motorbike','pedestrian','trafficsignal','trafficlight'])

labels_encoded = le.transform(labels)
labels_encoded

array([4, 4, 4, 4, 4, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 5, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3,
       3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3,
       3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3,
       3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3,
       3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3,
       3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3,
       3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3,
       3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3,
       3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3,
       3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3,

Now we are ready to split our data into training and test set. We are saving 20% of the data to use as the test set.

In [13]:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(df, labels_encoded, test_size = 0.2, random_state = 123)

The next step is to train the KNN classifier with our training data.

In [14]:
from sklearn.neighbors import KNeighborsClassifier

KNN = KNeighborsClassifier(n_neighbors=3)

KNN.fit(X_train, y_train)

KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',
           metric_params=None, n_jobs=None, n_neighbors=3, p=2,
           weights='uniform')

The final step is to use our trained model to make predictions and measure the accuracy.

In [15]:
y_pred = KNN.predict(X_test)
from sklearn import metrics
print("Accuracy: {}".format(metrics.accuracy_score(y_test, y_pred)))

Accuracy: 0.8870967741935484


Sources: https://stackoverflow.com/questions/34488993/creating-a-pickeled-data-file-of-image-data

        https://github.com/adotg/knn-what-how-why/blob/develop/knn.ipynb
        
        https://medium.com/@YearsOfNoLight/intro-to-image-classification-with-knn-987bc112f0c2