# K-Nearest Neighbors Classification Demo

K近鄰分類使用數據樣本周圍的資料標籤對數據進行分類。

## 1. Import 相關套件

In [11]:
import os

import numpy as np

from sklearn.datasets import make_blobs

import pandas as pd
import cudf as gd

from sklearn.neighbors import KNeighborsClassifier as skKNC
from ncue.neighbors import KNeighborsClassifier as cuKNC

## 2. 定義 Parameters

In [12]:
n_samples = 2**17
n_features = 40

n_query = 5000

n_neighbors = 4

## 3. 產生測試資料

### Host (RAM)

In [13]:
%%time
X_host_train, y_host_train = make_blobs(
   n_samples=n_samples, n_features=n_features, centers=5, random_state=0)

X_host_train = pd.DataFrame(X_host_train)
y_host_train = pd.DataFrame(y_host_train)

CPU times: user 221 ms, sys: 20.8 ms, total: 241 ms
Wall time: 242 ms


In [14]:
%%time
X_host_test, y_host_test = make_blobs(
   n_samples=n_query, n_features=n_features, centers=5, random_state=0)

X_host_test = pd.DataFrame(X_host_test)
y_host_test = pd.DataFrame(y_host_test)

CPU times: user 9.46 ms, sys: 489 µs, total: 9.95 ms
Wall time: 9.02 ms


### Device (GPU MEMORY)

In [15]:
# 將資料從RAM複製到GPU MEMORY，方便NCUE MODEL使用，以利最後結果的比對

X_device_train = gd.DataFrame.from_pandas(X_host_train)
y_device_train = gd.DataFrame.from_pandas(y_host_train)

In [16]:
X_device_test = gd.DataFrame.from_pandas(X_host_test)
y_device_test = gd.DataFrame.from_pandas(y_host_test)

## 4. Scikit-learn 模型(CPU)

In [17]:
%%time
knn_sk = skKNC(algorithm="brute", n_neighbors=n_neighbors, n_jobs=-1)
knn_sk.fit(X_host_train, y_host_train)

sk_result = knn_sk.predict(X_host_test)

  


CPU times: user 1min 38s, sys: 14.3 s, total: 1min 53s
Wall time: 17.6 s


## 5. NCUE 模型(GPU)

In [18]:
%%time
knn_ncue = cuKNC(n_neighbors=n_neighbors)
knn_ncue.fit(X_device_train, y_device_train)

ncue_result = knn_ncue.predict(X_device_test)

CPU times: user 684 ms, sys: 160 ms, total: 844 ms
Wall time: 849 ms


## 6. 比對運算結果(CPU vs. GPU)

In [19]:
passed = np.array_equal(np.asarray(ncue_result.as_gpu_matrix())[:,0], sk_result)
print('compare knn: ncue vs sklearn classes %s'%('equal'if passed else 'NOT equal'))

compare knn: ncue vs sklearn classes equal
