# Ex03 Gaussian Kernel-based probability density estimation method

### Outline
- Data import
- Data split
- Normalization
- Kernel density estimation
- Probability density of test data
- Prediction label
- Score
- Summary of KDE and KNN

#### Data import

In [5]:
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from scipy.stats import gaussian_kde

iris = datasets.load_iris()
X = iris.data
y = iris.target

#### Data split

In [6]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3,
random_state=1, stratify=y)

#### Normalization

In [7]:
sc = StandardScaler()
sc = sc.fit(X_train)
X_train_std = sc.transform(X_train)
X_test_std = sc.transform(X_test)

#### Kernel density estimation

In [8]:
kernel0 = gaussian_kde(X_train_std[y_train==0].T)
kernel1 = gaussian_kde(X_train_std[y_train==1].T)
kernel2 = gaussian_kde(X_train_std[y_train==2].T)

#### Probability density of test data

In [9]:
p0s = kernel0.evaluate(X_test_std.T)
p1s = kernel1.evaluate(X_test_std.T)
p2s = kernel2.evaluate(X_test_std.T)

#### Prediction label

In [10]:
y_pred = []
for p0, p1, p2 in zip(p0s, p1s, p2s):
    if max(p0, p1, p2) == p0:
        y_pred.append(0)
    elif max(p0, p1, p2) == p1:
        y_pred.append(1)
    else:
        y_pred.append(2)

#### Score

In [11]:
from sklearn.metrics import accuracy_score
print(accuracy_score(y_test, y_pred))

1.0


#### Summary of KDE and KNN
 The results of kernel density estimation are used as a supervised learning classifier.
 In reality, such methods are not used very often.
 I think it has the disadvantage of being computationally expensive and the accuracy can be significantly degraded depending on the data.
 However, as you can see from this trial, it seems to be able to classify some data rather quickly and in a good way.
 As a result, kernel density estimation was able to achieve higher accuracy than KNN. However, it is difficult to say which is more 　advantageous depending on the type of data; for the Iris data set, kernel density estimation was found to be better.