# k平均法(k-means)

## k平均法とは<a name="description"></a>

- クラスタ重心(クラスタ内の学習サンプル平均)からの距離でデータをk個に分割
- kは自分で指定

## 使用方法<a name="example"></a>

### データ準備<a name="data"></a>

In [None]:
from sklearn.datasets import make_blobs

k = 4

data, _ = make_blobs(centers=k, random_state=6)

### 学習<a name="training"></a>

In [None]:
import numpy as np
from sklearn.cluster import KMeans

np.random.seed(0)

kmeans = KMeans(init='k-means++', n_clusters=k, n_init=10)
kmeans.fit(data)

### 可視化<a name="visualization"></a>

In [None]:
import matplotlib.pyplot as plt

margin = .5
resolution = 300

plt.figure(figsize=(4, 4))

x_min, x_max = data[:, 0].min() - margin, data[:, 0].max() + margin
y_min, y_max = data[:, 1].min() - margin, data[:, 1].max() + margin
xx, yy = np.meshgrid(np.linspace(x_min, x_max, resolution), np.linspace(y_min, y_max, resolution))
Z = kmeans.predict(np.c_[xx.ravel(), yy.ravel()]).reshape(xx.shape)

plt.pcolormesh(xx, yy, Z, cmap='Paired')

plt.xlim(x_min, x_max)
plt.ylim(y_min, y_max)
plt.xticks(())
plt.yticks(())

plt.scatter(data[:, 0], data[:, 1], marker='.', color='black')

centroids = kmeans.cluster_centers_
plt.scatter(centroids[:, 0], centroids[:, 1], marker='x', s=150, linewidth=3, color='white')

plt.show()