# K-Means Clustering

In this notebook, we'll perform K-Means clustering on the seeds dataset to group the observations into clusters.

In [None]:
import pandas as pd

# load the training dataset
data = pd.read_csv('Data/seeds.csv')

# Display a random sample of 10 observations (just the features)
features = data[data.columns[0:6]]
features.sample(10)

## Training the K-Means Model
We'll use the **KMeans** algorithm to group the seeds into 3 clusters.

In [None]:
from sklearn.cluster import KMeans

# Create a model based on 3 clusters
model = KMeans(n_clusters=3, init='k-means++', n_init=10, max_iter=100)
# Fit to the data and predict the cluster assignments
km_clusters = model.fit_predict(features.values)
# View the cluster assignments
km_clusters

## Visualizing the Clusters
We'll use PCA to reduce the features to two dimensions and plot the results.

In [None]:
from sklearn.decomposition import PCA
import matplotlib.pyplot as plt
%matplotlib inline

pca = PCA(n_components=2).fit(features)
features_2d = pca.transform(features)

def plot_clusters(samples, clusters):
    col_dic = {0:'blue',1:'green',2:'orange'}
    mrk_dic = {0:'*',1:'x',2:'+'}
    colors = [col_dic[x] for x in clusters]
    markers = [mrk_dic[x] for x in clusters]
    for sample in range(len(clusters)):
        plt.scatter(samples[sample][0], samples[sample][1], color = colors[sample], marker=markers[sample], s=100)
    plt.xlabel('Dimension 1')
    plt.ylabel('Dimension 2')
    plt.title('Assignments')
    plt.show()

plot_clusters(features_2d, km_clusters)