K-MEANS CLUSTERING
- In this implementation:

X is the input data.
k is the number of clusters.
max_iters is the maximum number of iterations (optional).
The function iteratively assigns data points to the nearest centroid and updates the centroids based on the mean of the assigned points. The process repeats until convergence or the maximum number of iterations is reached.

Note: This is a basic implementation and may not be suitable for large datasets. More sophisticated versions, such as those handling optimizations and optimizations for initialization, may be used in practice.

In [69]:
import numpy as np

def k_means(X, k, max_iters=100):
    # Randomly initialize centroids
    centroids = X[np.random.choice(X.shape[0], k, replace=False)]

    for _ in range(max_iters):
        # Assign each data point to the nearest centroid and update centroids iteratively
        labels = np.argmin(np.linalg.norm(X[:, np.newaxis] - centroids, axis=2), axis=1)

        # Update centroids based on the mean of assigned points
        centroids = np.array([X[labels == i].mean(axis=0) for i in range(k)])

    # Sort the centroids and labels based on cluster index
    sorted_indices = np.argsort(labels)
    sorted_labels = labels[sorted_indices]
    sorted_X = X[sorted_indices]

    # Create a dictionary with cluster indices as keys and arrays of points as values
    cluster_dict = {cluster_index: sorted_X[sorted_labels == cluster_index] for cluster_index in range(k)}

    return centroids, cluster_dict

# Example usage:
# Replace X_data and k_value with your own data and desired number of clusters (k)
X_data = np.array([[1, 2], [2, 3], [8, 9], [9, 10], [15, 16], [16, 17]])
k_value = 4

# Perform k-means clustering
centroids, cluster_result = k_means(X_data, k_value)

# Display the results
print("Final Centroids:\n", centroids)

for cluster_index, points in cluster_result.items():
    print(f"\nCluster {cluster_index + 1}:\n{points}")




Final Centroids:
 [[ 1.   2. ]
 [ 2.   3. ]
 [15.5 16.5]
 [ 8.5  9.5]]

Cluster 1:
[[1 2]]

Cluster 2:
[[2 3]]

Cluster 3:
[[15 16]
 [16 17]]

Cluster 4:
[[ 8  9]
 [ 9 10]]


REAL DATA APPLICATIONS

In [70]:

# using loadtxt()
Data = np.loadtxt("Mall_Customers.csv",
                 delimiter=",", dtype=str)

display(Data)
print(Data.ndim)
print(Data.shape)
print(Data.size)

array([['CustomerID', 'Genre', 'Age', 'Annual Income (k$)',
        'Spending Score (1-100)'],
       ['0001', 'Male', '19', '15', '39'],
       ['0002', 'Male', '21', '15', '81'],
       ...,
       ['0198', 'Male', '32', '126', '74'],
       ['0199', 'Male', '32', '137', '18'],
       ['0200', 'Male', '30', '137', '83']], dtype='<U22')

2
(201, 5)
1005


In [71]:
#Process Data
X = np.delete(Data, 0, axis=0)
#shuffled_data = np.random.permutation(Data_without_first_row)

In [72]:
#Getting features from dataset select column number
X=X[:,[3,4]]
X_processed = X.astype(np.float_)
#X_processed

In [73]:
# Perform k-means clustering
k_value = 5
centroids, cluster_labels = k_means(X_processed, k_value)

# Display the results
print("Final Centroids:", centroids)
print("Cluster Labels:", cluster_labels)

Final Centroids: [[86.53846154 82.12820513]
 [88.2        17.11428571]
 [26.30434783 20.91304348]
 [55.2962963  49.51851852]
 [25.72727273 79.36363636]]
Cluster Labels: {0: array([[137.,  83.],
       [ 78.,  88.],
       [ 87.,  92.],
       [ 87.,  75.],
       [ 74.,  72.],
       [ 87.,  63.],
       [ 86.,  95.],
       [ 75.,  93.],
       [ 85.,  75.],
       [ 88.,  86.],
       [ 76.,  87.],
       [ 69.,  91.],
       [ 79.,  83.],
       [ 78.,  73.],
       [ 77.,  97.],
       [ 78.,  78.],
       [ 78.,  89.],
       [ 77.,  74.],
       [ 78.,  76.],
       [ 81.,  93.],
       [ 88.,  69.],
       [ 73.,  73.],
       [103.,  85.],
       [126.,  74.],
       [120.,  79.],
       [ 70.,  77.],
       [113.,  91.],
       [ 71.,  95.],
       [103.,  69.],
       [ 73.,  88.],
       [ 71.,  75.],
       [101.,  68.],
       [ 78.,  90.],
       [ 99.,  97.],
       [ 71.,  75.],
       [ 98.,  88.],
       [ 72.,  71.],
       [ 97.,  86.],
       [ 93.,  90.]]), 1: arr

Enhance our k-means by using WCSS where can find optimum of clusters use elbow method