# K-Means Clustering: Using Original Feature Space

This notebook demonstrates K-Means clustering **without standardizing** the features. We'll use the original data values to find and visualize clusters.

## Step 1: Import Libraries

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans

%matplotlib inline

## Step 2: Load and Preview the Data

In [None]:
df = pd.read_csv("data.csv")
df.head()

## Step 3: Visualize the Raw Data

In [None]:
plt.scatter(df.V1, df.V2, alpha=0.3, s=15)
plt.xlabel("V1")
plt.ylabel("V2")
plt.title("Initial Data Scatter Plot")
plt.show()

## Step 4: Determine Optimal Number of Clusters (K) using Elbow Method

In [None]:
wcss = []
K_range = range(1, 11)

for k in K_range:
    kmeans = KMeans(n_clusters=k, random_state=15)
    kmeans.fit(df[['V1', 'V2']])
    wcss.append(kmeans.inertia_)

plt.plot(K_range, wcss, 'bo-')
plt.xlabel('Number of Clusters (K)')
plt.ylabel('Within-Cluster Sum of Squares (WCSS)')
plt.title('Elbow Method For Optimal K')
plt.show()

## Step 5: Apply KMeans with Optimal K

In [None]:
# Replace this with the chosen K based on the elbow plot
optimal_k = 3
kmeans = KMeans(n_clusters=optimal_k, random_state=15)
df['Cluster'] = kmeans.fit_predict(df[['V1', 'V2']])

## Step 6: Visualize Clustered Data (in Original Space)

In [None]:
plt.scatter(df.V1, df.V2, c=df.Cluster, cmap='viridis', alpha=0.5)
plt.scatter(kmeans.cluster_centers_[:, 0], kmeans.cluster_centers_[:, 1], 
            s=200, c='red', marker='X', label='Centroids')
plt.xlabel('V1')
plt.ylabel('V2')
plt.title(f'KMeans Clustering with K={optimal_k} (Original Feature Space)')
plt.legend()
plt.show()

## Conclusion
- This analysis was performed in the **original data space** without standardizing features.
- Visual inspection of the elbow plot helped identify the best number of clusters.
- KMeans then grouped the data accordingly, and we visualized both points and cluster centers.