## **k-Means-Based Anomaly Detection**

k-Means clusters data and identifies anomalies as points distant from their cluster centroids.



**Imports**

In [3]:
from sklearn.cluster import KMeans
from sklearn.metrics import pairwise_distances_argmin_min


**Data Loading**

In [None]:
# Load dataset
data = pd.read_csv('your_dataset.csv')

# Select features for anomaly detection
X = data[['feature1', 'feature2']]  # Replace with relevant feature columns


**Model Building and Predictions**

In [None]:
# Apply k-Means clustering
kmeans = KMeans(n_clusters=3, random_state=42)
data['cluster'] = kmeans.fit_predict(X)

# Compute distances to the nearest centroid
distances = pairwise_distances_argmin_min(X, kmeans.cluster_centers_)[1]

# Define anomalies as points with distances above a threshold
threshold = np.percentile(distances, 95)
data['anomaly'] = (distances > threshold).astype(int)

# Anomalies are labeled as 1
anomalies = data[data['anomaly'] == 1]
print("Number of anomalies detected:", len(anomalies))


**Visualizations**

In [None]:
# Plot anomalies and regular points
plt.scatter(X['feature1'], X['feature2'], label='Normal', c='blue', s=20)
plt.scatter(anomalies['feature1'], anomalies['feature2'], label='Anomaly', c='red', s=20)
plt.title('k-Means-Based Anomaly Detection')
plt.xlabel('Feature 1')
plt.ylabel('Feature 2')
plt.legend()
plt.show()
