
# 📊 Aprendizaje No Supervisado con K-Means
**INACAP**

En este cuaderno trabajaremos un ejemplo práctico de **aprendizaje no supervisado** usando el algoritmo **K-Means**.  

El flujo será:
1. Cargar datos.
2. Preprocesar (escalado con StandardScaler).
3. Entrenar el modelo K-Means.
4. Evaluar con métricas (Inercia y Silhouette).
5. Visualizar los clusters con PCA.
6. Exportar resultados en un CSV.


In [None]:

import pandas as pd
from sklearn.preprocessing import StandardScaler
from sklearn.cluster import KMeans
from sklearn.metrics import silhouette_score
from sklearn.decomposition import PCA
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt


## 1️⃣ Cargar datos

In [None]:

df = pd.read_csv("customer_segmentation.csv")
df.head()


## 2️⃣ Preprocesar datos (escalado)

In [None]:

features = ["purchase_frequency", "average_purchase", "loyalty_points", "months_active"]

X = df[features].values

scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)


## 3️⃣ Entrenar modelo K-Means

In [None]:

k = 3
kmeans = KMeans(n_clusters=k, random_state=42)
labels = kmeans.fit_predict(X_scaled)


## 4️⃣ Evaluar modelo

In [None]:

inertia = kmeans.inertia_
sil_score = silhouette_score(X_scaled, labels)
sizes = np.bincount(labels)
centroids = kmeans.cluster_centers_

print(f"Inercia: {inertia:.2f}")
print(f"Silhouette Score: {sil_score:.2f}")
print("Tamaño de clusters:", sizes)


## 5️⃣ Visualización con PCA

In [None]:

pca = PCA(n_components=2)
X_pca = pca.fit_transform(X_scaled)
centroids_pca = pca.transform(centroids)

plt.figure(figsize=(8, 6))
sns.scatterplot(x=X_pca[:, 0], y=X_pca[:, 1], hue=labels, palette="Set2", s=60, alpha=0.8)
plt.scatter(centroids_pca[:, 0], centroids_pca[:, 1], c="red", marker="X", s=200, label="Centroides")
plt.title(f"K-means Clustering (k={k})\nSilhouette={sil_score:.2f} | Inertia={inertia:.2f}")
plt.xlabel("Componente Principal 1")
plt.ylabel("Componente Principal 2")
plt.legend()
plt.show()


## 6️⃣ Exportar resultados a CSV

In [None]:

rows = []
for i in range(k):
    for j, feature in enumerate(features):
        rows.append([f"cluster{i}_center_{feature}", round(centroids[i, j], 2)])
    rows.append([f"cluster{i}_size", float(sizes[i])])

rows.append(["inertia", round(inertia, 2)])
rows.append(["silhouette_score", round(sil_score, 2)])

df_out = pd.DataFrame(rows, columns=["ID", "value"])
df_out.to_csv("submission.csv", index=False)
print("✅ Archivo 'submission.csv' generado con éxito.")
