# Fuzzy Clustering with Traditional ML

This notebook implements customer categorization using fuzzy clustering with traditional machine learning approaches.

## Objectives
1. Load processed customer data
2. Apply fuzzy c-means clustering
3. Evaluate clustering quality
4. Interpret and visualize customer segments

## Setup

In [None]:
# Import required libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import skfuzzy as fuzz
from skfuzzy import cluster
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
from pathlib import Path

# Set visualization style
sns.set_style('whitegrid')
plt.rcParams['figure.figsize'] = (12, 6)

print("Libraries imported successfully!")

## 1. Load Processed Data

In [None]:
# Load processed data
DATA_DIR = Path('../data/processed')

# TODO: Load your processed customer data
# df = pd.read_csv(DATA_DIR / 'customers_cleaned.csv')
# print(f"Loaded {len(df)} customer records")

## 2. Data Preprocessing

Prepare features for fuzzy clustering.

In [None]:
# TODO: Select features and normalize
# features = ['feature1', 'feature2', 'feature3']  # Select relevant features
# X = df[features].values

# Standardize features
# scaler = StandardScaler()
# X_scaled = scaler.fit_transform(X)
# print("Features normalized")

## 3. Fuzzy C-Means Clustering

Apply fuzzy c-means algorithm to identify customer segments.

In [None]:
# TODO: Apply Fuzzy C-Means clustering
# n_clusters = 5  # Adjust based on your domain knowledge
# m = 2  # Fuzziness parameter

# cntr, u, u0, d, jm, p, fpc = fuzz.cluster.cmeans(
#     X_scaled.T,
#     n_clusters,
#     m,
#     error=0.005,
#     maxiter=1000,
#     init=None
# )

# print(f"Fuzzy Partition Coefficient (FPC): {fpc:.3f}")
# print("Note: FPC closer to 1 indicates better defined clusters")

## 4. Cluster Assignment

Assign customers to clusters based on maximum membership.

In [None]:
# TODO: Assign customers to clusters
# cluster_labels = np.argmax(u, axis=0)
# df['cluster'] = cluster_labels

# Add membership degrees for each cluster
# for i in range(n_clusters):
#     df[f'membership_cluster_{i}'] = u[i]

# print(df['cluster'].value_counts())

## 5. Cluster Visualization

Visualize clusters in 2D space using PCA.

In [None]:
# TODO: Visualize clusters
# pca = PCA(n_components=2)
# X_pca = pca.fit_transform(X_scaled)

# plt.figure(figsize=(12, 8))
# scatter = plt.scatter(X_pca[:, 0], X_pca[:, 1], c=cluster_labels, 
#                       cmap='viridis', alpha=0.6, s=50)
# plt.colorbar(scatter)
# plt.xlabel('First Principal Component')
# plt.ylabel('Second Principal Component')
# plt.title('Customer Segments (Fuzzy C-Means)')
# plt.tight_layout()
# plt.show()

## 6. Cluster Profiling

Analyze characteristics of each customer segment.

In [None]:
# TODO: Profile clusters
# cluster_profiles = df.groupby('cluster')[features].mean()
# print("\nCluster Profiles:")
# print(cluster_profiles)

## 7. Save Results

Save clustering results for further analysis.

In [None]:
# TODO: Save results
# OUTPUT_DIR = Path('../data/processed')
# df.to_csv(OUTPUT_DIR / 'customers_with_clusters_fcm.csv', index=False)
# print("Clustering results saved!")

## Next Steps

1. Compare results with neural network approach in `03_fuzzy_clustering_neural_network.ipynb`
2. Validate clusters with domain experts
3. Implement customer targeting strategies based on segments