# Deep Embedded Clustering for VPCF Image Analysis

This notebook demonstrates how to use the IDEC algorithm to cluster VPCF images and identify key structural features in ferroelectric materials.

## 1. Setup

In [None]:
!pip install numpy matplotlib scikit-learn tensorflow

In [None]:
import sys
sys.path.append('../src')
from IDEC import IDEC
from DEC import DEC
import metrics
import numpy as np
import matplotlib.pyplot as plt
from tensorflow.keras.optimizers import SGD
import os
os.makedirs('results/idec', exist_ok=True)
os.makedirs('results/dec', exist_ok=True)

## 2. Load Data

**TODO:** Load your VPCF image data here. The data should be a numpy array of shape `(n_samples, n_features)`, where `n_features` is the flattened size of your images.

In [None]:
import json
import os

data_dir = '../data/saved_vpcfs'
vpcfs = []
for filename in os.listdir(data_dir):
    if filename.endswith('.json'):
        filepath = os.path.join(data_dir, filename)
        with open(filepath, 'r') as f:
            data = json.load(f)
            for key in data:
                vpcfs.extend(data[key]['$array'])

x = np.array(vpcfs)
print('Data shape:', x.shape)

## 3. Define Model

The user requested a 4-layer deep autoencoder. The `dims` parameter defines the architecture of the autoencoder. The first element is the input dimension, and the last element is the dimension of the latent space. The decoder will be symmetric to the encoder.

In [None]:
input_dim = x.shape[1]
dims = [input_dim, 500, 500, 2000, 10] # 4-layer encoder

## 4. Pre-train Autoencoder for IDEC

In [None]:
idec = IDEC(dims=dims, n_clusters=4)
idec.pretrain(x, epochs=200)

## 5. Train IDEC Model

In [None]:
idec.compile(optimizer=SGD(0.01, 0.9), loss=['kld', 'mse'], loss_weights=[0.1, 1.0])
y_pred_idec = idec.fit(x, tol=0.001, maxiter=2e4, update_interval=140)

## 6. Pre-train Autoencoder for DEC

In [None]:
dec = DEC(dims=dims, n_clusters=4)
dec.pretrain(x, epochs=200)

## 7. Train DEC Model

In [None]:
dec.compile(optimizer=SGD(0.01, 0.9))
y_pred_dec = dec.fit(x, tol=0.001, maxiter=2e4, update_interval=140)