# RBM for Unlabelled Categorization with Ember ML

This notebook demonstrates how a Restricted Boltzmann Machine (RBM) can be used for unlabelled categorization or clustering. By training an RBM on unlabelled data, we can leverage the learned hidden layer activations or reconstruction error to group similar data points. This showcases RBMs as a form of unsupervised learning within the Ember ML framework.

In [None]:
# Import necessary libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import make_blobs # For generating synthetic clustering data
from sklearn.cluster import KMeans # For simple clustering demonstration

# Import Ember ML components
from ember_ml.ops import set_backend
from ember_ml.nn import tensor
from ember_ml import ops
from ember_ml.models.rbm import RestrictedBoltzmannMachine

# Set a backend (choose 'numpy', 'torch', or 'mlx')
# You can change this to see how the code runs on different backends
set_backend('numpy')
print(f"Using backend: {ops.get_backend()}")

## 1. Generate Unlabelled Data

We'll generate a synthetic dataset with distinct clusters but without providing the cluster labels. The goal is to see if the RBM can help in identifying these underlying categories.

In [None]:
# Generate synthetic data with 3 clusters
n_samples = 300
n_features = 10
n_clusters = 3
X, _ = make_blobs(n_samples=n_samples, n_features=n_features, centers=n_clusters, random_state=42)

# Scale data to be between 0 and 1 (often helpful for RBMs)
X_scaled = (X - X.min()) / (X.max() - X.min())

# Convert to EmberTensor
data_tensor = tensor.convert_to_tensor(X_scaled, dtype=tensor.float32)

print(f"Data shape: {tensor.shape(data_tensor)}")

## 2. Train an RBM on Unlabelled Data

We train an RBM in an unsupervised manner on the generated data. The RBM learns a representation of the data in its hidden layer.

In [None]:
# Define RBM parameters
n_visible = tensor.shape(data_tensor)[1] # Number of features
n_hidden = 5 # Number of hidden units (can be tuned)

# Create and train the RBM
rbm = RestrictedBoltzmannMachine(n_visible=n_visible, n_hidden=n_hidden)

print("Training RBM...")
# Train the RBM on the unlabelled data
# Note: RBM training can be sensitive to hyperparameters and data scaling.
# For this simple example, we use basic settings.
rbm.train(data_tensor, epochs=200, learning_rate=0.1)
print("RBM training complete.")

## 3. Use RBM Features for Categorization

We can use the activations of the RBM's hidden layer as features for clustering. Data points that activate the hidden units similarly are likely to belong to the same category.

In [None]:
# Get the hidden layer activations (features) from the trained RBM
hidden_features = rbm.transform(data_tensor)

# Convert features to NumPy for clustering (using scikit-learn for demonstration)
hidden_features_np = tensor.to_numpy(hidden_features)

print(f"Hidden features shape: {hidden_features_tensor.shape}")

# Apply KMeans clustering on the hidden features
kmeans = KMeans(n_clusters=n_clusters, random_state=42, n_init=10) # n_init to suppress warning
cluster_labels = kmeans.fit_predict(hidden_features_np)

print("\nSample cluster labels:")
print(cluster_labels[:20])

## 4. Visualize Results

We can visualize the data points in their original 2D space (if applicable) or using dimensionality reduction, colored by the cluster labels assigned based on the RBM features.

In [None]:
# Since our original data is 2D, we can plot it directly
# If data was higher dimensional, you might use PCA or t-SNE for visualization
plt.figure(figsize=(8, 6))
scatter = plt.scatter(X[:, 0], X[:, 1], c=cluster_labels, cmap='viridis')
plt.title('Unlabelled Categorization using RBM Features and KMeans')
plt.xlabel('Feature 1')
plt.ylabel('Feature 2')
plt.colorbar(scatter, label='Cluster Label')
plt.grid(True)
plt.show()

## Conclusion

This notebook demonstrated how an RBM can be used in an unsupervised manner for categorization. By training the RBM on unlabelled data and using its learned hidden representations as features for clustering, we were able to group the data points according to their underlying structure. This highlights the RBM's capability as a feature learning model within Ember ML for unsupervised tasks.