# Anomaly Detection with Ember ML

This notebook demonstrates how to perform anomaly detection using components from the Ember ML framework. We will use a simple dataset and a Restricted Boltzmann Machine (RBM) as an example, showcasing Ember ML's backend-agnostic capabilities.

In [None]:
# Import necessary libraries
import numpy as np
import matplotlib.pyplot as plt

# Import Ember ML components
from ember_ml.ops import set_backend
from ember_ml.nn import tensor
from ember_ml import ops
from ember_ml.models.rbm import RestrictedBoltzmannMachine, train_rbm

# Set a backend (choose 'numpy', 'torch', or 'mlx')
# You can change this to see how the code runs on different backends
set_backend('numpy')
print(f"Using backend: {ops.get_backend()}")

## 1. Generate or Load Data

For this example, we'll generate a simple synthetic dataset with some anomalies. In a real-world scenario, you would load your data here.

In [None]:
# Generate normal data (e.g., from a normal distribution)
np.random.seed(42) # for reproducibility
normal_data = np.random.randn(100, 2) * 0.5 + tensor.convert_to_tensor([1, 1])

# Generate anomaly data (e.g., data points far from the normal distribution)
anomaly_data = np.random.randn(10, 2) * 2.0 + tensor.convert_to_tensoronvert_to_tensoronvert_to_tensoronvert_to_tensoronvert_to_tensoronvert_to_tensor([-2, -2])

# Combine data
data = tensor.vstack((normal_data, anomaly_data))

# Convert to EmberTensor
data_tensor = tensor.convert_to_tensor(data, dtype=tensor.float32, device='cpu')

print(f"Data shape: {tensor.shape(data_tensor)}")

## 2. Train an Anomaly Detection Model (RBM Example)

We'll train a Restricted Boltzmann Machine (RBM) on the *normal* data. RBMs can learn the distribution of normal data, and data points that deviate significantly from this learned distribution can be considered anomalies.

In [None]:
# Define RBM parameters
n_visible = tensor.shape(data_tensor)[1] # Number of features
n_hidden = 10 # Number of hidden units

# Create and train the RBM on normal data
rbm = RestrictedBoltzmannMachine(visible_size=n_visible, hidden_size=n_hidden, device='cpu')

# Train the RBM (using only normal data for training)
# Note: RBM training can be sensitive to hyperparameters and data scaling.
# For this simple example, we use basic settings.
print("Training RBM...")
# Create a simple data generator that yields the entire normal data as a single batch
def data_generator():
    # Convert normal data to tensor once
    normal_data_tensor = tensor.convert_to_tensor(normal_data, dtype=tensor.float32)
    # Yield the entire dataset as a single batch
    yield normal_data_tensor

# Train the RBM using the data generator
train_rbm(rbm, data_generator(), epochs=100)
print("RBM training complete.")

## 3. Detect Anomalies

Now, we'll use the trained RBM to compute an 'anomaly score' for each data point (both normal and anomaly). Data points with high anomaly scores are likely anomalies.

In [None]:
# Compute anomaly scores for all data points
anomaly_scores = rbm.anomaly_score(data_tensor)

# Convert scores to NumPy for easier plotting/analysis
anomaly_scores_np = tensor.to_numpy(anomaly_scores)

print(f"Anomaly scores shape: {anomaly_scores_tensor.shape}")

## 4. Visualize Results

We can visualize the data points and their anomaly scores to see how well the RBM distinguishes anomalies.

In [None]:
# Plot the data points, colored by their anomaly score
plt.figure(figsize=(8, 6))
scatter = plt.scatter(data[:, 0], data[:, 1], c=anomaly_scores_np, cmap='viridis')
plt.colorbar(scatter, label='Anomaly Score')
plt.title('Anomaly Detection using RBM')
plt.xlabel('Feature 1')
plt.ylabel('Feature 2')
plt.grid(True)
plt.show()

## 5. Set a Threshold and Identify Anomalies

Based on the anomaly scores, we can set a threshold to classify data points as either normal or anomalous.

In [None]:
# Choose a threshold (this often requires tuning based on validation data)
threshold = stats.percentile(anomaly_scores_np, 90) # Example: top 10% as anomalies

print(f"Anomaly threshold: {threshold:.4f}")

# Identify anomalies based on the threshold
is_anomaly = rbm.is_anomaly(data_tensor, threshold=tensor.convert_to_tensor(threshold, dtype=tensor.float32))

# Convert boolean result to NumPy for printing
is_anomaly_np = tensor.to_numpy(is_anomaly)

print("\nIdentified Anomalies (True if anomalous):")
print(is_anomaly_np)

## Conclusion

This notebook demonstrated a basic anomaly detection workflow using Ember ML and an RBM. The anomaly scores clearly differentiate the synthetic anomalies from the normal data points. You can adapt this approach using different models and feature extraction techniques available in Ember ML for your specific anomaly detection tasks.