# NN Playground

This notebook will be testing out variations on different neural networks, utilizing various forms of cells/nodes.

Cell types include:

-Backfed input Cell
-Input Cell
-Noisy Input Cell

-Hidden Cell
-Probablistic Hidden Cell
-Spiking Hidden Cell

-Output Cell
-Matching Input Output Cell

-Recurrent Cell
-Memory Cell
-Different Memory Cell

-Kernel
-Convolution or Pool

# Imports

In [1]:
import numpy as np
import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F

# Base NN Class

Creating a BaseNN class intended to use inheritance in later implementations of different NN's when abstracting the base class to make specialized classes.

In [2]:
# Base class for neural networks
class BaseNN(nn.Module):
    def __init__(self, input_size, hidden_size, output_size):
        super(BaseNN, self).__init__()
        self.input_size = input_size
        self.hidden_size = hidden_size
        self.output_size = output_size

    def forward(self, x):
        raise NotImplementedError("forward method must be implemented in derived classes")

In [3]:
# Instantiate the neural networks
input_size = 2
hidden_size = 3

# Basic Classes

## Perceptron (P)

**High-Level Overview**

The Perceptron represents the simplest form of a feedforward neural network, consisting of a single neuron with adjustable weights and a bias. Developed in 1957 by Frank Rosenblatt, it laid the groundwork for understanding neural networks. The Perceptron algorithm is a binary classifier that linearly separates data into two parts, making it a cornerstone in the study of machine learning for simple predictive modeling tasks.

**Data Type**

Perceptrons can process:
- Numerical data
- Binary features

Given its simplicity, it's primarily suited for linearly separable datasets where inputs can be categorized into two distinct groups.

**Task Objective**

Perceptrons are utilized for:
- Binary classification tasks
- Basic pattern recognition

Their straightforward approach allows them to make decisions by weighing input features, showcasing early neural network capabilities in distinguishing between two classes.

**Scalability**

Due to its simplicity, the scalability of a single-layer Perceptron is limited to problems that are linearly separable. For more complex datasets or non-linear problems, multi-layer networks or different algorithms are recommended.

**Robustness to Noise**

Perceptrons can be sensitive to noise in the data, especially since they do not incorporate error minimization in the same way as more advanced models. They perform best with clean, well-defined datasets.

**Implementation Variants**

While the basic Perceptron is foundational, several key developments have been made to extend its utility, including:
- **Multi-layer Perceptrons (MLPs):** Comprising multiple layers of neurons to tackle non-linearly separable data.
- **Stochastic Gradient Descent:** An optimization method allowing Perceptrons and their multi-layer successors to learn from training data iteratively.

**Practical Application Guidance**

**When to Use Perceptrons:**
- For simple linear classification problems.
- As a learning tool to understand the basics of neural network architecture and linear decision boundaries.

**Considerations:**
- The Perceptron's inability to solve non-linear problems limits its application in complex real-world scenarios.
- It serves as a building block for more sophisticated networks that can handle a broader range of tasks.

### Conclusion

The Perceptron model, with its simplicity, offers a fundamental understanding of neural network principles. Although its direct applications are limited to linearly separable tasks, the Perceptron remains an essential concept in machine learning, providing a stepping stone to more advanced neural network architectures and algorithms.

In [4]:
class Perceptron:
    def __init__(self, input_size):
        # Initialize weights and bias randomly
        self.weights = np.random.rand(input_size)
        self.bias = np.random.rand()

    def activate(self, x):
        # Simple step function as activation
        return 1 if x > 0 else 0

    def forward(self, inputs):
        # Calculate the weighted sum of inputs
        weighted_sum = np.dot(inputs, self.weights) + self.bias

        # Apply the activation function
        output = self.activate(weighted_sum)

        return output

In [5]:
# Example Usage
if __name__ == "__main__":
    # Create a perceptron with 2 input cells
    perceptron = Perceptron(input_size=2)

    # Example input
    input_data = np.array([0.5, 0.8])

    # Get the output from the perceptron
    output = perceptron.forward(input_data)

    print(f"Input: {input_data}")
    print(f"Output: {output}")

Input: [0.5 0.8]
Output: 1


## Feed Forward (FF)

In [6]:
class FeedforwardNN(nn.Module):
    def __init__(self, input_size, hidden_size):
        super(FeedforwardNN, self).__init__()
        self.input_layer = nn.Linear(input_size, hidden_size)
        self.hidden_layer = nn.Linear(hidden_size, hidden_size)
        self.output_layer = nn.Linear(hidden_size, 1)  # Single output neuron

    def forward(self, x):
        x = torch.relu(self.input_layer(x))
        x = torch.relu(self.hidden_layer(x))
        x = torch.sigmoid(self.output_layer(x))
        return x

# Instantiate the neural network
input_size = 2  # Number of input features 
hidden_size = 3  # Number of neurons in the hidden layers
model = FeedforwardNN(input_size, hidden_size)

In [7]:
# Define a sample input
sample_input = torch.tensor([[0.5, 0.3]])  # Example Data

# Forward pass to get the output
output = model(sample_input)

# Print the model architecture and output
print(model)
print("Output:", output.item())

FeedforwardNN(
  (input_layer): Linear(in_features=2, out_features=3, bias=True)
  (hidden_layer): Linear(in_features=3, out_features=3, bias=True)
  (output_layer): Linear(in_features=3, out_features=1, bias=True)
)
Output: 0.4091791808605194


# NN Classes

From this point onward, implementations are of different NN's.

# Radial Basis Network (RBF)

In [8]:
class RadialBasisFunction:
    def __init__(self, input_size, num_centers):
        # Initialize centers and width parameters randomly
        self.centers = np.random.rand(num_centers, input_size)
        self.width = np.random.rand()
        self.weights = np.random.rand(num_centers)
    
    def gaussian(self, x, center, width):
        # Gaussian activation function
        return np.exp(-np.sum((x - center)**2) / (2 * width**2))
    
    def forward(self, inputs):
        # Calculate the activation for each center
        activations = np.array([self.gaussian(inputs, center, self.width) for center in self.centers])
        
        # Calculate the weighted sum of activations
        weighted_sum = np.dot(activations, self.weights)
        
        # Apply a threshold for binary output
        output = 1 if weighted_sum > 0.5 else 0
        
        return output

In [9]:
# Example Usage
if __name__ == "__main__":
    # Create an RBF network with 2 input cells and 3 centers
    rbf_network = RadialBasisFunction(input_size=2, num_centers=3)
    
    # Example input
    input_data = np.array([0.5, 0.8])
    
    # Get the output from the RBF network
    output = rbf_network.forward(input_data)
    
    print(f"Input: {input_data}")
    print(f"Output: {output}")

Input: [0.5 0.8]
Output: 1


# Recurrent Neural Network (RNN)

In [10]:
class SimpleRNN(BaseNN):
    def __init__(self, input_size, hidden_size):
        super(SimpleRNN, self).__init__(input_size, hidden_size, output_size=1)
        self.input_layer = nn.Linear(input_size, hidden_size)
        self.recurrent_layer = nn.RNN(hidden_size, hidden_size, batch_first=True)
        self.output_layer = nn.Linear(hidden_size, self.output_size)

    def forward(self, x):
        x = torch.relu(self.input_layer(x))
        h_t, _ = self.recurrent_layer(x)
        output = torch.sigmoid(self.output_layer(h_t[:, -1, :]))  # Taking the output from the last time step
        return output

## Comparing FF & RNN

In [11]:
# Instantiate the models
feedforward_model = FeedforwardNN(input_size, hidden_size)
simple_rnn_model = SimpleRNN(input_size, hidden_size)

# Forward pass for the feedforward model
sample_input = torch.tensor([[0.5, 0.3]])
output_feedforward = feedforward_model(sample_input)

# Forward pass for the simple RNN model
sample_input_rnn = torch.rand((1, 4, input_size))
output_rnn = simple_rnn_model(sample_input_rnn)

# Print the model architectures and outputs
print("Feedforward Model:")
print(feedforward_model)
print("Output:", output_feedforward.item())

print("\nSimple RNN Model:")
print(simple_rnn_model)
print("Output:", output_rnn.item())

Feedforward Model:
FeedforwardNN(
  (input_layer): Linear(in_features=2, out_features=3, bias=True)
  (hidden_layer): Linear(in_features=3, out_features=3, bias=True)
  (output_layer): Linear(in_features=3, out_features=1, bias=True)
)
Output: 0.5601709485054016

Simple RNN Model:
SimpleRNN(
  (input_layer): Linear(in_features=2, out_features=3, bias=True)
  (recurrent_layer): RNN(3, 3, batch_first=True)
  (output_layer): Linear(in_features=3, out_features=1, bias=True)
)
Output: 0.4223051071166992


# Deep Feed Forward (DFF)

In [12]:
# Deep Feed Forward Neural Network
class DeepFeedforwardNN(BaseNN):
    def __init__(self, input_size, hidden_size, output_size):
        super(DeepFeedforwardNN, self).__init__(input_size, hidden_size, output_size)
        self.input_layer = nn.Linear(input_size, hidden_size)
        self.hidden_layers = nn.ModuleList([
            nn.Linear(hidden_size, hidden_size) for _ in range(2)  # Two hidden layers with 4 nodes each
        ])
        self.output_layer = nn.Linear(hidden_size, output_size)

    def forward(self, x):
        x = torch.relu(self.input_layer(x))
        for layer in self.hidden_layers:
            x = torch.relu(layer(x))
        x = torch.sigmoid(self.output_layer(x))
        return x

In [13]:
# Instantiate the deep feedforward neural network
input_size = 3  # Number of input features 
hidden_size = 4  # Number of nodes in each hidden layer
output_size = 2  # Number of output nodes
deep_feedforward_model = DeepFeedforwardNN(input_size, hidden_size, output_size)

# Define a sample input
sample_input = torch.tensor([[0.5, 0.3, 0.8]])  # Example Data

# Forward pass to get the output
output_deep_feedforward = deep_feedforward_model(sample_input)

# Print the model architecture and output
print(deep_feedforward_model)
print("Output:", output_deep_feedforward)

DeepFeedforwardNN(
  (input_layer): Linear(in_features=3, out_features=4, bias=True)
  (hidden_layers): ModuleList(
    (0-1): 2 x Linear(in_features=4, out_features=4, bias=True)
  )
  (output_layer): Linear(in_features=4, out_features=2, bias=True)
)
Output: tensor([[0.5978, 0.4909]], grad_fn=<SigmoidBackward0>)


# Long Short Term Memory (LSTM)

In [14]:
# LSTM Neural Network
class LSTMNN(BaseNN):
    def __init__(self, input_size, hidden_size, output_size):
        super(LSTMNN, self).__init__(input_size, hidden_size, output_size)
        self.lstm_layer = nn.LSTM(input_size, hidden_size, batch_first=True)
        self.output_layer = nn.Linear(hidden_size, output_size)

    def forward(self, x):
        _, (h_t, c_t) = self.lstm_layer(x)
        output = torch.sigmoid(self.output_layer(h_t[-1, :, :]))  # Taking the output from the last time step
        return output

In [15]:
# Instantiate the LSTM neural network
input_size = 3  # Number of input features 
hidden_size = 3  # Number of memory cells
output_size = 4  # Number of output nodes
lstm_model = LSTMNN(input_size, hidden_size, output_size)

# Define a sample input
sample_input = torch.rand((1, 4, input_size))  # Example Data

# Forward pass to get the output
output_lstm = lstm_model(sample_input)

# Print the model architecture and output
print(lstm_model)
print("Output:", output_lstm)

LSTMNN(
  (lstm_layer): LSTM(3, 3, batch_first=True)
  (output_layer): Linear(in_features=3, out_features=4, bias=True)
)
Output: tensor([[0.5801, 0.3975, 0.6583, 0.6457]], grad_fn=<SigmoidBackward0>)


# Gated Recurrent Unit (GRU)

In [16]:
class GRUNN(BaseNN):
    def __init__(self, input_size, hidden_size, output_size=1):
        super(GRUNN, self).__init__(input_size, hidden_size, output_size)
        self.gru_layer = nn.GRU(input_size, hidden_size, batch_first=True)
        self.output_layer = nn.Linear(hidden_size, self.output_size)

    def forward(self, x):
        h_t, _ = self.gru_layer(x)
        output = torch.sigmoid(self.output_layer(h_t[:, -1, :]))  # Taking the output from the last time step
        return output

In [17]:
# Instantiate the GRU neural network
input_size = 3  # Number of input features 
hidden_size = 3  # Number of memory cells
output_size = 4  # Number of output nodes
gru_model = GRUNN(input_size, hidden_size, output_size)

# Define a sample input
sample_input = torch.rand((1, 4, input_size))  # Example Data

# Forward pass to get the output
output_gru = gru_model(sample_input)

# Print the model architecture and output
print(gru_model)
print("Output:", output_gru)

GRUNN(
  (gru_layer): GRU(3, 3, batch_first=True)
  (output_layer): Linear(in_features=3, out_features=4, bias=True)
)
Output: tensor([[0.5840, 0.5677, 0.6334, 0.6433]], grad_fn=<SigmoidBackward0>)


# Auto Encoder (AE)

AE designed for unsupervised learning & Data compression.

Learns compact representation of input data.

Used for data denoising, dimensionality reduction, feature learning.

Versitile building block in utilizing NN's.

In [18]:
class Autoencoder(nn.Module):
    def __init__(self, input_size, hidden_size):
        super(Autoencoder, self).__init__()
        self.encoder = nn.Linear(input_size, hidden_size)
        self.decoder = nn.Linear(hidden_size, input_size)

    def forward(self, x):
        encoded = torch.relu(self.encoder(x))
        decoded = torch.sigmoid(self.decoder(encoded))
        return decoded

In [19]:
# Instantiate the autoencoder
input_size = 10  # Number of input features
hidden_size = 5  # Number of hidden nodes (compressed representation)
autoencoder = Autoencoder(input_size, hidden_size)

# Define a sample input
sample_input = torch.rand((1, input_size))

# Forward pass to get the reconstructed output
output_autoencoder = autoencoder(sample_input)

# Print the model architecture and output
print(autoencoder)
print("Output:", output_autoencoder)

Autoencoder(
  (encoder): Linear(in_features=10, out_features=5, bias=True)
  (decoder): Linear(in_features=5, out_features=10, bias=True)
)
Output: tensor([[0.5548, 0.5509, 0.5095, 0.3987, 0.4270, 0.5357, 0.4606, 0.4512, 0.5785,
         0.4374]], grad_fn=<SigmoidBackward0>)


# Variational AE (VAE)

**High-Level Overview**

Variational Autoencoders (VAEs) are a cornerstone in the field of generative AI, representing a powerful class of deep learning models for generative modeling. They are designed to learn the underlying probability distribution of training data, enabling the generation of new data points with similar properties. VAEs combine traditional autoencoder architecture with variational inference principles, allowing them to compress data into a latent space and then generate data by sampling from this space, thereby facilitating a deep exploration of the continuous latent space representing the data.

**Data Type**

VAEs demonstrate remarkable adaptability across a range of data types, including:
- Images
- Text
- Audio
- Continuous numerical data

This versatility underscores their prominence in generative AI, making them a popular choice for a wide array of generative tasks.

**Task Objective**

Emphasizing their role in generative AI, VAEs excel in:
- Data generation
- Feature extraction and representation learning
- Dimensionality reduction
- Anomaly detection

Their deep learning capabilities enable them not only to model complex distributions but also to generate new, coherent samples, showcasing the transformative potential of generative AI.

**Scalability**

With their deep neural network architecture, VAEs scale effectively to accommodate the complexity and volume of vast datasets, further solidifying their status in generative AI for handling high-dimensional data efficiently.

**Robustness to Noise**

VAEs' proficiency in denoising and reconstructing inputs highlights their robustness, making them invaluable for applications in generative AI where data cleanliness cannot be assured.

**Implementation Variants**

Reflecting the innovation in generative AI, various VAE models have been developed to address specific challenges or improve upon the original framework, including Conditional VAEs, Beta-VAEs, and Disentangled VAEs, each offering unique advantages for controlled data generation and enhanced interpretability of latent representations.

**Practical Application Guidance**

In the realm of generative AI, VAEs are particularly suited for:
- Generating new data that mimics the properties of specific datasets.
- Unsupervised learning of complex data distributions.
- Applications requiring a nuanced understanding of data's underlying structure.

**Considerations:**

Training VAEs can present challenges, such as mode collapse, underscoring the need for expertise in generative AI to navigate these complexities successfully.

### Conclusion

Variational Autoencoders (VAEs) have cemented their place as a fundamental technology in generative AI, offering a sophisticated mechanism for understanding and generating data. Their broad applicability and the depth of insight they provide into data's inherent structure make them a pivotal tool in the advancement of generative modeling.

In [20]:
class VariationalAutoencoder(nn.Module):
    def __init__(self, input_size, hidden_size):
        super(VariationalAutoencoder, self).__init__()

        # Encoder layers
        self.encoder_fc1 = nn.Linear(input_size, hidden_size)
        self.encoder_fc2_mean = nn.Linear(hidden_size, hidden_size)
        self.encoder_fc2_logvar = nn.Linear(hidden_size, hidden_size)

        # Decoder layers
        self.decoder_fc1 = nn.Linear(hidden_size, input_size)
        self.decoder_fc2 = nn.Linear(input_size, input_size)

    def reparameterize(self, mean, logvar):
        std = torch.exp(0.5 * logvar)
        eps = torch.randn_like(std)
        return mean + eps * std

    def forward(self, x):
        # Encoder
        x = torch.relu(self.encoder_fc1(x))
        mean = self.encoder_fc2_mean(x)
        logvar = self.encoder_fc2_logvar(x)

        # Reparameterization trick
        z = self.reparameterize(mean, logvar)

        # Decoder
        x_hat = torch.relu(self.decoder_fc1(z))
        x_hat = torch.sigmoid(self.decoder_fc2(x_hat))

        return x_hat, mean, logvar

In [21]:
# Instantiate the variational autoencoder
input_size = 4  # Number of input features
hidden_size = 4  # Number of hidden nodes in probabilistic layer
vae = VariationalAutoencoder(input_size, hidden_size)

# Define a sample input
sample_input = torch.rand((1, input_size))

# Forward pass to get the reconstructed output and latent variables
output_vae, mean, logvar = vae(sample_input)

# Print the model architecture and output
print(vae)
print("Output:", output_vae)
print("Mean:", mean)
print("Log Variance:", logvar)

VariationalAutoencoder(
  (encoder_fc1): Linear(in_features=4, out_features=4, bias=True)
  (encoder_fc2_mean): Linear(in_features=4, out_features=4, bias=True)
  (encoder_fc2_logvar): Linear(in_features=4, out_features=4, bias=True)
  (decoder_fc1): Linear(in_features=4, out_features=4, bias=True)
  (decoder_fc2): Linear(in_features=4, out_features=4, bias=True)
)
Output: tensor([[0.4106, 0.4347, 0.5420, 0.4469]], grad_fn=<SigmoidBackward0>)
Mean: tensor([[ 0.2209, -0.2880, -0.0193, -0.0295]], grad_fn=<AddmmBackward0>)
Log Variance: tensor([[-0.5089,  0.4982,  0.0301,  0.4687]], grad_fn=<AddmmBackward0>)


# Denoising Auto Encoder

**High-Level Overview**

Denoising Autoencoders (DAEs) are an advanced type of autoencoder designed to *remove noise from input data*. By intentionally corrupting the input data and then learning to reconstruct the original, uncorrupted data, DAEs are trained to capture the most relevant features. This process enhances the model's ability to generalize from the data, making it highly effective for tasks that require robust feature extraction and data denoising capabilities.

**Data Type**

Denoising Autoencoders are capable of processing various data types, including:
- Images
- Text
- Audio signals
- Continuous numerical data

Their adaptability makes them particularly useful for applications involving noisy or incomplete data.

**Task Objective**

Denoising Autoencoders are primarily used for:
- Data denoising
- Feature extraction and representation learning
- Dimensionality reduction
- Data generation and enhancement

By learning to ignore the "noise" in data, DAEs excel in recovering clean representations from corrupted inputs.

**Scalability**

Similar to other autoencoders, the scalability of DAEs depends on the network architecture. Modern techniques and computational resources allow DAEs to handle large datasets and complex noise patterns effectively, showcasing their scalability in practical applications.

**Robustness to Noise**

The core strength of DAEs lies in their robustness to noise. They are specifically trained to identify and ignore irrelevant features (noise), focusing on reconstructing the essential aspects of the data, which makes them exceptionally reliable for denoising tasks.

**Implementation Variants**

Several variants of DAEs have been developed to address different types of noise or to enhance specific aspects of denoising, including:
- **Gaussian Noise DAEs:** Target Gaussian noise in the data.
- **Salt-and-Pepper Noise DAEs:** Designed to remove binary noise from images.
- **Variational DAEs:** Combine denoising capabilities with variational autoencoder frameworks for improved generative properties.

**Practical Application Guidance**

**When to Use Denoising Autoencoders:**
- For cleaning noisy data before further processing or analysis.
- In feature extraction tasks where maintaining data integrity is crucial.
- As a preprocessing step to improve the performance of subsequent machine learning models.

**Considerations:**
- The effectiveness of a DAE can vary based on the noise type and level; selecting the appropriate model variant is key.
- Training DAEs requires a balance between denoising capability and preserving relevant features, necessitating careful tuning of model parameters.

### Conclusion

Denoising Autoencoders offer a powerful solution for improving data quality, with their unique training strategy enabling them to extract clean, relevant features from noisy inputs. Their versatility across different data types and robustness to various noise patterns make them an invaluable tool in the data preprocessing pipeline, enhancing the performance of machine learning and deep learning models across a wide range of applications.

In [22]:
class DenoisingAutoencoder(nn.Module):
    def __init__(self, input_size, hidden_size):
        super(DenoisingAutoencoder, self).__init__()

        # Encoder layers
        self.encoder_fc1 = nn.Linear(input_size, hidden_size)

        # Decoder layers
        self.decoder_fc1 = nn.Linear(hidden_size, input_size)

    def forward(self, x):
        # Encoder
        x = torch.relu(self.encoder_fc1(x))

        # Decoder
        x_hat = torch.sigmoid(self.decoder_fc1(x))

        return x_hat

In [23]:
# Instantiate the denoising autoencoder
input_size = 4  # Number of input features
hidden_size = 4  # Number of hidden nodes
dae = DenoisingAutoencoder(input_size, hidden_size)

# Define a sample input (noisy data)
noisy_input = torch.rand((1, input_size))  # Example Noisy Data

# Forward pass to get the reconstructed output
output_dae = dae(noisy_input)

# Print the model architecture and output
print(dae)
print("Noisy Input:", noisy_input)
print("Reconstructed Output:", output_dae)

DenoisingAutoencoder(
  (encoder_fc1): Linear(in_features=4, out_features=4, bias=True)
  (decoder_fc1): Linear(in_features=4, out_features=4, bias=True)
)
Noisy Input: tensor([[0.5138, 0.0090, 0.7281, 0.1610]])
Reconstructed Output: tensor([[0.5775, 0.4534, 0.5027, 0.5040]], grad_fn=<SigmoidBackward0>)


# Sparse Auto Encoder

**High-Level Overview**

Sparse Autoencoders represent a specialized variant of autoencoders, aimed at *unsupervised learning of compressed representations*. By introducing sparsity constraints, they enforce most neurons to be inactive, enhancing feature detection and data representation efficiency. This approach improves generalization, making them suitable for tasks requiring robust feature extraction.

**Data Type**

Sparse Autoencoders efficiently process:
- Images
- Text
- Audio signals
- Continuous numerical data

Their versatility across different data types highlights their utility in feature extraction and data compression tasks.

**Task Objective**

Key applications include:
- Feature extraction and representation learning
- Dimensionality reduction
- Data denoising
- Pretraining for deeper neural networks

Sparsity constraints enable these models to learn higher-level features, distinguishing them from traditional autoencoders.

**Scalability**

Despite sparsity aiding in learning efficient representations, the network's size and depth impact its ability to model complex distributions and computational requirements.

**Robustness to Noise**

They demonstrate significant robustness to noise, attributed to their focus on essential features, making them ideal for denoising and robust representation learning.

**Implementation Variants**

Variants are based on the sparsity enforcement method:
- **KL Divergence Sparse Autoencoder:** Penalizes deviations from a target sparsity level using Kullback-Leibler divergence.
- **L1 Regularization Sparse Autoencoder:** Applies L1 penalty on hidden units' activations to encourage sparsity.
- **Winner-Take-All (WTA) Sparse Autoencoder:** Only a fraction of the most active hidden units are allowed to update their weights, enhancing sparsity.

**Practical Application Guidance**

**When to Use Sparse Autoencoders:**
- In extracting meaningful features from high-dimensional data.
- For dimensionality reduction with interpretability.
- During pretraining phases for deep learning models, providing a good initial weight set that captures useful data patterns.

**Considerations:**
- Selecting the appropriate sparsity constraint and regularization technique is critical for balancing feature selectivity and model complexity.
- Hyperparameters require careful tuning to achieve desired sparsity levels and optimal performance.

### Conclusion

Sparse Autoencoders stand out for learning efficient and interpretable data representations, with enforced sparsity offering clear advantages in feature selection and model robustness. They are invaluable in preprocessing, feature extraction, and as a pretraining step, enhancing subsequent models' performance across various data types and applications.

In [24]:
class SparseAutoencoder(BaseNN):
    def __init__(self, input_size, hidden_size, sparsity_target=0.1, sparsity_weight=0.2):
        super(SparseAutoencoder, self).__init__(input_size, hidden_size, output_size=input_size)

        # Encoder layers
        self.encoder_fc1 = nn.Linear(input_size, hidden_size)

        # Decoder layers
        self.decoder_fc1 = nn.Linear(hidden_size, input_size)

        # Sparsity parameters
        self.sparsity_target = sparsity_target
        self.sparsity_weight = sparsity_weight
        self.relu = nn.ReLU()

    def forward(self, x):
        # Encoder
        encoded = self.encoder_fc1(x)
        encoded = self.relu(encoded)

        # Decoder
        decoded = torch.sigmoid(self.decoder_fc1(encoded))

        return decoded, encoded

    def loss_function(self, x, x_hat, encoded):
        # Reconstruction loss
        reconstruction_loss = nn.functional.binary_cross_entropy(x_hat, x, reduction='mean')

        # Sparsity loss
        sparsity_loss = torch.sum(self.kl_divergence(self.sparsity_target, encoded))

        # Total loss
        total_loss = reconstruction_loss + self.sparsity_weight * sparsity_loss

        return total_loss

    def kl_divergence(self, target, activations):
        # KL Divergence to enforce sparsity
        p = torch.mean(activations, dim=0)  # Average activation over the dataset
        return target * torch.log(target / p) + (1 - target) * torch.log((1 - target) / (1 - p))

In [25]:
# Instantiate the sparse autoencoder
input_size = 5  # Number of input features
hidden_size = 3  # Number of hidden nodes
sae = SparseAutoencoder(input_size, hidden_size)

# Define a sample input
sample_input = torch.rand((1, input_size))  # Example Data

# Forward pass to get the reconstructed output and encoded representation
output_sae, encoded_sae = sae(sample_input)

# Calculate the loss
loss_sae = sae.loss_function(sample_input, output_sae, encoded_sae)

# Print the model architecture, output, and loss
print(sae)
print("Input:", sample_input)
print("Reconstructed Output:", output_sae)
print("Encoded Representation:", encoded_sae)
print("Loss:", loss_sae.item())

SparseAutoencoder(
  (encoder_fc1): Linear(in_features=5, out_features=3, bias=True)
  (decoder_fc1): Linear(in_features=3, out_features=5, bias=True)
  (relu): ReLU()
)
Input: tensor([[0.4828, 0.0561, 0.2652, 0.6145, 0.1415]])
Reconstructed Output: tensor([[0.4442, 0.6020, 0.5842, 0.5638, 0.3021]], grad_fn=<SigmoidBackward0>)
Encoded Representation: tensor([[0.5875, 0.1028, 0.1637]], grad_fn=<ReluBackward0>)
Loss: 0.8145620226860046


# Markov Chain (MC)

**High-Level Overview**

Markov Chains represent a stochastic model describing a sequence of possible events where the probability of each event depends only on the state attained in the previous event. This mathematical framework is fundamental in the study of random processes and is widely applicable across various domains, including statistical mechanics, economics, and predictive modeling. Markov Chains are particularly valued for their simplicity and power in modeling the randomness of systems evolving over time.

**Data Type**

Markov Chains are applicable to:
- Discrete events or states
- Temporal or spatial sequences

Their adaptability allows them to model a wide array of processes, from simple random walks to complex decision-making scenarios.

**Task Objective**

Markov Chains excel in:
- Predicting state transitions
- Modeling random processes
- Decision making under uncertainty

Their predictive capabilities make them an essential tool for scenarios where future states depend on the current state, without the need for historical data.

**Scalability**

Markov Chains scale well with the complexity of the model, primarily influenced by the number of states. While larger state spaces increase computational demands, advancements in algorithms and computing power have made it feasible to tackle complex chains efficiently.

**Robustness to Noise**

Given their probabilistic nature, Markov Chains naturally incorporate and manage uncertainty and noise within their models. This robustness makes them suitable for applications where data may be incomplete or inherently random.

**Implementation Variants**

Markov Chains come in various forms, including:
- **Discrete-Time Markov Chains:** Model transitions in discrete time steps.
- **Continuous-Time Markov Chains:** Allow for transitions at any point in time.
- **Hidden Markov Models (HMMs):** Extend Markov Chains by allowing observations to be a probabilistic function of the state, useful in scenarios where states are not directly observable.

**Practical Application Guidance**

**When to Use Markov Chains:**
- For modeling sequential or temporal data where future states depend on the current state.
- In decision-making processes to evaluate different strategies under uncertainty.
- When analyzing systems or processes that evolve over time in predictable patterns.

**Considerations:**
- Markov Chains assume the future is independent of the past given the present state, which may not hold in systems with memory or where historical context is crucial.
- They are best applied to processes where this assumption of memorylessness (the Markov property) is reasonable or where state transitions are primarily influenced by the current state.

### Conclusion

Markov Chains offer a powerful and flexible framework for modeling random processes and making predictions based on state transitions. By understanding their structure, capabilities, and the variety of their applications, one can effectively leverage Markov Chains to gain insights into complex systems, predict future events, and make informed decisions under uncertainty.

In [26]:
class MarkovChainNN(BaseNN):
    def __init__(self, input_size, hidden_size):
        super(MarkovChainNN, self).__init__(input_size, hidden_size, output_size=input_size)
        self.transition_matrix = nn.Parameter(torch.randn(input_size, hidden_size))
        self.output_layer = nn.Linear(hidden_size, input_size)

    def forward(self, x):
        # Apply a simple linear transformation based on the transition matrix
        x = torch.matmul(x, self.transition_matrix)
        # Apply a linear layer to get the final output
        output = self.output_layer(x)
        # You might want to apply some non-linearity here based on your specific needs
        # For example, you can use torch.relu(output) or torch.sigmoid(output) depending on the task
        return output

# Instantiate the Markov Chain neural network
input_size = 4  # Number of input features 
hidden_size = 8  # Number of hidden states
markov_chain_model = MarkovChainNN(input_size, hidden_size)

# Define a sample input
sample_input = torch.rand((1, input_size))  # Example Data

# Forward pass to get the output
output_markov_chain = markov_chain_model(sample_input)

In [27]:
# Print the model architecture and output
print(markov_chain_model)
print("Output:", output_markov_chain)

MarkovChainNN(
  (output_layer): Linear(in_features=8, out_features=4, bias=True)
)
Output: tensor([[ 0.6076, -0.3390,  1.0989, -0.3724]], grad_fn=<AddmmBackward0>)


# Hopfield Network

**High-Level Overview**

Hopfield Networks are a form of recurrent neural network with a unique structure that allows them to serve as associative memory systems. These networks are characterized by fully connected neurons with symmetric weight matrices, enabling them to converge to stable states or "memories". This architecture makes Hopfield Networks particularly adept at solving optimization and memory recall tasks, leveraging their ability to find energy minima to recall stored patterns.

**Data Type**

Hopfield Networks primarily deal with:
- Binary data
- Bipolar data

Their structure is optimized for patterns represented in these formats, making them suitable for tasks that can be encoded as binary or bipolar vectors.

**Task Objective**

Hopfield Networks are well-suited for:
- Pattern recognition
- Associative memory recall
- Optimization problems

Their ability to serve as content-addressable ("associative") memory systems allows them to recall entire patterns based on partial or noisy inputs, showcasing their strength in tasks requiring robust pattern completion and error correction.

**Scalability**

While Hopfield Networks provide powerful capabilities for pattern recognition and memory recall, their scalability is limited by the network size due to the fully connected nature of the architecture. The capacity of a Hopfield Network to store memories without error is approximately 15% of the number of neurons, limiting the size of problems they can effectively solve without modifications or extensions.

**Robustness to Noise**

A key feature of Hopfield Networks is their robustness to noise in input patterns. They can recover original stored patterns from inputs that are partially incorrect or incomplete, making them highly effective for tasks requiring error tolerance and noise reduction in pattern recall.

**Implementation Variants**

To address scalability and efficiency, several variants of Hopfield Networks have been developed, including:
- **Continuous Hopfield Networks:** Extend the binary model to continuous values, allowing for application to a wider range of problems.
- **Stochastic Hopfield Networks:** Introduce randomness in the update rules, enhancing the network's ability to escape local minima and find better solutions for optimization problems.

**Practical Application Guidance**

**When to Use Hopfield Networks:**
- When the task involves recovering or completing partial patterns.
- For optimization problems where potential solutions can be encoded as binary or bipolar vectors.
- In applications where associative memory models offer a natural solution.

**Considerations:**
- Hopfield Networks are not well-suited for large-scale problems due to their limited storage capacity and the computational cost of fully connected networks.
- They may not be the best choice for new tasks with high-dimensional data or where deep learning approaches have demonstrated superior performance.

### Conclusion

Hopfield Networks offer a approach to associative memory and optimization problems, with their unique ability to recall stored patterns from noisy or incomplete inputs. Understanding their structure, capabilities, and limitations is crucial for leveraging their strengths in relevant applications, while recognizing when alternative neural network models might be more appropriate.

In [28]:
class HopfieldNetwork(BaseNN):
    def __init__(self, input_size):
        super(HopfieldNetwork, self).__init__(input_size, hidden_size=None, output_size=input_size)
        
        # Weight matrix for the Hopfield Network
        self.weights = nn.Parameter(torch.zeros((input_size, input_size), dtype=torch.float))

    def forward(self, x):
        # Apply the Hopfield Network dynamics
        y = torch.sign(x @ self.weights).long()  # Convert to torch.long after applying sign
        return y


In [29]:
# Example usage
input_size = 5  # Change this based on your needs
hopfield_model = HopfieldNetwork(input_size)

# Define a sample input pattern (1 or -1)
sample_input = torch.tensor([[1, -1, 1, -1, 1]], dtype=torch.float)  # Change the data type to torch.float

# Forward pass to retrieve the output
output_hopfield = hopfield_model(sample_input)

# Print the model architecture and output
print(hopfield_model)
print("Output:", output_hopfield.numpy())

HopfieldNetwork()
Output: [[0 0 0 0 0]]


# Boltzmann Machine (BM)

BM is a stochastic RNN designed to find a probability distribution over its set of binary-valued patterns. 

*Main Objective* is to learn the joint probablity distribution of its training data.

Unique Features:
-visible & hidden units forming bipartite graph
-connects between units have weights & model learns weights during training
-stochastic update process for both training & inference

Common Uses:
-unsupervised learning tasks like feature learning, dimensionality reduction, density estimation.

In [30]:
class BoltzmannMachine(nn.Module):
    def __init__(self, num_visible, num_hidden):
        super(BoltzmannMachine, self).__init__()
        self.num_visible = num_visible
        self.num_hidden = num_hidden

        # Define the parameters (weights and biases)
        self.weights = nn.Parameter(torch.randn(num_visible, num_hidden))
        self.visible_bias = nn.Parameter(torch.randn(num_visible))
        self.hidden_bias = nn.Parameter(torch.randn(num_hidden))

    def forward(self, visible_states):
        # Ensure visible_states has the correct dimensions (batch_size x num_visible)
        if visible_states.dim() == 1:
            visible_states = visible_states.view(1, -1)

        # Compute the hidden probabilities given visible states
        hidden_probabilities = F.sigmoid(F.linear(visible_states, self.weights.t(), self.hidden_bias))

        # Sample hidden states from the computed probabilities
        hidden_states = torch.bernoulli(hidden_probabilities)

        # Compute the visible probabilities given the sampled hidden states
        visible_probabilities = F.sigmoid(F.linear(hidden_states, self.weights, self.visible_bias))

        # Sample visible states from the computed probabilities
        visible_states = torch.bernoulli(visible_probabilities)

        return visible_states, hidden_states

In [31]:
# Example usage
num_visible = 5
num_hidden = 3

boltzmann_machine = BoltzmannMachine(num_visible, num_hidden)

# Define a sample visible state (binary values)
sample_visible_state = torch.tensor([1, 0, 1, 0, 1.], dtype=torch.float)

# Perform a Gibbs sampling step
sampled_visible, sampled_hidden = boltzmann_machine(sample_visible_state)

# Print the model architecture and sampled states
print(boltzmann_machine)
print("Sampled Visible State:", sampled_visible.detach().numpy())
print("Sampled Hidden State:", sampled_hidden.detach().numpy())

BoltzmannMachine()
Sampled Visible State: [[1. 1. 1. 0. 1.]]
Sampled Hidden State: [[0. 1. 1.]]


# Restricted Boltzmann Machine (RBM)

Differences from normal Boltzmann Machine:

    -No connections between units within same layer (no hidden-hidden or visible-visible connections)
    
    -Bipartite graph with one layer of visible units and one layer of hidden units
    
Objective: 

    -RBM objective with changes for potentially more effective feature learning

Unique Features:

    -Widely used for feature learning & are building blocks in deep learning architectures.
    
    -Efficent training, Contrastive Divergence (CD) is often used for training RBMs
Common Uses:

    -Feature Learning; pre-training deep NN
    
    -Collaborative filtering, topic modeling, other unsupervised learning tasks.

In [32]:
class RBM(BaseNN):
    def __init__(self, visible_size, hidden_size):
        super(RBM, self).__init__(visible_size, hidden_size, None)
        self.weights = nn.Parameter(torch.randn(visible_size, hidden_size))
        self.visible_bias = nn.Parameter(torch.zeros(visible_size))
        self.hidden_bias = nn.Parameter(torch.zeros(hidden_size))

    def forward(self, x):
        hidden_prob = F.sigmoid(F.linear(x, self.weights.t(), self.hidden_bias))
        hidden_state = torch.bernoulli(hidden_prob)
        reconstructed_prob = F.sigmoid(F.linear(hidden_state, self.weights, self.visible_bias))
        return hidden_state, reconstructed_prob

In [33]:
# Example usage
visible_size = 5
hidden_size = 3

# Create an RBM model
rbm_model = RBM(visible_size, hidden_size)

# Define a sample visible state (binary values)
sample_visible_state = torch.tensor([[1, 0, 1, 0, 1.]], dtype=torch.float)

# Forward pass to get the hidden states
hidden_states, _ = rbm_model(sample_visible_state)

# Print the model architecture and hidden states
print(rbm_model)
print("Sampled Visible State:", sample_visible_state.detach().numpy())
print("Hidden States:", hidden_states.detach().numpy())

RBM()
Sampled Visible State: [[1. 0. 1. 0. 1.]]
Hidden States: [[1. 0. 1.]]


# Deep Belief Network (DBN)

Objective:

    -Unsupervised learning & feature learning
    -Model complex hierarchical representations of data
Unique Features:
    
    -Multiple layer of stochastic, latent variables (usually binary)
    -Stack of Restricted Boltzmann Machines (See super in the DBN class init for implementing via making DBN layers)
    - Uses a layer-wise pre-training approach followed by fine-tuning using backprop
    
Common Uses:
    
    - Feature Learning
    - Generative tasks (new samples from learned distribution)

In [34]:
class DBN(BaseNN):
    def __init__(self, visible_size, hidden_sizes):
        super(DBN, self).__init__(visible_size, None, None)
        self.rbm_layers = nn.ModuleList([RBM(visible_size, hidden_size) for hidden_size in hidden_sizes])

    def forward(self, x):
        # Forward pass through each RBM layer
        for rbm_layer in self.rbm_layers:
            x, _ = rbm_layer(x)
        return x

In [35]:
# Example usage
visible_size = 5
hidden_sizes = [5, 1]

# Create a Deep Belief Network
dbn_model = DBN(visible_size, hidden_sizes)

# Define a sample input
sample_input = torch.rand((1, visible_size))

# Forward pass through the DBN
output_dbn = dbn_model(sample_input)

# Print the model architecture and output
print(dbn_model)
print("Output:", output_dbn.detach().numpy())

DBN(
  (rbm_layers): ModuleList(
    (0-1): 2 x RBM()
  )
)
Output: [[1.]]


# Deep Convolutional Network (DCN)

Objective:
    
    -Processing structured grid data like images
    -Excels @ capturing hierarchical spatial patterns

Unique Features:
    
    -Uses convolutional layers w/ learnable filters that capture local patterns
    -Typically includes pooling layers to reduce spatial dimensions & increase computational efficency
    -Uses shared weights in convolutional layers for translation invariance
    
Common uses:
    
    -Image classification
    -Feature learning in spatial data (hierarchical representations of spatial features)
    -Transfer learning (pre-trained CNNs on large datasets often fine-tuned for specific tasks)

In [36]:
class DeepCNN(BaseNN):
    def __init__(self, input_channels, num_classes, image_size):
        hidden_size = 64

        super(DeepCNN, self).__init__(input_size=image_size, hidden_size=hidden_size, output_size=num_classes)

        self.conv1 = nn.Conv2d(in_channels=input_channels, out_channels=32, kernel_size=3, stride=1, padding=1)
        self.conv2 = nn.Conv2d(in_channels=32, out_channels=64, kernel_size=3, stride=1, padding=1)
        self.pool = nn.MaxPool2d(kernel_size=2, stride=2)
        self.fc1 = nn.Linear(64 * (image_size // 4) * (image_size // 4), hidden_size)
        self.fc2 = nn.Linear(hidden_size, num_classes)

    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = x.view(-1, 64 * (self.input_size // 4) * (self.input_size // 4))
        x = F.relu(self.fc1(x))
        x = self.fc2(x)
        return x

In [37]:
# Example usage
input_channels = 1  # Grayscale images
num_classes = 10
image_size = 28

deep_cnn_model = DeepCNN(input_channels, num_classes, image_size)

sample_image = torch.rand((1, input_channels, image_size, image_size))

output_scores = deep_cnn_model(sample_image)

print(deep_cnn_model)
print("Output Scores:", output_scores.detach().numpy())

DeepCNN(
  (conv1): Conv2d(1, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (conv2): Conv2d(32, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (pool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (fc1): Linear(in_features=3136, out_features=64, bias=True)
  (fc2): Linear(in_features=64, out_features=10, bias=True)
)
Output Scores: [[-0.11650774  0.14896058 -0.14006138 -0.06969805 -0.08172397 -0.12495886
  -0.13444492  0.15795682 -0.12663546  0.01591007]]


# Deconvolutional Network (DN)

Objective:
    
    -Reconstruction & generation of struuctured data (especially images)
    -specialize in capturing hierarchical spatial patterns

Unique Features:
    
    -deconvolutional layers w/ learnable filters for reconstructing spatial patterns
    -usually include unpooling layers to increase spatial dimensions while maintaining computational efficency
    -shared weights in deconvolutional layers to introduce translation invariance during reconstruction
    
Common uses:
    
    -image reconstruction & generation
    -feature learning in spatial data w/ focus on capturing hierarchical spatial patterns
    -semantic segmentation in images
    -inverse problems (ex.image restoration)
    -transfer learning (pre-trained on larger dataset -> fine-tuned for specicific reconstruction task)

In [38]:
class DeepDeconvNet(BaseNN):
    def __init__(self, input_channels, output_channels, output_size):
        hidden_size = 64

        super(DeepDeconvNet, self).__init__(input_size=None, hidden_size=hidden_size, output_size=output_size)

        self.fc1 = nn.Linear(hidden_size, 64 * (output_size // 4) * (output_size // 4))
        self.deconv1 = nn.ConvTranspose2d(in_channels=64, out_channels=32, kernel_size=3, stride=2, padding=1, output_padding=1)
        self.deconv2 = nn.ConvTranspose2d(in_channels=32, out_channels=output_channels, kernel_size=3, stride=2, padding=1, output_padding=1)

    def forward(self, x):
        x = F.relu(self.fc1(x))
        x = x.view(-1, 64, (self.output_size // 4), (self.output_size // 4))
        x = F.relu(self.deconv1(x))
        x = F.sigmoid(self.deconv2(x))
        return x

notes on parameters above: 

    channels: in context of images, 1=grayscale, 3=RGB
    kernel_size: size of convolutional kernel-filter, size of local region considered for each convolutional operation
    stride: step-size
    padding: zero-padding addied to input of each side, helps maintain/adjust spatial dimensions
    output_size: shape of output data

In [39]:
# Example usage
input_channels = 1  # Grayscale images
output_channels = 3  # Number of channels in the output image (e.g., RGB)
output_size = 28

deep_deconv_model = DeepDeconvNet(input_channels, output_channels, output_size)

sample_latent_vector = torch.rand((1, deep_deconv_model.hidden_size))

output_image = deep_deconv_model(sample_latent_vector)

print(deep_deconv_model)
print("Output Image Shape:", output_image.shape)

DeepDeconvNet(
  (fc1): Linear(in_features=64, out_features=3136, bias=True)
  (deconv1): ConvTranspose2d(64, 32, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), output_padding=(1, 1))
  (deconv2): ConvTranspose2d(32, 3, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), output_padding=(1, 1))
)
Output Image Shape: torch.Size([1, 3, 28, 28])


# Deep Convolutional Inverse Graphics Network (DCIGN)

Objective:

    -Inverting image rendering process to understand & reconstruct 3D structure from 2D images
    -Specializes network for specific tasks involving 3D object manipulation & scene understanding 
Unique Features:

    -Combines convolutional layers for feature extraction w/ inverse graphics layers for 3D reconstruction
    -Capable of learning interpretable, manipulable representations of image elements
    -Adaptable architecture for varying levels of detail and types of 3D reconstruction
    
Common uses:
   
    -3D object reconstruction from 2D images in computer vision and graphics
    -Pose estimation for objects and characters in images
    -comprehensive scene understanding for robotics & autonomous navigation
    -applications in AR & VR for real-time image manipulation
    -Image restoration & completion

In [40]:
class DCIGN(BaseNN):
    def __init__(self, input_channels, input_size, output_size):
        hidden_size = 64 

        super(DCIGN, self).__init__(input_size=input_size, hidden_size=hidden_size, output_size=output_size)

        self.conv1 = nn.Conv2d(in_channels=input_channels, out_channels=32, kernel_size=3, stride=1, padding=1)
        self.conv2 = nn.Conv2d(in_channels=32, out_channels=64, kernel_size=3, stride=1, padding=1)
        self.pool = nn.MaxPool2d(kernel_size=2, stride=2)

        # Calculate the size of the flattened output after convolutional and pooling layers
        self.flattened_size = 64 * (input_size // 4) * (input_size // 4)

        self.fc1 = nn.Linear(self.flattened_size, hidden_size)
        self.fc2 = nn.Linear(hidden_size, output_size)

    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))

        # Flatten the output for the fully connected layers
        x = x.view(-1, self.flattened_size)

        x = F.relu(self.fc1(x))
        x = self.fc2(x)
        return x

In [41]:
input_channels = 3  # for RGB images
input_size = 32  # Example size, adjust as needed
output_size = 10  # Example output size

dcign = DCIGN(input_channels, input_size, output_size)

sample_input = torch.randn(1, input_channels, input_size, input_size)

output = dcign(sample_input)

print("Output Tensor:", output)
print("Output Shape:", output.shape)

Output Tensor: tensor([[ 0.0539,  0.0224,  0.2151, -0.2104, -0.0529,  0.0575, -0.0496, -0.0005,
         -0.0895, -0.1289]], grad_fn=<AddmmBackward0>)
Output Shape: torch.Size([1, 10])


# General Adversarial Network (GAN)

Objective:

    -Generate images from random noise through adversarial training
    - Improve generative models' performance by using 2 networks against each other (generator & discriminator) 
Unique Features:

    -Adversarial training: Uses 2 NN's (generator/discriminator) that are trained simultaneously through adversarial processes. 
        -Generator learns to produce increasingly realistic data
        -Discriminator learns to better distinguish between real & generated data
    
    -Feedback loop: Generator is updated based on the feedback from the discriminator, guiding it to product more realistic outputs
    
Common uses:
   
    -Image generation
    -Style transfer
    -Anomoly detection
    -Super resolution

## Generator Class

In [42]:
class Generator(nn.Module):
    def __init__(self, input_size, hidden_size, output_size):
        super(Generator, self).__init__()
        self.fc1 = nn.Linear(input_size, hidden_size)
        self.fc2 = nn.Linear(hidden_size, output_size)

    def forward(self, x):
        x = F.relu(self.fc1(x))
        x = torch.sigmoid(self.fc2(x))
        return x

## Discriminator Class

In [43]:
class Discriminator(nn.Module):
    def __init__(self, input_size, hidden_size, output_size):
        super(Discriminator, self).__init__()
        self.fc1 = nn.Linear(input_size, hidden_size)
        self.fc2 = nn.Linear(hidden_size, output_size)

    def forward(self, x):
        x = F.relu(self.fc1(x))
        x = torch.sigmoid(self.fc2(x))
        return x

## GAN class
Utilizes the generator and discriminator

In [44]:
class GAN(BaseNN):
    def __init__(self, generator, discriminator):
        super(GAN, self).__init__(input_size=None, hidden_size=None, output_size=None)
        self.generator = generator
        self.discriminator = discriminator

    def forward(self, x):
        generated_data = self.generator(x)
        discriminator_output = self.discriminator(generated_data)
        return generated_data, discriminator_output

In [45]:
# Example usage
input_size = 10
hidden_size = 128
output_size = 10

generator = Generator(input_size, hidden_size, output_size)
discriminator = Discriminator(output_size, hidden_size, 1)

gan_model = GAN(generator, discriminator)

sample_noise = torch.randn((1, input_size))

generated_data, discriminator_output = gan_model(sample_noise)

print(gan_model)
print("Generated Data:", generated_data)
print("Discriminator Output:", discriminator_output)

GAN(
  (generator): Generator(
    (fc1): Linear(in_features=10, out_features=128, bias=True)
    (fc2): Linear(in_features=128, out_features=10, bias=True)
  )
  (discriminator): Discriminator(
    (fc1): Linear(in_features=10, out_features=128, bias=True)
    (fc2): Linear(in_features=128, out_features=1, bias=True)
  )
)
Generated Data: tensor([[0.4952, 0.4792, 0.4761, 0.3429, 0.4809, 0.4092, 0.4601, 0.4996, 0.5085,
         0.5300]], grad_fn=<SigmoidBackward0>)
Discriminator Output: tensor([[0.5585]], grad_fn=<SigmoidBackward0>)


# Spiking Neural Network (SNN)


**Objective:**

- Mimic the biological processes of the human brain more closely than traditional artificial neural networks by using neurons that fire in discrete spikes.
- Process information through the timing of these spikes, enabling the network to efficiently handle spatiotemporal data and perform dynamic pattern recognition.

**Unique Features:**

- **Biologically Inspired:** Incorporates models of neurons that generate discrete spikes, a form of communication used in the biological nervous system.
- **Temporal Dynamics:** Capable of capturing and processing temporal information inherent in the input data through the sequence and timing of spikes.
- **Energy Efficiency:** Designed to be inherently more energy-efficient for certain computations, mirroring the energy efficiency seen in the human brain.
- **Learning Through Time:** Utilizes learning mechanisms such as Spike-Timing-Dependent Plasticity (STDP), allowing the network to adapt based on the timing between spikes.

**Common Uses:**

- **Neurobiological Research:** Offers a platform for exploring theories of brain function and the principles underlying neural computation.
- **Sensory Processing:** Applied in processing and interpreting data from sensory inputs, such as visual and auditory systems, in a manner similar to biological systems.
- **Edge Computing:** Ideal for deployment in edge devices due to their low power consumption, where they can perform real-time data analysis.
- **Pattern Recognition:** Utilized in tasks requiring the detection of patterns over time, such as speech recognition or gesture analysis.
- **Robotic Control:** Empowers robots with the ability to process sensory inputs in real-time, leading to more adaptive and responsive behaviors.

In [46]:
class SNN(BaseNN):
    def __init__(self, input_size, hidden_size, output_size):
        super(SNN, self).__init__(input_size, hidden_size, output_size)
        # Define a simple linear layer to simulate neuron connections
        self.linear = nn.Linear(input_size, hidden_size)
        # Spike function could be a Heaviside step function or similar
        self.spike_fn = lambda x: torch.heaviside(x - 0.5, torch.tensor([0.0]))

    def forward(self, x):
        x = self.linear(x)
        x = self.spike_fn(x)  # Simulate spiking behavior
        return x

In [47]:
# Example usage
input_size = 10
hidden_size = 20
output_size = 10  # Output size is not used in this simplified example but included for consistency with the class definition

# Instantiate the SNN model
snn_model = SNN(input_size, hidden_size, output_size)

# Generate a sample input (batch size, input size)
sample_input = torch.randn((1, input_size))

# Forward pass through the SNN
spiked_output = snn_model(sample_input)

print("Sample Input:", sample_input)
print("Spiked Output:", spiked_output)

Sample Input: tensor([[ 0.0568, -0.0442,  0.9503, -0.6641, -0.7917,  0.8993,  0.7960,  1.4113,
         -1.6402, -1.1859]])
Spiked Output: tensor([[0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0.,
         1., 0.]], grad_fn=<NotImplemented>)


Output explanation: The output tensor is of size *hidden size* with values either 0 or 1, representing whether each neuron in the hidden layer fired (1) or did not fire(0) based on the simplified spike function.

This example demonstrates the instantation and baic usage, however it is a highly abstracted version for practical implementation and scope limitations.

# Liquid State Machine (LSM)

Objective:
    
    -Process time-varying inputs
    -Utilize high-dimensional transient states (liquid states) induced by input stimuli for copmutation allowing the network to perform temporal pattern regonition and time-series prediction

Unique Features:
    
    -Utilizes a network of spiking neurons to create a responsive state to input stimuli
    -Specializes in handling sequences and temporal patterns
    -Flexible readout layer interprets the reservoir's state for varied tasks

Common Uses:
    
    -Neurobiological simulations and understanding brain functions
    Speech and gesture recognition for interactive systems
    -Time-series forcasting in finance & weather prediction
    -Robotic sensory processing for adaptive control
    -Biometric authentication through pattern analysis

In [48]:
class LSM(SNN):
    def __init__(self, input_size, reservoir_size, output_size):
        super(LSM, self).__init__(input_size, reservoir_size, output_size)
        # In a true LSM, the reservoir would be more complex and involve dynamic connections.
        # Here, we simulate it with a single RNN layer for simplicity.
        self.reservoir = nn.RNN(input_size, reservoir_size, batch_first=True)
        # The readout layer
        self.readout = nn.Linear(reservoir_size, output_size)

    def forward(self, x):
        # Process input through the simplified 'reservoir'
        reservoir_state, _ = self.reservoir(x)
        
        # Assuming the last state as the representation
        reservoir_state = reservoir_state[:, -1, :]
        output = self.readout(reservoir_state)
        return output

In [49]:
# Example usage
input_size = 10
reservoir_size = 128
output_size = 1

# Instantiate the LSM model
lsm_model = LSM(input_size, reservoir_size, output_size)

# Generate a sample input (batch size, sequence length, input size)
# Let's create a batch of 5 sequences, each of length 7 (time steps) with 10 features
sample_input = torch.randn((5, 7, input_size))

# Forward pass through the LSM
output = lsm_model(sample_input)

print("LSM Model:", lsm_model)
print("Output Shape:", output.shape)
print("Output:", output)

LSM Model: LSM(
  (linear): Linear(in_features=10, out_features=128, bias=True)
  (reservoir): RNN(10, 128, batch_first=True)
  (readout): Linear(in_features=128, out_features=1, bias=True)
)
Output Shape: torch.Size([5, 1])
Output: tensor([[ 0.0739],
        [ 0.2955],
        [-0.0266],
        [-0.0481],
        [ 0.0263]], grad_fn=<AddmmBackward0>)


### Expected Output and Understanding

- **LSM Model:** This print statement will display the structure of the LSM model, including the RNN (reservoir) and the readout linear layer.
- **Output Shape:** Since the readout layer's output size is 1, and you're processing a batch of 5 sequences, the output shape should be `[5, 1]`, indicating that for each sequence in the batch, you get a single output value.
- **Output:** This will show the actual output values from the LSM. These values are generated by processing the synthetic sequential data through the LSM's reservoir and readout layer.

This example is a straightforward demonstration meant to illustrate how you might set up and use an LSM model with PyTorch for sequence processing tasks. The synthetic data doesn't represent a specific real-world problem, but in practice, you could adapt this setup to work on tasks like time-series forecasting, sequence classification, or any problem where understanding temporal dynamics is crucial.

# Extreme Learning Machine

**High-Level Overview**

Extreme Learning Machines (ELMs) represent an innovative class of single-hidden layer feedforward neural networks (SLFNs) that streamline the learning process by randomly assigning input weights and biases, focusing instead on analytically determining the output weights. This unique approach reduces training complexity and time, making ELMs particularly suitable for rapid prototyping and handling large or noisy datasets efficiently.

**Data Type**

ELMs are versatile, capable of processing:
- Numerical
- Time-series
- Images
- Continuous data

This adaptability makes them applicable across a broad spectrum of data-intensive fields.

**Task Objective**

ELMs excel in:
- Classification
- Regression
- Feature Learning

Their fast learning speed and high efficiency position them as a powerful tool for both predictive modeling and data representation tasks.

**Scalability**

ELMs demonstrate remarkable scalability, efficiently managing large datasets and complex models with adjustable hidden nodes. This attribute is pivotal for applications in big data, where the volume and dimensionality of data can significantly impact computational performance.

**Robustness to Noise**

One of the standout features of ELMs is their robustness to noise, making them exceptionally reliable in real-world scenarios where data quality may vary. This robustness ensures that ELMs maintain high performance even when data is imperfect or incomplete.

**Implementation Variants**

Several variants of ELMs have been developed to cater to specific needs, including:
- **Kernel ELM (KELM):** Offers enhanced capabilities for non-linear problem solving.
- **Online Sequential ELM (OSELM):** Ideal for dynamic environments where data is available in sequences or streams.

**Practical Application Guidance**

**When to Use ELMs:**
- Rapid model development is required.
- Dealing with large or noisy datasets.
- The task involves linear or non-linear problems where quick training is beneficial.

**Considerations:**
- While ELMs offer significant advantages in terms of speed and simplicity, they may not be the best fit for tasks requiring deep interpretability of model decisions. 
- For highly unstructured data, such as raw text or images that necessitate deep learning techniques, exploring other neural network models might yield better results.

### Conclusion

Extreme Learning Machines offer a unique combination of speed, efficiency, and versatility, making them a valuable addition to the neural network toolkit. By understanding their capabilities, implementation variants, and practical applications, researchers and practitioners can effectively leverage ELMs to address a wide range of challenges in data analysis and predictive modeling.