<a href="https://colab.research.google.com/github/GerardoMunoz/ML_2025/blob/main/Boltzmann_Machine.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Boltzmann Machine: A Stochastic Hopfield Network**

## **Introduction**
A Boltzmann Machine is a type of stochastic recurrent neural network similar to a Hopfield Network, but with two major differences:

1. **Hidden Neurons** – Unlike Hopfield Networks, which consist only of visible (output) neurons, BMs introduce hidden neurons to model complex data distributions.  
2. **Learning Algorithm** – Instead of using a deterministic weight update rule like Hopfield Networks, BMs learn using **stochastic gradient descent (SGD) and Gibbs sampling**.  



## **Network Structure**

- **Visible** neurons ($ V_i $) representing observed data.
- **Hidden** neurons ($ H_i $) capturing latent features.

The neurons are fully connected (except for self-connections). The connections have weights that determine the probability of states.

## **Energy Function**
The energy of a given state **\( s \)** in a Boltzmann Machine is defined as:  

$$
E(s) = -\sum_{i<j} W_{ij} s_i s_j - \sum_i b_i s_i
$$

where:
- $ W_{ij} $ are the weights between neurons,
- $ s_i $ is the state (0 or 1) of neuron $ i $,
- $ b_i $ is the bias term.

## **Training via Gibbs Sampling**
- Initialize neurons with random states.
- Compute probability of flipping each neuron using the **sigmoid function**:

  $$
  P(s_i = 1) = \frac{1}{1 + e^{- (\sum_j W_{ij} s_j + b_i )}}
  $$

- Repeat until convergence.

## **Probabilistic Expectation in BM Weight Updates**  
In a **Boltzmann Machine (BM)**, weights are updated using the expectation over the probability distribution of the states:

$$
\Delta W_{ij} = \eta \left( \langle s_i s_j \rangle_{\text{data}} - \langle s_i s_j \rangle_{\text{model}} \right)
$$

where:  
- $ \langle s_i s_j \rangle_{\text{data}} $ is the expectation **under the data distribution**.
- $ \langle s_i s_j \rangle_{\text{model}} $ is the expectation **under the model's learned distribution**.

Instead of directly using $ s_i s_j $, we compute expectations over **probabilistic activations**:

$$
\langle s_i s_j \rangle_{\text{data}} = \sum_{s} P_{\text{data}}(s) \cdot s_i s_j
$$

where $ P_{\text{data}}(s) $ represents the probability of a particular state $ s $ occurring in the data.  
The model tries to match these expectations using **Gibbs Sampling** over all possible states.


---



# **Restricted Boltzmann Machine (RBM): A Simplified Boltzmann Machine**

## **RBM vs. Boltzmann Machine**
RBMs are a special type of Boltzmann Machine with one key restriction:  
 **No intra-layer connections** – meaning there are **no connections between hidden neurons** and **no connections between visible neurons**.  

This restriction makes training **much more efficient** compared to general BMs.


---

## **RBM Structure & Energy Function**
### **Network Structure**
- **Visible Layer** ($ V $): Represents input data (e.g., pixels in an image).
- **Hidden Layer** ($ H $): Captures latent features.
- **Weights** ($ W $): Connect visible and hidden layers, but **no connections within each layer**.

### **Energy Function**  
(Same idea as BMs but simplified due to no intra-layer connections):

$$
E(V, H) = - \sum_{i,j} V_i W_{ij} H_j - \sum_i b_i V_i - \sum_j c_j H_j
$$

where:
- $ W_{ij} $ is the weight between visible neuron $ V_i $ and hidden neuron $ H_j $,
- $ b_i $ and $ c_j $ are biases for visible and hidden layers.

### **Training: Contrastive Divergence (CD)**
Instead of full **Gibbs Sampling**, RBMs use **Contrastive Divergence (CD)** for faster training:
1. **Forward Pass:** Compute hidden activations given the visible layer.
2. **Reconstruct Visible Layer:** Use hidden activations to generate new visible layer values.
3. **Update Weights:** Compute weight differences between the original and reconstructed states.

---

## **Probabilistic Activation in RBM Training**  
In an **RBM**, the bipartite structure makes the conditional probabilities **independent**, simplifying the weight updates. The update rule is:

$$
\Delta W_{ij} = \eta \left( P(H_j = 1 | V) V_i - P(H_j = 1 | V') V'_i \right)
$$

where:
- $ P(H_j = 1 | V) $ is the probability of hidden unit \( H_j \) being active, given visible units:
  
  $$
  P(H_j = 1 | V) = \sigma \left( \sum_i V_i W_{ij} + c_j \right)
  $$

- $ P(V_i = 1 | H) $ is the probability of visible unit \( V_i \) being active, given hidden units:
  
  $$
  P(V_i = 1 | H) = \sigma \left( \sum_j H_j W_{ij} + b_i \right)
  $$

Here, the weight update **does not use raw activations** but rather the **expected probability** of activation under the data and model distributions.


