**The wake-sleep algorithm for unsupervised neural networks** <br>
<br>
Geoffrey E Hinton, Peter Dayan, Brendan J Frey, Radford M Neal![alt text](https://)

**Introduction** <br>
As Artificial Neural Networks are based on simplified neuronal structures in the brain often not only the lower level idea of information processing via neurons is strongly influenced by natural brain processes but also the higher level model structure. In April 1995 the wake-sleep algorithm for unsupervised neural networks was published. Its main idea is inspired by the natural wake and sleep periods with respect to their impact on learning. However, what makes this algorithm especially interesting is the fact that it is based on unsupervised learning whereas most neuronal networks are dealing with supervised learning tasks.
<br>
<br>
In the following we want to rebuild and explain the different ideas behind the algorithm and evaluate its performance via the common MNIST dataset. In order to do so we start with the smallest unit, the neurons and end with the overall network structure. 

<font color='red'>TODO: Introduction</font> <br>
**General Idea**

- concrete inspiration (wake - sleep)
- no teacher needed, instead the description length of the input is compared to description length of hidden representation
- Bottom - up: Create hidden representation
- Top - down: Recreate representation from hidden representation and compare to input

**Binary Stochastic Neurons**<br>

Eventhough the explicit type of the stochastic neurons is not decisive for the network structure for simplification binary stochastic neurons are used in this implementation. Those units have states of either 0 or 1 depending on the previously calculated probability:

(1)$$Prob(s_v = 1) = \frac{1}{1 + exp(-b_v - \sum_u s_u w_{uv})}$$ 

In the implementation each neuron contains information about its input weights, its bias, the probability and its state (either 0 or 1). As the amount of weights depends on the size of the input this variable is unavoidable for the initialization. <br>
<font color='red'>TODO: Initialization + comments about initialization</font>

In [0]:
class Stochastic_Neuron:
    def __init__(self, input_size):      
        self.input_weights = np.random.normal(loc=0,scale=0.5,size=input_size)
        self.bias = 0 
        self.state = 0      
        self.prob = 0 

Formula (1) is the activation function for each neuron and results in a probability.

In [0]:
    def activation_probability(self, inputs):
        self.prob = tf.math.sigmoid(- self.bias - np.sum(self.input_weights * inputs))
        return self.prob

However, as the neuron is binary, this probability still has to be converted into 0 or 1. This is achived via the binomial distribution.

In [0]:
    def __call__(self, inputs):
        self.state = np.random.binomial(1, self.activation_probability(inputs))
        return self.state

<font color='red'>Include examples?</font>

**Layer**
<br>
<br>
For the next level of the network the neurons are combined in layers as usual. Therefore each layers consist of a list of neurons each initialized with the input size. 

In [0]:
def layer(input_size, layer_size):
    return [Stochastic_Neuron(input_size) for i in range(layer_size)]

Up to this point the network modules are relatively similar to those of the standard ANN. In the network class they are put together according to the idea of wake and sleep phases:

1. Build up the whole network consisting of multiple layers with different amounts of neurons respectively. The input size of each layer depends on the layer_size of the previous layer or the input size in the case of the first layer, whereas the layer sizes are defined in a list.

In [0]:
class Network:

    def __init__(self, layer_sizes):
        self.layers = [layer(layer_sizes[idx], layer_sizes[idx+1]) for idx in range(len(layer_sizes)-1)]

2. As the information flow stays the same in both the recognition and the generative model, it is sufficient to work on one Network class. Starting with the input, the information is travelling via the neurons through the whole network. As we need the states of all neurons for learning, they are stored in a list. Eventhough the calculations are based on the binary output also the probabolities are stored for the loss calculation.



In [0]:
    def infer(self, inputs):
        states = [inputs]
        activations = [] # only for loss!
        for layer in self.layers:
            states.append([neuron(states[-1]) for neuron in layer])
            activations.append([neuron.prob for neuron in layer]) # only for loss!

        self.network_probs = activations[:-1] 

     
        return states

3. The learning function is the main part of the wake-sleep algorithm. The updates are not based on the information of a single model, but rather depend on the states of the previous model and the probabilities of the current one. With the update formula (2) for the generative weights:  $$ \Delta w_{kj} = \epsilon s^\alpha_k(s^\alpha_j - q^\alpha_j)$$
<br>
and the update formula (3) for the recognition weights:
<br>
<br>
$$ \Delta w_{jk} = \epsilon s^\gamma_j(s^\gamma_k - q^\gamma_k)$$
<br>
the network is updated. The learning rate $\epsilon$ stays the same in both phases. The states $s$ are infered from the network that is not updated at the moment where the first states are the prior output states and the second states refer to the states of the current layer. The probability $p$ / $q$ however belongs to the currently updating network.

In [0]:
    def learning(self, other_network_states, epsilon):
        for idx, layer in enumerate(self.layers):

            layer_states = other_network_states[idx+1] # in der klammer 
            prior_states = other_network_states[idx] # auserhalb der klammer

            for idx, neuron in enumerate(layer):
                neuron.input_weights += np.array([epsilon * prior * (layer_states[idx] - neuron.prob) for prior in prior_states]) 

**Loss**

In [0]:
# calculate the loss of the network
def calculate_loss(rec_probs, gen_probs):
  loss = 0
  for l_rec, l_gen in zip(rec_probs, gen_probs[::-1]):
        l_rec = np.array(l_rec)
        l_gen = np.array(l_gen)
        loss += np.sum(l_gen * np.log(l_gen / l_rec) + (1 - l_gen) * np.log((1 - l_gen)/(1 - l_rec)))
  return loss