# Manual weight extraction and NN calculations
Weights in a NN are randomly initialized; as such, if a model fails to train, sometimes reinitalizing the weights and running the training steps again can remedy the issue.

This is ubiquitous, but ineffective as an initialization. There has been much effort invested in determining how to optimally initialize weights, but an effective solution is to use Xavier-weight-algorithm ([Glorot & Bengio, 2006](http://proceedings.mlr.press/v9/glorot10a/glorot10a.pdf)). This algorithm still uses a PRNG, however uses a gaussian distribution over uniform centered on 0, with standard deviation related to the number of connections present in the current layer, such that
$$
\sigma_i = \sqrt{\frac{2}{n_{\text{in}, i} + n_{\text{out}, i}}}
$$
gives the standard deviation for the $i^\text{th}$ layer. Note, in this context *layer* refers to the *layer of weights*, not of neurons: there are $N-1$ weight layers for the typical $N$ layered NN.

TensorFlow has [keyword arguments](https://keras.io/api/layers/initializers/) for specifying initializers. Available algorithms are in `tf.keras.initializers`:

- `RandomNormal(mean=0.0, stddev=0.05, seed=None)`
Normal distribution.
- `RandomUniform(mean=0.0, stddev=0.05, seed=None)`
Uniform distribution.
- `TruncatedNormal(mean=0.0, stddev=0.05, seed=None)`
Essentially the same as a normal distribution, however values more than $2 \sigma$ from the mean are discarded and redrawn.
- `Zeros()`
All zero.
- `Ones()`
All ones.
- `GlorotNormal(seed=None)`
Xavier normal.
- `GlorotUniform(seed=None)`
Xavier uniform.
- `Identity(gain=1.0)`
Generates 2d identity matrix.
- `Orthogonal(gain=1.0)`
Orthogonal matrix.
- `Constant(value=0)`
Constant value.

You can also create custom initializers with functions or classes (see the above link for documentation).


These are given to a keras layer, e.g.
```py
initializer = tf.keras.initializers.RandomNormal()
layer = tf.keras.layers.Dense(2, kernel_initializer=initializer)
```
Keras layers also have a `bias_initializer` keyword for using such an algorithm on the bias neurons.

## Manual weigth calculations
For illustrative purposes, we will train a neural network on a logic gate. This is a bit like cracking a wallnut with a sledgehammer.

In [1]:
import tensorflow as tf
import numpy as np

In [2]:
# XOR
x = np.array([
    [0,0], [1,0], [0,1], [1,1]
])

y = np.array(
    [0, 1, 1, 0]
)

In [3]:
model = tf.keras.models.Sequential([
    tf.keras.layers.Dense(
        2, 
        input_dim=x.shape[1],
        activation='relu'
    ),
    tf.keras.layers.Dense(1) # output layer  
])

model.compile(loss='mean_squared_error', optimizer='adam')

In [4]:
model.fit(x, y, verbose=0, epochs=10000)

<tensorflow.python.keras.callbacks.History at 0x14c619f10>

In [5]:
predictions = model.predict(x)
print(predictions)

[[8.6809308e-08]
 [9.9999988e-01]
 [9.9999994e-01]
 [2.9486669e-08]]


*Note:* I had to dump the first trained version of the above model, ran the exact same code again, and optained a much better model -- the hazards of initialization are real.

### Dumping weights
We will now use `.layers.get_weights()` to extract the weights of each layer in the NN:

In [6]:
extracted_weights = []
extracted_biases = []

for i, layer in enumerate(model.layers):
    weights = layer.get_weights()[0]
    biases = layer.get_weights()[1]
    print(f"# -- LAYER {i} ---------------- #")
    
    print(" - biases")
    # j is the jth neuron in the ith layer
    for j, bias in enumerate(biases):
        print(f"{i}B   \t->\tL{i+1}N{j}: {bias:.3f}")
        extracted_biases.append(bias)
    
    print(" - weights")    
    for j_from, w1 in enumerate(weights):
        for j_to, w2 in enumerate(w1):
            print(f"L{i}N{j_from} \t->\tL{i+1}N{j_to}: {w2:.3f}")
            extracted_weights.append(w2)
    print("")

# -- LAYER 0 ---------------- #
 - biases
0B   	->	L1N0: 0.000
0B   	->	L1N1: -0.000
 - weights
L0N0 	->	L1N0: 1.276
L0N0 	->	L1N1: -1.023
L0N1 	->	L1N0: -1.276
L0N1 	->	L1N1: 1.023

# -- LAYER 1 ---------------- #
 - biases
1B   	->	L2N0: 0.000
 - weights
L1N0 	->	L2N0: 0.784
L1N1 	->	L2N0: 0.977



We will now use the extracted weights to manually calculate the output of the NN. The algorithm for this is
$$
S_k = \left( \sum_i n_i \cdot w_i \right) + b_k
$$
for the $k^\text{th}$ neuron sum, followed by ReLU activation
$$
m_i = \text{max}(0, S_i)
$$
giving the values of the $i^\text{th}$ neuron in the next layer.

In [7]:
wgt = extracted_weights # alias to make typing easier
bis = extracted_biases

def manual_predict(x):
    global wgt, bis
    i_0, i_1 = x
    
    h_0_sum = (i_0 * wgt[0]) + (i_1 * wgt[1]) + (bis[0])
    h_1_sum = (i_0 * wgt[2]) + (i_1 * wgt[3]) + (bis[1])
    
    print(f"hidden sum 0 = {h_0_sum:.3f}")
    print(f"hidden sum 1 = {h_1_sum:.3f}")
    
    # relu activation
    h_0 = max(0, h_0_sum)
    h_1 = max(0, h_1_sum)
    
    print(f"hidden 0 = {h_0:.3f}")
    print(f"hidden 1 = {h_1:.3f}")
    
    out_sum = (h_0 * wgt[4]) + (h_1 * wgt[5]) + (bis[2])
    
    print(f"output sum = {out_sum:.3f}")
    
    # relu activation
    out_sum = max(0, out_sum)
    
    print(f"-- ouput = {out_sum:.3f}")

manual_predict(x[1])

hidden sum 0 = 1.276
hidden sum 1 = -1.276
hidden 0 = 1.276
hidden 1 = 0.000
output sum = 1.000
-- ouput = 1.000
