<a target="_blank" href="https://colab.research.google.com/github/davalpez/MyKaggle/blob/MyKaggle/Tensorflow%20basics/TF-architecture.ipynb">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>


# In depth architecture and structure for basic Neural Networks

In other notebooks of this section, we have taken a look at how neural networks help us with different regression and classification problems. I would like to go a bit in depth to see exactly in this structure we have built, to understand better how the networks evolve when we train it and comprehend better what goes behind each step of the training.

Recalling from our TF-regression notebook, the structure of a basic NN could look like this :

![image.png](attachment:image.png)

As we can see, each layer has a certain amount of neurons that is fully interconnected with the neurons from previous and posterior layers.

When in our notebooks we store a model we have created, we are saving this information about out model:

* The weight values
* The model's architecture
* The model's training configuration (what you pass to the .compile() method)
* The optimizer and its state, if any (this enables you to restart training where you left off)

Let's go through, one by one, the information we save to understand better the work we are saving here.



### Model's architecture.

It is not the first time we plot our model. One part of information we save into the keras file is the number of layers, output shape of each layer and the number of parameters on each layer.

Let's start first for bringing back some example that we have worked with. I would bring back the last model saved from regression, but to see better how the propagation works, we will recreate some data and create a model with just 3 layers and a few neurons to see how the process goes.

In [1]:
# Recreate some data from TF-regression for our experiment.
import numpy as np

def OurLinearCorrelation(x):
    '''Function to create linear correlation on a dependant variable
    '''
    return ((2*x)+3)

x_large = np.arange(-100,50,2)
y_large = OurLinearCorrelation(x_large)

We create batches for training and testing the network.

In [2]:
y_test = y_large[:16]
x_test = x_large[:16]
y_train = y_large[16:]
x_train = x_large[16:]

Let's have a look to what the data looks like. As we see large values, it is recommended to normalize these. We will try to first make the neural network work as we do in tensorflow, but keep in mind that networks work better with smaller values, so in the future we will always try to make these normalized.

In [3]:
x_train.max(),x_test.max(),y_train.max(),y_test.max()

(48, -70, 99, -137)

In [4]:
x_train.min(),x_test.min(),y_train.min(),y_test.min(),

(-68, -100, -133, -197)

Let's now build a neural network as we have done before for regression with tensorflow, to see the architecture we aim to follow and the results we expect to have. First, we import the tensorflow library.

In [5]:
import tensorflow as tf
print(tf.__version__)

2025-01-27 13:11:04.381790: I external/local_tsl/tsl/cuda/cudart_stub.cc:32] Could not find cuda drivers on your machine, GPU will not be used.
2025-01-27 13:11:04.417894: I external/local_tsl/tsl/cuda/cudart_stub.cc:32] Could not find cuda drivers on your machine, GPU will not be used.
2025-01-27 13:11:04.496568: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:479] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2025-01-27 13:11:04.582976: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:10575] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2025-01-27 13:11:04.583191: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1442] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2025-01-27 13:11:04.670658: I tensorflow/core/platform/cpu_feature_guard.cc:

2.16.2


For this excercise, I want to show how the weight and biases change overtime. This is possible to do on Tensorflow with callbacks. If we need to obtain any other information in Tensorflow while debuggin a new network, this is a good way to verify what is happening inside our network.

In [6]:
class PrintWeightsCallback(tf.keras.callbacks.Callback):
    def on_train_batch_end(self, batch, logs=None):
        print(f"Batch {batch + 1}:")
        for layer in self.model.layers:
            weights = layer.get_weights()  # Get weights and biases of the layer
            if weights: 
                print(f"Weights of {layer.name}: {weights[0]}") 
                print(f"Biases of {layer.name}: {weights[1]}") 
                print(f"output of layer of {layer.name}: {layer.output}") 


Now, we create a simple neural network as we have done before. See how this time, we are using this callback function we created in the class PrintWeightCallback to see how the weights evolve overtime in our network.

In [7]:
# Simple NN for regression

# Set random seed for weight initialization.
tf.random.set_seed(10)

# 1. Create our model

model_1 = tf.keras.Sequential([
  tf.keras.layers.Dense(2),
  tf.keras.layers.Dense(2),
  tf.keras.layers.Dense(1)
])

# 2. Compile the model

model_1.compile(loss=tf.keras.losses.mae,
              optimizer=tf.keras.optimizers.SGD(),
              metrics=["mae"])

# 3. Fit the model

model_1.fit(tf.expand_dims(x_train, axis=-1), y_train, epochs=5,verbose=1,callbacks=[PrintWeightsCallback()])

Epoch 1/5


2025-01-27 13:11:09.432164: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:984] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2025-01-27 13:11:09.432724: W tensorflow/core/common_runtime/gpu/gpu_device.cc:2251] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...


Batch 1:
Weights of dense: [[ 0.5336266 -1.4189976]]
Biases of dense: [-0.00139105  0.00149671]
output of layer of dense: <KerasTensor shape=(None, 2), dtype=float32, sparse=False, name=keras_tensor_1>
Weights of dense_1: [[-0.8687154  -0.73066896]
 [ 1.1269513   1.3052276 ]]
Biases of dense_1: [0.00084217 0.00074181]
output of layer of dense_1: <KerasTensor shape=(None, 2), dtype=float32, sparse=False, name=keras_tensor_2>
Weights of dense_2: [[-1.4687964]
 [-1.3610137]]
Biases of dense_2: [-0.000625]
output of layer of dense_2: <KerasTensor shape=(None, 1), dtype=float32, sparse=False, name=keras_tensor_3>
[1m1/2[0m [32m━━━━━━━━━━[0m[37m━━━━━━━━━━[0m [1m0s[0m 602ms/step - loss: 27.0496 - mae: 27.0496Batch 2:
Weights of dense: [[-0.10713542 -0.45049715]]
Biases of dense: [ 0.00449521 -0.00740027]
output of layer of dense: <KerasTensor shape=(None, 2), dtype=float32, sparse=False, name=keras_tensor_1>
Weights of dense_1: [[-0.6475075  -0.52569366]
 [ 0.5387329   0.7601736 ]]
Bi

<keras.src.callbacks.history.History at 0x7fc7c25f5d30>

And this are the layers and the parameters we have in our simple neural network.

In [8]:
# Plot the shapes of the network
model_1.summary()

These parameters shown in our summary corresond to the weights and biases from our model.

* In the first dense layer, we have 2 weights + 2 biases.

* In the second layer, we have 2 weights from the previous layer multiplied by the 2 weights from that second layer, plus 2 biases from the second layer, making it 6 parameters.

* In the third layer, it is 2 weights from the previous layer multiplied by only 1 weight from that output layer, plus one bias from that output layer, making it 3 parameters.

Optimizer parameters are parameters created for the optimizer. In this case, we are using the [SGD optimizer](https://www.tensorflow.org/api_docs/python/tf/keras/optimizers/SGD). These two parameters correspond to `learning rate` and the other one for a `step counter` maintained by the optimizer.

### Weight values with Forward and Back Propagation

As we explained in "the basics" notebook, each layer is formed by neurons. Each of these neurons are going to have weights, set as random values. We set the random seed at the beginning to ensure that these weights are still randomized but makes it easier for us to reproduce our experiments.

In [9]:
tf.random.set_seed(10)  # Set seed for reproducibility
initializer = tf.keras.initializers.GlorotUniform()
weights_1 = initializer(shape=(3, 3))

tf.random.set_seed(10)  # Reset seed
weights_2 = initializer(shape=(3, 3))

print("Random seed weights:")
print(weights_1)
print(weights_2)

Random seed weights:
tf.Tensor(
[[ 0.561846   -0.2693448  -0.0558188 ]
 [ 0.61675906 -0.9262366   0.6609192 ]
 [ 0.42180896 -0.9863157   0.13641763]], shape=(3, 3), dtype=float32)
tf.Tensor(
[[ 0.561846   -0.2693448  -0.0558188 ]
 [ 0.61675906 -0.9262366   0.6609192 ]
 [ 0.42180896 -0.9863157   0.13641763]], shape=(3, 3), dtype=float32)


These neurons output consist on the formula:

$z=W⋅x+b$
Where :

* $W$: Weight vector for the neuron.
* $x$: Input vector from the previous layer ( $x=a^{prev}  f(z)$).
* $b$: Bias for the neuron.
* $z$: Weighted sum before applying activation.

Our Z is passed through our activation function. If we don't specify anything when creating a layer, like in the some basic models we have built in our TF-regression notebook, we would have a passthrough function ( or also known as [linear function](https://www.tensorflow.org/api_docs/python/tf/keras/activations/linear)). In our multiclassification network, We have set [RelU](https://www.tensorflow.org/api_docs/python/tf/keras/layers/ReLU).

$a=f(z)$

So, when we calculate the first predictions ( these probability matrix that will end up forming the probability array in the output layer), our NN is only computing this function of $z$ and passing values. And this is called the `Forward Pass` .

Once we have a prediction, the loss is calculated depending the function we are using ( mse or mae are some possible uses). Once we have the loss value, we continue doing the `Backwards pass` or Backproagation. 

In this step, the weights and the biases are adjusted after calculating the gradiant descends following the formula:


$W ← w - η \small\frac{\partial L}{\partial W} $

​Where:
* W is the updated weight.
* w is the previous weight.
* η is the learning rate.
 
* $ \small\frac{\partial L}{\partial W}$
is the loss gradient with respect to the weight.


Similarly, for our biases would be 

$B ← b - η \small\frac{\partial L}{\partial b} $

Let's have a glimpse of how our array of weights and biases look in the model we created in this notebook.

In [10]:
# Create some list to store the arrays of weights and biases from each layer
weights_list=[]
biases_list=[]
for layer in model_1.layers:
    weights, biases = layer.get_weights()  # get_weights() returns [weights, biases]
    weights_list.append(weights)
    biases_list.append(biases)
print("All biases : ",biases_list)
print("All weights : ",weights_list)

All biases :  [array([ 0.00366361, -0.00774363], dtype=float32), array([-0.00071091, -0.00122146], dtype=float32), array([-0.00282407], dtype=float32)]
All weights :  [array([[ 0.33067733, -0.7626276 ]], dtype=float32), array([[-0.45214713, -0.32906038],
       [ 0.55247456,  0.7017479 ]], dtype=float32), array([[-0.713707 ],
       [-0.6563617]], dtype=float32)]


### Imitating forward propagation

Let's try reproducing this ourselves to see what goes on in this stage of the training. We will take just a piece of the data an emulate how would this data go when doing  the `forward pass` stage.

For making it a bit more clear what happens with the activation function in other models as in multiclassification model, we will use [RelU](https://www.tensorflow.org/api_docs/python/tf/keras/activations/relu) as the our activation for our layers.

Let's first create a function to simulate the weight creation depending on the input.

In [11]:
def LayerCreation(input_size=1,neurons=2):
    '''
    
    Creates weight and biases values as we would have in a dense layer
    in tensorflow.

    Args:
    input_size : size of the expected input.
    neurons : amount of neurons in the layer.

    return:
    W_tensor : tensor with weights that compose the layer.
    b_tensor : tensor with the biases that compose the layer.
    '''
    initializer = tf.keras.initializers.Zeros()
    random_tensor_w1 = tf.random.Generator.from_seed(10) 
    W_tensor =  random_tensor_w1.normal(shape=(input_size,neurons))
    random_tensor_b1 = tf.random.Generator.from_seed(11) 
    b_tensor =  initializer(shape=(neurons,))
    return W_tensor,b_tensor

LayerCreation(1,2)

(<tf.Tensor: shape=(1, 2), dtype=float32, numpy=array([[-0.29604465, -0.21134205]], dtype=float32)>,
 <tf.Tensor: shape=(2,), dtype=float32, numpy=array([0., 0.], dtype=float32)>)

The tensorflow library already helps us with converting the inputs to tensors and to the dtypes we already need, but to show here step by step the evolution of the weights, we will also create a function to treat the input.

In [12]:
def PrepareInput(input):
    '''
    This function prepares the input array so it is ready to be introduced
    to the input layer we are creating as a simulation of a neural network.
    
    Args:
    input : input array we will feed to our first layer.

    return:
    prepared_input: tensor that is prepared to be introduced in the first layer.
    '''
    expanded_input = (tf.expand_dims(input, axis=-1))
    prepared_input = tf.cast(expanded_input, dtype=tf.float32)
    return prepared_input

Now, let's create what would be the 3 layer model for forward propagation but without recurring to the tensorflow models, to see closely how the weights and biases interact in the model with the input data.

In [13]:
# Input data
input_example = x_train[50:]
prepared_input = PrepareInput(input_example)

# Layer 1

W1tf,b1tf = LayerCreation(1,2)

# Layer 2

W2tf,b2tf = LayerCreation(2,2)

# Layer 3

W3tf,b3tf = LayerCreation(2,1)


In [14]:
print("W1tf :",W1tf)
print("b1tf :",b1tf)

W1tf : tf.Tensor([[-0.29604465 -0.21134205]], shape=(1, 2), dtype=float32)
b1tf : tf.Tensor([0. 0.], shape=(2,), dtype=float32)


Now that we have prepared the input data, and we have our layers defined with their own weights and biases, let's try to do the forward propagation for the first epoch.

In [15]:
# Forward propagation stage

def linear_activation(x):
    '''Linear activation that returns the same value.
    '''
    return (x)

def relu_activation(x):
    '''Relu activation that returns the element-wise maximum for 0 and the input value.
    '''
    return np.maximum(0,x)

def relu_derivative(x):
    '''Relu derivative that returns 1 if the input is greater than 0 or returns 0 otherwise.
    '''
    return np.where(x > 0, 1, 0)

def linear_derivative(x):
    '''Linear derivative returns one, as the derivative of x is equal to 1 always."
    '''
    return np.ones_like(x)

def forward_propagation(input_value,W,b,activation_function=linear_activation):
    ''' Calculates the activation values and predictions forwared through the neural network.
        Args:
        input_value : Initial input or output value from the previous layer.
        w : weight values of the layer's neurons.
        b : bias values of the layer's neurons.
        Return:
        Output value from the layer.
    '''
    Z = (input_value @ W) + b 
    A = activation_function(Z)
    return A


A_layer1 = forward_propagation((prepared_input), W1tf,b1tf)

A_layer2 = forward_propagation(A_layer1,W2tf,b2tf)

Prediction = forward_propagation(A_layer2,W3tf,b3tf)

# Note : You can use variable.shape to check that matrix sizes
# matches if you are not sure about how the activation or weights
# interact :)

The first epoch here would mean that for the array of X values we passed, all the Y values corresponding to those X values would be 0.19. Of course, this is the first pass and it is very far from what we expect, and that is why now we have to proceed with the backwards propagation.

In [16]:
def calculate_mae_error(prediction,true_value):
      ''' 
      This function calculates the mean average error from the predictions we made 
      and the expected value.
      Args:
      prediction: prediction value obtained in the forward pass.
      true_value: actual value from the input we used.
      Return:
      mean_errors: mean average error.
      '''
      m = true_value.shape[0]
      loss = tf.abs(true_value - prediction)
      mean_loss = tf.reduce_sum(loss) / m

      return mean_loss


We need to prepare the shape for the y_train now. We use PrepareInput function we created before to have the (9,1) shape needed for doing later the backward propagation.

In [17]:
output_example = y_train[50:]
true_value = PrepareInput(output_example)
true_value

<tf.Tensor: shape=(9, 1), dtype=float32, numpy=
array([[67.],
       [71.],
       [75.],
       [79.],
       [83.],
       [87.],
       [91.],
       [95.],
       [99.]], dtype=float32)>

In [18]:
mae = calculate_mae_error(Prediction,true_value)
mae

<tf.Tensor: shape=(), dtype=float32, numpy=81.83067>

We see we have a big error. This is completely normal, as for the first time we run the  `forward propagation`we dont have any information in our weights to know what the prediction should look like.

### Imitating backward propagation

Now that we have calculated the error, we can proceed with the `Backwards pass`. Bringing back the mathematical formula, this is how we update the weights for our dense layers


$W ← w - η \small\frac{\partial L}{\partial W}$

​Where:
* W is the updated weight.
* w is the previous weight.
* η is the learning rate.
* $ \small\frac{\partial L}{\partial W}$ is the loss gradient with respect to the weight.

We need to clarify now what is exactly the loss gradient. The loss gradient is obtained by multiplying the computed loss by the activation output from the previous layer transposed.

$ \small\frac{\partial L}{\partial W_n} = dZ_n ⋅ A^T_{n-1} $
​

In the third layer, the computed gradient is the mean averague value error between the true value and the prediction. 

$ dZ_3=f'( y_{pred} -y_{true} ) $


And how do we calculate the local gradient from the current layer ? we do it following this expression:

$  dZ_n =(dZ_{n+1}⋅W^T_{n+1})⋅f′(Z_n) $

​Where:
* $dZ_{n+1}$ is the gradient loss with the respect the output of the posterior layer.

* $W^T_{n+1}$ is the transposed weight matrix from the third layer.

* $f′(Z_n)$ is the derivative of the activation function from the output obtained from the current layer.

It is a bit cumbersome but I hope it makes sense, and now with the code written below it might be easier to see.

We specify the learning rate, and we choose a default η= 0.01. We only need to calculate the gradient loss for each layer and apply the derivative function to the output obtained from each layer in order to update our weights.

In [19]:
def ObtainLocalGradient(dZ_post,Z,post_layer_weight,derivative_function=linear_derivative):
    '''Function to calculate the local gradient for the first and second layer.

    Args:
    dZ_post : loss gradient from the layer that goes after the current layer.
    Z : result from the activation function we had in the forward propagation
        from this layer before.
    post_layer_weight: weights from the posterior layer.

    return:
    dZ : gradient of the loss function with respect to the current laayer.
    '''

    dA = tf.matmul(dZ_post,post_layer_weight,transpose_b=True)
    f_Z = derivative_function(Z)
    dZ = dA * f_Z

    return dZ

def UpdatedWeightValue(dZ,Aminus,weight,n=0.0001):
    '''Function used to calculate the updated value of the weights.

    Args:
    dZ: loss gradient
    n: learning rate
    x: weight 

    return:
    update: updated weights.
    '''


    dW= tf.matmul(Aminus,dZ,transpose_a=True)/ tf.cast(tf.shape(Aminus)[0], tf.float32) 
    weight -= n* dW

    return weight

def UpdatedBiasValue(dZ,bias,n=0.0001):
    '''Function used to calculate the updated value of the biases.
    
    Args:
    dZ: loss gradient
    n: learning rate
    x: weight or bias

    return:
    update: updated biases.
    '''

    db = tf.reduce_sum(dZ, axis=0, keepdims=True)
    bias -= n* db
    
    return bias


For  $dZ_{3}$, we have that the local gradient is  $f′(Z_3)$.
$(Z_3 = mae)$, and because the derivative of the linear function returns. 

In [20]:
dZ3= Prediction - true_value
dZ3

<tf.Tensor: shape=(9, 1), dtype=float32, numpy=
array([[-66.064545],
       [-70.00608 ],
       [-73.94761 ],
       [-77.889145],
       [-81.83068 ],
       [-85.77222 ],
       [-89.713745],
       [-93.65528 ],
       [-97.59682 ]], dtype=float32)>

In [21]:
A_layer2

<tf.Tensor: shape=(9, 2), dtype=float32, numpy=
array([[  2.7326677,  -8.254143 ],
       [  2.9034595,  -8.770027 ],
       [  3.074251 ,  -9.285911 ],
       [  3.2450428,  -9.801795 ],
       [  3.4158344, -10.3176775],
       [  3.5866263, -10.833563 ],
       [  3.7574182, -11.349445 ],
       [  3.9282095, -11.86533  ],
       [  4.0990014, -12.381214 ]], dtype=float32)>

In [22]:
# Layer 3

new_wtf3 = UpdatedWeightValue(dZ3,A_layer2,W3tf)
new_btf3 = UpdatedBiasValue(dZ3,b3tf)
print("new weights for layer 3: ",new_wtf3)
print("new bias for layer 3: ",new_btf3)

new weights for layer 3:  tf.Tensor(
[[-0.26764387]
 [-0.2971279 ]], shape=(2, 1), dtype=float32)
new bias for layer 3:  tf.Tensor([[0.0736476]], shape=(1, 1), dtype=float32)


If we do the same for the other two layers. First, we obtain the local gradient for the second layer

In [23]:
# Local gradient for the second layer

dZ2 = ObtainLocalGradient(dZ3,A_layer2,W3tf)
dZ2

<tf.Tensor: shape=(9, 2), dtype=float32, numpy=
array([[19.558054, 13.962216],
       [20.724926, 14.795229],
       [21.891794, 15.62824 ],
       [23.058664, 16.461252],
       [24.225534, 17.294264],
       [25.392406, 18.127275],
       [26.559275, 18.960287],
       [27.726145, 19.793299],
       [28.893015, 20.626312]], dtype=float32)>

In [24]:
#Layer 2

new_wtf2 = UpdatedWeightValue(dZ2,A_layer1,W2tf)
new_btf2 = UpdatedBiasValue(dZ2,b2tf)

print("new weights for layer 2: ",new_wtf2)
print("new bias for layer 2: ",new_btf2)


new weights for layer 2:  tf.Tensor(
[[-0.2668967  -0.19053374]
 [ 0.03143832  1.5313946 ]], shape=(2, 2), dtype=float32)
new bias for layer 2:  tf.Tensor([[-0.02180298 -0.01556484]], shape=(1, 2), dtype=float32)


And doing it for the first layer finally would update the weights for the first layer.

In [25]:
# Local gradient for the first layer

dZ1 = ObtainLocalGradient(dZ2,A_layer1,W2tf)

#Layer 1

new_wtf1 = UpdatedWeightValue(dZ1,prepared_input,W1tf)
new_btf1 = UpdatedBiasValue(dZ1,b1tf)

print("new weights for layer 1: ",new_wtf1)
print("new bias for layer 1: ",new_btf1)


new weights for layer 1:  tf.Tensor([[-0.25204194 -0.3189828 ]], shape=(1, 2), dtype=float32)
new bias for layer 1:  tf.Tensor([[ 0.00974416 -0.02383646]], shape=(1, 2), dtype=float32)


Now we would have updated all weights and complete a first epoch. This is what we are doing internally with the tensorflow library when we train a neural network : we pass through the data with the weights we have and using the activation function we have prefered to use for our experiment. Then, we obtain a prediction that we use to compare to the actual results we know beforehand with the real values. After obtaining this value, we calculate de error gradient and local gradient and pass it backwards, updating the values of the weights to maintain the weights that have helped to achieve the desired results and tune the ones that get our value off from the true values.

### Putting all together, running epochs

Let's try now to build up a function to run forward propagation and back propagation progressively, and see if we can manage to obtain similar results to the ones obtained when creating a model in tensorflow.

We create a class now that contains all the functions we have been creating before, so it is more organized and it is easier to call for the forward and back propagation processes. We still use the function we have created before, we just call them inside our class.

Notice how we also save variables in the class so it is also easy to pass internally certain variables to run a full epoch, and how we also create a method to train our new neural network class! :)

In [26]:
class OurNeuralNetwork:
    def __init__(self):
        """
        Initializes the neural network with given input, hidden, and output sizes.
        """

        # Call the function to initialize weights and biases
        self.initialize_weights()

    def initialize_weights(self):
        """
        Initializes the weights and biases for the 3 layers
        """
        # Layer 1

        self.W1,self.b1 = LayerCreation(1,2)

        # Layer 2

        self.W2,self.b2 = LayerCreation(2,2)

        # Layer 3

        self.W3,self.b3 = LayerCreation(2,1)

        self.initialize_predictions()

    def self_forward_propagation(self, input):
        """
        Performs forward propagation to get the predictions.

        Args:
        input : input array with values with (n,1) shape.
        
        """

        A_layer1 = forward_propagation(input, self.W1,self.b1)
        self.A_layer1 = A_layer1

        A_layer2 = forward_propagation(A_layer1,self.W2,self.b2)
        self.A_layer2 = A_layer2

        self.y_pred = forward_propagation(A_layer2,self.W3,self.b3)
        



    def self_backward_propagation(self,input, y_true):
        """
        Performs backward propagation to update the weights and biases and calculates
        the mae.
        
        Args:
        input : input array with values with (n,1) shape.
        y_true : array with the true values for the labels.

        return:
        mae : mean average error calculated from y_prediction and y_true.

        """

        mae = calculate_mae_error(self.y_pred,y_true)

        # Update values for layer 3

        dZ3= self.y_pred - y_true

        new_w3 = UpdatedWeightValue(dZ3,self.A_layer2,self.W3)
        new_b3 = UpdatedBiasValue(dZ3,self.b3)

        # Update values for layer 2

        dZ2 = ObtainLocalGradient(dZ3,self.A_layer2,self.W3)

        new_w2 = UpdatedWeightValue(dZ2,self.A_layer1,self.W2)
        new_b2 = UpdatedBiasValue(dZ2,self.b3)

        # Update values for layer 1

        dZ1 = ObtainLocalGradient(dZ2,self.A_layer1,self.W2)

        new_w1 = UpdatedWeightValue(dZ1,input,self.W1)
        new_b1 = UpdatedBiasValue(dZ1,self.b1)

        # Update values

        self.update_self_weights(new_w3,new_b3,new_w2,new_b2,new_w1,new_b1)
        self.initialize_predictions()
        
        return mae
    
    def update_self_weights(self,new_w3,new_b3,new_w2,new_b2,new_w1,new_b1):
        """
        Stores the new values of weights and biases.

        Args:
        new_wN : new weights calculated to be updated to the weight values of the N layer.
        new_bn : new biases calculated to be updated to the bias values of the N layer.

        """
        self.W3 = new_w3
        self.b3 = new_b3
        self.W2 = new_w2
        self.b2 = new_b2
        self.W1 = new_w1
        self.b1 = new_b1
    
    def train_network(self,input,y_true,number_epoch):
        """
        Runs a certain amount of epochs to train the network.

        Args:
        input : input array with values with (n,1) shape.
        y_true : array with the true values for the labels. 
        numer_epoch: number of epochs to run to train the network with the same values introduced as input and y_true.      
        
        """
        for epoch in range(number_epoch):

            self.self_forward_propagation(input)
            mae = self.self_backward_propagation(input,y_true)
            if (epoch + 1) % 10 == 0:
                # Print the loss every 10 iterations
                print(f"Epoch {epoch+1}/{number_epoch}, Loss: {mae}")
                
    def do_prediction(self,input):

        """
        Runs a forward prop to determine the expected value for a certain input.

        Args:
        input : input array with values with (n,1) shape.

        return:
        self.y_pred =  y_prediction value stored in the class from doing a forward propagation with the specific input value passed before.
        
        """

        self.self_forward_propagation(input)
            
        return self.y_pred

    def show_weights(self):
        """
        Prints weight values
        
        """
        print("Weights from first layer : ",self.W1)
        print("Weights from second layer : ",self.W2)
        print("Weights from third layer : ",self.W3)

    def show_biases(self):
        """
        Prints bias values
        
        """
        print("Biases from first layer : ",self.b1)
        print("Biases from second layer : ",self.b2)
        print("Biases from third layer : ",self.b3)    

    def get_prediction(self):
        """
        returns prediction stored.
        
        """
        return self.y_pred
    
    def initialize_predictions(self):
        """
        Initializes prediction class variable. Must do when creating the NN and after each epoch.
        
        """
        self.y_prediction=[]


We start by initializing our network, doing an instantiation of OurNeuralNetwork class. In order to simplify and to demostrate a similar network to the one we created with tensorflow, we have the number of layers and neurons called from within the class.

In [27]:
our_network = OurNeuralNetwork()

In [28]:
our_network.show_weights()

Weights from first layer :  tf.Tensor([[-0.29604465 -0.21134205]], shape=(1, 2), dtype=float32)
Weights from second layer :  tf.Tensor(
[[-0.29604465 -0.21134205]
 [ 0.01063002  1.5165398 ]], shape=(2, 2), dtype=float32)
Weights from third layer :  tf.Tensor(
[[-0.29604465]
 [-0.21134205]], shape=(2, 1), dtype=float32)


Let's try to do a forward and backward pass. As you can see, within these three lines we are calling and repeating the same thing we did before.

In [29]:
our_network.self_forward_propagation(prepared_input)

In [30]:
our_network.self_backward_propagation(prepared_input,true_value)

<tf.Tensor: shape=(), dtype=float32, numpy=81.83067>

By calling it again, we are doing another epoch. Each time at the end of backward propagation, we are updating the weights.

In [31]:
our_network.self_forward_propagation(prepared_input)
our_network.self_backward_propagation(prepared_input,true_value)

<tf.Tensor: shape=(), dtype=float32, numpy=78.28142>

If we check the weights, we can see how they have been changing during these processes already. We are already training our network.

In [32]:
our_network.show_weights()

Weights from first layer :  tf.Tensor([[-0.21130395 -0.46642485]], shape=(1, 2), dtype=float32)
Weights from second layer :  tf.Tensor(
[[-0.24545546 -0.16673051]
 [ 0.05865001  1.561604  ]], shape=(2, 2), dtype=float32)
Weights from third layer :  tf.Tensor(
[[-0.24962954]
 [-0.43768468]], shape=(2, 1), dtype=float32)


Let's do a proper training, and see how the prediction turn to be. Let's get back to our training and testing samples, and do their preparation before feeding them into our network.

In [33]:
prep_x_train= PrepareInput(x_train)
prep_y_train= PrepareInput(y_train)

prep_x_test= PrepareInput(x_test)
prep_y_test= PrepareInput(y_test)

In [34]:
our_network.train_network(prep_x_train,prep_y_train,30)

Epoch 10/30, Loss: 2.698012113571167
Epoch 20/30, Loss: 2.2568602561950684
Epoch 30/30, Loss: 1.8855582475662231


It looks like we have achieved a good loss and the weights are trained properly in our simple neural network made from scratch. What if we try to predict how the y_test looks from x_test?

In [35]:
y_test_pred = our_network.do_prediction(prep_x_test)
y_test_pred,prep_y_test

(<tf.Tensor: shape=(16, 1), dtype=float32, numpy=
 array([[-197.42119],
        [-193.45299],
        [-189.48479],
        [-185.51656],
        [-181.54836],
        [-177.58015],
        [-173.61194],
        [-169.64374],
        [-165.67554],
        [-161.70732],
        [-157.7391 ],
        [-153.7709 ],
        [-149.8027 ],
        [-145.83447],
        [-141.86629],
        [-137.89807]], dtype=float32)>,
 <tf.Tensor: shape=(16, 1), dtype=float32, numpy=
 array([[-197.],
        [-193.],
        [-189.],
        [-185.],
        [-181.],
        [-177.],
        [-173.],
        [-169.],
        [-165.],
        [-161.],
        [-157.],
        [-153.],
        [-149.],
        [-145.],
        [-141.],
        [-137.]], dtype=float32)>)

Our model looks succesful and see how we managed to obtain such a perfect prediction model with just a few training epochs.

Let's try now to do the same with another neural network, for a similar correlation problem and see the results as well.

In [36]:
x_new = np.arange(0,10,1)
y_new = OurLinearCorrelation(x_new)
x_new

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [37]:
x_prep = PrepareInput(x_new)
y_prep = PrepareInput(y_new)

What if we tried to see how our previous model looks when predicting this new set ?

In [38]:
y_test_pred = our_network.do_prediction(x_prep)
y_test_pred,y_prep

(<tf.Tensor: shape=(10, 1), dtype=float32, numpy=
 array([[ 0.9892023],
        [ 2.9733064],
        [ 4.957411 ],
        [ 6.941515 ],
        [ 8.925619 ],
        [10.909723 ],
        [12.8938265],
        [14.877931 ],
        [16.862034 ],
        [18.846142 ]], dtype=float32)>,
 <tf.Tensor: shape=(10, 1), dtype=float32, numpy=
 array([[ 3.],
        [ 5.],
        [ 7.],
        [ 9.],
        [11.],
        [13.],
        [15.],
        [17.],
        [19.],
        [21.]], dtype=float32)>)

We can see how our model as it was training on the features of the other function, does not retrieve the same values as with the function it was trained with. We could do two things now: retrain our old model and adapt it to the new correlation function we created, or create a new one.

This is for now an example of how to navigate through neural networks, and how to understand better what is happening in between the layers when we work in Tensorflow. In future notebooks, we will try to have a deeper look on more complex networks to see how these interactions are. In the end, what we need to have clear are these rules we have seen here, as with more complex networks what we have in the end are larger matrix indexes, but the idea stays the same.