# B"H

## Singel Neuron "Network"

Before applying this to a complete neural network, let’s start with a simplified forward pass with just one neuron. Rather than backpropagating from the loss function for a full neural network, let’s backpropagate the ReLU function for a single neuron and act as if we intend to **minimize the output for this single neuron**. 

This example is obviously not used in the real world (where we minimize the loss etc) - this just for learning purposes etc.

![](https://drive.google.com/uc?id=1_lq5wWBDiqhtXbPaOTfvdCrZY0o7qJQz)

In [2]:
# Forward pass
x = [1.0, -2.0, 3.0]  # input values
w = [-3.0, -1.0, 2.0]  # weights
b = 1.0  # bias

# Multiplying inputs by weights
xw0 = x[0] * w[0]
xw1 = x[1] * w[1]
xw2 = x[2] * w[2]
print(xw0, xw1, xw2, b)

# Adding weighted inputs and a bias
z = xw0 + xw1 + xw2 + b
print(z)

# ReLU activation function
y = max(z, 0)
print(y)

-3.0 2.0 6.0 1.0
6.0
6.0


### Our Big Function

![](https://drive.google.com/uc?id=1JpU0VqoiiRBLoDyrV7GE1JSED14cwPxo)

<br>

Let’s rewrite our equation to the form that will allow us to determine how to calculate the derivatives more easily:

![](https://drive.google.com/uc?id=17XPz2uAdSPCmTwHn6D0b5I6BDCHZudmP)

<br>

... in psuedo-code:

```
ReLU(
    sum(
        mul(x0, w0), 
        mul(x1, w1), 
        mul(x2, w2), 
        b
    )
)
```

### Partial derivative of w0

Let's start by considering what we need to calculate for the partial derivative of $\large w_0$

![](https://drive.google.com/uc?id=1JdTSOrQXda3c6a6LOoFHZRUB2HnFP4X5)


> For legibility, we did not denote the ReLU() parameter, which is the full sum, and the sum parameters, which are all of the multiplications of inputs and weights. We excluded this because the equation would be longer and harder to read. 

This equation shows that we have to calculate the derivatives and partial derivatives of all of the atomic operations and multiply them to acquire the impact that $\large x_0$ makes on the output. 




#### Gradient from next layer

We’ll have multiple chained layers of neurons in the neural network model, followed by the loss function. 

We'll want to know the impact of a given **weight or bias** on the loss. 

The derivative **with respect to the layer’s inputs**, as opposed to the derivative **with respect to the weights and biases**, is not used to update any parameters. Instead, it is used to **chain** to another layer (which is why we backpropagate to the previous layer in a chain).

---


For this example, let’s assume that our neuron receives a gradient of $1$ from the **next layer**. We’re making up this value for demonstration purposes, and a value of 1 won’t change the values, which means that we can more easily show all of the processes. 

We are going to use the color of red for derivatives.

![](https://drive.google.com/uc?id=1kdRsvEPwknm-mTohSuPzUrv2S6MCYml-)

#### ReLU derivative

Recall that the derivative of ReLU() **with respect to its input** is 1, if the input is greater than 0, and 0 otherwise.

The input value to the ReLU function is 6, so the derivative equals 1. 

We have to use the chain rule and multiply this derivative with the derivative received from the next layer (which we made up to be 1).

<br>

![](https://drive.google.com/uc?id=1ZBCFRJbpnniz3uZO5vEFC0C_xEPs7LXk)

This results with the derivative of 1:

![](https://drive.google.com/uc?id=1fyBKyZ3Bz-nwGfbgRvbZGFJzojt4fyok)

In [3]:
# -- ---------------------------------------
# Forward pass
x = [1.0, -2.0, 3.0]  # input values
w = [-3.0, -1.0, 2.0]  # weights
b = 1.0  # bias

# Multiplying inputs by weights
xw0 = x[0] * w[0]
xw1 = x[1] * w[1]
xw2 = x[2] * w[2]

# Adding weighted inputs and a bias
z = xw0 + xw1 + xw2 + b

# ReLU activation function
y = max(z, 0)
# -- ---------------------------------------


# -- ---------------------------------------
# Backward pass

# The derivative from the next layer
dvalue = 1.0

# Derivative of ReLU and the chain rule
drelu_dz = dvalue * (1. if z > 0 else 0.)
print(drelu_dz)
# -- ---------------------------------------



1.0


#### Sum derivative

![](https://drive.google.com/uc?id=1vKEGB1GIt0ob1U9f_UBYpYIZ8xiLm323)

<br>

---

> Note: `w.r.t` stands for "with respect to"

---

The partial derivative of the simple sum operation (i.e. $f(x, y, z) = x + y + z$) is always 1, no matter the inputs:

![](https://drive.google.com/uc?id=1CK16QPZ9QmMF-GQzC44ApNYF2Nm--s7m)


In [4]:
# -- ---------------------------------------
# Forward pass
x = [1.0, -2.0, 3.0]  # input values
w = [-3.0, -1.0, 2.0]  # weights
b = 1.0  # bias

# Multiplying inputs by weights
xw0 = x[0] * w[0]
xw1 = x[1] * w[1]
xw2 = x[2] * w[2]

# Adding weighted inputs and a bias
z = xw0 + xw1 + xw2 + b

# ReLU activation function
y = max(z, 0)
# -- ---------------------------------------


# -- ---------------------------------------
# Backward pass

# -- ----------------------
# The derivative from the next layer
dvalue = 1.0
# -- ----------------------

# -- ----------------------
# Derivative of ReLU with chain rule
drelu_dz = dvalue * (1. if z > 0 else 0.)
print(drelu_dz)
# -- ----------------------

# -- ----------------------
# Partial derivatives of the sum with chain rule
dsum_dxw0 = 1
drelu_dxw0 = drelu_dz * dsum_dxw0
print(drelu_dxw0)
# -- ----------------------

# -- ---------------------------------------


1.0
1.0


![](https://drive.google.com/uc?id=1IVD0aLm49_FiBe1iodPAc97IMu4r0iRj)

Result:

![](https://drive.google.com/uc?id=1bLDvheKGVXgleqzgrIBx_tCfC-zoBxkF)

### For all weighted inputs and bias

We can then perform the same operation.

![](https://drive.google.com/uc?id=13bXiryB5dBMpa9vsAJIDVBJEHOsSO1UM)

In [5]:
# -- ---------------------------------------
# Forward pass
x = [1.0, -2.0, 3.0]  # input values
w = [-3.0, -1.0, 2.0]  # weights
b = 1.0  # bias

# Multiplying inputs by weights
xw0 = x[0] * w[0]
xw1 = x[1] * w[1]
xw2 = x[2] * w[2]

# Adding weighted inputs and a bias
z = xw0 + xw1 + xw2 + b

# ReLU activation function
y = max(z, 0)
# -- ---------------------------------------


# -- ---------------------------------------
# Backward pass

# -- ----------------------
# The derivative from the next layer
dvalue = 1.0
# -- ----------------------

# -- ----------------------
# Derivative of ReLU and the chain rule
drelu_dz = dvalue * (1. if z > 0 else 0.)
# -- ----------------------

# -- ----------------------
# Partial derivatives of the sum, the chain rule
dsum_dxw0 = 1
dsum_dxw1 = 1
dsum_dxw2 = 1
dsum_db = 1

drelu_dxw0 = drelu_dz * dsum_dxw0
drelu_dxw1 = drelu_dz * dsum_dxw1
drelu_dxw2 = drelu_dz * dsum_dxw2
drelu_db = drelu_dz * dsum_db

print(drelu_dxw0, drelu_dxw1, drelu_dxw2, drelu_db)
# -- ----------------------

# -- ---------------------------------------


1.0 1.0 1.0 1.0


### Multiplication of weights and inputs

Continuing backward, the next function is the multiplication of weights and inputs. 

The derivative for a product is whatever the input is being multiplied by. 

Recall:

![](https://drive.google.com/uc?id=1IAQS4eIH242ZSOLRVK3929RMBosiCx7p)

<br>

---

![](https://drive.google.com/uc?id=19GdcFSnltR2iSBV2nCL7EMv_YLELhESa)

---

[Video showing above](https://www.youtube.com/watch?v=_9qHQA30hys)

In [6]:
# -- ---------------------------------------
# Forward pass
x = [1.0, -2.0, 3.0]  # input values
w = [-3.0, -1.0, 2.0]  # weights
b = 1.0  # bias

# Multiplying inputs by weights
xw0 = x[0] * w[0]
xw1 = x[1] * w[1]
xw2 = x[2] * w[2]

# Adding weighted inputs and a bias
z = xw0 + xw1 + xw2 + b

# ReLU activation function
y = max(z, 0)
# -- ---------------------------------------


# -- ---------------------------------------
# Backward pass

# -- ----------------------
# The derivative from the next layer
dvalue = 1.0
# -- ----------------------

# -- ----------------------
# Derivative of ReLU and the chain rule
drelu_dz = dvalue * (1. if z > 0 else 0.)
# -- ----------------------


# -- ----------------------
# Partial derivatives of the sum, the chain rule
dsum_dxw0 = 1
dsum_dxw1 = 1
dsum_dxw2 = 1
dsum_db = 1

drelu_dxw0 = drelu_dz * dsum_dxw0
drelu_dxw1 = drelu_dz * dsum_dxw1
drelu_dxw2 = drelu_dz * dsum_dxw2
drelu_db = drelu_dz * dsum_db
# -- ----------------------


# -- ----------------------
# Partial derivatives of the multiplication, the chain rule
dmul_dx0 = w[0]
dmul_dx1 = w[1]
dmul_dx2 = w[2]

dmul_dw0 = x[0]
dmul_dw1 = x[1]
dmul_dw2 = x[2]

drelu_dx0 = drelu_dxw0 * dmul_dx0
drelu_dw0 = drelu_dxw0 * dmul_dw0
drelu_dx1 = drelu_dxw1 * dmul_dx1
drelu_dw1 = drelu_dxw1 * dmul_dw1
drelu_dx2 = drelu_dxw2 * dmul_dx2
drelu_dw2 = drelu_dxw2 * dmul_dw2

print(drelu_dx0, drelu_dw0, drelu_dx1, drelu_dw1, drelu_dx2, drelu_dw2)
# -- ----------------------

# -- ---------------------------------------



-3.0 1.0 -1.0 -2.0 2.0 3.0
