# A Single Neuron

In this module, we will learn:
1. How does a single neuron work?
2. Forward Propagation
3. Backward Propagation
4. Optimization Algorithm
5. Loss Function

***

***
## How does a single neuron work?

The working of a neuron involves following steps:
1. Defining input and output
2. Forward Propagation
3. Computing Loss
4. Backward Propagation
5. Updating weights and biases

We continue from step 2 through step 5 untill we get either zero or very small loss value.

We, in this module, discuss each step in a bit detail.



***
### Defining Inputs and Outputs

We have the following pattern between input(feature) and output(target)

| Example  Numbers      | Features (x) | Target (y)    |
| :---                  |    :----:    |          ---: |
| 1                     | 1            | 2             |
| 2                     | 2            | 4             |
| 3                     | 3            | 6             |
| 4                     | 4            | 8             |
| 5                     | 5            | 10            |
| 6                     | 6            | 12            |
| $\vdots$              | $\vdots$     | $\vdots$      |
| $\vdots$              | $\vdots$     | $\vdots$      |
| 100                   | 100          | 200           |

The task of the neuron is to find the $w$ and $b$ for the following equation, which shows a relationship between input(x) and output(y).
\begin{equation}
y = wx + b
\end{equation}

In the above equation, $w$ and $b$, are called weight and bias, respectively.
***


### Forward Propagation

![Pic1Module2](Pic1Module2.png)

The forward propagation involves the following steps.
1. Initializing weights and biases. **Please note that initializing weights and biases is done at the begining of training but not with every epoch.**
2. Compute $Z = W * X + b $
3. Passing the value of $Z$ through an activation function to get $\hat{Y}$; that is, 
$$Y = f(Z).$$ 
In this module, we are using a linear activation function; therefore, $\hat{Y} = Z.$


Please note that we will learn about activation functions in detail in next lectures.

***

#### Step 1: Initializing weights and biases

Since there is only one layer and one neuron; therefore, we have to initialize only one weight value and one bias value. Let's say that we initialized the weight and bias to be $$w = 1.2,$$ and $$b = 0.$$

#### Step 2: Computing $z = wx + b$

\begin{equation}
z = 1.2 * 1 + 0 = 1.2
\end{equation}



#### Step 3: Computing $\hat{y} = z$

\begin{equation}
\hat{y} = z = 1.2
\end{equation}

We can see that the computed output $\hat{y} = 1.2$; whereas, the actual output was $y = 2$. So, the next step is to calculate the loss to determine how far the estimated output is from the actual output.

### Computing Loss Function
We will compute the loss to determine the disparity between the estimated and the actual output values. There are many loss functions, which will discuss in detail later; however, here, we are using **Mean Squared Error** loss function.

The MSE loss function is defined as:
\begin{eqnarray}
\mathcal{L\left(y, \hat{y}\right)} = \frac{1}{N}\sum_{i=1}{N}\left(y-\hat{y}\right)^2
\end{eqnarray}

Since, in the given module, we are using stochastic gradient descent; we will feed single example to the model and update parameters (w and b) for each example; therefore, the MSE formula will be reduced to $(2)$.

\begin{equation}
\mathcal{L\left(y, \hat{y}\right)} = \left(y-\hat{y}\right)^2
\end{equation}



#### Step 4: Compute the loss
We will compute the loss for given example as follows:
\begin{eqnarray}
\mathcal{L\left(y, \hat{y}\right)} = & \left(2-1.2\right)^2 \\
= & 0.64
\end{eqnarray}

The loss is too much; that is, $0.64$, which is an indication that our model is not performing well. Therefore, in the next step, we will update $w$ and $b$.

### Updating the parameters

When a neural network model does not perform well, the weights and biases are updated using any optimization algoritm. The weights are updated as follows:

\begin{equation}
w : = w - \alpha dw \\
b : = b - \alpha db
\end{equation}
In the above equation, $\alpha$ is the learning parameter, and, in the given example, we have opted it be $0.09$. Furthermore, for a stochastic gradient descent optimization, $dw$ and $db$ are as follows:
\begin{equation}
dw  = -2(y-\hat{y})x\\
db = -2(y-\hat{y})
\end{equation}

**Please note that $dw$ and $db$ are different for different optimization algorithms and activation functions.**

#### Step 5: Updating the weight and bias
We will update the weight and bias according to the above-mentioned equations.

First, we will calculate dw and db.

\begin{equation}
dw  = -2(y-\hat{y})x = -2*(2 - 1.2)*1 = -1.6\\
db = -2(y-\hat{y}) = -2*(2-1.2) = -1.6
\end{equation}

Second, we will update w and b as follows:
\begin{equation}
w : = w - \alpha dw = 1.2 - 0.09*(-1.6) = 1.34\\
b : = b - \alpha db = 0 - 0.09*(-1.6) = 0.144
\end{equation}


Please note that if we use the updated values of $w$ and $b$ to compute the output $\hat{y}$ for $x=1$, we will get a better approximation. However, we will proceed with second example; that is, $x = 2$.

### Training the network

Now we will repeat from step 2 through step 5 for all 100 examples, and all this will make a single epoch. We will train the model for a number of epochs untill the loss becomes neglible.