## History of Deep Learning

<br>

![](CS5480_images/chrome_np99k2xLxh.png)

<br><br>

### McCulloch-Pitts Neuron

Feed Forward McCulloch-Pitts net can compute any Boolean function $f : \{0,1\}^n \rightarrow \{0,1\}$<br>
Recursive McCulloch-Pitts networks can simulate any Deterministic Finite Automaton (DFA).<br>
McCulloch-Pitts networks can be represented mathematically as shown below:

$$
y(x_1,x_2,\dots,x_{n+m},\theta) = \begin{cases}
   1 &\displaystyle \text{if } \sum_{i=1}^n x_i \ge \theta  \text{ and  } \sum_{i=n+1}^m x_i =0\\
   0 &\displaystyle \text{if } \sum_{i=1}^n x_i < \theta  \text{ or  } \sum_{i=n+1}^m x_i >0
\end{cases}
$$

- Here all signals are binary 
- Threshold $\theta \in \mathbb{N}$
- $n$ excitatory inputs $x_1,\dots,x_n$
- $m$ inhibitory inputs $x_{n+1},\dots,x_{n+m}$
- One output $y$

### Rosenblatt’s Perceptron

Rosenblatt’s Perceptron can be represented mathematically as shown below:
$$
f(x) = \begin{cases}
   1 &\text{if } w\cdot x+b>0 \\
   0 &\text{otherwise } 
\end{cases}
$$

Boolean Gates using a Perceptron

![](CS5480_images/chrome_mmOWWTaZry.png)

- Hebb, in his influential book The organization of Behavior (1949),
claimed
    - Behavior changes are primarily due to the changes of synaptic strengths $(w_{ij})$ between neurons $i$ and $j$
    - Hebbian learning law: $w_{ij}$ increases only when both $i$ and $j$ are "on".
    - _"Neurons that fire together, wire together. Neurons that fire out of sync, fail to link"_
    - In perceptron learning, Hebbian law can be restated as:
        - $w_{ij}$ increases only if the outputs of both units $x_i$
and $y$ have the same sign.
    - In our simple network (one output and n input units)
      $$\nabla W_{ij}=\nabla W_{ij}(\text{new})-\nabla W_{ij}(\text{old})=X_iY$$
      Or
      $$\nabla W_{ij}=\nabla W_{ij}(\text{new})-\nabla W_{ij}(\text{old})=\alpha X_iY$$

#### Hebbian Learning:

- Initialization: $b = 0, w_i = 0, i = 1 \text{ to } n$
- For each of the training sample $(x,y)$, do the following:
- Update weight and bias as :
  $$\begin{align*}
   w_i&:=w_i+x_i\times y_i, i=1 \text{ to } n \\
   b&:=b+x_i \times y
  \end{align*}$$

We can obtain AND function using bipolar units $(-1,1)$. In this case a correct boundary $-1 +x_1+x_2=0$ shall we learned. <br>
But we can not obtain AND function using Binary Unit $(1,0)$ and $\alpha =1$. In this case an incorrect boundary $1+x_1+x_2=0$ is learned. 

#### Perceptron Learning

1. Initialization: $b = 0, w_i = 0, i = 1 \text{ to } n$
2. While stop condition is false do step 2 to 4
3. For each of the training sample $(x,y)$ do steps 3 to 4
4. compute output of perceptron, $o$
5. if $o\ne y$
   $$\begin{align*}
   w_i&:=w_i+\alpha \times x_i \times o ,\quad i= 1 \text{ to } n \\
   b&:=b+\alpha \times o
   \end{align*}
   $$

- Learning occurs only when a sample has $o\ne y$
- Two loops, a completion of the inner loop (each sample is used once) is called an epoch
- Stop when 
  - When no weight is changed in the current epoch, or
  - When pre-determined number of epochs is reached



### Widrow-Hoff Learning Rule

- It is also called  the delta learning rule.
- Developed for a perceptron.
- Learning algorithm: same as Perceptron learning, just change step 4 as below
   $$\begin{align*}
   w_i&:=w_i+\alpha \times x_i \times \boxed{(o-y)} \quad i= 1 \text{ to } n \\
   b&:=b+\alpha \times \boxed{(o-y)}
   \end{align*}
   $$

### Perceptron convergence theorem

If there exists an exact solution (if the training data set is linearly separable), then the perceptron learning algorithm is guaranteed to find an exact solution in a finite number of steps.


#### Proof

<br>

![](CS5480_images/chrome_Myg1q0YkBZ.png)

Consider the above image 

- Perceptron tries to find the boundary $W^TX=0$
- Angle between $W$ and any point $X$ which lies on that line will be $90^{\circ}$
- Angle between positive points $(p_1,p_2,p_3,\dots)$ and $W$ shall be less than $90^{\circ}$
- Angle between negative  points $(n_1,n_2,n_3,\dots)$ and $W$ shall be greater than $90^{\circ}$
- If for any positive point say $p_3$ the angle with $W$ becomes more than $90^{\circ}$, it means perceptron has made a mistake
- Because there is a mistake so the perceptron learning algorithm now tries to update the weight, with the rule $\overline {W}=W+ \alpha X\times Y$, consider $\alpha$ and $Y$ to be $1$ for simplicity. So it becomes $\overline W=W+  X$ 
- The cos angle between $p_3$ and $\overline W$ is proportional to $\overline W^TX$
  $$\begin{align*}
   \cos \theta &\propto \overline W^TX \\
   &\propto ( W+X)^TX \\
   &\propto  W^TX+X^TX 
  \end{align*} 
  $$

- Since $X^TX$ is positive have we can say that the value of $\cos \theta$ for  $\overline  W$ and $X$ is more than that of $W$ and $X$ 
- Hence we can say that the value of angle $\theta$ for  $\overline  W$ and $X$ is less than that of $W$ and $X$ 
- we can see that the angle between $p_3$ and $W$ is decreased after weight update 
- Hence we can say that after certain number of iteration the angle between  $p_3$ and $W$ will become less than $90^{\circ}$ 

### Perceptron and Other Linear Classifiers

- Logistic regression and perceptron with sigmoid activation
are the same.
- Linear SVMs and perceptrons are 'almost' the same. 'Almost' because SVM provides best margin but preceptron doesn't have such notion.

### XOR Gate using a Perceptron

XOR can be solved by a more complex network with hidden units (Multi-Layer Perceptron). But it can not be solved using one layer perceptron.

Consider the below picture:

![](CS5480_images/chrome_dN49ePd1Dp.png)

Below is the output for above network considering inputs $(x_1,x_2)$ as $(-1,-1),(-1,1),(1,-1),(1,1)$

In [47]:
#| code-fold: true
import numpy as np
import plotly.express as px
import plotly.io as pio
import pandas as pd
pio.renderers.default = "plotly_mimetype+notebook_connected" # this is reqired only to render html not for notebook

x_1=np.array([-1,-1,1,1])
x_2=np.array([-1,1,-1,1])
z_1=np.where((2*x_1-2*x_2)>=1,1,-1)
z_2=np.where((-2*x_1+2*x_2)>=1,1,-1) 
y=np.where((2*z_1+2*z_2)>=0,1,-1) 
df = pd.DataFrame(data=np.vstack([x_1.flatten(),x_2.flatten(),z_1.flatten(),z_2.flatten(),y.flatten()]).T,
columns=['x_1','x_2','z_1','z_2','class'])
fig = px.scatter( df,x='x_1', y='x_2', color='class')
fig.update_layout(autosize=False,width=400,height=400,coloraxis_showscale=False)
fig.show()

__In Depth analysis__

Output of Layer one for $z_1$

In [52]:
#| code-fold: true
import numpy as np
import matplotlib.pyplot as plt
nx, ny = (100, 100)
x = np.linspace(-5, 5, nx)
y = np.linspace(-5, 5, ny)
x_1, x_2 = np.meshgrid(x, y)
z_1=np.where((2*x_1-2*x_2)>=1,1,-1)
z_2=np.where((-2*x_1+2*x_2)>=1,1,-1) 
y=np.where((2*z_1+2*z_2)>=0,1,-1) 
df = pd.DataFrame(data=np.vstack([x_1.flatten(),x_2.flatten(),z_1.flatten(),z_2.flatten(),y.flatten()]).T,
columns=['x_1','x_2','z_1','z_2','class'])

In [53]:
#| code-fold: true
fig = px.scatter( df,x='x_1', y='x_2', color='z_1')
fig.update_layout(autosize=False,width=600,height=600,coloraxis_showscale=False)
fig.show()

Output of Layer one for $z_2$

In [54]:
#| code-fold: true
fig = px.scatter( df,x='x_1', y='x_2', color='z_2')
fig.update_layout(autosize=False,width=600,height=600,coloraxis_showscale=False)
fig.show()

Final Layer output  for $y$

In [55]:
#| code-fold: true
import plotly.express as px
fig = px.scatter( df,x='x_1', y='x_2', color='class')
fig.update_layout(autosize=False,width=600,height=600,coloraxis_showscale=False)
fig.show()

we can see that the perceptron with two layers is able to classify the XOR data, it means it has non linearity, just to understand it more deeper if we remove thresholding at the neuron we get output as below:

Output of Layer one for $z_1$

In [56]:
#| code-fold: true
import numpy as np
import matplotlib.pyplot as plt
nx, ny = (100, 100)
x = np.linspace(-5, 5, nx)
y = np.linspace(-5, 5, ny)
x_1, x_2 = np.meshgrid(x, y)
z_1=2*x_1-2*x_2 
z_2=-2*x_1+2*x_2 
y=2*z_1+2*z_2 
df = pd.DataFrame(data=np.vstack([x_1.flatten(),x_2.flatten(),z_1.flatten(),z_2.flatten(),y.flatten()]).T,
columns=['x_1','x_2','z_1','z_2','class'])

In [57]:
#| code-fold: true
fig = px.scatter( df,x='x_1', y='x_2', color='z_1')
fig.update_layout(autosize=False,width=600,height=600)
fig.show()

Output of Layer one for $z_2$

In [60]:
#| code-fold: true
fig = px.scatter( df,x='x_1', y='x_2', color='z_2')
fig.update_layout(autosize=False,width=600,height=600)
fig.show()

Final Layer output  for $y$

In [61]:
#| code-fold: true
import plotly.express as px
fig = px.scatter( df,x='x_1', y='x_2', color='class')
fig.update_layout(autosize=False,width=600,height=600)
fig.show()

## Recall: Training a Neural Network

It was already discuss  [here](../Machine-Learning/2022-09-24-CS5590-week5.ipynb#multi-layer-perceptrons) in Machine Learning. 

<br><br><br>
$\tiny  {\textcolor{#808080}{\boxed{\text{Reference: Dr. Vineeth, IIT Hyderabad }}}}$