# CS5242 Neural Networks and Deep Learning I
## Homework 3: Code Supplement

This notebook contains code to run an experiment for when a local minimum can be escaped with an ADAM optimizer. The implementation of the ADAM optimizer is taken from the original paper as here: https://arxiv.org/abs/1412.6980

### Imports and Configuration

In [1]:
import numpy as np
import pandas as pd

In [9]:
# ADAM Configuration Parameters
alpha = 0.001 #Learning Rate
beta1 = 0.9
beta2 = 0.999
epsilon = 0.0

In [12]:
# Experiment Parameters
L = 1 #Loss Function Intercept
n = 5000 #Iterations
h_range = np.arange(alpha, alpha*4, alpha*0.1) #Range of h values for experiment

### ADAM Optimizer

The ADAM Optimizer is created for a single weight with an update function.

In [3]:
class Adam():
    def __init__(self, alpha, beta1, beta2, epsilon):
        self.m_dw, self.v_dw = 0, 0
        self.beta1 = beta1
        self.beta2 = beta2
        self.epsilon = epsilon
        self.alpha = alpha
        
    def update(self, t, w, dw):
        self.m_dw = self.beta1*self.m_dw + (1-self.beta1)*dw
        self.v_dw = self.beta2*self.v_dw + (1-self.beta2)*(dw**2)
        m_dw_corr = self.m_dw/(1-self.beta1**t)
        v_dw_corr = self.v_dw/(1-self.beta2**t)
        w = w - self.alpha*(m_dw_corr/(np.sqrt(v_dw_corr)+self.epsilon))
        return w

### Gradient Function

The piecewise loss function is defined as follows:
 $$ f(w)=   \left\{
\begin{array}{ll}
      L - w & 0 \leq w \leq L \\
      w - L & L \leq w\leq L+h \\
      2h + L - w  & L+h \leq w \leq L+2h\\
      h + \frac{L}{2} -\frac{w}{2}  & L+2h \leq w \\
\end{array} 
\right.  $$

As a result, the gradient function is as follows:

$$ \frac{d(f(w))}{dw}=  \left\{
\begin{array}{ll}
      -1 & 0 \leq w \leq L \\
      1 & L \leq w\leq L+h \\
      -1  & L+h \leq w \leq L+2h\\
      -\frac{1}{2} & L+2h \leq w \\
\end{array} 
\right. $$ 


In [4]:
def grad_function(w, h):
    if(w < L):
        return -1
    elif(w < L+h):
        return 1
    elif(w < L+2*h):
        return - 1
    else:
        return -0.5

### Experiment

The experiment is run with iterating h over a range of values for 5000 iterations.

In [15]:
results = pd.DataFrame(columns = ['h', 'Local Minimum Peak', 'Timestamp crossed Local Minimum Peak', 'Final Weight', 'Maximum Weight Achieved'])

In [22]:
for i, h in enumerate(h_range):
    w_0 = 0
    max_w_0 = 0
    adam = Adam(alpha, beta1, beta2, epsilon)
    t = 1 
    crossed = False

    while t < n+1:
        dw = grad_function(w_0, h)
        w_0 = adam.update(t, w=w_0, dw=dw)
        max_w_0 = np.max([max_w_0, w_0])
        if(w_0 > L+h and not crossed):
            print(f'For h={h:.4f} Crossed the bump at time step {t}')
            results.loc[i] = [h, L+h, t, 0, 0]
            crossed = True
        t+=1
    if(not crossed):
        results.loc[i] = [h, L+h, np.nan, w_0, max_w_0]
        print(f'For h={h:.4f} the final weight is {w_0:.4f} and the max weight is {max_w_0:.4f}')
    else:
        results.loc[i, ['Final Weight', 'Maximum Weight Achieved']] = [w_0, max_w_0]

For h=0.0010 Crossed the bump at time step 1002
For h=0.0011 Crossed the bump at time step 1002
For h=0.0012 Crossed the bump at time step 1002
For h=0.0013 Crossed the bump at time step 1002
For h=0.0014 Crossed the bump at time step 1002
For h=0.0015 Crossed the bump at time step 1003
For h=0.0016 Crossed the bump at time step 1003
For h=0.0017 Crossed the bump at time step 1003
For h=0.0018 Crossed the bump at time step 1003
For h=0.0019 Crossed the bump at time step 1004
For h=0.0020 Crossed the bump at time step 1004
For h=0.0021 Crossed the bump at time step 1004
For h=0.0022 Crossed the bump at time step 1005
For h=0.0023 Crossed the bump at time step 1005
For h=0.0024 Crossed the bump at time step 1006
For h=0.0025 the final weight is 0.9998 and the max weight is 1.0024
For h=0.0026 the final weight is 0.9998 and the max weight is 1.0024
For h=0.0027 the final weight is 0.9998 and the max weight is 1.0024
For h=0.0028 the final weight is 0.9998 and the max weight is 1.0024
For 

In [23]:
results

Unnamed: 0,h,Local Minimum Peak,Timestamp crossed Local Minimum Peak,Final Weight,Maximum Weight Achieved
0,0.001,1.001,1002.0,4.351674,4.351674
1,0.0011,1.0011,1002.0,4.35153,4.35153
2,0.0012,1.0012,1002.0,4.35153,4.35153
3,0.0013,1.0013,1002.0,4.35153,4.35153
4,0.0014,1.0014,1002.0,4.351387,4.351387
5,0.0015,1.0015,1003.0,4.349235,4.349235
6,0.0016,1.0016,1003.0,4.349235,4.349235
7,0.0017,1.0017,1003.0,4.349235,4.349235
8,0.0018,1.0018,1003.0,4.349093,4.349093
9,0.0019,1.0019,1004.0,4.346943,4.346943


### Results

- We can see that the local minimum peak of height up to 0.0024 can be escaped with the ADAM Optimizer having a learning rate of 0.001
- This value may differ as we vary across different values of the learning rate
- We can also see that for value of h <= 0.0024, the weight keeps increasing past crossing the peak since the loss continuously decreases after that.