### EECS 495: Optimization Techniques for Machine Learning and Deep Learning
Homework 1: Chapter 2 Assignments (2.3, 2.4, 2.9)

Northwestern University - Fall 2018 

Problem 2.3 - Implement random search to minimize: 

$$ g(w_0, w_1) = \tanh(4w_{0}+4w_1)+\max(0.4w_0^2,1)+1$$

Using the conditions: P = 1000 directions, $\alpha$ = 1, up to 8 steps, and initial point $w^0 = \begin{bmatrix} 2 \\ 2\end{bmatrix}$. 

In [155]:
#Problem 2.3 - Implement random search
import numpy as np
import matplotlib as mpl

#Lists to store past values 
weights_past = []
evals_past = []

#Variable inits
alpha = 1
w_0 = np.array([2,2])
P = 1000
num_steps = 8

#Define g
def g(w):
    return np.tanh(4*w[0]+4*w[1])+np.max([0.4*w[0]**2,1])+1

#Define a helper function for generating random unit vectors of some length
#Credit to Stack Overflow for this function: https://stackoverflow.com/questions/6283080/random-unit-vector-in-multi-dimensional-space
#I started programming this...but thieir top solution was so elegant... 
def rand_unit_vec(len):
    #np.random.rand(1)[0] is centered about [0,1) shift down by 0.5 and multiply by 2 to center about [-1,1)
    #Otherwise, you will spend hours wondering why your function only ever increases
    vec = [(np.random.rand(1)[0]-0.5)*2 for i in range(0,len)]
    mag = sum(x*x for x in vec)**0.5
    return np.transpose([x/mag for x in vec])

# --- Store initial values ---

#init the weights
w = w_0

#Store the inital values
weights_past.append(w)
evals_past.append(g(w))

# --- Implement random local search ---
for i in range(0, num_steps):
    #init the descent directions for this tep
    dirs = []
    dir_evals = []
    #Compute P descent directions - set the radius of the circle about which we will sample directions to
    # be equal to 1. 
    for j in range(0,P):
        #Generate direction and append
        dirs.append(rand_unit_vec(2))
        
        #Evaluate at generated dir and append
        dir_evals.append(g(w+dirs[j]))
    
    #Select the smallest descent direction
    idx_sm = dir_evals.index(np.min(dir_evals))
    
    #Pull that direction
    dir_sm = dirs[idx_sm]

    #Check if the step + the current pos minimizes the function
    if(g(w+dir_sm) < g(w)): 
        #Take descent step
        w = w+alpha*dir_sm
    
        #Store the step and eval at that step
        weights_past.append(w)
        evals_past.append(g(w))
   
print('Starting Points:') 
print('W_0:', w_0)
print('g(w_0) = ',g(w_0))
print()
print('Ending Points:')
print('W_final:', weights_past[7])
print('g(W_final) = ',g(weights_past[7]))

Starting Points:
W_0: [2 2]
g(w_0) =  3.5999999999999748

Ending Points:
W_final: [-1.56808444 -3.52533699]
g(W_final) =  1.0


$\textbf{Lesson Learned:}$ if $\vec{w}$ is constantly increasing, then it's probably centered about $[0,1)$, bounding $w_1 \times w_2$ to the $1^{st}$ quadrant in a 2D carteasian space. This will mean that $\vec{w}$ can only increase in magnitude and you'll *never* converge. To fix this and save four hours of debugging, the random function should follow $rand \in [-1,1)$ which is found by $2*(rand - 0.5)$.