In [1]:
import math
import numpy as np

# HW 5
## Linear Regression Error

Given noisy target y = **w**<sup>*T</sup>x + &epsilon; &SuchThat; x &isin; &reals;<sup>d</sup> (with x<sub>0</sub> = 1), y &isin; &reals;, w* unknown vector, &epsilon; noise term with zero mean and &sigma;<sup>2</sup> variance; data set &Dscr; = {(x1, y1),...,(xN, yN)} and the equation:


&#120124;<sub>&#119967;</sub>[E<sub>in</sub>(w<sub>lin</sub>)] = &sigma;<sup>2</sup>(1 - (d+1)/N)


Let the LH be &tau;, then:

&tau;/(&sigma;<sup>2</sup>) = 1 - (d+1)/N

&rArr; (d+1)/N = 1-&tau;/(&sigma;<sup>2</sup>)

&rArr; (d+1)/(1-&tau;/(&sigma;<sup>2</sup>)) = N



For &sigma; = 0.1, d = 8, E<sub>in</sub> &gt; 0.008, want smallest N.

&rArr; 9/(1 - 0.008/(0.1<sup>2</sup>)) < N

&rArr; 45 < N


Thus the smallest given N that satisfies this is **100**



## Nonlinear Transforms

**Given :** &phi;(1, x1, x2) = (1, x1<sup>2</sup>, x2<sup>2</sup>)

**Want :** weights giving a hyperbolic decision boundary



Since the general equation of a hyperbola is x<sup>2</sup>/a<sup>2</sup> - y<sup>2</sup>/b<sup>2</sup> = 1 and we want h(x) to be negative when x1<sup>2</sup> is large, the correct weights should be **w1 < 0, w2 > 0**.

**Given :** &phi;<sub>4</sub>: x &rarr; (1, x1, x1<sup>2</sup>, x1*x2, x2<sup>2</sup>, x1<sup>3</sup>, x1<sup>2</sup>*x2, x1*x2<sup>2</sup>, x2<sup>3</sup>, x1<sup>4</sup>, x1<sup>3</sup>*x2, x1<sup>2</sup>*x2<sup>2</sup>, x1*x2<sup>3</sup>, x2<sup>4</sup>), d = 13.

**Want :** VC dimension


Since VC dimension for a linear model = d + 1, d<sub>VC</sub> = 14, thus the lowest number given not smaller than the VC dimension is **15**.

## Gradient Descent

**Given :** nonlinear error surface E(u,v) = (u*e<sup>v</sup> - 2*v*e<sup>-u</sup>)<sup>2</sup>, starting point (u,v) = (1,1), learning rate &eta; = 0.1

- &part;E/&part;u = 2*(e<sup>v</sup> + 2*v*e<sup>-u</sup>) * (u*e<sup>v</sup> - 2*v*e<sup>-u</sup>)
- &part;E/&part;v = 2*(u*e<sup>v</sup> - 2*e<sup>-u</sup>)*(u*e<sup>v</sup> - 2*v*e<sup>-u</sup>)


In [10]:
gd_thresh = 10e-14
gd_lrate = 0.1
gd_start = np.array([1,1])
gd_coord_iters = 15
def gd_error(cur_coords):
    u = cur_coords[0]
    v = cur_coords[1]
    cur_error = math.pow(u*math.pow(math.e, v) - 2.0*v*math.pow(math.e, -1 * u), 2.0)
    return cur_error

def gd_partial(cur_coords):
    u = cur_coords[0]
    v = cur_coords[1]
    u_coord = 2.0*(math.pow(math.e, v) + 2.0*v*math.pow(math.e, -1 * u)) * (u*math.pow(math.e,v) - 2.0*v*math.pow(math.e, -1 * u))
    v_coord = 2.0*(u*math.pow(math.e, v) - 2.0*math.pow(math.e, -1 * u))*(u*math.pow(math.e, v) - 2*v*math.pow(math.e, -1 * u))
    return np.array([u_coord, v_coord])

def gd_perform(coords, l_rate, threshold):
    num_iterations = 0
    cur_error = gd_error(coords)
    while cur_error >= threshold:
        num_iterations = num_iterations + 1
        cur_partial = gd_partial(coords)
        coords = np.subtract(coords, np.multiply(l_rate, cur_partial))
        cur_error = gd_error(coords)
    return coords, cur_error, num_iterations

def gd_coord_perform(coords, l_rate, iters):
    #instead of moving along both coords, moving along u, then v, then u, then v...
    cur_error = 0
    for cur_iter in range(iters):
        cur_partial = gd_partial(coords)
        #first move along u coord so 0 out v coord
        coords = np.subtract(coords, np.multiply(l_rate, np.multiply(np.array([1,0]), cur_partial)))
        #now redo for v coord
        cur_partial = gd_partial(coords)
        coords = np.subtract(coords, np.multiply(l_rate, np.multiply(np.array([0,1]), cur_partial)))
        cur_error = gd_error(coords)
    return cur_error


gd_coords, gd_err, gd_numiters = gd_perform(gd_start, gd_lrate, gd_thresh)
gd_cerr = gd_coord_perform(gd_start, gd_lrate, gd_coord_iters)
print("With starting coordinates (%f,%f), learning rate %f and threshold %e:" % (gd_start[0], gd_start[1], gd_lrate, gd_thresh))
print("It took %d iterations to achieve an error of %e ending at coordinates (%f,%f)." % (gd_numiters, gd_err, gd_coords[0], gd_coords[1]))
print("Iterating coordinate-wise for %d iterations, the resulting error is %f." % (gd_coord_iters, gd_cerr))


With starting coordinates (1.000000,1.000000), learning rate 0.100000 and threshold 1.000000e-13:
It took 10 iterations to achieve an error of 1.208683e-15 ending at coordinates (0.044736,0.023959).
Iterating coordinate-wise for 15 iterations, the resulting error is 0.139814.
