# Deep learning from scratch: homework 2

### General instructions

Complete the exercises listed below in this Jupyter notebook - leaving all of your code in Python cells in the notebook itself.  Feel free to add any necessary cells.

In [203]:
import autograd.numpy as np
from autograd import grad

import matplotlib.pyplot as plt
%matplotlib inline

#### <span style="color:#a50e3e;">Exercise 1. </span>   Perform two-class classification on a toy dataset

Code up the two-class logistic regression / softmax cost function, using gradient descent to minimize.  You should use the two class toy dataset in the file called *3d_classification_data_v2.csv*.

Create a plot with two panels that shows the number of misclassifications at each gradient descent step (in the left panel), and one that compares the cost function at each gradient descent step (in the right panel).  You won't get perfect separation - but you should be able to separate most of the points.

In [204]:

data_1 = np.loadtxt("3d_classification_data_v2.csv",delimiter = ',')

In [264]:
x_1 = data_1[:,:-1]
y_1 = data_1[:,-1]
np.place(y_1,y_1<0,0)
y_1_hot = np.eye(2)[y_1.astype(int)]
w_1 = np.random.randn(np.shape(x_1)[1]+1,2)

In [265]:
y_1_hot.dtype

dtype('float64')

In [266]:
def softmax(z):
    """Compute softmax values for each sets of scores in x."""
    return (np.exp(z.T) / np.sum(np.exp(z), axis=1)).T

    
    #(np.exp((z))) / np.sum(np.exp((z)))

In [267]:
def model(X, W):
    return (W[0] + np.dot(X,W[1:]))

In [268]:
def class_predict(z):
    return z.argmax(axis=1)

In [269]:
def misclassification(y_true,y_guess):
    return (np.sum(y_true == y_guess))/y_true.size

In [270]:
def cost(W,X,y_hot_true):
    output = model(X,W)
    softmax_values = softmax(output)
    return (np.mean((-np.sum(np.log(softmax_values)*y_hot_true,axis=1))))

In [271]:
def num_predict(W,X,Y):
    raw = model(X,W) 
    soft = softmax(raw)
    print(soft)
    classes = class_predict(soft)
    print(classes)
    return misclassification(Y,classes)

In [272]:
cost(w_1,x_1,y_1_hot)

0.7383567698911

In [273]:
# gradient descent function - inputs: g (input function), alpha (steplength parameter), max_its (maximum number of iterations), w (initialization)
def gradient_descent(g,alpha,max_its,w,x,y_hot):
    # compute the gradient of our input function - note this is a function too!
    
    gradient = grad(g)

    # run the gradient descent loop
    best_w = w             # weight we return, should be the one providing lowest evaluation
    best_eval = g(w,x,y_hot)       # lowest evaluation yet
    for k in range(max_its):
        # evaluate the gradient
        grad_eval = gradient(w,x,y_hot)
        #print(grad_eval)

        # take gradient descent step
        w = w - alpha*grad_eval
        
        # return only the weight providing the lowest evaluation
        test_eval = g(w,x,y_hot)
        if test_eval < best_eval:
            best_eval = test_eval
            best_w = w
            
    return best_w

In [278]:
best = gradient_descent(g=cost,alpha = .0015,max_its = 10000,w = w_1, x=x_1, y_hot=y_1_hot)
print(best)

[[-1.80406362 -0.68571936]
 [ 1.07496816  0.06015715]
 [ 1.02560338 -0.45653016]]


In [279]:
print(w_1)

[[-0.9497291  -1.54005387]
 [ 0.43414388  0.70098143]
 [ 0.3253405   0.24373272]]


In [280]:
cost(best,x_1,y_1_hot)

0.51222918208625

In [281]:
num_predict(best,x_1,y_1)

[[0.39636358 0.60363642]
 [0.40264491 0.59735509]
 [0.39169457 0.60830543]
 [0.45098398 0.54901602]
 [0.48025004 0.51974996]
 [0.33709277 0.66290723]
 [0.41421122 0.58578878]
 [0.44800038 0.55199962]
 [0.46487683 0.53512317]
 [0.37517735 0.62482265]
 [0.28754567 0.71245433]
 [0.37886017 0.62113983]
 [0.44364725 0.55635275]
 [0.48668493 0.51331507]
 [0.5006028  0.4993972 ]
 [0.40329835 0.59670165]
 [0.42616197 0.57383803]
 [0.3905395  0.6094605 ]
 [0.48581514 0.51418486]
 [0.44948252 0.55051748]
 [0.47301395 0.52698605]
 [0.48708482 0.51291518]
 [0.51450436 0.48549564]
 [0.3679408  0.6320592 ]
 [0.53594462 0.46405538]
 [0.55379418 0.44620582]
 [0.47121894 0.52878106]
 [0.36813565 0.63186435]
 [0.3322822  0.6677178 ]
 [0.46293206 0.53706794]
 [0.40761617 0.59238383]
 [0.34130147 0.65869853]
 [0.47887173 0.52112827]
 [0.55428095 0.44571905]
 [0.41409577 0.58590423]
 [0.38381911 0.61618089]
 [0.56269944 0.43730056]
 [0.48846383 0.51153617]
 [0.47845347 0.52154653]
 [0.44032364 0.55967636]


0.92

In [133]:
soft = softmax(test)

In [134]:
cost(soft,y_1_hot)

1.1825525843986018

In [98]:
first = to_classlabel(soft)
first.shape

(100,)

In [99]:
np.place(y_1,y_1<0,0)
y_1.shape
first

array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1])

In [100]:
misclassification(y_1,first)

0.5

#### <span style="color:#a50e3e;">Exercise 2. </span>   Perform two-class classification on a breast cancer dataset

Use the softmax cost function to classify healthy from cancerous tissue using the dataset located in breast_cancer_dataset.csv (included in this homework folder).  You can examine the description of this dataset [here](https://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+(Diagnostic)). 

There are $N = 8$ input dimensions to the input of this dataset (these are the first $N = 8$ columns of the breast_cancer_dataset.csv, the last column are the associated labels).  Fit using gradient descent using a maximum of 5,000 iterations.  You should be able to reach a point on the surface where you misclassify less than 30 examples.

**Note:** Python is a great prototyping language but [it is slow](http://jakevdp.github.io/blog/2014/05/09/why-python-is-slow/), particular when evaluating explicit for loops.  If you are having speed issues try re-writing the softmax cost function using as few explicit for-loops as possible (you can indeed write the entire summation in a single line of Python code, for-loop free).

#### <span style="color:#a50e3e;">Exercise 3. </span>   Code up One-versus-All multiclass classification

Using the toy $C = 3$ class dataset located in *3class_data.csv* - code up One-Versus-All classification, using this toy dataset to test out your code.  You should be able to learn a model that perfectly separates this data - as shown in class.  You may use your softmax cost / gradient descent code here for each of the two-class subproblems! 

#### <span style="color:#a50e3e;">Exercise 4. </span>   A nonlinear two-class dataset

Propose a nonlinear feature transformation and integrate it into your two-class classification scheme in order to adequately classify the dataset shown below - located in the file *bricks.csv'.  With the right transformation you should be able to classify this quite well.

<p>
  <img src= 'brick_pick.png' width="40%" height="40%" alt=""/>
</p>