# Notes and code used on my UDACITY MLND Deep Learning module

## Activation functions

Code for implementing Sigmoid function and Softmax function.

In [None]:
import numpy as np

In [2]:
'''
SIGMOID FUNCTION
'''

def sigmoid_f(x):
    if x < 0:
        z = np.exp(x)
        return z / (1+z)
    elif x >= 0:
        z = np.exp(-x)
        return 1 / (1+z)

def score(x_1, x_2):
    return 4*x_1 + 5*x_2 - 9

print(sigmoid_f(score(1,1)))
print(sigmoid_f(score(2,4)))
print(sigmoid_f(score(5,-5)))
print(sigmoid_f(score(-4,5)))

0.5
0.999999994397
8.31528027664e-07
0.5


In [3]:
'''
SOFTMAX FUNCTION
'''
def softmax(L):
    eL = []
    softmaxL = []
    for e in L:
        eL.append(np.exp(e))
    sumeL = sum(eL)
    for v in eL:
        softmaxL.append(v/float(sumeL))
    return softmaxL

L = [2, 1, 0]
print(softmax(L))

[0.6652409557748219, 0.24472847105479764, 0.090030573170380462]


In [4]:
'''Better implementation of SOFTMAX'''
def softmax_o(L):
    expL = np.exp(L)
    return np.divide (expL, expL.sum())

L = [5, 6, 7]
print(softmax_o(L))
# [0.090030573170380462, 0.24472847105479764, 0.6652409557748219]

[ 0.09003057  0.24472847  0.66524096]


The score function represent the line (plane - hyperplane) of an arbitrary neural network. The sigmoid function is a continuous differentiable function that gives back a probability of a given point to be on one or the other side of the classification boundary. A pribability of 0.5 is equivalent to having the given point right on the boundary.

The softmax function transforms a list of values to a list of probabilities that add up to one. In the case of having more than two classes, the softmax function gives the probabilities of each class.

## Cross - Entropy

If we have probabilities for each data point, we could multiply probabilities of all datapoints to get a value for a model. Since this is not a good approach because the numbers will be very small for several datapoints, we use the logarithmic property that states that the log of a multiplication is just the addition of the log of each of its constituents. Using natural log, which is negative for numbers between 0 and 1 (ln 0 approaches -inf and ln 1 = 0) we would get negative numbers, so we use a the negative of the natural log to get positive numbers and thus the expression ends up being:

-Σ[i=1, m] y_i \* ln(p_i) + (1-y_i) \* ln(1-p_i)

The smaller the cross entropy, the better the model. 

In [5]:
'''CROSS ENTROPY'''

def cross_entropy(Y, P):
    result = 0
    for i in range(len(Y)):
        result -=  Y[i]*np.log(P[i])+(1-Y[i])*np.log(1-P[i])
    return result

Y=[1,0,1,1]
P=[0.4,0.6,0.1,0.5]
print(cross_entropy(Y,P))
# 4.8283137373

4.8283137373


In [1]:
'''Provided solution'''
def cross_entropy(Y, P):
    Y = np.float_(Y)
    P = np.float_(P)
    return -np.sum(Y * np.log(P) + (1 - Y) * np.log(1 - P))


Note that the y_i is either one or zero, and there is only one vector of ys, so this is equivalent of having two classes. If we have multiple classes (n of them), the cross entropy is the following:

-Σ[i=1, m] Σ[j=1, n] y_i_j \* ln(p_i_j) 

y_i_j is 0 or 1, thus we are only adding the events that actually occured.

## Error Function

The error function for one point is:

-(1-y)\*ln(1-y_pred) - y\*ln(y_pred)

if y is equal to 1, only the second term remains; if zero only the first term remains. y_pred is the prediction, or probability, of the datapoint being of class '1' [y_pred = p(y=1)] -> according to the model.

The error function for the model then is the average of the errors:

-1/m \*  Σ[i=1, m] (1-y_i)\*ln(1-y_pred_i) + y_i\*ln(y_pred_i)

or, since y_pred = f_act(Wx+b)

-1/m \*  Σ[i=1, m] (1-y_i)\*ln(1-f_act(Wx_i+b)) + y_i\*ln(f_act(Wx_i+b))

Generalizing for more than two classes:

-1/m \*  Σ[i=1, m] Σ[j=1, n] y_i_j \* ln(y_pred_i_j)