#### Support Vector Machines - SVM

SVM's are ML models able to perform following ML tasks
* linear and non-linear classification
* Regression
* outlier detection

Well suited for classification of complex small to medium-sized datasets.
The fundamental idea behind SVM is to fit the widest possible "street" (gap)  between the classes. I.e the widest possible margin between decision boundary that seperates the two classes and the training instance.

SVM's are sensitive to feature scales. Consider feature scaling to reduce impact.



##### Soft Margin Classification.
With SVM when we impose that all instances are off the "street" and on the right side. This is known as __hard margin classification__ . 
Hard margin classification - only works if the data is linearly seperable and is very sensitive to outliers

** Soft Margin Classification ** tries to avoid this issue by: Finding a good balance between keeping the street as large as possible and limiting the __margin violations__

In Scikit-Learn SVM classes - we use the "c" hyperparamenter to manage this balance.
Higher "C" leads to wider street with potentially more margin violations.

If SVM model is overfitting, try regularization by reducing "C"


In [2]:
import numpy as np
a = np.random.randn(12288, 150) # a.shape = (12288, 150)
b = np.random.randn(150, 45) # b.shape = (150, 45)
c = np.dot(a,b)

In [5]:
c.shape

(12288, 45)

In [8]:
import numpy as np
# a.shape = (3,4)
# b.shape = (4,1)
import numpy as np
a = np.random.randn(3, 4) # a.shape = (12288, 150)
b = np.random.randn(4, 1) # b.shape = (150, 45)

y = a+b.T

# for i in range(3):
#     for j in range(4):
#         c[i][j] = a[i][j] + b[j]

In [10]:
y.shape

(3, 4)

In [27]:
import math

def basic_sigmoid(x):
    """
    Compute sigmoid of x.

    Arguments:
    x -- A scalar

    Return:
    s -- sigmoid(x)
    """
    
    ### START CODE HERE ### (≈ 1 line of code)
    s = 1/(1+math.exp(1)**(-x))
    
    ### END CODE HERE ###
    
    return s

In [28]:
basic_sigmoid(3)

0.9525741268224331

In [53]:
import numpy as np
x = np.array([1,2,3]).reshape(3,1)

x.shape
#1/(1+np.exp(1)**(-z))

(3, 1)

In [61]:
import numpy as np
# GRADED FUNCTION: sigmoid

import numpy as np # this means you can access numpy functions by writing np.function() instead of numpy.function()

def sigmoid(x):
    """
    Compute the sigmoid of x

    Arguments:
    x -- A scalar or numpy array of any size

    Return:
    s -- sigmoid(x)
    """
    
    ### START CODE HERE ### (≈ 1 line of code)
    exp_unit = np.ones(x.shape)
    s = 1/(1+np.exp(exp_unit)**(-x))
    
    ### END CODE HERE ###
    
    return s

In [62]:
x = np.array([1, 2, 3])
sigmoid(x)


array([ 0.73105858,  0.88079708,  0.95257413])

In [49]:
x.reshape(3,1)

array([[1],
       [2],
       [3]])

array([[ 1.],
       [ 1.],
       [ 1.]])

In [1]:
import numpy as np
A = np.random.randn(4,3)
B = np.sum(A, axis = 1, keepdims = True)
B.shape

(4, 1)

## 4 - Neural Network model

Logistic regression did not work well on the "flower dataset". You are going to train a Neural Network with a single hidden layer.

**Here is our model**:
<img src="images/classification_kiank.png" style="width:600px;height:300px;">

**Mathematically**:

For one example $x^{(i)}$:
$$z^{[1] (i)} =  W^{[1]} x^{(i)} + b^{[1] (i)}\tag{1}$$ 
$$a^{[1] (i)} = \tanh(z^{[1] (i)})\tag{2}$$
$$z^{[2] (i)} = W^{[2]} a^{[1] (i)} + b^{[2] (i)}\tag{3}$$
$$\hat{y}^{(i)} = a^{[2] (i)} = \sigma(z^{ [2] (i)})\tag{4}$$
$$y^{(i)}_{prediction} = \begin{cases} 1 & \mbox{if } a^{[2](i)} > 0.5 \\ 0 & \mbox{otherwise } \end{cases}\tag{5}$$

Given the predictions on all the examples, you can also compute the cost $J$ as follows: 
$$J = - \frac{1}{m} \sum\limits_{i = 0}^{m} \large\left(\small y^{(i)}\log\left(a^{[2] (i)}\right) + (1-y^{(i)})\log\left(1- a^{[2] (i)}\right)  \large  \right) \small \tag{6}$$

**Reminder**: The general methodology to build a Neural Network is to:
    1. Define the neural network structure ( # of input units,  # of hidden units, etc). 
    2. Initialize the model's parameters
    3. Loop:
        - Implement forward propagation
        - Compute loss
        - Implement backward propagation to get the gradients
        - Update parameters (gradient descent)

You often build helper functions to compute steps 1-3 and then merge them into one function we call `nn_model()`. Once you've built `nn_model()` and learnt the right parameters, you can make predictions on new data.