## Cost Function
![image.png](attachment:image.png)
Cost function that we used for linear regression will result in a non-convex function $J(\theta)$ which may not optimally converge to a global minima

![image.png](attachment:image.png)

![image.png](attachment:image.png)

$ Cost(h_\theta(x), y) = 0; if h_\theta(x) = y $

## Simplified Cost Function and Gradient Descent

![image.png](attachment:image.png)
$ Cost(h_\theta(x), y) = -y\log(h_\theta(x)) - (1-y)\log(1-
h_\theta(x)) $ 

![image.png](attachment:image.png)

![image.png](attachment:image.png)

### Vectorised implementation of the algorithm will be
Notice that this algorithm is identical to the one we used in linear regression.


$ h = g(X\theta) $

$ J(\theta) = \frac{1}{m}(-y^{T}\log(h) - (1-y)^{T}\log(1-h)) $ 

$ \theta_j := \theta_j - \alpha\frac{\delta}{\delta\theta_j}J(\theta) $

$  \theta := \theta - \alpha\frac{1}{m}\sum_{i=1}^{m}[h_\theta(x)^{(i)} - y^{(i)}). x^{(i)}] $

Vectorized implementation is:

$ \theta := \theta - \frac{\alpha}{m}X^{T}g(X\theta) - \overrightarrow{y} $

## Advance Optimization
![image.png](attachment:image.png)

![image.png](attachment:image.png)
Dimension of initial theta must be greater than or equal to 2


Note: '100' should be 100. value must be integer

![image.png](attachment:image.png)

```octave
% First we need to provide a function that evaluates the following two functions for a given input value
```
$$ J(\theta) $$

$$ \frac{\delta}{\delta\theta_j}J(\theta) $$

```octave
function [jVal, gradient] = costFunction(theta)
    jVal = [...code to compute J(theta)...]
    gradient = [...code to compute derivate of J(theta)...]
end
```

```octave
% optimset function creates an object containing the options we want to send to fminunc().

options = optimset('GradObj', 'on', 'MaxIter', 100);
initialTheta = zeros(2,1);
    [optTheta, functionVal, exitFlag] = fminunc(@costFunction, initialTheta, options);
```

### Mulitclass classification problem

When we have more than two categories. instead of y = {0, 1} we will expand our definition so that y = {0, 1, ... n}

![image.png](attachment:image.png)

 ![image.png](attachment:image.png)

![image.png](attachment:image.png)
Out of all the classifiers, one which has the highest probablity of being true is selected.