# Lecture 5. Logistic ( regression ) classification
---

## Regression

- Hypothesis:

$$H(X) = WX$$

- Cost : 

$$cost(W) = \frac{1}{m} \sum(WX - y)^2$$

- Gradient descent :
 - $\alpha$ : learning rate
 
$$W := W - \alpha \frac{\partial}{\partial W} cost(W)$$

## Binary Classification
- Spam Email Detection: Spam or Ham
- Facebook feed: show or hide
- Credit Card Fraudulent Transaction detection: legitimate or fraud
- Radiology: Malignant tumor or Benign tumor

## Weakness of linear regression
- 만약 data 에서 아주 극단의 feature x 값이 있는 경우, 즉 outlier 가 있는 경우 line 이 기울어지게 됨
 - 편향된 line 에 의해 classification 오류가 발생할 수 있음
- $H(x) = Wx + b$ 로는 $y = 0 ~ or ~ 1$ 을 표현하는데 한계가 있음

<img src="http://www.ats.ucla.edu/stat/stata/webbooks/logistic/chapter1/stata9-1.gif", width=400>

## Logistic Hypothesis
- Logistic function
- Sigmoid function
 - sigmoid is curved in two directions, like the letter "S", or the Greek $\varsigma$ (sigma)
$$g(z) = \frac{1}{(1 + e^{-z})}$$

### Hypothesis
$$
\begin{align}
H(x) &= g(z) \\
( z &= WX )\\
H(X) &= \frac{1}{1+e^{-W^T X}}
\end{align}
$$

![](http://www.saedsayad.com/images/LogReg_1.png)

## Cost fucntion
- cost
$$cost(W, b) = \frac{1}{m} \sum (H(x_i) - y_i)^2$$

- $H(x) = Wx + b$
 - $H(x)$ 는 직선
 - 직선을 제곱하면 곡선이 된다.
![](https://encrypted-tbn2.gstatic.com/images?q=tbn:ANd9GcTQDO-s0PxY8dprDOOp2wjTAKTp_gs1HhESbyC05Z_tuEEK80wF)

- $H(X) = \frac{1}{1+e^{-W^T X}}$
 - $H(X)$ 는 $S$ 자 모양, $(0 \le H(X) \le 1)$
 - 제곱하면 구불구불한 곡선이 된다.
 - 이 경우 global minimum 을 찾지 못하고, local minimum 에 빠지기 쉽다.
 
![](https://encrypted-tbn2.gstatic.com/images?q=tbn:ANd9GcQSo6e9_wzEYMfavVgKmqqXYzaJE7nVmRS13FuLmOISDgue3b_y)

## New cost function for logistic

$$
\begin{align}
cost(W) &= \frac{1}{m} \sum c(H(x), y) \\
c(H(x), y) &= \begin{cases} -log(H(x)) \quad y = 1 \\ -log(1 - H(x)) \quad y = 0 \end{cases}
\end{align}
$$
<br />

- $H(X) = \frac{1}{1+e^{-W^T X}}$ -> $log (H(X))$
 - exponential term 이 있기 때문에 구부러진 점을 상쇄하기 위해 $log$ 를 취해준다.


- $z = H(X), \quad c(H, y) = g(z)$
 - $g(z) = -log(z), \quad when ~ y = 1$
   - $z = 1$ 이 되면 $g(z)$ 는 $0$에 가까워짐 **( 예측이 맞음: $H(X) = 1$ -> $c(H, y) = 0$ )**
   - $z = 0$ 에 가까워지면 $g(z)$ 는 $+\infty$에 가까워짐 **( 예측이 틀림: $H(X) = 0$ -> $c(H, y) = \infty$ )**
   
  - $g(z) = -log(1 - z), \quad when ~ y = 0$
   - $z = 0$ 이 되면 $g(z)$ 는 $0$에 가까워짐 **( 예측이 맞음: $H(X) = 0$ -> $c(H, y) = 0$ )**
   - $z = 1$ 에 가까워지면 $g(z)$ 는 $+\infty$에 가까워짐 **( 예측이 틀림: $H(X) = 1$ -> $c(H, y) = \infty$ )**

![](http://adit.io/imgs/logistic/log_graph.png)

## Cost function
- if 문을 없애고 한 줄 짜리 수식으로 바꿔줌
 - $y = 1, \quad c = -log(H(x))$
 - $y = 0, \quad c = -log(1 - H(x))$

$$
\begin{align}
cost(W) &= \frac{1}{m} \sum c(H(x), y) \\
c(H(x), y) &= \begin{cases} -log(H(x)) \quad y = 1 \\ -log(1 - H(x)) \quad y = 0 \end{cases} \\
c(H(x), y) &= -y \cdot log(H(x)) - (1-y) \cdot log(1-H(x))
\end{align}
$$

## Minimize cost - Gradient descent algorithm

- Gradient descent
 - $\alpha$: learning rate

$$
\begin{align}
c(W) =& -\frac{1}{m} \sum y \cdot log(H(x)) + (1-y) \cdot log(1-H(x)) \\
W :=& W - \alpha \frac{\partial}{\partial W} cost(W)
\end{align}
$$

- code

```
# cost function
cost = tf.reduce_mean(-tf.reduce_sum(Y*tf.log(hypothesis) + (1-Y)*tf.log(1-hypothesis)))

# Minimize
a = tf.Variable(0.1) # learning rate
optimizer = tf.train.GradientDescentOptimizer(a)
train = optimizer.minimize(cost)
```

# Lab 5. Logistic ( regression ) classification
---
- $H(X) = \frac{1}{1+e^{-W^T X}}$
- $c(W) = -\frac{1}{m} \sum y \cdot log(H(x)) + (1-y) \cdot log(1-H(x))$

In [1]:
import tensorflow as tf
import numpy as np

# load data
xy = np.loadtxt('./data/train_logistic.txt', unpack=True, dtype='float32')
x_data = xy[0:-1]
y_data = xy[-1]

X = tf.placeholder(tf.float32)
y = tf.placeholder(tf.float32)

W = tf.Variable(tf.random_uniform([1, len(x_data)], -1.0, 1.0))

# Hypothesis
h = tf.matmul(W, X)
hypothesis = tf.div(1., 1. + tf.exp(-h))

# Cost func
cost = -tf.reduce_mean(y*tf.log(hypothesis) + (1-y)*tf.log(1-hypothesis))

# Minimize
a = tf.Variable(0.1) # learning rate
optimizer = tf.train.GradientDescentOptimizer(a)
train = optimizer.minimize(cost)

# init
init = tf.global_variables_initializer()

# Launch
sess = tf.Session()
sess.run(init)

# fitting
for step in range(3001):
    sess.run(train, feed_dict={X:x_data, y:y_data})
    if step % 100 == 0:
        print(step, sess.run(cost, feed_dict={X:x_data, y:y_data}), sess.run(W))

0 1.06748 [[ 0.48464769  0.91838366 -0.7550177 ]]
100 0.521366 [[-0.79896683 -0.07749598  0.42639631]]
200 0.42402 [[-1.75914693 -0.05038413  0.63074201]]
300 0.367072 [[-2.50094962  0.00815814  0.74847698]]
400 0.329799 [[-3.10139132  0.05333814  0.84540093]]
500 0.30341 [[-3.6065352   0.08720482  0.93049532]]
600 0.283586 [[-4.04421854  0.11325153  1.00694621]]
700 0.268007 [[-4.43207598  0.13380995  1.0766983 ]]
800 0.255328 [[-4.78182936  0.15038267  1.14111257]]
900 0.244721 [[-5.10160685  0.16396967  1.20118403]]
1000 0.235649 [[-5.39723778  0.17526054  1.25765765]]
1100 0.227747 [[-5.67304182  0.18474858  1.31110489]]
1200 0.220761 [[-5.93228388  0.19279398  1.36197162]]
1300 0.214507 [[-6.17750168  0.19966963  1.41061103]]
1400 0.208848 [[-6.4106884   0.20558295  1.45730782]]
1500 0.203681 [[-6.63344526  0.21069677  1.50229418]]
1600 0.198927 [[-6.84707594  0.21514028  1.54576254]]
1700 0.194523 [[-7.05264521  0.2190166   1.58787191]]
1800 0.190418 [[-7.25104141  0.22240938  1.

## Ask to ML

In [2]:
# hypothesis
h = tf.matmul(W, X)
hypothesis = tf.div(1., 1. + tf.exp(-h))

# cost func
cost = -tf.reduce_mean(y*tf.log(hypothesis) + (1-y)*tf.log(1-hypothesis))

# minimize
a = tf.Variable(0.1) # learning rate
optimizer = tf.train.GradientDescentOptimizer(a)
train = optimizer.minimize(cost)

# init
init = tf.global_variables_initializer()

# launch
sess = tf.Session()
sess.run(init)

# fitting
for step in range(3001):
    sess.run(train, feed_dict={X:x_data, y:y_data})
    if step % 100 == 0:
        print(step, sess.run(cost, feed_dict={X:x_data, y:y_data}), sess.run(W))

print("-"*50)

# predict
print(sess.run(hypothesis, feed_dict={X:[[1], [2], [2]]})>0.5)
print(sess.run(hypothesis, feed_dict={X:[[1], [5], [5]]})>0.5)
print(sess.run(hypothesis, feed_dict={X:[[1, 1], [4, 3], [3, 5]]})>0.5)

0 1.88748 [[-0.3623485  -0.24675807 -0.41658551]]
100 0.462074 [[-1.34712648 -0.06644846  0.54792029]]
200 0.390255 [[-2.17783165 -0.01678387  0.69671303]]
300 0.345374 [[-2.83659554  0.03409148  0.80205745]]
400 0.314653 [[-3.381675    0.07263944  0.89218622]]
500 0.292156 [[-3.84801102  0.10195407  0.97236824]]
600 0.274816 [[-4.25726557  0.12483323  1.04503465]]
700 0.260917 [[-4.62353039  0.14310825  1.11178637]]
800 0.249429 [[-4.956388    0.15798049  1.17376947]]
900 0.239698 [[-5.2626195   0.17026654  1.23183393]]
1000 0.23129 [[-5.54717159  0.18054026  1.28662479]]
1100 0.223905 [[-5.81375217  0.18921727  1.33864117]]
1200 0.217331 [[-6.06520605  0.19660683  1.38827586]]
1300 0.21141 [[-6.30376148  0.20294473  1.43584323]]
1400 0.206026 [[-6.53118563  0.20841216  1.48159826]]
1500 0.201088 [[-6.74890995  0.21315266  1.52575004]]
1600 0.196528 [[-6.9581027   0.21728082  1.5684725 ]]
1700 0.19229 [[-7.15973186  0.2208894   1.60991061]]
1800 0.18833 [[-7.35460043  0.2240523   1.65