# ML lec 01


## What is ML?
- Limitations of explicit programming
- 1959, Arthur Samuel

## What is learning?
### supervised
- label이 정해진 데이터(training set)을 가지고 학습하는 것
- ML의 일반적인 문제
ex) Image labeling, Email spam filter, Predicting exam score
- Training data set이 반드시 필요
- regression(0~100), binary classification(binary), multi-label classification으로 나뉨

### unsupervised
- un-labeled data, 데이터를 보고 스스로 학습
ex) google news grouping, Word clustering

# ML lec 02 - Simple Linear Regression
## Regression
- Regression toward the mean (Francis Galton)
- 전체 평균으로 되돌아가려는(회귀하려는) 성질이 있다는 통계적 원리

## Linear Regression
- 데이터를 가장 잘 대변하는 직선의 방정식을 찾는 것

## Hypothesis
- H(x) = Wx+b

## Which hypothesis is better?
- 어떤 가설이 데이터를 가장 잘 대변하는가?

## Cost, Cost function
- H(x)-y, 가설과 실제 데이터의 차이
- Cost를 최소화 하는 방법
- Error 값을 제곱해서 더한 후 평균을 내는 것을 Cost function으로 많이 사용
$$Cost(w) = \frac{1}{m}\sum_{i=1}^m (H(x_i)-y_i)^2$$

## Goal: Minimize cost
$$ minimize Cost(W,b) $$

## Build hypothesis and cost
$$H(x) = Wx + b $$
```
x_data = [1,2,3,4,5]
y_data = [1,2,3,4,5]

W = tf.Variable(2.9)
b = tf.Variable(0.5)

# hypothesis = W * x + b
hypothesis = W * x_data + b
```
$$Cost(w) = \frac{1}{m}\sum_{i=1}^m (H(x_i)-y_i)^2$$
```
cost = tf.reduce_mean(tf.square(hypothesis - y_data))
```

## Gradient descent (경사 하강법)
```
learning_rate = 0.01

# Gradient descent
with tf.GradientTape() as tape:
    hypothesis = W * x_data + b
    cost = tf.reduce_mean(tf.square(hypothesis - y_data))
    
W_grad, b_grad = tape.gradient(cost, [w, b])

W.assign_sub(learning_rate * W_grad)
b.assign_sub(learning_rate * b_grad)
```

In [5]:
import tensorflow as tf
import numpy as np

# Data
x_data = [1,2,3,4,5]
y_data = [1,2,3,4,5]

W = tf.Variable(2.9)
b = tf.Variable(0.5)

learning_rate = 0.01

# W, b update
for i in range(100+1):
    # Gradient descent
    with tf.GradientTape() as tape:
        hypothesis = W * x_data + b
        cost = tf.reduce_mean(tf.square(hypothesis-y_data))
    W_grad, b_grad = tape.gradient(cost, [W,b])
    W.assign_sub(learning_rate * W_grad)
    b.assign_sub(learning_rate * b_grad)
    if i % 10 == 0:
        print("{:5}|{:10.4f}|{:10.4f}|{:10.6f}".format(i, W.numpy(), b.numpy(), cost))
        
print()

#predict
print( W * 5 + b )
print( W * 2.5 + b)

    0|    2.4520|    0.3760| 45.660004
   10|    1.1036|    0.0034|  0.206336
   20|    1.0128|   -0.0209|  0.001026
   30|    1.0065|   -0.0218|  0.000093
   40|    1.0059|   -0.0212|  0.000083
   50|    1.0057|   -0.0205|  0.000077
   60|    1.0055|   -0.0198|  0.000072
   70|    1.0053|   -0.0192|  0.000067
   80|    1.0051|   -0.0185|  0.000063
   90|    1.0050|   -0.0179|  0.000059
  100|    1.0048|   -0.0173|  0.000055

tf.Tensor(5.00667, shape=(), dtype=float32)
tf.Tensor(2.4946702, shape=(), dtype=float32)


# ML lec 03 - How to minimize cost
## Gradient descent algorithm
- 경사를 내려가면서 최저점을 찾게 만드는 알고리즘
- Cost가 최소화 되는 W와 b를 찾는 데 사용 가능

### How it works?
- 0이나 랜덤값을 초기값으로 가짐
- cost가 줄어드는 방향으로 W와 b값을 업데이트
- 기울기 값을 구해서 최소점에 도달했다고 판단될 때까지 반복

## Formal definition
$$Cost(w) = \frac{1}{2m}\sum_{i=1}^m (H(x_i)-y_i)^2$$
$$ W := W - a\frac{1}{m}\sum_{u=1}^m (W(x_i)-y_i)x_i$$

## Convex function
- local minimum과 global minimum이 일치하는 경우
- Gradient descent algorithm을 사용 가능

## Cost function in pure Python 

In [8]:
import numpy as np

X = np.array([1, 2, 3])
Y = np.array([1, 2, 3])

def cost_func(W, X, Y):
    c = 0
    for i in range(len(X)):
        c += (W * X[i] - Y[i]) ** 2
    return c / len(X)

for feed_W in np.linspace(-3, 5, num=15):
    curr_cost = cost_func(feed_W, X, Y)
    print("{:6.3f} | {:10.5f}".format(feed_W, curr_cost))

-3.000 |   74.66667
-2.429 |   54.85714
-1.857 |   38.09524
-1.286 |   24.38095
-0.714 |   13.71429
-0.143 |    6.09524
 0.429 |    1.52381
 1.000 |    0.00000
 1.571 |    1.52381
 2.143 |    6.09524
 2.714 |   13.71429
 3.286 |   24.38095
 3.857 |   38.09524
 4.429 |   54.85714
 5.000 |   74.66667


## Cost function in TensorFlow

In [9]:
X = np.array([1, 2, 3])
Y = np.array([1, 2, 3])

def cost_func(W, X, Y):
  hypothesis = X * W
  return tf.reduce_mean(tf.square(hypothesis - Y))

W_values = np.linspace(-3, 5, num=15)
cost_values = []

for feed_W in W_values:
    curr_cost = cost_func(feed_W, X, Y)
    cost_values.append(curr_cost)
    print("{:6.3f} | {:10.5f}".format(feed_W, curr_cost))

-3.000 |   74.66667
-2.429 |   54.85714
-1.857 |   38.09524
-1.286 |   24.38095
-0.714 |   13.71429
-0.143 |    6.09524
 0.429 |    1.52381
 1.000 |    0.00000
 1.571 |    1.52381
 2.143 |    6.09524
 2.714 |   13.71429
 3.286 |   24.38095
 3.857 |   38.09524
 4.429 |   54.85714
 5.000 |   74.66667


## Gradient descent

In [10]:
tf.random.set_seed(0)

x_data = [1., 2., 3., 4.]
y_data = [1., 3., 5., 7.]

W = tf.Variable(tf.random.normal((1,), -100., 100.))

for step in range(300):
    hypothesis = W * X
    cost = tf.reduce_mean(tf.square(hypothesis - Y))

    alpha = 0.01
    gradient = tf.reduce_mean(tf.multiply(tf.multiply(W, X) - Y, X))
    descent = W - tf.multiply(alpha, gradient)
    W.assign(descent)
    
    if step % 10 == 0:
        print('{:5} | {:10.4f} | {:10.6f}'.format(
            step, cost.numpy(), W.numpy()[0]))
        

    0 | 11716.3086 |  48.767971
   10 |  4504.9126 |  30.619968
   20 |  1732.1364 |  19.366755
   30 |   666.0052 |  12.388859
   40 |   256.0785 |   8.062004
   50 |    98.4620 |   5.379007
   60 |    37.8586 |   3.715335
   70 |    14.5566 |   2.683725
   80 |     5.5970 |   2.044044
   90 |     2.1520 |   1.647391
  100 |     0.8275 |   1.401434
  110 |     0.3182 |   1.248922
  120 |     0.1223 |   1.154351
  130 |     0.0470 |   1.095710
  140 |     0.0181 |   1.059348
  150 |     0.0070 |   1.036801
  160 |     0.0027 |   1.022819
  170 |     0.0010 |   1.014150
  180 |     0.0004 |   1.008774
  190 |     0.0002 |   1.005441
  200 |     0.0001 |   1.003374
  210 |     0.0000 |   1.002092
  220 |     0.0000 |   1.001297
  230 |     0.0000 |   1.000804
  240 |     0.0000 |   1.000499
  250 |     0.0000 |   1.000309
  260 |     0.0000 |   1.000192
  270 |     0.0000 |   1.000119
  280 |     0.0000 |   1.000074
  290 |     0.0000 |   1.000046
