# 3. Gradient Descent(경사하강법)
***

## 3.1 Gradient Descent Algorithm(경사하강법 알고리즘)

### 3.1.1 What is 'learning'?('학습'이란 무엇인가?)
- 학습: 손실($loss$)을 최소화 시키는 가중치($w$)를 찾는 것

$$ \operatorname{argmin}_w loss(w) $$

- GD(Gradient Descent, 경사하강법)는 가중치($w$)를 찾는 방법중 하나 

### 3.1.2 Graph of GD(그래프) 

 ![3_GD](slides/이미지/3_GD.jpg)

- 기본 개념: 가중치 $w$를 계속 업데이트 시켜 Global loss minimum를 찾는 것
- 목표: Global loss minimum 찾기
- Gradient(경사): 미분값
- Gradient(경사) 구하는 법: 모델을 가중치로 미분(derivative)
- GD 적용 방법
    1. 임의의 초기값을 준다(initial weight).
    2. Gradient(경사)를 계산한 후 방향을 설정한다.
    3. Gradient에 Learning Rate($\alpha$)를 곱하고 가중치($w$)에 업데이트 해준다.
    4. 위에서 업데이트된 가중치를 적용한 후 새로운 계산을 한다.
    5. 위의 과정을 반복한다.

## 3.2 Code Practice: GD Implement(경사하강법 적용)

### 1. Data & Model Design(데이터 & 모델 정의)

In [2]:
# 기본 데이터 
x_data = [1.0, 2.0, 3.0]
y_data = [2.0, 4.0, 6.0]

# 가중치 초기값(initial weight) 주기
w = 1.0 # a random guess: random value
w_2 = 1.0 # a random guess: random value

b = 0 # bias

# 1. our forward pass model(기본 모델 정의)
def forward(x):
    return x * w

# 2. Loss function(손실 함수 정의)
def loss(x, y):
    y_pred = forward(x)
    return (y_pred - y) * (y_pred - y)

# 3. Compute gradient(경사 계산법 정의)
def gradient(x, y): #d_loss/d_w
    return 2 * x * (x * w - y)

### 2. Training: Updating weight(훈련: 가중치 업데이트)

In [3]:
# Before training
print("predict (before training)", 4, forward(4))

# Training loop
for epoch in range(100): # epoch: 훈련 반복 횟수
    for x_val, y_val in zip(x_data, y_data): # 훈련 데이터 가져오기
        grad = gradient(x_val, y_val) # 경사 정의
        w = w - 0.01 * grad # where we have to update # Learning Rate 0.01 설정
        print("\tgrad: ", x_val, y_val, grad)
        l = loss(x_val, y_val) # 손실 계산

    print("progress:", epoch, "w=", w, "loss=", l)

# After training
print("predict(after training)", "4 hours", forward(4))

predict (before training) 4 4.0
	grad:  1.0 2.0 -2.0
	grad:  2.0 4.0 -7.84
	grad:  3.0 6.0 -16.2288
progress: 0 w= 1.260688 loss= 4.919240100095999
	grad:  1.0 2.0 -1.478624
	grad:  2.0 4.0 -5.796206079999999
	grad:  3.0 6.0 -11.998146585599997
progress: 1 w= 1.453417766656 loss= 2.688769240265834
	grad:  1.0 2.0 -1.093164466688
	grad:  2.0 4.0 -4.285204709416961
	grad:  3.0 6.0 -8.87037374849311
progress: 2 w= 1.5959051959019805 loss= 1.4696334962911515
	grad:  1.0 2.0 -0.8081896081960389
	grad:  2.0 4.0 -3.1681032641284723
	grad:  3.0 6.0 -6.557973756745939
progress: 3 w= 1.701247862192685 loss= 0.8032755585999681
	grad:  1.0 2.0 -0.59750427561463
	grad:  2.0 4.0 -2.3422167604093502
	grad:  3.0 6.0 -4.848388694047353
progress: 4 w= 1.7791289594933983 loss= 0.43905614881022015
	grad:  1.0 2.0 -0.44174208101320334
	grad:  2.0 4.0 -1.7316289575717576
	grad:  3.0 6.0 -3.584471942173538
progress: 5 w= 1.836707389300983 loss= 0.2399802903801062
	grad:  1.0 2.0 -0.3265852213980338
	grad:  2

	grad:  2.0 4.0 -1.8593837580738182e-10
	grad:  3.0 6.0 -3.8489211817704927e-10
progress: 81 w= 1.999999999982466 loss= 2.7669155644059242e-21
	grad:  1.0 2.0 -3.5067948545020045e-11
	grad:  2.0 4.0 -1.3746692673066718e-10
	grad:  3.0 6.0 -2.845563784603655e-10
progress: 82 w= 1.9999999999870368 loss= 1.5124150106147723e-21
	grad:  1.0 2.0 -2.5926372160256506e-11
	grad:  2.0 4.0 -1.0163070385260653e-10
	grad:  3.0 6.0 -2.1037571684701106e-10
progress: 83 w= 1.999999999990416 loss= 8.26683933105326e-22
	grad:  1.0 2.0 -1.9167778475548403e-11
	grad:  2.0 4.0 -7.51381179497912e-11
	grad:  3.0 6.0 -1.5553425214420713e-10
progress: 84 w= 1.9999999999929146 loss= 4.518126871054872e-22
	grad:  1.0 2.0 -1.4170886686315498e-11
	grad:  2.0 4.0 -5.555023108172463e-11
	grad:  3.0 6.0 -1.1499068364173581e-10
progress: 85 w= 1.9999999999947617 loss= 2.469467919185614e-22
	grad:  1.0 2.0 -1.0476508549572827e-11
	grad:  2.0 4.0 -4.106759377009439e-11
	grad:  3.0 6.0 -8.500933290633839e-11
progress: 86

### 3.2.2 Excercise

In [4]:
# Exercise 3.2.2: Compute and implement gradient

# gradient
def gradient_1(x, y):
    return 2*x*(x*x*w_2+x*w+b-y) 

def gradient_2(x, y):
    return 2*x*x*(x*x*w_2+x*w+b-y) 

In [5]:
# Before training
print("predict (before training)", 4, forward(4))

# Training loop
for epoch in range(100):
    for x_val, y_val in zip(x_data, y_data):
        grad_1 = gradient_1(x_val, y_val)
        w = w - 0.01 * grad_1 # where we have to update
        grad_2 = gradient_2(x_val, y_val)
        w_2 = w_2 - 0.01 * grad_2 # where we have to update
        print("\tgrad: ", x_val, y_val, grad_1, grad_2)
        l = loss(x_val, y_val)

    print("progress:", epoch, "w=", w, "w_2=", w_2, "loss=", l)

# After training
print("predict(after training)", "4 hours", forward(4))

predict (before training) 4 7.9999999999996945
	grad:  1.0 2.0 1.9999999999998472 1.9599999999998499
	grad:  2.0 4.0 15.526399999999427 28.56857599999894
	grad:  3.0 6.0 34.359816959999414 84.52514972159857
progress: 0 w= 1.4811378303999367 w_2= -0.15053725721597355 loss= 2.4229615593787632
	grad:  1.0 2.0 -1.3387988536320736 -1.312022876559432
	grad:  2.0 4.0 -6.242465903716006 -11.486137262837453
	grad:  3.0 6.0 -9.192896810866742 -22.61452615473218
progress: 1 w= 1.648879446082085 w_2= 0.2035896057253171 loss= 1.1095707904526109
	grad:  1.0 2.0 -0.29506189638519587 -0.28916065845749195
	grad:  2.0 4.0 0.5183399173257683 0.9537454478794132
	grad:  3.0 6.0 4.274602908587514 10.515523155125265
progress: 2 w= 1.603900636786804 w_2= 0.0917885262798452 loss= 1.412052349841094
	grad:  1.0 2.0 -0.6086216738667014 -0.5964492403893673
	grad:  2.0 4.0 -1.5560568728564093 -2.863144646055794
	grad:  3.0 6.0 0.08461471836465506 0.20815220717705785
progress: 3 w= 1.6247012750703886 w_2= 0.12430294

	grad:  3.0 6.0 0.4220665407651474 1.0382836902822632
progress: 71 w= 1.850767168319643 w_2= 0.045770598905952364 loss= 0.20043394246204002
	grad:  1.0 2.0 -0.20692446554880917 -0.20278597623783323
	grad:  2.0 4.0 -0.41253335750566045 -0.7590613778104149
	grad:  3.0 6.0 0.41632135001085935 1.0241505210267245
progress: 72 w= 1.8527985330500791 w_2= 0.0451475672361676 loss= 0.19501444684987743
	grad:  1.0 2.0 -0.20410779942750645 -0.20002564343895646
	grad:  2.0 4.0 -0.40691793291625267 -0.7487289965659052
	grad:  3.0 6.0 0.41065436307897585 1.0102097331742819
progress: 73 w= 1.854802246742727 w_2= 0.0445330163044734 loss= 0.18974148795863943
	grad:  1.0 2.0 -0.20132947390559908 -0.1973028844274869
	grad:  2.0 4.0 -0.4013789457657637 -0.7385372602090037
	grad:  3.0 6.0 0.4050645154552015 0.9964587080197944
progress: 74 w= 1.8567786857848887 w_2= 0.04392683067064036 loss= 0.18461110360953278
	grad:  1.0 2.0 -0.1985889670889418 -0.1946171877471632
	grad:  2.0 4.0 -0.39591535558398405 -0.72