# Tobig's 14기 2주차 Optimization 과제
### Made by 이지용

# Gradient Descent 구현하기

### 1) "..." 표시되어 있는 빈 칸을 채워주세요  
### 2) 강의내용과 코드에 대해 공부한 내용을 적어서 과제를 채워주세요

In [1]:
import pandas as pd
import numpy as np
import random

In [2]:
data = pd.read_csv('assignment_2.csv')
data.head()

Unnamed: 0,Label,bias,experience,salary
0,1,1,0.7,48000
1,0,1,1.9,48000
2,1,1,2.5,60000
3,0,1,4.2,63000
4,0,1,6.0,76000


## Train Test 데이터 나누기
### 데이터셋을 train/test로 나눠주는 메소드  
https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html

In [3]:
from sklearn.model_selection import train_test_split

In [4]:
X_train, X_test, y_train, y_test = train_test_split(data.iloc[:, 1:], data.iloc[:, 0], test_size=0.25, random_state = 0)

In [5]:
X_train.shape, X_test.shape, y_train.shape, y_test.shape

((150, 3), (50, 3), (150,), (50,))

## Scaling  

experience와 salary의 단위, 평균, 분산이 크게 차이나므로 scaler를 사용해 단위를 맞춰줍니다. 

In [6]:
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
bias_train = X_train["bias"]
bias_train = bias_train.reset_index()["bias"]
X_train = pd.DataFrame(scaler.fit_transform(X_train), columns = X_train.columns)
X_train["bias"] = bias_train
X_train.head()

Unnamed: 0,bias,experience,salary
0,1,0.187893,-1.143335
1,1,1.185555,0.043974
2,1,-0.310938,-0.351795
3,1,-1.629277,-1.34122
4,1,-1.3086,0.043974


이때 scaler는 X_train에 fit 해주시고, fit한 scaler를 X_test에 적용시켜줍니다.  
똑같이 X_test에다 fit하면 안돼요!

In [7]:
bias_test = X_test["bias"]
bias_test = bias_test.reset_index()["bias"]
X_test = pd.DataFrame(scaler.transform(X_test), columns = X_test.columns)
X_test["bias"] = bias_test
X_test.head()

Unnamed: 0,bias,experience,salary
0,1,-1.344231,-0.615642
1,1,0.50857,0.307821
2,1,-0.310938,0.571667
3,1,1.363709,1.956862
4,1,-0.987923,-0.747565


In [8]:
# parameter 개수
N = len(X_train.loc[0])
N

3

In [9]:
# 초기 parameter들을 임의로 설정해줍니다.
parameters = np.array([random.random() for i in range(N)])
parameters

array([0.47943121, 0.77132011, 0.21115225])

### * LaTeX   

Jupyter Notebook은 LaTeX 문법으로 수식 입력을 지원하고 있습니다.  
http://triki.net/apps/3466  
https://jjycjnmath.tistory.com/117

## Logistic Function

## $p = "1/(1+e^{-x}))" $

In [10]:
def logistic(X, parameters):
    z = 0
    for i in range(len(parameters)) :
        z += X[i]*parameters[i]
    p =  1.0/(1.0+np.exp(-z))
    
    return p

* e^(-x)에서 x는 $\sum Bx$ 이기 때문에 parameter의 개수만큼 반복하여 z에 더해주고 p에서 그 확률값을 받아 return한다

In [11]:
logistic(X_train.iloc[1], parameters)

0.8026847080785801

## Object Function

Object Function : 목적함수는 Gradient Descent를 통해 최적화 하고자 하는 함수입니다.  
로지스틱 회귀의 목적함수를 작성해주세요
## $l(p) ="\sum(ylog(p)+(1-y)log(1-p))"$

In [12]:
def cross_entropy_i(X, y, parameters) :
    p = logistic(X, parameters)                       # 위에서 작성한 함수를 활용하세요
    loss = y*np.log(p)+(1-y)*np.log(1-p)
    return -loss

In [13]:
def cross_entropy(X_set, y_set, parameters) :
    loss = 0
    for i in range(X_set.shape[0]):
        X = X_set.iloc[i, :]
        y = y_set.iloc[i]
        loss += cross_entropy_i(X, y, parameters)
    return loss

* cross_entropy_i로 목적함수의 식을 구현해주고 cross_entropy 안에서 반복해주면서 값을 다 더한다

In [14]:
cross_entropy(X_test, y_test, parameters)

52.78312829447642

## Gradient of Cross Entropy

## ${\partial\over{\partial \theta_j}}l(p)= "-\sum(y-p)x"$

In [15]:
# cross_entropy를 theta_j에 대해 미분한 값을 구하는 함수
def get_gradient_ij_cross_entropy(X, y, parameters, j):
    p = logistic(X, parameters)
    gradient = -(y-p)*X[j]
    return gradient

* 반대방향으로 가야하기때문에 미분한것에 -가 있는것이다.

In [16]:
get_gradient_ij_cross_entropy(X_train.iloc[0, :], y_train.iloc[0], parameters, 1)

-0.07617525971642503

## Batch Gradient Descent  

Batch Gradient Descent : $"x^{k}=x^{k-1}−tk⋅\sum_{i=1}^{m}fi(x^{k-1}),k=1,2,3,.."$

* 학습 한번에 모든 데이터셋에 대해 기울기를 구한다
* 방향에 대해서는 정확할 수 있지만 계산량이 매우많다

In [17]:
def get_gradients_bgd(X_train, y_train, parameters) :
    gradients = [0 for i in range(len(parameters))]
    
    for i in range(X_train.shape[0]):
        X = X_train.iloc[i, :]
        y = y_train.iloc[i]
        for j in range(len(parameters)):
            gradients[j] += get_gradient_ij_cross_entropy(X, y, parameters, j)
            
    return gradients

* gradients에 값을 parameter의 개수만큼 0으로 선언해준다
* gradients에 값에 미분한 점들로 더해준다(위에서 구한 get_gradient_ij_cross_entropy 사용)

In [18]:
gradients_bgd = get_gradients_bgd(X_train, y_train, parameters)
gradients_bgd

[47.45687076818456, 11.123177010732196, 38.3697934212434]

## Stochastic Gradient Descent  

Stochastic Gradient Descent : $"x^{k+1}=x^{k}−tk⋅∇fik(x^{k−1}),ik \in \{1,2,...,m\}"$

* 학습 한번에 임의의 데이터에 대해서만 기울기를 구한다
* 데이터를 넣기전에 무작위하게 섞어주고 차례대로 뽑는게 아니라 임의로 뽑아서 gradient를 구하는방법이다. 
* Batch Gradient Descent와는 다르게 하나의 데이터를 이용하여 업데이트하기 떄문에 속도는 빠를수 았으나 정확하지 않을 수 있다.
* 그렇기 때문에 Batch와 Stochastic을 합쳐놓은듯한 Mini-Batch를 많이 사용한다고 한다.

In [19]:
def get_gradients_sgd(X_train, y, parameters) :
    gradients = [0 for i in range(len(parameters))]
    r = int(random.random()*(X_train.shape[0]))
    X = X_train.iloc[r, :]
    y = y_train.iloc[r]
        
    for j in range(len(parameters)):
        gradients[j] =  get_gradient_ij_cross_entropy(X, y, parameters, j)
        
    return gradients

In [20]:
gradients_sgd = get_gradients_sgd(X_train, y_train, parameters)
gradients_sgd

[0.27907769096593915, -0.45469495488095796, -0.2270371795417179]

## Update Parameters  

In [21]:
def update_parameters(parameters, gradients, learning_rate) :
    for i in range(len(parameters)) :
        gradients[i] *= learning_rate
    parameters -= gradients
    return parameters

* gradients에 학습률만큼 모수를 업데이트 시켜준다

In [22]:
update_parameters(parameters, gradients_bgd, 0.01)

array([ 0.0048625 ,  0.66008834, -0.17254569])

## Gradient Descent  

위에서 작성한 함수들을 조합해서 Gradient Descent를 진행하는 함수를 완성해주세요

learning_rate = "학습률"  
max_iter = "최대 반복횟수"  
tolerance = "변화율이 tolerance보다 작으면 학습을 그만한다"

In [23]:
def gradient_descent(X_train, y_train, learning_rate=0.01, max_iter=100000, tolerance=0.0001, optimizer="bgd") :
    count = 1
    point = 100 if optimizer == "bgd" else 10000
    N = len(X_train.iloc[0])
    parameters = np.array([random.random() for i in range(N)])
    gradients = [0 for i in range(N)]
    loss = cross_entropy(X_train, y_train, parameters)
    
    while count < max_iter :
        
        if optimizer == "bgd" :
            gradients = get_gradients_bgd(X_train, y_train, parameters)
        elif optimizer == "sgd" :
            gradients = get_gradients_sgd(X_train, y_train, parameters)
            # loss, 중단 확인
        if count%point == 0 :
            new_loss = cross_entropy(X_train, y_train, parameters)
            print(count, "loss: ",new_loss, "params: ", parameters, "gradients: ", gradients)
            
            #중단 조건
            if abs(new_loss-loss) < tolerance*len(y_train) :
                break
            loss = new_loss
                
            
                
        parameters = update_parameters(parameters, gradients, learning_rate)
        count += 1
    return parameters

In [24]:
new_param_bgd = gradient_descent(X_train, y_train)
new_param_bgd

100 loss:  45.38015700324572 params:  [-1.62451963  3.47646664 -3.30263354] gradients:  [0.27622568162035926, -0.9089056475351965, 0.8495535155928922]
200 loss:  44.80016573469691 params:  [-1.78213122  3.99162367 -3.78159499] gradients:  [0.0831963192569357, -0.27034643851353923, 0.25001806976136465]
300 loss:  44.74021028343032 params:  [-1.83414216  4.16037091 -3.93739771] gradients:  [0.029956146053548974, -0.09704300445760279, 0.0894502013742747]
400 loss:  44.73212098733444 params:  [-1.85338472  4.22267447 -3.99479258] gradients:  [0.011387147404068182, -0.036849762270933485, 0.03392605700446663]


array([-1.85338472,  4.22267447, -3.99479258])

## Hyper Parameter Tuning

Hyper Parameter들을 매번 다르게 해서 학습을 진행해 보세요. 다른 점들을 발견할 수 있습니다.

* gradient descent 특징은 learning rate, iter 횟수 등과 같은 hyper parater 학습을 통해 얻는것이 아니라 사람이 직접 넣어주는 값이라고 한다. (모수 parameter는 학습을 통해서 얻어진다)





In [25]:
new_param_sgd = gradient_descent(X_train, y_train, learning_rate=0.01, max_iter=100000, tolerance=0.0001, optimizer="sgd")
new_param_sgd

10000 loss:  46.54438391412418 params:  [-1.43493838  3.01140181 -2.91592548] gradients:  [0.004054417735083267, -0.0053056118784497565, 0.000178290673859503]
20000 loss:  45.03954202337346 params:  [-1.71109631  3.70273559 -3.52247166] gradients:  [0.28413741082912586, 0.2862403819271171, 0.23740078634082887]
30000 loss:  44.979189479329705 params:  [-1.79812758  3.88931191 -3.84217575] gradients:  [0.006815579658324907, -0.007947490943932916, -0.002397689373763393]
40000 loss:  44.87452961934882 params:  [-1.89681346  4.11813949 -4.034833  ] gradients:  [0.0957263666859769, 0.014575511906823565, 0.02315231371248387]
50000 loss:  44.75806371677091 params:  [-1.85601122  4.28758581 -3.99079496] gradients:  [0.0004795226480606663, -0.0006275034354554744, 2.108673097836912e-05]
60000 loss:  44.81538916495611 params:  [-1.76405043  4.25419405 -4.05874088] gradients:  [-0.37097497088180875, 0.18144089003230005, 0.3996785372642768]
70000 loss:  44.7809926263097 params:  [-1.816949    4.1839

array([-1.89829822,  4.24538887, -4.05240859])

## Predict Label

In [26]:
y_predict = []
for i in range(len(y_test)):
    p = logistic(X_test.iloc[i,:], new_param_bgd)
    if p> 0.5 :
        y_predict.append(1)
    else :
        y_predict.append(0)

## Confusion Matrix

In [27]:
from sklearn.metrics import *
tn, fp, fn, tp = confusion_matrix(y_test, y_predict).ravel()
confusion_matrix(y_test, y_predict)

array([[38,  2],
       [ 1,  9]], dtype=int64)

In [28]:
accuracy = accuracy_score(y_test, y_predict) #(tp+tn)/(tp+fn+fp+tn)
print("accuracy: ", accuracy)

accuracy:  0.94


* 정확도가 상당히 높다!

In [29]:
precision = precision_score(y_test, y_predict) #(tp)/(tp+fp)
recall = recall_score(y_test,y_predict) #(tp)/(tp+fn)
specificity = tn/(tn+fp)
f1_score = f1_score(y_test, y_predict) #2*(precision*recall)/(precision+recall)

In [30]:
print("precision: ", precision)
print("recall: ", recall)
print("specificity: ", specificity)
print("f1_score: ", f1_score)

precision:  0.8181818181818182
recall:  0.9
specificity:  0.95
f1_score:  0.8571428571428572


* 참고한 자료 <br>
https://www.edwith.org/aipython/lecture/24518/ <br>
https://deepapple.tistory.com/7