### 다변수 선형회귀 Multi variable linear regression

- hypothesis 변수가 여러개일때 가중치의 개수 늘어남-> matrix 사용
- 내적(dot product) 사용 
- 텐서플로우 표현: H(X)= XW
- X[n,m] * W[m,k] = [n,k]


In [5]:
import tensorflow as tf
import numpy as np

In [6]:
data= np.array([
    #x1,x2,x3,y
    [73.,80.,75.,152.],
    [93.,88.,93.,185.],
    [89.,91.,90.,180.],
    [96.,98.,100.,196.],
    [73.,66.,70.,142.]
],dtype=np.float32)

In [7]:
#slice data
X= data[:,:-1]
y= data[:,[-1]]

In [8]:
#matrix 이용
W= tf.Variable(tf.random.normal([3,1])) #변수의 개수는 3개, 출력은 하나
b= tf.Variable(tf.random.normal([1]))
learning_rate=0.000001

In [9]:
#hypothesis, prediction function
def predict(X):
    return tf.matmul(X,W)+b

n_epochs= 2000
for i in range(n_epochs+1):
    #record the gradient of the cost function
    with tf.GradientTape() as tape:
        cost= tf.reduce_mean((tf.square(predict(X)-y)))
        #오차제곱의 평균 구함
        
    #오차의 가중치를 테이프에 기록
    W_grad, b_grad= tape.gradient(cost,[W,b])
    
    #업데이트해줌
    W.assign_sub(learning_rate*W_grad)
    b.assign_sub(learning_rate*b_grad)
    
    if i%100==0:
        print("{:5} | {:10.4f}".format(i, cost.numpy()))
    

    0 | 32941.7422
  100 |     5.5545
  200 |     1.4987
  300 |     1.4976
  400 |     1.4969
  500 |     1.4963
  600 |     1.4957
  700 |     1.4950
  800 |     1.4944
  900 |     1.4937
 1000 |     1.4931
 1100 |     1.4925
 1200 |     1.4918
 1300 |     1.4912
 1400 |     1.4905
 1500 |     1.4899
 1600 |     1.4893
 1700 |     1.4886
 1800 |     1.4880
 1900 |     1.4874
 2000 |     1.4867


### 로지스틱 회귀 Logistic Regression
1. What is logistic regression
- classification : [0:positive/1:negative] -> 원핫인코딩 
- logistic vs linear : logistic 은 데이터가 구분됨, linear는 연속적인 수치형 데이터
2. How to solve?
- hypothesis representation : linear 한 값을 logistic function(g function) 을 통해 0과 1로 구분
- sigmoid/logistic function : linear regression 을 통해 나온 실수 z 값을 시그모이드를 통해서 0과 1로 표현할수있음
- decision boundary : 0.5보다 크면 1, 0.5 보다 작으면 0 -> 0.5가 decision boundary 임, 구간을 정해주는 기준
- cost function : random하게 만들어지는 parameter를 최적화하는것, log 함수이용하면 convex한 형태로 표현가능
- optimizer(gradient descent) : 어떻게 cost 함수를 최소화하는 방법

In [10]:
x_train= np.array([
    [1,2],
    [2,3],
    [3,1],
    [4,3],
    [5,3],
    [6,2]
], dtype=np.float32)
y_train= np.array([
    [0],
    [0],
    [0],
    [1],
    [1],
    [1]
], dtype= np.float32)

In [11]:
x_test= np.array([[5,2]],dtype=np.float32)
y_test= np.array([[1]],dtype=np.float32)

In [12]:
dataset= tf.data.Dataset.from_tensor_slices((x_train,y_train)).batch(len(x_train))
W= tf.Variable(tf.zeros([2,1]),name='weight')
b= tf.Variable(tf.zeros([1]),name='bias')

In [13]:
#원소의 자료구조 반환
dataset.element_spec

(TensorSpec(shape=(None, 2), dtype=tf.float32, name=None),
 TensorSpec(shape=(None, 1), dtype=tf.float32, name=None))

In [14]:
def logistic_regression(features):
    hypothesis= tf.sigmoid(tf.matmul(features,W)+b)
    return hypothesis

In [15]:
def loss_fn(features, labels):
    hypothesis= logistic_regression(features)
    cost= -tf.reduce_mean(labels*tf.math.log(hypothesis)+(1-labels)*tf.math.log(1-hypothesis))
    return cost

In [16]:
def grad(hypothesis, features, labels):
    with tf.GradientTape() as tape:
        loss_value= loss_fn(features, labels)
    return tape.gradient(loss_value,[W,b])

In [17]:
optimizer= tf.keras.optimizers.SGD(learning_rate=0.01)
epochs=3000

In [18]:
for step in range(epochs+1):
    for features, labels in iter(dataset):
        hypothesis= logistic_regression(features)
        grads= grad(hypothesis,features,labels)
        optimizer.apply_gradients(grads_and_vars=zip(grads,[W,b]))
        if step%300 == 0:
            print("iter:{}, loss:{:.4f}".format(step,loss_fn(features,labels)))

iter:0, loss:0.6874
iter:300, loss:0.5054
iter:600, loss:0.4535
iter:900, loss:0.4228
iter:1200, loss:0.3992
iter:1500, loss:0.3790
iter:1800, loss:0.3608
iter:2100, loss:0.3442
iter:2400, loss:0.3288
iter:2700, loss:0.3146
iter:3000, loss:0.3013


In [20]:
def accuracy_fn(hypothesis,labels):
    predicted= tf.cast(hypothesis>0.5,dtype=tf.float32)
    accuracy= tf.reduce_mean(tf.cast(tf.equal(predicted,labels),dtype=tf.int32))
    return accuracy

test_acc= accuracy_fn(logistic_regression(x_test),y_test)
print('accuracy:{}%'.format(test_acc*100))

accuracy:100%


정시은