# lab05_Logistic (Regression) Classification using pytorch

### 2018.09.26(화)

$ H(X) = \frac{1}{1+e^{-W^{T}X}}$

$cost(W) = \frac{1}{m}\sum c(H(x),y)$

$c(H(x),y) = \begin{cases} -log(H(x)) & : y=1 \\ -log(1-H(x)) & : y = 0 \end{cases}$

$ c(H(x),y) = - ylog(H(x)) - (1-y)log(1-H(x))$

$\therefore\ \ cost(W) = -\frac{1}{m}\sum ylog(H(x)) + (1-y)(log(1-H(x))$

$ W := W -\alpha\frac{\delta}{\delta W}cost(W)$

In [101]:
import torch
from torch.autograd import Variable
from torch import nn, optim
import numpy as np

## Classifying Diabetes
### torch에 이미 구현된 함수를 쓸 경우

BCE loss ? ->
$ℓ(x,y)=L=\{l_1,…,l_N\}^⊤,l_n=−w_n[y_n⋅logx_n+(1−y_n)⋅log(1−x_n)]$

BCE Loss -> loss =  pos_weight \* (target \* torch.log(output)) + neg_weight\* ((1 - target) \* torch.log(1 - output))


In [124]:
xy = np.loadtxt("data-03-diabetes.csv", delimiter = ",", dtype=np.float32)
x_data = Variable(torch.tensor(xy[:,0:-1],dtype = torch.float))
y_data = Variable(torch.tensor(xy[:,[-1]],dtype = torch.float))

class Model(torch.nn.Module):
    def __init__(self):
        super(Model, self).__init__()
        self.linear = torch.nn.Linear(8, 1,bias = True) # 2 in and 1 out
        
    def forward(self, x):
        y_pred = torch.sigmoid(self.linear(x))
        return y_pred


model = Model()
criterion = torch.nn.BCELoss(size_average=False)
optimizer = torch.optim.SGD(model.parameters(), lr=0.001)



In [125]:
for t in range(10001):
    y_pred = model(x_data)
    cost = criterion(y_pred,y_data)
    
    optimizer.zero_grad()
    cost.backward()
    optimizer.step()
    if t % 20 == 0:
        print(t, "\ncost : ",cost.data.numpy(), "\nweight : \n",model.linear.weight.data.numpy(), model.linear.bias.data.numpy())

0 
cost :  526.89514 
weight : 
 [[ 0.17692101  0.10496174  0.04877179 -0.0540678   0.0725209  -0.10012192
  -0.2966953  -0.33440897]] [0.0107098]
20 
cost :  423.9573 
weight : 
 [[-0.28068662 -0.7785837  -0.09737264 -0.31268698 -0.22071922 -0.48594123
  -0.5088917  -0.39792967]] [0.04148428]
40 
cost :  395.66907 
weight : 
 [[-0.49119288 -1.3897622  -0.17319518 -0.4316716  -0.32279983 -0.763597
  -0.6168252  -0.33850655]] [0.00572284]
60 
cost :  381.71912 
weight : 
 [[-0.614207   -1.8238871  -0.2020757  -0.5046725  -0.3658692  -0.9782595
  -0.710071   -0.27822793]] [0.00613062]
80 
cost :  373.91415 
weight : 
 [[-0.6907728  -2.1479914  -0.20698962 -0.5519221  -0.3815366  -1.1515254
  -0.7855916  -0.22590724]] [0.01933992]
100 
cost :  369.1727 
weight : 
 [[-0.7411129  -2.398337   -0.19875567 -0.5836246  -0.3841458  -1.2956694
  -0.84528136 -0.18321545]] [0.03644054]
120 
cost :  366.12384 
weight : 
 [[-0.7757754  -2.5963674  -0.18296488 -0.60533774 -0.3805976  -1.4182397
  -0.8

In [186]:
# Accuracy computation
# True if hypothesis>0.5 else False
correct = torch.tensor([0],dtype = torch.long)
for x,y in zip(x_data,y_data):
    outputs = model(x)
    predicted = (outputs.data > 0.45).float()
    correct += (y==predicted).long()
print("correct(Y) : " ,correct, "\naccuracy : ",(correct.numpy()/len(x_data))[0])

correct(Y) :  tensor([588]) 
accuracy :  0.7747035573122529


## Classifying Diabetes
### hypothesis와 cost function 를 직접 계산할 경우

$ H(X) = \frac{1}{1+e^{-W^{T}X}}$

$cost(W) = \frac{1}{m}\sum c(H(x),y)$

$c(H(x),y) = \begin{cases} -log(H(x)) & : y=1 \\ -log(1-H(x)) & : y = 0 \end{cases}$

$ c(H(x),y) = - ylog(H(x)) - (1-y)log(1-H(x))$

$\therefore\ \ cost(W) = -\frac{1}{m}\sum ylog(H(x)) + (1-y)(log(1-H(x))$

$ W := W -\alpha\frac{\delta}{\delta W}cost(W)$

In [387]:
xy = np.loadtxt("data-03-diabetes.csv", delimiter = ",", dtype=np.float32)
X = Variable(torch.tensor(xy[:,0:-1],dtype = torch.float))
Y = Variable(torch.tensor(xy[:,[-1]],dtype = torch.float))

W = torch.randn(8, device=torch.device("cpu"))

# learning rate
learning_rate = 0.01

[derivative of cost function for Logistic Regression
](https://math.stackexchange.com/questions/477207/derivative-of-cost-function-for-logistic-regression)

[logistic_regression](https://ml-cheatsheet.readthedocs.io/en/latest/logistic_regression.html#id22)

In [388]:
def predict(features, weights):
    z = np.dot(features, weights)
    return torch.sigmoid(torch.tensor(z,dtype = torch.float))

In [389]:
def cost_function(features, labels, weights):
    observations = len(labels)

    predictions = predict(features, weights)
    class1_cost = -labels*np.log(predictions)
    class2_cost = (1-labels)*np.log(1-predictions)
    cost = class1_cost - class2_cost
    cost = cost.sum()/observations

    return cost

In [390]:
def update_weights(features, labels, weights, lr):
    N = len(features)
    predictions = predict(features, weights)
    gradient = np.dot(torch.transpose(features,0,1), predictions - labels.view(N))
    gradient /= N
    gradient *= lr
    weights -= torch.mean(torch.tensor(gradient,dtype = torch.float))

    return weights

In [391]:
def train(features, labels, weights, lr, iters):
    for i in range(iters):
        weights = update_weights(features, labels, weights, lr)
        cost = cost_function(features, labels, weights)
        if i % 100 == 0:
            print("iter: ", str(i), " cost: ",cost.numpy())
    return weights

In [392]:
def accuracy(predicted_labels, actual_labels):
    diff = predicted_labels - actual_labels
    return 1.0 - (float(np.count_nonzero(diff)) / len(diff))

In [393]:
W = train(X,Y,W,0.001,10000)


iter:  0  cost:  923.3561
iter:  100  cost:  915.82733
iter:  200  cost:  908.4773
iter:  300  cost:  901.30457
iter:  400  cost:  894.3094
iter:  500  cost:  887.48883
iter:  600  cost:  880.842
iter:  700  cost:  874.3666
iter:  800  cost:  868.062
iter:  900  cost:  861.9253
iter:  1000  cost:  855.9542
iter:  1100  cost:  850.147
iter:  1200  cost:  844.501
iter:  1300  cost:  839.0136
iter:  1400  cost:  833.68176
iter:  1500  cost:  828.5041
iter:  1600  cost:  823.47626
iter:  1700  cost:  818.5965
iter:  1800  cost:  813.8615
iter:  1900  cost:  809.2682
iter:  2000  cost:  804.8136
iter:  2100  cost:  800.4949
iter:  2200  cost:  796.3085
iter:  2300  cost:  792.25244
iter:  2400  cost:  788.32245
iter:  2500  cost:  784.51587
iter:  2600  cost:  780.83014
iter:  2700  cost:  777.26135
iter:  2800  cost:  773.8069
iter:  2900  cost:  770.464
iter:  3000  cost:  767.22943
iter:  3100  cost:  764.10016
iter:  3200  cost:  761.07355
iter:  3300  cost:  758.14655
iter:  3400  cost

In [394]:
# Accuracy computation
# True if hypothesis>0.5 else False
correct = torch.tensor([0],dtype = torch.long)
for x,y in zip(x_data,y_data):
    pred = predict(X,W)
    predicted = (outputs.data > 0.5).float()
    correct += (y==predicted).long()
print("correct(Y) : " ,correct, "\naccuracy : ",(correct.numpy()/len(x_data))[0])

correct(Y) :  tensor([496]) 
accuracy :  0.6534914361001317
