# Softmax Regression
## Multi-class Classification
+ Binary classification problem과 달리 3개 이상의 다중 클래스 분류에서 활용됨
    + e.g. iris data에서 4개의 feature를 통해 4개의 iris 품종을 classification할 때
+ 클래스에 부여된 확률의 총 합이 1이 되도록 각 클래스에 확률 부여
    + class가 3개일 때 softamax fucntion을 통해 3-dim. vector를 원소의 총합이 1이 되도록 원소들의 값이 변환\
    $H(x) = softmax(WX+B)$ 

## Softmax Function 
+ 분류해야하는 class의 총 개수를 k라고 할 때, k-dim.의 vector를 입력받아 각 class에 대한 확률을 추정함\
$p_i = \frac{e^{z_i}}{\sum_{j=1}^k e^{z_j}}$
    + $z_i$: k-dim.의 vector에서 i번째 원소
    + $p_i$: i번째 class가 정답일 확률을 
    + k=3일 때, 3 dim. vector $z=[z_1, z_2, z_3]$\
    $softmax(z) = [\frac{e^{z_1}}{\sum_{j=1}^3 e^{z_j}} \frac{e^{z_2}}{\sum_{j=1}^3 e^{z_j}} \frac{e^{z_3}}{\sum_{j=1}^3 e^{z_j}}] = [p_1, p_2, p_3] = \hat{y} = predict \ value$

## Cost function
__Cross Entropy Function__\
$cost(W)=-\frac{1}{n} \sum_{i=1}^n \sum_{j=1}^k y_j^{(i)} \log{(p_j^{(i)})}$\
+ $-\sum_{j=1}^k y_j \log{(p_J)}$ 값을 최소화하는 방향으로 학습해야하기 때문
    + 실제값과 예측값의 차이가 별로 없을 때 위의 식이 0에 가까워야 함

__Cross Entropy Function in Binary classification__
+ if k=2\
$cost(w) = -\frac{1}{n} \sum_{i=1}^n [y^{(i)}\log(p^{(i)})+(1-y^{(i)})\log(1-p^{(i)})]$


In [1]:
import torch 
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim

torch.manual_seed(1)

<torch._C.Generator at 0x1ec3013faf0>

In [2]:
# data
x_train = [[1, 2, 1, 1],
           [2, 1, 3, 2],
           [3, 1, 3, 4],
           [4, 1, 5, 5],
           [1, 7, 5, 5],
           [1, 2, 5, 6],
           [1, 6, 6, 6],
           [1, 7, 7, 7]] # 8 by 4
y_train = [2, 2, 2, 1, 1, 1, 0, 0] # 8 by 1
x_train = torch.FloatTensor(x_train)
y_train = torch.LongTensor(y_train)

In [3]:
# implement on low level

# one-hot encodding
y_one_hot = torch.zeros(8,3) # number of class 3
y_one_hot.scatter(1,y_train.unsqueeze(1),1) # 새로 구성한 tensor에 원하는 index에 맞게 값을 할당 해줌 

# initialize model 
w = torch.zeros((4,3), requires_grad=True) 
b = torch.zeros(1, requires_grad=True)

# optimizer
opt = optim.SGD([w,b], lr=0.1)

# number of epoch 
n_epoch = 1000 

for epoch in range(n_epoch+1):
    hyp = F.softmax(x_train.matmul(w)+b, dim=1)
    cost = (y_one_hot * -torch.log(hyp)).sum(dim=1).mean()
    
    opt.zero_grad()
    cost.backward()
    opt.step()
    
    if epoch % 100 == 0:
        print(f'Epoch {epoch:4d}/{n_epoch} Cost: {cost.item():.6f}')
    

Epoch    0/1000 Cost: 0.000000
Epoch  100/1000 Cost: 0.000000
Epoch  200/1000 Cost: 0.000000
Epoch  300/1000 Cost: 0.000000
Epoch  400/1000 Cost: 0.000000
Epoch  500/1000 Cost: 0.000000
Epoch  600/1000 Cost: 0.000000
Epoch  700/1000 Cost: 0.000000
Epoch  800/1000 Cost: 0.000000
Epoch  900/1000 Cost: 0.000000
Epoch 1000/1000 Cost: 0.000000


In [4]:
# implement on high level 

# initialize model 
w = torch.zeros((4,3), requires_grad=True) 
b = torch.zeros(1, requires_grad=True)

# optimizer
opt = optim.SGD([w,b], lr=0.1)

# number of epoch 
n_epoch = 1000 

for epoch in range(n_epoch+1):
    z = x_train.matmul(w)+b
    cost = F.cross_entropy(z, y_train)
    
    opt.zero_grad()
    cost.backward()
    opt.step()
    
    if epoch % 100 == 0:
        print(f'Epoch {epoch:4d}/{n_epoch} Cost: {cost.item():.6f}')



Epoch    0/1000 Cost: 1.098612
Epoch  100/1000 Cost: 0.761050
Epoch  200/1000 Cost: 0.689991
Epoch  300/1000 Cost: 0.643229
Epoch  400/1000 Cost: 0.604117
Epoch  500/1000 Cost: 0.568255
Epoch  600/1000 Cost: 0.533922
Epoch  700/1000 Cost: 0.500291
Epoch  800/1000 Cost: 0.466908
Epoch  900/1000 Cost: 0.433507
Epoch 1000/1000 Cost: 0.399962


In [9]:
# implement with nn.Module 
mod = nn.Linear(4,3)
opt=optim.SGD(mod.parameters(), lr=0.1)

n_epoch = 1000

for epoch in range(n_epoch+1):
    pred = mod(x_train)
    cost = F.cross_entropy(pred, y_train)
    
    opt.zero_grad()
    cost.backward()
    opt.step()
    
    if epoch % 100 == 0:
        print(f'Epoch {epoch:4d}/{n_epoch} Cost: {cost.item():.6f}')
    

Epoch    0/1000 Cost: 3.763306
Epoch  100/1000 Cost: 0.634510
Epoch  200/1000 Cost: 0.553141
Epoch  300/1000 Cost: 0.499127
Epoch  400/1000 Cost: 0.454847
Epoch  500/1000 Cost: 0.415295
Epoch  600/1000 Cost: 0.378128
Epoch  700/1000 Cost: 0.341821
Epoch  800/1000 Cost: 0.305279
Epoch  900/1000 Cost: 0.268723
Epoch 1000/1000 Cost: 0.242802


In [8]:
# implement with class 
# model
class SoftmaxClassifier(nn.Module):
    def __init__(self):
        super().__init__()
        self.linear = nn.Linear(4,3)
    
    def forward(self,x):
        return self.linear(x)
    
mod = SoftmaxClassifier()

In [10]:
opt = optim.SGD(mod.parameters(), lr=0.1)
n_epoch = 1000

for epoch in range(n_epoch+1):
    pred = mod(x_train)
    cost = F.cross_entropy(pred, y_train)
    
    opt.zero_grad()
    cost.backward()
    opt.step()
    
    if epoch % 100 == 0:
        print(f'Epoch {epoch:4d}/{n_epoch} Cost: {cost.item():.6f}')

Epoch    0/1000 Cost: 0.242658
Epoch  100/1000 Cost: 0.230196
Epoch  200/1000 Cost: 0.219127
Epoch  300/1000 Cost: 0.209044
Epoch  400/1000 Cost: 0.199820
Epoch  500/1000 Cost: 0.191348
Epoch  600/1000 Cost: 0.183541
Epoch  700/1000 Cost: 0.176324
Epoch  800/1000 Cost: 0.169633
Epoch  900/1000 Cost: 0.163413
Epoch 1000/1000 Cost: 0.157618
