<a href="https://colab.research.google.com/github/guebin/DL2025/blob/main/posts/05wk-1
    .ipynb"><img src="https://colab.research.google.com/assets/colab-badge.svg" style="text-align: left"></a>

# 1. 강의영상

{{<video https://youtu.be/playlist?list=PLQqh36zP38-yx6DQsLACqw8pWm0udv8Jm&si=in1eMD0-wU49y7mS >}}

# 2. Imports

In [5]:
import torch
import matplotlib.pyplot as plt

In [6]:
plt.rcParams['figure.figsize'] = (4.5, 3.0)

# 6. 데이터분석 코딩패턴

## A. train/val/test 

`-` Step1: 데이터정리

In [736]:
train_dataset = torchvision.datasets.MNIST(root='./data', train=True, download=True)
test_dataset = torchvision.datasets.MNIST(root='./data', train=False, download=True)
to_tensor = torchvision.transforms.ToTensor()
X0 = torch.stack([to_tensor(img) for img, lbl in train_dataset if lbl==0])
X1 = torch.stack([to_tensor(img) for img, lbl in train_dataset if lbl==1])
X = torch.concat([X0,X1],axis=0).reshape(-1,784)
y = torch.tensor([0.0]*len(X0) + [1.0]*len(X1)).reshape(-1,1)
XX0 = torch.stack([to_tensor(img) for img, lbl in test_dataset if lbl==0])
XX1 = torch.stack([to_tensor(img) for img, lbl in test_dataset if lbl==1])
XX = torch.concat([XX0,XX1],axis=0).reshape(-1,784)
yy = torch.tensor([0.0]*len(XX0) + [1.0]*len(XX1)).reshape(-1,1)

`-` Step2: 학습가능한 오브젝트들의 설정 (모델링과정 포함)

In [737]:
torch.manual_seed(1)
net = torch.nn.Sequential(
    torch.nn.Linear(784,32),
    torch.nn.ReLU(),
    torch.nn.Linear(32,1),
    torch.nn.Sigmoid()
)
loss_fn = torch.nn.BCELoss()
optimizr = torch.optim.SGD(net.parameters())

`-` Step3: 학습 (=적합)

In [738]:
for epoc in range(1,501):
    #----에폭시작-----#
    # step1 
    yhat = net(X)
    # step2 
    loss = loss_fn(yhat,y)
    # step3     
    loss.backward()
    # step4 
    optimizr.step()
    optimizr.zero_grad()
    #-----에폭끝-----#
    # 에폭별로 살펴보고 싶은 뭔가들.. 
    if (epoc % 50) == 0:
        acc = ((net(X.data) > 0.5) == y.data).float().mean()
        print(f"# of epochs = {epoc},\t acc={acc.item(): .2f}")

# of epochs = 50,	 acc= 0.45
# of epochs = 100,	 acc= 0.66
# of epochs = 150,	 acc= 0.85
# of epochs = 200,	 acc= 0.94
# of epochs = 250,	 acc= 0.97
# of epochs = 300,	 acc= 0.98
# of epochs = 350,	 acc= 0.99
# of epochs = 400,	 acc= 0.99
# of epochs = 450,	 acc= 0.99
# of epochs = 500,	 acc= 0.99


`-` Step4: 예측 & 결과분석 

*train acc*

In [739]:
((net(X) > 0.5)*1.0 ==  y).float().mean()

tensor(0.9936)

*test acc*

In [740]:
((net(XX) > 0.5)*1.0 ==  yy).float().mean()

tensor(0.9986)

## B. Dropout 사용 

`-` Step1: 데이터정리

In [741]:
pass

`-` Step2: 학습가능한 오브젝트들의 설정 (모델링과정 포함)

In [742]:
torch.manual_seed(1)
net = torch.nn.Sequential(
    torch.nn.Linear(784,32),
    torch.nn.Dropout(0.5),
    torch.nn.ReLU(),
    torch.nn.Linear(32,1),
    torch.nn.Sigmoid()
)
loss_fn = torch.nn.BCELoss()
optimizr = torch.optim.SGD(net.parameters())

`-` Step3: 학습 (=적합)

In [743]:
for epoc in range(1,501):
    net.train()
    #----에폭시작-----#
    # step1 
    yhat = net(X)
    # step2 
    loss = loss_fn(yhat,y)
    # step3     
    loss.backward()
    # step4 
    optimizr.step()
    optimizr.zero_grad()
    #-----에폭끝-----#
    net.eval()
    # 에폭별로 살펴보고 싶은 뭔가들.. 
    if (epoc % 50) == 0:
        acc = ((net(X.data) > 0.5) == y.data).float().mean()
        print(f"# of epochs = {epoc},\t acc={acc.item(): .2f}")

# of epochs = 50,	 acc= 0.45
# of epochs = 100,	 acc= 0.66
# of epochs = 150,	 acc= 0.84
# of epochs = 200,	 acc= 0.94
# of epochs = 250,	 acc= 0.97
# of epochs = 300,	 acc= 0.98
# of epochs = 350,	 acc= 0.99
# of epochs = 400,	 acc= 0.99
# of epochs = 450,	 acc= 0.99
# of epochs = 500,	 acc= 0.99


`-` Step4: 예측 & 결과분석 

*train acc*

In [744]:
((net(X) > 0.5)*1.0 ==  y).float().mean()

tensor(0.9935)

*test acc*

In [745]:
((net(XX) > 0.5)*1.0 ==  yy).float().mean()

tensor(0.9986)

## C. GPU도 사용 

`-` Step1: 데이터정리

In [746]:
pass

`-` Step2: 학습가능한 오브젝트들의 설정 (모델링과정 포함)

In [747]:
torch.manual_seed(1)
net = torch.nn.Sequential(
    torch.nn.Linear(784,32),
    torch.nn.Dropout(0.5),
    torch.nn.ReLU(),
    torch.nn.Linear(32,1),
    torch.nn.Sigmoid()
).to("cuda:0")
loss_fn = torch.nn.BCELoss()
optimizr = torch.optim.SGD(net.parameters())

`-` Step3: 학습 (=적합)

In [748]:
for epoc in range(1,501):
    net.train()
    #----에폭시작-----#
    X = X.to("cuda:0")
    y = y.to("cuda:0")
    # step1 
    yhat = net(X)
    # step2 
    loss = loss_fn(yhat,y)
    # step3     
    loss.backward()
    # step4 
    optimizr.step()
    optimizr.zero_grad()
    #-----에폭끝-----#
    net.eval()
    # 에폭별로 살펴보고 싶은 뭔가들.. 
    if (epoc % 50) == 0:
        acc = ((net(X.data) > 0.5) == y.data).float().mean()
        print(f"# of epochs = {epoc},\t acc={acc.item(): .2f}")

# of epochs = 50,	 acc= 0.45
# of epochs = 100,	 acc= 0.66
# of epochs = 150,	 acc= 0.84
# of epochs = 200,	 acc= 0.94
# of epochs = 250,	 acc= 0.97
# of epochs = 300,	 acc= 0.98
# of epochs = 350,	 acc= 0.99
# of epochs = 400,	 acc= 0.99
# of epochs = 450,	 acc= 0.99
# of epochs = 500,	 acc= 0.99


`-` Step4: 예측 & 결과분석 

*train acc*

In [749]:
((net(X) > 0.5) ==  y).float().mean()

tensor(0.9934, device='cuda:0')

*test acc*

In [750]:
#((net(XX) > 0.5) ==  yy).float().mean() -- 오류나요

- 방법1 -- net을 cpu로 내림
- 방법2 -- net를 cuda에 유지 (=XX,yy를 cuda로 올림)

In [751]:
XX = XX.to("cuda:0")
yy = yy.to("cuda:0")

In [752]:
((net(XX) > 0.5) ==  yy).float().mean()

tensor(0.9986, device='cuda:0')

## D. 미니배치도 사용 

`-` Step1: 데이터정리

In [753]:
X = X.to("cpu")
y = y.to("cpu")
XX = XX.to("cpu")
yy = yy.to("cpu")

In [754]:
ds = torch.utils.data.TensorDataset(X,y)
dl = torch.utils.data.DataLoader(ds, batch_size=16)

`-` Step2: 학습가능한 오브젝트들의 설정 (모델링과정 포함)

In [755]:
torch.manual_seed(1)
net = torch.nn.Sequential(
    torch.nn.Linear(784,32),
    torch.nn.Dropout(0.5),
    torch.nn.ReLU(),
    torch.nn.Linear(32,1),
    torch.nn.Sigmoid()
).to("cuda:0")
loss_fn = torch.nn.BCELoss()
optimizr = torch.optim.SGD(net.parameters())

`-` Step3: 학습 (=적합)

In [756]:
for epoc in range(1,3):
    net.train()
    #----에폭시작-----#
    for Xi,yi in dl:
        Xi = Xi.to("cuda:0")
        yi = yi.to("cuda:0")
        # step1 
        yi_hat = net(Xi)
        # step2 
        loss = loss_fn(yi_hat,yi)
        # step3     
        loss.backward()
        # step4 
        optimizr.step()
        optimizr.zero_grad()
    #-----에폭끝-----#
    net.eval()
    # 에폭별로 살펴보고 싶은 뭔가들..
        # ## 방법1 -- net를 cpu로 내림
        # net.to("cpu")
        # acc = ((net(X.data) > 0.5) == y.data).float().mean()
        # print(f"# of epochs = {epoc},\t acc={acc.item(): .4f}")    
        # net.to("cuda:0")
    ## 방법2 -- net을 cuda에 유지 
    s = 0 
    for Xi,yi in dl:
        Xi = Xi.to("cuda:0")
        yi = yi.to("cuda:0")
        yi_hat = net(Xi)
        s = s + ((net(Xi.data) > 0.5) == yi.data).float().sum()
    acc = s/12665
    print(f"# of epochs = {epoc},\t acc={acc.item(): .4f}")

# of epochs = 1,	 acc= 0.9738
# of epochs = 2,	 acc= 0.9886


`-` Step4: 예측 & 결과분석 

In [757]:
net.to("cpu")

Sequential(
  (0): Linear(in_features=784, out_features=32, bias=True)
  (1): Dropout(p=0.5, inplace=False)
  (2): ReLU()
  (3): Linear(in_features=32, out_features=1, bias=True)
  (4): Sigmoid()
)

*train acc*

In [758]:
((net(X) > 0.5)*1.0 ==  y).float().mean()

tensor(0.9886)

*test acc*

In [759]:
((net(XX) > 0.5)*1.0 ==  yy).float().mean()

tensor(0.9953)

> 점점 **비본질적인** 코드가 늘어남 (=코드가 드럽다는 소리에요) --> Trainer의 개념 등장 

# 5. 이진분류 코딩패턴 

In [199]:
train_dataset = torchvision.datasets.MNIST(root='./data', train=True, download=True)
test_dataset = torchvision.datasets.MNIST(root='./data', train=False, download=True)
to_tensor = torchvision.transforms.ToTensor()
X0_train = torch.stack([to_tensor(Xi) for Xi, yi in train_dataset if yi==0])
X1_train = torch.stack([to_tensor(Xi) for Xi, yi in train_dataset if yi==1])
X0_test = torch.stack([to_tensor(Xi) for Xi, yi in test_dataset if yi==0])
X1_test = torch.stack([to_tensor(Xi) for Xi, yi in test_dataset if yi==1])
X = torch.concat([X0_train,X1_train],axis=0).reshape(-1,784)
y = torch.tensor([0.0]*len(X0_train) + [1.0]*len(X1_train)).reshape(-1,1)
XX = torch.concat([X0_test,X1_test],axis=0).reshape(-1,784)
yy = torch.tensor([0.0]*len(X0_test) + [1.0]*len(X1_test)).reshape(-1,1)

## A. 시그모이드 수동구현 

In [271]:
torch.manual_seed(0)
net = torch.nn.Sequential(
    torch.nn.Linear(784,32),
    torch.nn.ReLU(),
    torch.nn.Linear(32,1)
)
optimizr = torch.optim.Adam(net.parameters())

In [272]:
for epoc in range(100):
    # step1 
    netout = net(X) 
    yhat = torch.exp(netout) / (1 + torch.exp(netout))
    # step2
    loss = loss_fn(yhat,y) # loss_fn = torch.nn.BCELoss()
    # step3     
    loss.backward()
    # step4 
    optimizr.step()
    optimizr.zero_grad()

In [266]:
(((net(XX) > 0) == yy)*1.0).mean()

tensor(0.9995)

## B. `BCEWithLogitsLoss`

In [375]:
torch.manual_seed(0)
net = torch.nn.Sequential(
    torch.nn.Linear(784,32),
    torch.nn.ReLU(),
    torch.nn.Linear(32,1),
)
loss_fn = torch.nn.BCEWithLogitsLoss()
optimizr = torch.optim.Adam(net.parameters())

In [376]:
for epoc in range(100):
    # step1 
    netout = net(X)
    # step2
    loss = loss_fn(netout,y)
    # step3     
    loss.backward()
    # step4 
    optimizr.step()
    optimizr.zero_grad()

In [377]:
(((net(XX) > 0) == yy)*1.0).mean()

tensor(0.9995)

## C. `log_sigmoid`

In [378]:
torch.manual_seed(0)
net = torch.nn.Sequential(
    torch.nn.Linear(784,32),
    torch.nn.ReLU(),
    torch.nn.Linear(32,1),
)
loss_fn = torch.nn.BCEWithLogitsLoss()
optimizr = torch.optim.Adam(net.parameters())

In [379]:
for epoc in range(100):
    # step1 
    netout = net(X)
    # step2
    loss = loss_fn(netout,y)
    # step3     
    loss.backward()
    # step4 
    optimizr.step()
    optimizr.zero_grad()

In [377]:
(((net(XX) > 0) == yy)*1.0).mean()

tensor(0.9995)

In [381]:
torch.exp(torch.nn.functional.logsigmoid(netout))

tensor([[7.5931e-04],
        [9.4517e-04],
        [2.2128e-03],
        ...,
        [9.7442e-01],
        [9.7977e-01],
        [9.3523e-01]], grad_fn=<ExpBackward0>)

In [386]:
torch.nn.functional.nll_loss(torch.exp(torch.nn.functional.logsigmoid(netout)),y.flatten().long())

IndexError: Target 1 is out of bounds.

## C. 과잉모수

In [302]:
torch.manual_seed(0)
net = torch.nn.Sequential(
    torch.nn.Linear(784,32),
    torch.nn.ReLU(),
    torch.nn.Linear(32,2),
)
bce = torch.nn.BCEWithLogitsLoss()
ce = torch.nn.CrossEntropyLoss()
optimizr = torch.optim.Adam(net.parameters())

In [303]:
netout = net(X)

In [304]:
1 / (1+torch.exp(-netout)) # 시도1

tensor([[0.4998, 0.4781],
        [0.4855, 0.4858],
        [0.5133, 0.4771],
        ...,
        [0.5123, 0.4821],
        [0.5066, 0.4721],
        [0.5238, 0.4855]], grad_fn=<MulBackward0>)

In [345]:
yhat = torch.exp(netout)/ (torch.exp(netout).sum(axis=1)).reshape(-1,1)
yhat

tensor([[0.5216, 0.4784],
        [0.4997, 0.5003],
        [0.5362, 0.4638],
        ...,
        [0.5302, 0.4698],
        [0.5345, 0.4655],
        [0.5383, 0.4617]], grad_fn=<DivBackward0>)

In [309]:
yhat

tensor([[0.5216, 0.4784],
        [0.4997, 0.5003],
        [0.5362, 0.4638],
        ...,
        [0.5302, 0.4698],
        [0.5345, 0.4655],
        [0.5383, 0.4617]], grad_fn=<DivBackward0>)

In [374]:
net = torch.nn.Sequential(
    torch.nn.Linear(784,32),
    torch.nn.ReLU(),
    torch.nn.Linear(32,2),
)
bce = torch.nn.BCELoss()
ce = torch.nn.CrossEntropyLoss()
netout = net(X)
yhat = torch.exp(netout)/ (torch.exp(netout).sum(axis=1)).reshape(-1,1)
print((bce(yhat[:,[0]],y) + bce(yhat[:,[1]],(y == 0)*1.0))/2)
#print(ce(netout,torch.nn.functional.one_hot(y.flatten().long()).float()))
print(ce(netout,y.flatten().long()))

tensor(0.6868, grad_fn=<DivBackward0>)
tensor(0.7154, grad_fn=<NllLossBackward0>)


In [364]:
torch.nn.functional.one_hot(y.flatten().long()).float()

tensor([[1., 0.],
        [1., 0.],
        [1., 0.],
        ...,
        [0., 1.],
        [0., 1.],
        [0., 1.]])

In [363]:
netout

tensor([[-0.2014,  0.0705],
        [-0.2163,  0.1049],
        [-0.1852,  0.1187],
        ...,
        [-0.1563,  0.1046],
        [-0.1777,  0.1110],
        [-0.1451,  0.1024]], grad_fn=<AddmmBackward0>)

In [176]:
for epoc in range(100):
    # step1 
    netout = net(X)
    # step2
    loss = loss_fn(yhat,y)
    # step3     
    loss.backward()
    # step4 
    optimizr.step()
    optimizr.zero_grad()

In [182]:
((net(XX).argmax(axis=1) == yy)*1.0).mean()

tensor(0.9995)

# 5. 다중클래스 분류

## A. 결론 (그냥 외우세요)

`-` 2개의 class를 구분하는 문제가 아니라 $k$개의 class를 구분해야 한다면? 

***일반적인 개념*** 

- 손실함수: BCE loss $\to$ Cross Entropy loss 
- 마지막층의 선형변환: torch.nn.Linear(?,1) $\to$ torch.nn.Linear(?,k) 
- **마지막층의 활성화: sig $\to$ softmax**

***파이토치 한정*** 

- **y의형태: (n,) vector + int형 // (n,k) one-hot encoded matrix + float형**
- 손실함수: torch.nn.BCEWithLogitsLoss,  $\to$ torch.nn.CrossEntropyLoss
- 마지막층의 선형변환: torch.nn.Linear(?,1) $\to$ torch.nn.Linear(?,k) 
- **마지막층의 활성화: None $\to$ None (손실함수에 이미 마지막층의 활성화가 포함)**

## B. 실습: 3개의 클래스를 구분 

`-` 정리된 코드1: 통계잘하는데 파이토치 못쓰는 사람의 코드 

In [None]:
## Step1: 데이터준비 
path = fastai.data.external.untar_data('https://s3.amazonaws.com/fast-ai-imageclas/mnist_png.tgz')
X0 = torch.stack([torchvision.io.read_image(str(fname)) for fname in (path/'training/0').ls()])
X1 = torch.stack([torchvision.io.read_image(str(fname)) for fname in (path/'training/1').ls()])
X2 = torch.stack([torchvision.io.read_image(str(fname)) for fname in (path/'training/2').ls()])
X = torch.concat([X0,X1,X2]).reshape(-1,1*28*28)/255
y = torch.nn.functional.one_hot(torch.tensor([0]*len(X0) + [1]*len(X1)+ [2]*len(X2))).float()
## Step2: 학습가능한 오브젝트 생성
torch.manual_seed(43052)
net = torch.nn.Sequential(
    torch.nn.Linear(784,32),
    torch.nn.ReLU(),
    torch.nn.Linear(32,3),
#    torch.nn.Softmax()
)
loss_fn = torch.nn.CrossEntropyLoss()
optimizr = torch.optim.Adam(net.parameters())
## Step3: 적합 
for epoc in range(100):
    ## step1 
    netout = net(X)
    ## step2 
    loss = loss_fn(netout,y)
    ## step3 
    loss.backward()
    ## step4 
    optimizr.step()
    optimizr.zero_grad()
    
## Step4: 적합 (혹은 적합결과확인)
(netout.argmax(axis=1) == y.argmax(axis=1)).float().mean()

tensor(0.9827)

`-` 정리된 코드2: 파이토치를 잘하는 사람의 코드 

In [None]:
## Step1: 데이터준비 
path = fastai.data.external.untar_data('https://s3.amazonaws.com/fast-ai-imageclas/mnist_png.tgz')
X0 = torch.stack([torchvision.io.read_image(str(fname)) for fname in (path/'training/0').ls()])
X1 = torch.stack([torchvision.io.read_image(str(fname)) for fname in (path/'training/1').ls()])
X2 = torch.stack([torchvision.io.read_image(str(fname)) for fname in (path/'training/2').ls()])
X = torch.concat([X0,X1,X2]).reshape(-1,1*28*28)/255
#y = torch.nn.functional.one_hot(torch.tensor([0]*len(X0) + [1]*len(X1)+ [2]*len(X2))).float()
y = torch.tensor([0]*len(X0) + [1]*len(X1)+ [2]*len(X2))
## Step2: 학습가능한 오브젝트 생성
torch.manual_seed(43052)
net = torch.nn.Sequential(
    torch.nn.Linear(784,32),
    torch.nn.ReLU(),
    torch.nn.Linear(32,3),
#    torch.nn.Softmax()
)
loss_fn = torch.nn.CrossEntropyLoss()
optimizr = torch.optim.Adam(net.parameters())
## Step3: 적합 
for epoc in range(100):
    ## step1 
    netout = net(X)
    ## step2 
    loss = loss_fn(netout,y)
    ## step3 
    loss.backward()
    ## step4 
    optimizr.step()
    optimizr.zero_grad()
## Step4: 적합 (혹은 적합결과확인)    
(netout.argmax(axis=1) == y).float().mean()

tensor(0.9827)

- 완전같은코드임 

## C. Softmax 

`-` 눈치: softmax를 쓰기 직전의 숫자들은 (n,k)꼴로 되어있음. 각 observation 마다 k개의 숫자가 있는데, 그중에서 유난히 큰 하나의 숫자가 있음. 

In [None]:
net(X)

tensor([[ 4.4836, -4.5924, -3.4632],
        [ 1.9839, -3.4456,  0.3030],
        [ 5.9082, -7.5250, -0.7634],
        ...,
        [-0.8089, -0.8294,  0.6012],
        [-2.1901, -0.4458,  0.7465],
        [-1.6856, -2.2825,  5.1892]], grad_fn=<AddmmBackward0>)

In [None]:
y

tensor([0, 0, 0,  ..., 2, 2, 2])

`-` 수식 

- $\text{sig}(u)=\frac{e^u}{1+e^u}$
- $\text{softmax}({\boldsymbol u})=\text{softmax}([u_1,u_2,\dots,u_k])=\big[ \frac{e^{u_1}}{e^{u_1}+\dots e^{u_k}},\dots,\frac{e^{u_k}}{e^{u_1}+\dots e^{u_k}}\big]$

`-` torch.nn.Softmax() 손계산 

(예시1) -- 잘못계산 

In [64]:
softmax = torch.nn.Softmax(dim=0)

In [65]:
netout = torch.tensor([[-2.0,-2.0,0.0],
                        [3.14,3.14,3.14],
                        [0.0,0.0,2.0],
                        [2.0,2.0,4.0],
                        [0.0,0.0,0.0]])
netout

tensor([[-2.0000, -2.0000,  0.0000],
        [ 3.1400,  3.1400,  3.1400],
        [ 0.0000,  0.0000,  2.0000],
        [ 2.0000,  2.0000,  4.0000],
        [ 0.0000,  0.0000,  0.0000]])

In [66]:
softmax(netout) 

tensor([[0.0041, 0.0041, 0.0115],
        [0.7081, 0.7081, 0.2653],
        [0.0306, 0.0306, 0.0848],
        [0.2265, 0.2265, 0.6269],
        [0.0306, 0.0306, 0.0115]])

(예시2) -- 이게 맞게 계산되는 것임 

In [67]:
softmax = torch.nn.Softmax(dim=1)

In [68]:
netout

tensor([[-2.0000, -2.0000,  0.0000],
        [ 3.1400,  3.1400,  3.1400],
        [ 0.0000,  0.0000,  2.0000],
        [ 2.0000,  2.0000,  4.0000],
        [ 0.0000,  0.0000,  0.0000]])

In [69]:
softmax(netout)

tensor([[0.1065, 0.1065, 0.7870],
        [0.3333, 0.3333, 0.3333],
        [0.1065, 0.1065, 0.7870],
        [0.1065, 0.1065, 0.7870],
        [0.3333, 0.3333, 0.3333]])

(예시3) -- 차원을 명시안하면 맞게 계산해주고 경고 줌 

In [70]:
softmax = torch.nn.Softmax()

In [71]:
netout

tensor([[-2.0000, -2.0000,  0.0000],
        [ 3.1400,  3.1400,  3.1400],
        [ 0.0000,  0.0000,  2.0000],
        [ 2.0000,  2.0000,  4.0000],
        [ 0.0000,  0.0000,  0.0000]])

In [72]:
softmax(netout)

  return self._call_impl(*args, **kwargs)


tensor([[0.1065, 0.1065, 0.7870],
        [0.3333, 0.3333, 0.3333],
        [0.1065, 0.1065, 0.7870],
        [0.1065, 0.1065, 0.7870],
        [0.3333, 0.3333, 0.3333]])

(예시4) -- 진짜 손계산 

In [73]:
netout 

tensor([[-2.0000, -2.0000,  0.0000],
        [ 3.1400,  3.1400,  3.1400],
        [ 0.0000,  0.0000,  2.0000],
        [ 2.0000,  2.0000,  4.0000],
        [ 0.0000,  0.0000,  0.0000]])

In [74]:
torch.exp(netout)

tensor([[ 0.1353,  0.1353,  1.0000],
        [23.1039, 23.1039, 23.1039],
        [ 1.0000,  1.0000,  7.3891],
        [ 7.3891,  7.3891, 54.5981],
        [ 1.0000,  1.0000,  1.0000]])

In [75]:
0.1353/(0.1353 + 0.1353 + 1.0000), 0.1353/(0.1353 + 0.1353 + 1.0000), 1.0000/(0.1353 + 0.1353 + 1.0000) # 첫 obs

(0.10648512513773022, 0.10648512513773022, 0.7870297497245397)

In [76]:
torch.exp(netout[1])/torch.exp(netout[1]).sum() # 두번째 obs 

tensor([0.3333, 0.3333, 0.3333])

## D. CrossEntropyLoss

`-` 수식 

***`# 2개의 카테고리`***

`-` 예제1: BCELoss vs BCEWithLogisticLoss

In [77]:
y = torch.tensor([0,0,1]).reshape(-1,1).float()
netout = torch.tensor([-1, 0, 1]).reshape(-1,1).float()
y,netout

(tensor([[0.],
         [0.],
         [1.]]),
 tensor([[-1.],
         [ 0.],
         [ 1.]]))

In [78]:
# 계산방법1: 공식암기
sig = torch.nn.Sigmoid()
yhat = sig(netout)
- torch.sum(torch.log(yhat)*y + torch.log(1-yhat)*(1-y))/3

tensor(0.4399)

In [79]:
# 계산방법2: torch.nn.BCELoss() 이용
sig = torch.nn.Sigmoid()
yhat = sig(netout)
loss_fn = torch.nn.BCELoss()
loss_fn(yhat,y)

tensor(0.4399)

In [80]:
# 계산방법3: torch.nn.BCEWithLogitsLoss() 이용
loss_fn = torch.nn.BCEWithLogitsLoss()
loss_fn(netout,y)

tensor(0.4399)

`-` 예제2: BCEWithLogisticLoss vs CrossEntropyLoss

In [81]:
torch.concat([sig(netout),1-sig(netout)],axis=1)

tensor([[0.2689, 0.7311],
        [0.5000, 0.5000],
        [0.7311, 0.2689]])

In [82]:
netout = torch.tensor([[3,2],[2,2],[5,6]]).float()
y = torch.tensor([[1,0],[1,0],[0,1]]).float()
y,netout #,netout[:,[1]]-netout[:,[0]]

(tensor([[1., 0.],
         [1., 0.],
         [0., 1.]]),
 tensor([[3., 2.],
         [2., 2.],
         [5., 6.]]))

In [83]:
softmax(netout)

tensor([[0.7311, 0.2689],
        [0.5000, 0.5000],
        [0.2689, 0.7311]])

In [84]:
# 계산방법1: 공식암기
-torch.sum(torch.log(softmax(netout))*y)/3

tensor(0.4399)

In [85]:
# 계산방법2: torch.nn.CrossEntropyLoss() 이용 + y는 one-hot으로 정리
loss_fn = torch.nn.CrossEntropyLoss()
loss_fn(netout,y)

tensor(0.4399)

In [86]:
# 계산방법3: torch.nn.CrossEntropyLoss() 이용 + y는 0,1 로 정리
loss_fn = torch.nn.CrossEntropyLoss()
loss_fn(netout,y)

tensor(0.4399)

`#`

***`# 3개의 카테고리`***

In [87]:
y = torch.tensor([2,1,2,2,0])
y_onehot = torch.nn.functional.one_hot(y)
netout = torch.tensor(
    [[-2.0000, -2.0000,  0.0000],
     [ 3.1400,  3.1400,  3.1400],
     [ 0.0000,  0.0000,  2.0000],
     [ 2.0000,  2.0000,  4.0000],
     [ 0.0000,  0.0000,  0.0000]]
)
y,y_onehot

(tensor([2, 1, 2, 2, 0]),
 tensor([[0, 0, 1],
         [0, 1, 0],
         [0, 0, 1],
         [0, 0, 1],
         [1, 0, 0]]))

In [88]:
## 방법1 -- 추천X
loss_fn = torch.nn.CrossEntropyLoss()
loss_fn(netout,y_onehot.float())

tensor(0.5832)

In [89]:
## 방법2 -- 추천O
loss_fn = torch.nn.CrossEntropyLoss()
loss_fn(netout,y)

tensor(0.5832)

In [90]:
## 방법3 -- 공식.. (이걸 쓰는사람은 없겠지?)
softmax = torch.nn.Softmax() 
loss_fn = torch.nn.CrossEntropyLoss()
- torch.sum(torch.log(softmax(netout))*y_onehot)/5

tensor(0.5832)

`#`

`-` 계산하는 공식을 아는것도 중요한데 torch.nn.CrossEntropyLoss() 에는 softmax 활성화함수가 이미 포함되어 있다는 것을 확인하는 것이 더 중요함. 

`-` torch.nn.CrossEntropyLoss() 는 사실 torch.nn.CEWithSoftmaxLoss() 정도로 바꾸는 것이 더 말이 되는 것 같다. 

## E. Minor Topic: 이진분류와 CrossEntropy

`-` 2개의 클래스일경우에도 CrossEntropy를 쓸 수 있지 않을까? 

In [91]:
## Step1: 데이터준비 
path = fastai.data.external.untar_data(fastai.data.external.URLs.MNIST)
X0 = torch.stack([torchvision.io.read_image(str(fname)) for fname in (path/'training/0').ls()])
X1 = torch.stack([torchvision.io.read_image(str(fname)) for fname in (path/'training/1').ls()])
X = torch.concat([X0,X1]).reshape(-1,1*28*28)/255
y = torch.tensor([0]*len(X0) + [1]*len(X1))
## Step2: 학습가능한 오브젝트 생성
torch.manual_seed(43052)
net = torch.nn.Sequential(
    torch.nn.Linear(784,32),
    torch.nn.ReLU(),
    torch.nn.Linear(32,2),
    #torch.nn.Softmax()
)
loss_fn = torch.nn.CrossEntropyLoss()
optimizr = torch.optim.Adam(net.parameters())
## Step3: fit  
for epoc in range(70): 
    ## 1 
    ## 2 
    loss= loss_fn(net(X),y) 
    ## 3 
    loss.backward()
    ## 4 
    optimizr.step()
    optimizr.zero_grad() 
## Step4: Predict 
softmax = torch.nn.Softmax()
(net(X).argmax(axis=1) == y).float().mean()

NameError: name 'fastai' is not defined

`-` 이진분류문제 = "y=0 or y=1" 을 맞추는 문제 = 성공과 실패를 맞추는 문제 = 성공확률과 실패확률을 추정하는 문제 

`-` softmax, sigmoid

- softmax: (실패확률, 성공확률) 꼴로 결과가 나옴 // softmax는 실패확률과 성공확률을 둘다 추정한다. 
- sigmoid: (성공확률) 꼴로 결과가 나옴 // sigmoid는 성공확률만 추정한다. 

`-` 그런데 "실패확률=1-성공확률" 이므로 사실상 둘은 같은걸 추정하는 셈이다. (성공확률만 추정하면 실패확률은 저절로 추정되니까) 

`-` 즉 아래는 같은 표현력을 가진 모형이다. 

![](https://raw.githubusercontent.com/guebin/DL2024/cdbdf23589efc2198260ab9c749f1757f67a128d/posts/05wk-1-fig2.svg)

![](https://raw.githubusercontent.com/guebin/DL2024/cdbdf23589efc2198260ab9c749f1757f67a128d/posts/05wk-1-fig1.svg)

`-` 둘은 같은 표현력을 가진 모형인데 학습할 파라메터는 sigmoid의 경우가 더 적다. $\to$ sigmoid를 사용하는 모형이 비용은 싸고 효과는 동일하다는 말 $\to$ 이진분류 한정해서는 softmax를 쓰지말고 sigmoid를 써야함. 

- softmax가 갑자기 너무 안좋아보이는데 sigmoid는 k개의 클래스로 확장이 불가능한 반면 softmax는 확장이 용이하다는 장점이 있음.

## F. 정리 

`-` 결론 

1. 소프트맥스는 시그모이드의 확장이다. 
2. 클래스의 수가 2개일 경우에는 (Sigmoid, BCEloss) 조합을 사용해야 하고 클래스의 수가 2개보다 클 경우에는 (Softmax, CrossEntropyLoss) 를 사용해야 한다. 


`-` 그런데 사실.. 클래스의 수가 2개일 경우일때 (Softmax, CrossEntropyLoss)를 사용해도 그렇게 큰일나는것은 아니다. (그냥 좀 비효율적인 느낌이 드는 것 뿐임. 흑백이미지를 칼라잉크로 출력하는 느낌) 

***참고***

|$y$|분포가정|마지막층의 활성화함수|손실함수|
|:--:|:--:|:--:|:--:|
|3.45, 4.43, ... (연속형) |정규분포|None (or Identity)|MSE|
|0 or 1|이항분포 with $n=1$ (=베르누이) |Sigmoid| BCE|
|[0,0,1], [0,1,0], [1,0,0]| 다항분포 with $n=1$|Softmax| Cross Entropy |