# 7. Wide&Deep(더 넓고, 더 길게)
***

- Wide(더 넓게): 더 많은 input($x$값)을 통해 output($y$) 예측
- Deep(더 깊게): 더 많은 선형 레이어(Linear Layer), 활성화 함수(Activation Function) 통해 output($y$) 예측

## 7.1 Wide(더 넓게)

Wide가 필요한 이유: Output 예측의 정확성과 신뢰성을 높이기 위해 더 많은 변수들(독립 변수, $x$값,input)을 고려할 필요가 있음  
- 행렬 곱(Matrix Multiplication): $XW = \hat{Y}$
```python
x_data = [[2.1, 0.1], # 4X2 Input 행렬
          [4.2, 0.8],
          [3.1, 0.9],
          [3.3, 0.2]]
y_data = [[0.0],      # 2X1 Output 행렬
          [1.0],
          [0.0],
          [1.0]]
linear = torch.nn.Linear(2,1) # Input 2개로 output 1개 예측
y_pred = linear(x_data)
```


## 7.2 Deep(더 깊게)

Deep이 필요한 이유: Output 예측의 정확성과 신뢰성을 높이기 위해 더 많은 레이어들을 넣을 필요가 있음
- 선형 레이어(Linear Layer), 활성화 레이어(Activation Layer) 다수 추가 및 종류 다양화

```python
sigmoid = torch.nnSigmoid()
l1 = torch.nn.Linear(2, 4)
l2 = torch.nn.Linear(4, 3)
l3 = torch.nn.Linear(3, 1) # 3개의 레이어(행렬곱 개념)
out1 = sigmoid(l1(x_data))
out2 = sigmoid(l2(out1))
y_pred = sigmoid(l3(out2)) # 활성화 레이어로 연결
```

- Sigmoid: Vanishing Gradient Problem(경사도 지움 문제)

## 7.3 Code Practice: Wide&Deep

### 0. 기초 데이터 가져오기(Classifying Diabetes)

In [6]:
# Step 0 : Import and define our data

import numpy as np
import torch
import torch.nn as nn
import torchvision
from torch.autograd import Variable

xy = np.loadtxt('diabetes.csv', delimiter=',', dtype=np.float32)
x_data = Variable(torch.from_numpy(xy[:, 0:-1]))
y_data = Variable(torch.from_numpy(xy[:, [-1]]))

print(x_data.data.shape) # torch.Size([759, 8])
print(y_data.data.shape) # torch.Size([759, 1])

print(x_data[:10])

torch.Size([759, 8])
torch.Size([759, 1])
tensor([[-0.2941,  0.4874,  0.1803, -0.2929,  0.0000,  0.0015, -0.5312, -0.0333],
        [-0.8824, -0.1457,  0.0820, -0.4141,  0.0000, -0.2072, -0.7669, -0.6667],
        [-0.0588,  0.8392,  0.0492,  0.0000,  0.0000, -0.3055, -0.4927, -0.6333],
        [-0.8824, -0.1055,  0.0820, -0.5354, -0.7778, -0.1624, -0.9240,  0.0000],
        [ 0.0000,  0.3769, -0.3443, -0.2929, -0.6028,  0.2846,  0.8873, -0.6000],
        [-0.4118,  0.1658,  0.2131,  0.0000,  0.0000, -0.2370, -0.8950, -0.7000],
        [-0.6471, -0.2161, -0.1803, -0.3535, -0.7920, -0.0760, -0.8548, -0.8333],
        [ 0.1765,  0.1558,  0.0000,  0.0000,  0.0000,  0.0522, -0.9522, -0.7333],
        [-0.7647,  0.9799,  0.1475, -0.0909,  0.2837, -0.0909, -0.9317,  0.0667],
        [-0.0588,  0.2563,  0.5738,  0.0000,  0.0000,  0.0000, -0.8685,  0.1000]])


### 1. 모델 클래스 생성(Model Class)

In [7]:
# Step 1 : Deesign our model

class Model(torch.nn.Module):
    def __init__(self):
        """
        In the constructor we instantiate 2 nn.linear module
        """
        super(Model, self).__init__()
        self.l1 = torch.nn.Linear(8, 6) # Wide: 8 Inputs
        self.l2 = torch.nn.Linear(6, 4)
        self.l3 = torch.nn.Linear(4, 1) # Deep: 3 layers

        self.sigmoid = torch.nn.Sigmoid()

    def forward(self, x):
        """
        In the forward function we accept a Variable of input data and we must return a Variable of output data. We can use Modules defined in the constructor as well as arbitrary operators on Varaibles.
        """
        out1 = self.sigmoid(self.l1(x))
        out2 = self.sigmoid(self.l2(out1))
        y_pred = self.sigmoid(self.l3(out2)) # Deep: 3 more layers(Activation Layer)
        return y_pred

model = Model()

### 2. 손실 함수 및 최적화 함수 구성(Loss & Optimizer)

In [9]:
# Step 2 : Construct loss and optimizer

"""
The call to model.parameters() in the SGD constructor will contain the learnable parameters of the 2 nn.linear modules members of the model.
"""
criterion = torch.nn.BCELoss(size_average=True)
optimizer = torch.optim.SGD(model.parameters(), lr=0.1)



### 3. 훈련 순환 돌리기(Training Cycle: Forward, Backward, Update)

In [10]:
# Step 3 : Training loop

for epoch in range(100):
    # Forward pass : Compute predicted y by passing x to the model
    y_pred = model(x_data)
    
    # Compute and print loss
    loss = criterion(y_pred, y_data)
    print(epoch, loss.data)

    # Zero gradients, perform a backward pass, and update the weights
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

0 tensor(0.6961)
1 tensor(0.6912)
2 tensor(0.6869)
3 tensor(0.6829)
4 tensor(0.6793)
5 tensor(0.6761)
6 tensor(0.6732)
7 tensor(0.6706)
8 tensor(0.6682)
9 tensor(0.6660)
10 tensor(0.6641)
11 tensor(0.6623)
12 tensor(0.6607)
13 tensor(0.6593)
14 tensor(0.6580)
15 tensor(0.6568)
16 tensor(0.6557)
17 tensor(0.6547)
18 tensor(0.6538)
19 tensor(0.6530)
20 tensor(0.6523)
21 tensor(0.6516)
22 tensor(0.6510)
23 tensor(0.6505)
24 tensor(0.6500)
25 tensor(0.6496)
26 tensor(0.6491)
27 tensor(0.6488)
28 tensor(0.6484)
29 tensor(0.6481)
30 tensor(0.6478)
31 tensor(0.6476)
32 tensor(0.6474)
33 tensor(0.6471)
34 tensor(0.6469)
35 tensor(0.6468)
36 tensor(0.6466)
37 tensor(0.6465)
38 tensor(0.6463)
39 tensor(0.6462)
40 tensor(0.6461)
41 tensor(0.6460)
42 tensor(0.6459)
43 tensor(0.6458)
44 tensor(0.6458)
45 tensor(0.6457)
46 tensor(0.6456)
47 tensor(0.6456)
48 tensor(0.6455)
49 tensor(0.6455)
50 tensor(0.6454)
51 tensor(0.6454)
52 tensor(0.6453)
53 tensor(0.6453)
54 tensor(0.6453)
55 tensor(0.6452)
56

## 7.4 Exercise: Classifying Diabetes with deep nets

In [17]:
# Step 1 : Deesign our model

class Model(torch.nn.Module):
    def __init__(self):
        """
        In the constructor we instantiate 2 nn.linear module
        """
        super(Model, self).__init__()
        self.l1 = torch.nn.Linear(8, 7) # Wide: 8 Inputs
        self.l2 = torch.nn.Linear(7, 6)
        self.l3 = torch.nn.Linear(6, 5)
        self.l4 = torch.nn.Linear(5, 4)
        self.l5 = torch.nn.Linear(4, 3) 
        self.l6 = torch.nn.Linear(3, 2)
        self.l7 = torch.nn.Linear(2, 1) # Deep: 7 layers

        self.sigmoid = torch.nn.Sigmoid()
        self.relu = torch.nn.ReLU()
        self.tanh = torch.nn.Tanh() # 새로운 활성화 함수들 추가

    def forward(self, x):
        """
        In the forward function we accept a Variable of input data and we must return a Variable of output data. We can use Modules defined in the constructor as well as arbitrary operators on Varaibles.
        """
        out1 = self.sigmoid(self.l1(x))
        out2 = self.relu(self.l2(out1))
        out3 = self.relu(self.l3(out2))
        out4 = self.tanh(self.l4(out3))
        out5 = self.sigmoid(self.l5(out4))
        out6 = self.relu(self.l6(out5))
        y_pred = self.sigmoid(self.l7(out6)) # Deep: 7 more layers(Activation Layer)
        return y_pred

model = Model()

# Step 2 : Construct loss and optimizer

"""
The call to model.parameters() in the SGD constructor will contain the learnable parameters of the 2 nn.linear modules members of the model.
"""
criterion = torch.nn.BCELoss(size_average=True)
optimizer = torch.optim.SGD(model.parameters(), lr=0.1)

# Step 3 : Training loop

for epoch in range(100):
    # Forward pass : Compute predicted y by passing x to the model
    y_pred = model(x_data)
    
    # Compute and print loss
    loss = criterion(y_pred, y_data)
    print(epoch, loss.data)

    # Zero gradients, perform a backward pass, and update the weights
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

0 tensor(0.7142)
1 tensor(0.7106)
2 tensor(0.7072)
3 tensor(0.7039)
4 tensor(0.7009)
5 tensor(0.6980)
6 tensor(0.6953)
7 tensor(0.6927)
8 tensor(0.6902)
9 tensor(0.6879)
10 tensor(0.6858)
11 tensor(0.6837)
12 tensor(0.6817)
13 tensor(0.6799)
14 tensor(0.6781)
15 tensor(0.6765)
16 tensor(0.6749)
17 tensor(0.6734)
18 tensor(0.6720)
19 tensor(0.6707)
20 tensor(0.6694)
21 tensor(0.6682)
22 tensor(0.6670)
23 tensor(0.6660)
24 tensor(0.6649)
25 tensor(0.6640)
26 tensor(0.6630)
27 tensor(0.6622)
28 tensor(0.6613)
29 tensor(0.6605)
30 tensor(0.6598)
31 tensor(0.6591)
32 tensor(0.6584)
33 tensor(0.6577)
34 tensor(0.6571)
35 tensor(0.6566)
36 tensor(0.6560)
37 tensor(0.6555)
38 tensor(0.6550)
39 tensor(0.6545)
40 tensor(0.6541)
41 tensor(0.6536)
42 tensor(0.6532)
43 tensor(0.6528)
44 tensor(0.6525)
45 tensor(0.6521)
46 tensor(0.6518)
47 tensor(0.6515)
48 tensor(0.6512)
49 tensor(0.6509)
50 tensor(0.6506)
51 tensor(0.6504)
52 tensor(0.6501)
53 tensor(0.6499)
54 tensor(0.6497)
55 tensor(0.6495)
56