# PyTorch

[Pytorch](https://pytorch.org/)也是使用tensor,并且可以自动求导,定义的方式也会比TF简单一些.主要使用的是类继承的方式

In [1]:
import h5py
import torch
import numpy as np
import matplotlib.pyplot as plt
from torch.autograd import Variable
from sklearn.model_selection import train_test_split

### 1 Change data to tensor & Create. 

**只有tensor类型的数据才具有自动求导的能力,所以我们需要将数据转换成tensor.**并且requires_grad=True表示需要计算梯度(渐变)

#### 1.1 Numpy to tensor
我们可以将Numpy数据转换为tensor.
```python
torch.from_numpy```

In [2]:
inputs = np.array([[73, 67, 43], 
                   [91, 88, 64], 
                   [87, 134, 58], 
                   [102, 43, 37], 
                   [69, 96, 70]], dtype='float32')

In [3]:
targets = np.array([[1],[0],[0],[1],[0]], dtype='float32').T

In [4]:
inputs = torch.from_numpy(inputs)
targets = torch.from_numpy(targets)
print('inputs shape:',inputs.shape)
print('targets shape:',targets.shape)

inputs shape: torch.Size([5, 3])
targets shape: torch.Size([1, 5])


In [5]:
torch.typename(inputs)

'torch.FloatTensor'

这样我们就实现了将Numpy数据转换为tensor.

#### 1.2 Crate tensor

如果是需要将常数(float),向量(vector),矩阵(matrix),任意维度的数组(n-dimensional array.)转换成tensor可以使用:

```python
torch.tensor(data,requires_grad=True)```

其中requires_grad=True的意思就是将元素转换成tensor.

In [6]:
float_ = torch.tensor(1.,requires_grad=True)
vector_ = torch.tensor(np.array([1.,2.,3.]),requires_grad=True)
matrix = torch.tensor(np.matrix([[1.,2.],[3.,4.]]),requires_grad=True)
print('float_ type:',torch.typename(float_))
print('vector_ type:',torch.typename(vector_))
print('matrix type:',torch.typename(matrix))

float_ type: torch.FloatTensor
vector_ type: torch.DoubleTensor
matrix type: torch.DoubleTensor


当然,如果我们要使用正态分布的值来创建一个高维的tensor可以使用:
```python
torch.randn```

In [7]:
W = torch.randn(1,3,requires_grad=True)
b = torch.randn(1,requires_grad=True)
print(W)
print(b)

tensor([[ 0.0266,  0.2513, -0.8446]], requires_grad=True)
tensor([-0.0826], requires_grad=True)


### 2 Build LR model

#### 2.1 create sigmoid model

对于sigmoid函数可以使用torch.sigmoid.

```python
torch.sigmoid```

对于Pytorch中:

(1) 矩阵相乘:torch.mm,或者"@"符号

(2) 矩阵转置:X.t()

(3) 矩阵相加:torch.add

...

In [8]:
def model(X):
    return torch.sigmoid(torch.mm(W,X.t()) + b)

In [9]:
preds = model(inputs)
preds

tensor([[2.2197e-08, 1.3900e-13, 2.0821e-06, 1.8292e-08, 3.6429e-15]],
       grad_fn=<SigmoidBackward>)

### 3. Define loss

Pytorch可以根据定义的loss自动求导,但是需要注意的是,loss中只能有tensor,如果有其他类型的数据,tensor是不能根据链式法则求出导数的.

**Ps:**

由于我们定义的loss function是带有log的,那么一旦log中的数字极为接近0或者过大

那么Pytorch中的torch.log会计算出nan,且不可逆转.

所以我们需要加上一个极小量来使得结果不会是nan.

In [10]:
def cost(t1,t2):
    epsilon = 1e-10
    one = torch.tensor(1.,requires_grad=True)
    part_1 = t1*torch.log(t2+epsilon)
    part_2 = (one-t1)*torch.log(one-t2+epsilon)
    loss_ = -torch.sum(part_1+part_2)/torch.tensor(5.,requires_grad=True)
    return loss_

In [11]:
loss = cost(targets,preds)
loss

tensor(7.0860, grad_fn=<DivBackward0>)

使用
```python
loss.backward()```
进行反向传播(求导)

In [12]:
loss.backward()

### 4 Update

可以使用

```python
.grad```

来查看更新的梯度(dW,db)

In [13]:
print(W)
print(W.grad)

tensor([[ 0.0266,  0.2513, -0.8446]], requires_grad=True)
tensor([[-34.8236, -21.8931, -15.9212]])


In [14]:
print(b)
print(b.grad)

tensor([-0.0826], requires_grad=True)
tensor([-0.3980])


在每次梯度更新更新之后,我们一定要设置

```python
.grad.zero_()``` 

将梯度重置为0.

因为Pytorch在梯度计算方面是使用累加的方式,也是是说如果不重置为0,那么下次计算梯度的时候是从前次计算,而不是从当前这个值计算.

In [15]:
W.grad.zero_()
b.grad.zero_()
print(W.grad)
print(b.grad)

tensor([[0., 0., 0.]])
tensor([0.])


那么组合在一起就是:

torch.no_grad():

表示不需要要将内部的代码计算渐变梯度,因为我们做的是跟新参数操作并不需要作为requires_grad=True将其归纳与tensor求导中.

具体请看[torch.no_grad](https://pytorch.org/docs/stable/torch.html?highlight=torch%20no_grad)

官方中文档中表示的是:

如果不加torch.no_grad(),那么一个节点requires_grad被设置为True,那么所有依赖它的节点的requires_grad都为True.也就是体现需要更新操作.

In [16]:
preds = model(inputs)
loss = cost(targets,preds)
loss.backward()
with torch.no_grad():
    W -= W.grad 
    b -= b.grad 
    W.grad.zero_()
    b.grad.zero_()
print(W)
print(b)

tensor([[34.8501, 22.1444, 15.0766]], requires_grad=True)
tensor([0.3154], requires_grad=True)


In [17]:
preds = model(inputs)
loss = cost(targets,preds)
loss

tensor(13.8155, grad_fn=<DivBackward0>)

那么我们已经可以看到W,b进行梯度变化.并且loss也改变.

由于案例中$W,b$受到随机的响应,可能梯度的值会变得非常的小,导致参数更新基本无变化,这是正常现象.

### LR Pytorch

#### 加载数据集
这个数据集是放在h5文件中的,所以我们需要使用库h5py将图片数据读取出来.

In [18]:
def load_data():
    '''
    create train set and test set
    make sure you have .h5 file in your dataset
    
    Returns:
    -------
        train_set_x_orig: original train set shape is (209, 64, 64, 3) 
        train_set_y_orig: original train label shape is (209,)
        test_set_x_orig: original test set shape is (50, 64, 64, 3)
        test_set_y_orig: original test label shape is (50,)
        classes: cat or non-cat.
        
        
    Note:
    ----
        (209, 64, 64, 3): 209 picture,64 width,64 height,3 channel.
    '''
    train_dataset = h5py.File('../data_set/train_catvnoncat.h5', "r")
    train_set_x_orig = np.array(train_dataset["train_set_x"][:]) # your train set features
    train_set_y_orig = np.array(train_dataset["train_set_y"][:]) # your train set labels

    test_dataset = h5py.File('../data_set/test_catvnoncat.h5', "r")
    test_set_x_orig = np.array(test_dataset["test_set_x"][:]) # your test set features
    test_set_y_orig = np.array(test_dataset["test_set_y"][:]) # your test set labels

    classes = np.array(test_dataset["list_classes"][:]) # the list of classes
    
    return train_set_x_orig, train_set_y_orig, test_set_x_orig, test_set_y_orig, classes

In [19]:
train_x_orig, train_y_orig, test_x_orig, test_y_orig, classes = load_data()

In [20]:
train_x = train_x_orig.reshape(train_x_orig.shape[0],-1) / 255 
train_y = train_y_orig.reshape(1,-1)
test_y  = test_y_orig.reshape(1,-1)
test_x = test_x_orig.reshape(test_x_orig.shape[0],-1) / 255
print('Train_x\'s shape:{}'.format(train_x.shape))
print('Test_x\'s shape:{}'.format(test_x.shape))
print("Train_y's shape:{}".format(train_y.shape))
print("Test_y's shape:{}".format(test_y.shape))

Train_x's shape:(209, 12288)
Test_x's shape:(50, 12288)
Train_y's shape:(1, 209)
Test_y's shape:(1, 50)


#### 1 转换数据类型
将Numpy的数据类型转换为tensor类型.

In [21]:
train_x_tensor = Variable(torch.Tensor(train_x))
test_x_tensor = Variable(torch.Tensor(test_x))
train_y_tensor = Variable(torch.Tensor(train_y))
test_y_tensor = Variable(torch.Tensor(test_y))

#### 2 初始化参数

其中```torch.manual_seed(1)```表示设置随机数种子.

In [22]:
def initial(n):
    """
    Implementation initialization parameters
    
    Parameters:
    ----------
        n: fetures
    Return:
    ------
        W: weights
        b: bias
    """
    torch.manual_seed(1)
    W = torch.randn(1,n,requires_grad=True) 
    b = torch.randn(1,requires_grad=True)
    return W,b

#### 3 构建pytorch LR

In [23]:
def LR_pytorch(X,y,learning_rate,Iter):
    """
    Build Pytorch of LR.
    
    Parameters:
    ----------
        X: training data
        y: training labels
        learning_rate: learning rate
        Iter: Iterative
    Returns:
    -------
        W: best weights
        b: best bias
        
    """
    m,n = X.shape
    m = torch.tensor(m,requires_grad=True,dtype=torch.float32)
    W,b = initial(n)
    one = torch.tensor(1.,requires_grad=True)
    one_negative = torch.tensor(-1.,requires_grad=True)
    
    epsilon = torch.tensor(1e-10,requires_grad=True)
    for iter_ in range(Iter):
        A = torch.sigmoid(torch.mm(W,X.t()) + b) # compute A
        loss_ = -torch.sum(y*torch.log(A + epsilon)+(one-y)*torch.log(one-A + epsilon)) /m # compute loss
        loss_.backward() # start backward propagation
        if iter_ % 100 ==0:
            print("after iter {} cost :{}".format(iter_,loss_.item()))
        
        # start update
        with torch.no_grad():
            W -= W.grad * learning_rate
            b -= b.grad * learning_rate
            W.grad.zero_()
            b.grad.zero_()
            
    return W,b

In [24]:
W_,b_ = LR_pytorch(train_x_tensor,train_y_tensor,0.1,1200)

after iter 0 cost :8.221567153930664
after iter 100 cost :4.841607570648193
after iter 200 cost :4.030020713806152
after iter 300 cost :2.172579050064087
after iter 400 cost :2.6039559841156006
after iter 500 cost :5.026844501495361
after iter 600 cost :1.133457899093628
after iter 700 cost :1.0972813367843628
after iter 800 cost :0.9333294034004211
after iter 900 cost :0.8244034051895142
after iter 1000 cost :0.8027365803718567
after iter 1100 cost :0.7889816164970398


可以看到梯度是逐渐呈现下降趋势.

#### 4 correct rate

In [25]:
def predict(X,y,W,b):
    m,n = X.shape
    predict = torch.round(torch.sigmoid(W@X.t() + b))
    accurate = torch.sum((predict == y)).item() / m
    return accurate

In [26]:
accurate = predict(test_x_tensor,test_y_tensor,W_,b_)
print('test data correct rate is:',accurate)

test data correct rate is: 0.64


In [27]:
accurate = predict(train_x_tensor,train_y_tensor,W_,b_)
print('train data correct rate is:',accurate)

train data correct rate is: 0.9617224880382775


以上就是自定义使用Pytorch来搭建LR.当然对于LR来说使用Pytorch继承的函数或类会更加方便

-------------

### PyTorch & Model

对于Pytorch来说,大部分情况下构建神经网络(LR)是需要使用类的继承来搭建

使用PyTorch定义线性回归模型一般分以下几步:

1.构建继承类,定义forward函数.

2.构建损失函数(loss)和优化器(optimizer) 

3.训练:

3.1 (forward)

3.2 (backward)

3.3 (update)

#### 1 torch.nn.Module
我们需要先继承[torch.nn.Module](https://pytorch.org/docs/stable/nn.html?highlight=torch%20nn%20module#module-torch.nn),里面包含了许多已经集成好的函数.

由于我们现在做的是LR,所以同样需要线性函数以及sigmoid函数:

(1) [torch.nn.Linear(in_features, out_features,...)](https://pytorch.org/docs/stable/nn.html#linear-functions)

in_features:线性输入的特征数量

out_features:线性结果输出的特征

具体使用方法请查看函数帮助

(2) [torch.sigmoid](https://pytorch.org/docs/stable/torch.html?highlight=torch%20sigmoid#torch.sigmoid)

torch中的sigmoid函数

**Ps:**

我们只需要在class中定义forward函数,将向前传播的步骤一一写下就行.

In [28]:
class Logistic_regression(torch.nn.Module):
    """
    Create LR class.
    

    """
    def __init__(self,n,m):
        torch.nn.Module.__init__(self)
        self.linear = torch.nn.Linear(n,1)
        
    def forward(self,X):
       
        return torch.sigmoid(self.linear(X))

#### 2 build model

当继承torch的类定义完毕且函数forward也定义完毕就可以开始构建模型.

(1) 定义损失函数ceriterion,这里使用的是[torch.nn.BCELoss](https://pytorch.org/docs/stable/nn.html?highlight=torch%20nn%20bceloss#torch.nn.BCELoss)也就是Binary Cross Entropy.

(2) 定义optimizer,这里使用的是[torch.optim.SGD](https://pytorch.org/docs/stable/optim.html?highlight=torch%20optim%20sgd)

(3) 开始迭代:

(3.1) 使用定义好的类进行预测

(3.2) 计算loss

(3.3) 需要先将optimizer重置为0,因为Pytorch是使用累加渐变梯度的.

    optimizer.zero_grad()
    
(3.4) 反向传播计算梯度loss.backward().

(3.5) 更新所有参数[optimizer.step()](https://pytorch-cn.readthedocs.io/zh/latest/package_references/torch-optim/)

(4) 返回我们的线性模型,从中获取参数:

(4.1) 第一种直接返回定义类中的parameters(),它是一个迭代器,需要使用循环将其拿出来.

(4.2) 第二种直接返回线性模型,使用.weight.data或者.bias.data 获取参数.

In [29]:
def lr_pytorch(X,y,test_X,test_y,alpha,Iter):
    
    m,n = X.shape
    
    # define model
    model = Logistic_regression(n,m)
    
    # define loss ceriterion
    ceriterion = torch.nn.BCELoss(reduction='mean')
    # define optimizer
    optimizer = torch.optim.SGD(model.parameters(),lr=alpha)
    
    # train loop
    for iter_ in range(Iter):
        # forward propagation
        # predict y
        y_predict = model(X)
        
        # compute loss
        loss = ceriterion(y_predict.t(),y)
        if iter_ % 100 == 0:
            print('after iter {} cost {}'.format(iter_,loss))
            
        # star backword propagation
        # clear optimizer
        optimizer.zero_grad()
        # backword propagation
        loss.backward()
        optimizer.step()
    
    
    # compute correct rate
    
    predict = torch.round(model.forward(test_X))
    accuracy = (predict.t() == test_y).sum().item() / test_y.shape[1]
    print('-----------accuracy------------')
    print("The test correct rate is {}".format(accuracy))
    
    predict = torch.round(model.forward(X))
    accuracy = (predict.t() == y).sum().item() / y.shape[1]
    print("The train correct rate is {}".format(accuracy))
    
    return model.linear

In [30]:
parameters = lr_pytorch(train_x_tensor,train_y_tensor,test_x_tensor,test_y_tensor,0.01,1000)

after iter 0 cost 0.730829119682312
after iter 100 cost 2.446681022644043
after iter 200 cost 0.6464812159538269
after iter 300 cost 1.020178198814392
after iter 400 cost 0.7230757474899292
after iter 500 cost 0.46212637424468994
after iter 600 cost 0.2490411400794983
after iter 700 cost 0.154551163315773
after iter 800 cost 0.1355052888393402
after iter 900 cost 0.12510743737220764
-----------accuracy------------
The test correct rate is 0.68
The train correct rate is 0.9856459330143541


可以看出最终的结果和之前都是相似的.

最后我们拿出参数$W,b$

In [31]:
parameters.weight.data

tensor([[ 0.0138, -0.0284, -0.0065,  ..., -0.0102, -0.0351,  0.0193]])

In [32]:
parameters.bias.data

tensor([-0.0088])

# Summary
Pytorch 总体而言是比TF方便的,这也是为什么Pytorch的使用人数会逐渐逼近TF(就目前而言),但是我在编写的过程中也发现了某些地方Pytorch不足的地方.不过问题也不大,随着版本的更新,会逐渐变好.