## PyTorch 模型进阶训练技巧
- 自定义损失函数
- 动态调整学习率


典型案例：loss上下震荡
![image-2.png](attachment:image-2.png)

### 1、自定义损失函数
- 1、PyTorch已经提供了很多常用的损失函数，但是有些非通用的损失函数并未提供，比如：DiceLoss、HuberLoss...等
- 2、模型如果出现loss震荡，在经过调整数据集或超参后，现象依然存在，非通用损失函数或自定义损失函数针对特定模型会有更好的效果

比如：DiceLoss是医学影像分割常用的损失函数，定义如下：
![image-2.png](attachment:image-2.png)

- Dice系数, 是一种集合相似度度量函数，通常用于计算两个样本的相似度(值范围为 [0, 1])：
- ∣X∩Y∣表示X和Y之间的交集，∣ X ∣ 和∣ Y ∣ 分别表示X和Y的元素个数，其中，分子中的系数 2，是因为分母存在重复计算 X 和 Y 之间的共同元素的原因.

In [25]:
import torch
import torch.nn.functional as F
import torch.nn as nn
from torch.optim.lr_scheduler import LambdaLR
from torch.optim.lr_scheduler import StepLR
import torchvision
from torch.utils.data import Dataset, DataLoader
from torchvision.transforms import transforms
import matplotlib.pyplot as plt
from torch.utils.tensorboard import SummaryWriter
import time
import numpy as np

In [26]:
#DiceLoss 实现 Vnet 医学影像分割模型的损失函数
class DiceLoss(nn.Module):
    def __init__(self, weight=None, size_average=True):
        super(DiceLoss, self).__init__()

    def forward(self, inputs, targets, smooth=1):
        inputs = F.sigmoid(inputs)       
        inputs = inputs.view(-1)
        targets = targets.view(-1)
        intersection = (inputs * targets).sum()                  
        dice_loss = 1 - (2.*intersection + smooth)/(inputs.sum() + targets.sum() + smooth)

        return dice_loss

In [27]:
#自定义实现多分类损失函数 处理多分类
# cross_entropy + L2正则化
class MyLoss(torch.nn.Module):
    def __init__(self, weight_decay=0.01):
        super(MyLoss, self).__init__()
        self.weight_decay = weight_decay

    def forward(self, inputs, targets):
        ce_loss = F.cross_entropy(inputs, targets)
        l2_loss = torch.tensor(0., requires_grad=True).to(inputs.device)
        for name, param in self.named_parameters():
            if 'weight' in name:
                l2_loss += torch.norm(param)
        loss = ce_loss + self.weight_decay * l2_loss
        return loss


注：
- 在自定义损失函数时，涉及到数学运算时，我们最好全程使用PyTorch提供的张量计算接口
- 利用Pytorch张量自带的求导机制

In [28]:
#超参数定义
# 批次的大小
batch_size = 16 #可选32、64、128
# 优化器的学习率
lr = 1e-4
#运行epoch
max_epochs = 2
# 方案二：使用“device”，后续对要使用GPU的变量用.to(device)即可
device = torch.device("cuda:1" if torch.cuda.is_available() else "cpu") # 指明调用的GPU为1号

In [29]:
# 数据读取
#cifar10数据集为例给出构建Dataset类的方式
from torchvision import datasets

#“data_transform”可以对图像进行一定的变换，如翻转、裁剪、归一化等操作，可自己定义
data_transform=transforms.Compose([
                       transforms.ToTensor(),
                       transforms.Normalize((0.5,0.5,0.5),(0.5,0.5,0.5))
                   ])


train_cifar_dataset = datasets.CIFAR10('cifar10',train=True, download=False,transform=data_transform)
test_cifar_dataset = datasets.CIFAR10('cifar10',train=False, download=False,transform=data_transform)

#构建好Dataset后，就可以使用DataLoader来按批次读入数据了
train_loader = torch.utils.data.DataLoader(train_cifar_dataset, 
                                           batch_size=batch_size, num_workers=4, 
                                           shuffle=True, drop_last=True)

test_loader = torch.utils.data.DataLoader(test_cifar_dataset, 
                                         batch_size=batch_size, num_workers=4, 
                                         shuffle=False)



In [30]:
# restnet50 pretrained
Resnet50 = torchvision.models.resnet50(pretrained=True)
Resnet50.fc.out_features=10
print(Resnet50)



ResNet(
  (conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
  (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (relu): ReLU(inplace=True)
  (maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
  (layer1): Sequential(
    (0): Bottleneck(
      (conv1): Conv2d(64, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (downsample): Sequential(
        (0): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 

In [16]:
#训练&验证

# 定义损失函数和优化器
device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
# 损失函数：自定义损失函数
criterion = MyLoss()
# 优化器
optimizer = torch.optim.Adam(Resnet50.parameters(), lr=lr)
epoch = max_epochs
Resnet50 = Resnet50.to(device)
total_step = len(train_loader)
train_all_loss = []
test_all_loss = []

for i in range(epoch):
    Resnet50.train()
    train_total_loss = 0
    train_total_num = 0
    train_total_correct = 0

    for iter, (images,labels) in enumerate(train_loader):
        images = images.to(device)
        labels = labels.to(device)
        
        outputs = Resnet50(images)
        loss = criterion(outputs,labels)
        train_total_correct += (outputs.argmax(1) == labels).sum().item()
        
        #backword
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        
        train_total_num += labels.shape[0]
        train_total_loss += loss.item()
        print("Epoch [{}/{}], Iter [{}/{}], train_loss:{:4f}".format(i+1,epoch,iter+1,total_step,loss.item()/labels.shape[0]))
    
    Resnet50.eval()
    test_total_loss = 0
    test_total_correct = 0
    test_total_num = 0
    for iter,(images,labels) in enumerate(test_loader):
        images = images.to(device)
        labels = labels.to(device)
        
        outputs = Resnet50(images)
        loss = criterion(outputs,labels)
        test_total_correct += (outputs.argmax(1) == labels).sum().item()
        test_total_loss += loss.item()
        test_total_num += labels.shape[0]
    print("Epoch [{}/{}], train_loss:{:.4f}, train_acc:{:.4f}%, test_loss:{:.4f}, test_acc:{:.4f}%".format(
        i+1, epoch, train_total_loss / train_total_num, train_total_correct / train_total_num * 100, test_total_loss / test_total_num, test_total_correct / test_total_num * 100
    
    ))
    train_all_loss.append(np.round(train_total_loss / train_total_num,4))
    test_all_loss.append(np.round(test_total_loss / test_total_num,4))


Epoch [1/10], Iter [1/3125], train_loss:0.710159
Epoch [1/10], Iter [2/3125], train_loss:0.761919
Epoch [1/10], Iter [3/3125], train_loss:0.748266
Epoch [1/10], Iter [4/3125], train_loss:0.777146
Epoch [1/10], Iter [5/3125], train_loss:0.699766
Epoch [1/10], Iter [6/3125], train_loss:0.741773
Epoch [1/10], Iter [7/3125], train_loss:0.687201
Epoch [1/10], Iter [8/3125], train_loss:0.618017
Epoch [1/10], Iter [9/3125], train_loss:0.653016
Epoch [1/10], Iter [10/3125], train_loss:0.690120
Epoch [1/10], Iter [11/3125], train_loss:0.648009
Epoch [1/10], Iter [12/3125], train_loss:0.694650
Epoch [1/10], Iter [13/3125], train_loss:0.502452
Epoch [1/10], Iter [14/3125], train_loss:0.538519
Epoch [1/10], Iter [15/3125], train_loss:0.596250
Epoch [1/10], Iter [16/3125], train_loss:0.607648
Epoch [1/10], Iter [17/3125], train_loss:0.574751
Epoch [1/10], Iter [18/3125], train_loss:0.584658
Epoch [1/10], Iter [19/3125], train_loss:0.428719
Epoch [1/10], Iter [20/3125], train_loss:0.530868
Epoch [1/

Epoch [1/10], Iter [164/3125], train_loss:0.096563
Epoch [1/10], Iter [165/3125], train_loss:0.076501
Epoch [1/10], Iter [166/3125], train_loss:0.147476
Epoch [1/10], Iter [167/3125], train_loss:0.177934
Epoch [1/10], Iter [168/3125], train_loss:0.121549
Epoch [1/10], Iter [169/3125], train_loss:0.124102
Epoch [1/10], Iter [170/3125], train_loss:0.097225
Epoch [1/10], Iter [171/3125], train_loss:0.104199
Epoch [1/10], Iter [172/3125], train_loss:0.150368
Epoch [1/10], Iter [173/3125], train_loss:0.098011
Epoch [1/10], Iter [174/3125], train_loss:0.131318
Epoch [1/10], Iter [175/3125], train_loss:0.120925
Epoch [1/10], Iter [176/3125], train_loss:0.120460
Epoch [1/10], Iter [177/3125], train_loss:0.106729
Epoch [1/10], Iter [178/3125], train_loss:0.161727
Epoch [1/10], Iter [179/3125], train_loss:0.169705
Epoch [1/10], Iter [180/3125], train_loss:0.142939
Epoch [1/10], Iter [181/3125], train_loss:0.120374
Epoch [1/10], Iter [182/3125], train_loss:0.120579
Epoch [1/10], Iter [183/3125], 

Epoch [1/10], Iter [325/3125], train_loss:0.095060
Epoch [1/10], Iter [326/3125], train_loss:0.090416
Epoch [1/10], Iter [327/3125], train_loss:0.068069
Epoch [1/10], Iter [328/3125], train_loss:0.110763
Epoch [1/10], Iter [329/3125], train_loss:0.060889
Epoch [1/10], Iter [330/3125], train_loss:0.110807
Epoch [1/10], Iter [331/3125], train_loss:0.122002
Epoch [1/10], Iter [332/3125], train_loss:0.115815
Epoch [1/10], Iter [333/3125], train_loss:0.067004
Epoch [1/10], Iter [334/3125], train_loss:0.063815
Epoch [1/10], Iter [335/3125], train_loss:0.120017
Epoch [1/10], Iter [336/3125], train_loss:0.104086
Epoch [1/10], Iter [337/3125], train_loss:0.091577
Epoch [1/10], Iter [338/3125], train_loss:0.084077
Epoch [1/10], Iter [339/3125], train_loss:0.113410
Epoch [1/10], Iter [340/3125], train_loss:0.061866
Epoch [1/10], Iter [341/3125], train_loss:0.101881
Epoch [1/10], Iter [342/3125], train_loss:0.107144
Epoch [1/10], Iter [343/3125], train_loss:0.142906
Epoch [1/10], Iter [344/3125], 

Epoch [1/10], Iter [486/3125], train_loss:0.100552
Epoch [1/10], Iter [487/3125], train_loss:0.087380
Epoch [1/10], Iter [488/3125], train_loss:0.121468
Epoch [1/10], Iter [489/3125], train_loss:0.097617
Epoch [1/10], Iter [490/3125], train_loss:0.104743
Epoch [1/10], Iter [491/3125], train_loss:0.078716
Epoch [1/10], Iter [492/3125], train_loss:0.098265
Epoch [1/10], Iter [493/3125], train_loss:0.082094
Epoch [1/10], Iter [494/3125], train_loss:0.087327
Epoch [1/10], Iter [495/3125], train_loss:0.069399
Epoch [1/10], Iter [496/3125], train_loss:0.066200
Epoch [1/10], Iter [497/3125], train_loss:0.068601
Epoch [1/10], Iter [498/3125], train_loss:0.126001
Epoch [1/10], Iter [499/3125], train_loss:0.085090
Epoch [1/10], Iter [500/3125], train_loss:0.109014
Epoch [1/10], Iter [501/3125], train_loss:0.106699
Epoch [1/10], Iter [502/3125], train_loss:0.082973
Epoch [1/10], Iter [503/3125], train_loss:0.095683
Epoch [1/10], Iter [504/3125], train_loss:0.113937
Epoch [1/10], Iter [505/3125], 

Epoch [1/10], Iter [647/3125], train_loss:0.081452
Epoch [1/10], Iter [648/3125], train_loss:0.097151
Epoch [1/10], Iter [649/3125], train_loss:0.070104
Epoch [1/10], Iter [650/3125], train_loss:0.094944
Epoch [1/10], Iter [651/3125], train_loss:0.056059
Epoch [1/10], Iter [652/3125], train_loss:0.065773
Epoch [1/10], Iter [653/3125], train_loss:0.087860
Epoch [1/10], Iter [654/3125], train_loss:0.088647
Epoch [1/10], Iter [655/3125], train_loss:0.074508
Epoch [1/10], Iter [656/3125], train_loss:0.078260
Epoch [1/10], Iter [657/3125], train_loss:0.068859
Epoch [1/10], Iter [658/3125], train_loss:0.080638
Epoch [1/10], Iter [659/3125], train_loss:0.101420
Epoch [1/10], Iter [660/3125], train_loss:0.084931
Epoch [1/10], Iter [661/3125], train_loss:0.066806
Epoch [1/10], Iter [662/3125], train_loss:0.105629
Epoch [1/10], Iter [663/3125], train_loss:0.084870
Epoch [1/10], Iter [664/3125], train_loss:0.071970
Epoch [1/10], Iter [665/3125], train_loss:0.087836
Epoch [1/10], Iter [666/3125], 

Epoch [1/10], Iter [808/3125], train_loss:0.094673
Epoch [1/10], Iter [809/3125], train_loss:0.058413
Epoch [1/10], Iter [810/3125], train_loss:0.068775
Epoch [1/10], Iter [811/3125], train_loss:0.082067
Epoch [1/10], Iter [812/3125], train_loss:0.069499
Epoch [1/10], Iter [813/3125], train_loss:0.046804
Epoch [1/10], Iter [814/3125], train_loss:0.052497
Epoch [1/10], Iter [815/3125], train_loss:0.039903
Epoch [1/10], Iter [816/3125], train_loss:0.075335
Epoch [1/10], Iter [817/3125], train_loss:0.118900
Epoch [1/10], Iter [818/3125], train_loss:0.095827
Epoch [1/10], Iter [819/3125], train_loss:0.080276
Epoch [1/10], Iter [820/3125], train_loss:0.078976
Epoch [1/10], Iter [821/3125], train_loss:0.067389
Epoch [1/10], Iter [822/3125], train_loss:0.039839
Epoch [1/10], Iter [823/3125], train_loss:0.084257
Epoch [1/10], Iter [824/3125], train_loss:0.086442
Epoch [1/10], Iter [825/3125], train_loss:0.067308
Epoch [1/10], Iter [826/3125], train_loss:0.065607
Epoch [1/10], Iter [827/3125], 

Epoch [1/10], Iter [969/3125], train_loss:0.064269
Epoch [1/10], Iter [970/3125], train_loss:0.078429
Epoch [1/10], Iter [971/3125], train_loss:0.053220
Epoch [1/10], Iter [972/3125], train_loss:0.059810
Epoch [1/10], Iter [973/3125], train_loss:0.061482
Epoch [1/10], Iter [974/3125], train_loss:0.059918
Epoch [1/10], Iter [975/3125], train_loss:0.095541
Epoch [1/10], Iter [976/3125], train_loss:0.066343
Epoch [1/10], Iter [977/3125], train_loss:0.063362
Epoch [1/10], Iter [978/3125], train_loss:0.049746
Epoch [1/10], Iter [979/3125], train_loss:0.076230
Epoch [1/10], Iter [980/3125], train_loss:0.085253
Epoch [1/10], Iter [981/3125], train_loss:0.055329
Epoch [1/10], Iter [982/3125], train_loss:0.073866
Epoch [1/10], Iter [983/3125], train_loss:0.090456
Epoch [1/10], Iter [984/3125], train_loss:0.065264
Epoch [1/10], Iter [985/3125], train_loss:0.094808
Epoch [1/10], Iter [986/3125], train_loss:0.083755
Epoch [1/10], Iter [987/3125], train_loss:0.100000
Epoch [1/10], Iter [988/3125], 

Epoch [1/10], Iter [1128/3125], train_loss:0.080108
Epoch [1/10], Iter [1129/3125], train_loss:0.088471
Epoch [1/10], Iter [1130/3125], train_loss:0.062608
Epoch [1/10], Iter [1131/3125], train_loss:0.029030
Epoch [1/10], Iter [1132/3125], train_loss:0.102873
Epoch [1/10], Iter [1133/3125], train_loss:0.044108
Epoch [1/10], Iter [1134/3125], train_loss:0.062481
Epoch [1/10], Iter [1135/3125], train_loss:0.070823
Epoch [1/10], Iter [1136/3125], train_loss:0.056807
Epoch [1/10], Iter [1137/3125], train_loss:0.086398
Epoch [1/10], Iter [1138/3125], train_loss:0.070901
Epoch [1/10], Iter [1139/3125], train_loss:0.057244
Epoch [1/10], Iter [1140/3125], train_loss:0.084820
Epoch [1/10], Iter [1141/3125], train_loss:0.060651
Epoch [1/10], Iter [1142/3125], train_loss:0.050026
Epoch [1/10], Iter [1143/3125], train_loss:0.051782
Epoch [1/10], Iter [1144/3125], train_loss:0.078317
Epoch [1/10], Iter [1145/3125], train_loss:0.101919
Epoch [1/10], Iter [1146/3125], train_loss:0.066825
Epoch [1/10]

Epoch [1/10], Iter [1286/3125], train_loss:0.041231
Epoch [1/10], Iter [1287/3125], train_loss:0.072158
Epoch [1/10], Iter [1288/3125], train_loss:0.037460
Epoch [1/10], Iter [1289/3125], train_loss:0.052904
Epoch [1/10], Iter [1290/3125], train_loss:0.051290
Epoch [1/10], Iter [1291/3125], train_loss:0.076521
Epoch [1/10], Iter [1292/3125], train_loss:0.045308
Epoch [1/10], Iter [1293/3125], train_loss:0.077797
Epoch [1/10], Iter [1294/3125], train_loss:0.050401
Epoch [1/10], Iter [1295/3125], train_loss:0.054285
Epoch [1/10], Iter [1296/3125], train_loss:0.071456
Epoch [1/10], Iter [1297/3125], train_loss:0.069530
Epoch [1/10], Iter [1298/3125], train_loss:0.063551
Epoch [1/10], Iter [1299/3125], train_loss:0.060730
Epoch [1/10], Iter [1300/3125], train_loss:0.054880
Epoch [1/10], Iter [1301/3125], train_loss:0.049532
Epoch [1/10], Iter [1302/3125], train_loss:0.069171
Epoch [1/10], Iter [1303/3125], train_loss:0.061904
Epoch [1/10], Iter [1304/3125], train_loss:0.047012
Epoch [1/10]

Epoch [1/10], Iter [1444/3125], train_loss:0.031921
Epoch [1/10], Iter [1445/3125], train_loss:0.082468
Epoch [1/10], Iter [1446/3125], train_loss:0.066029
Epoch [1/10], Iter [1447/3125], train_loss:0.079104
Epoch [1/10], Iter [1448/3125], train_loss:0.050547
Epoch [1/10], Iter [1449/3125], train_loss:0.070847
Epoch [1/10], Iter [1450/3125], train_loss:0.066685
Epoch [1/10], Iter [1451/3125], train_loss:0.062502
Epoch [1/10], Iter [1452/3125], train_loss:0.039792
Epoch [1/10], Iter [1453/3125], train_loss:0.074898
Epoch [1/10], Iter [1454/3125], train_loss:0.082731
Epoch [1/10], Iter [1455/3125], train_loss:0.051062
Epoch [1/10], Iter [1456/3125], train_loss:0.081949
Epoch [1/10], Iter [1457/3125], train_loss:0.048781
Epoch [1/10], Iter [1458/3125], train_loss:0.031672
Epoch [1/10], Iter [1459/3125], train_loss:0.081797
Epoch [1/10], Iter [1460/3125], train_loss:0.043624
Epoch [1/10], Iter [1461/3125], train_loss:0.042655
Epoch [1/10], Iter [1462/3125], train_loss:0.065425
Epoch [1/10]

Epoch [1/10], Iter [1602/3125], train_loss:0.035398
Epoch [1/10], Iter [1603/3125], train_loss:0.082975
Epoch [1/10], Iter [1604/3125], train_loss:0.069643
Epoch [1/10], Iter [1605/3125], train_loss:0.074299
Epoch [1/10], Iter [1606/3125], train_loss:0.036288
Epoch [1/10], Iter [1607/3125], train_loss:0.089655
Epoch [1/10], Iter [1608/3125], train_loss:0.052850
Epoch [1/10], Iter [1609/3125], train_loss:0.103227
Epoch [1/10], Iter [1610/3125], train_loss:0.021318
Epoch [1/10], Iter [1611/3125], train_loss:0.053062
Epoch [1/10], Iter [1612/3125], train_loss:0.064742
Epoch [1/10], Iter [1613/3125], train_loss:0.041883
Epoch [1/10], Iter [1614/3125], train_loss:0.046411
Epoch [1/10], Iter [1615/3125], train_loss:0.058942
Epoch [1/10], Iter [1616/3125], train_loss:0.044977
Epoch [1/10], Iter [1617/3125], train_loss:0.041410
Epoch [1/10], Iter [1618/3125], train_loss:0.084004
Epoch [1/10], Iter [1619/3125], train_loss:0.064973
Epoch [1/10], Iter [1620/3125], train_loss:0.083455
Epoch [1/10]

Epoch [1/10], Iter [1760/3125], train_loss:0.039222
Epoch [1/10], Iter [1761/3125], train_loss:0.071271
Epoch [1/10], Iter [1762/3125], train_loss:0.043728
Epoch [1/10], Iter [1763/3125], train_loss:0.060507
Epoch [1/10], Iter [1764/3125], train_loss:0.072506
Epoch [1/10], Iter [1765/3125], train_loss:0.056758
Epoch [1/10], Iter [1766/3125], train_loss:0.043773
Epoch [1/10], Iter [1767/3125], train_loss:0.053143
Epoch [1/10], Iter [1768/3125], train_loss:0.092098
Epoch [1/10], Iter [1769/3125], train_loss:0.027869
Epoch [1/10], Iter [1770/3125], train_loss:0.057473
Epoch [1/10], Iter [1771/3125], train_loss:0.060365
Epoch [1/10], Iter [1772/3125], train_loss:0.040789
Epoch [1/10], Iter [1773/3125], train_loss:0.064049
Epoch [1/10], Iter [1774/3125], train_loss:0.063056
Epoch [1/10], Iter [1775/3125], train_loss:0.051557
Epoch [1/10], Iter [1776/3125], train_loss:0.054645
Epoch [1/10], Iter [1777/3125], train_loss:0.039127
Epoch [1/10], Iter [1778/3125], train_loss:0.024407
Epoch [1/10]

Epoch [1/10], Iter [1918/3125], train_loss:0.062682
Epoch [1/10], Iter [1919/3125], train_loss:0.073875
Epoch [1/10], Iter [1920/3125], train_loss:0.059812
Epoch [1/10], Iter [1921/3125], train_loss:0.049579
Epoch [1/10], Iter [1922/3125], train_loss:0.111791
Epoch [1/10], Iter [1923/3125], train_loss:0.076176
Epoch [1/10], Iter [1924/3125], train_loss:0.049307
Epoch [1/10], Iter [1925/3125], train_loss:0.037029
Epoch [1/10], Iter [1926/3125], train_loss:0.078327
Epoch [1/10], Iter [1927/3125], train_loss:0.073983
Epoch [1/10], Iter [1928/3125], train_loss:0.071034
Epoch [1/10], Iter [1929/3125], train_loss:0.072575
Epoch [1/10], Iter [1930/3125], train_loss:0.035677
Epoch [1/10], Iter [1931/3125], train_loss:0.078652
Epoch [1/10], Iter [1932/3125], train_loss:0.050624
Epoch [1/10], Iter [1933/3125], train_loss:0.061268
Epoch [1/10], Iter [1934/3125], train_loss:0.030012
Epoch [1/10], Iter [1935/3125], train_loss:0.064447
Epoch [1/10], Iter [1936/3125], train_loss:0.067326
Epoch [1/10]

Epoch [1/10], Iter [2076/3125], train_loss:0.060907
Epoch [1/10], Iter [2077/3125], train_loss:0.055302
Epoch [1/10], Iter [2078/3125], train_loss:0.063130
Epoch [1/10], Iter [2079/3125], train_loss:0.041546
Epoch [1/10], Iter [2080/3125], train_loss:0.079889
Epoch [1/10], Iter [2081/3125], train_loss:0.059205
Epoch [1/10], Iter [2082/3125], train_loss:0.077855
Epoch [1/10], Iter [2083/3125], train_loss:0.040796
Epoch [1/10], Iter [2084/3125], train_loss:0.063951
Epoch [1/10], Iter [2085/3125], train_loss:0.060815
Epoch [1/10], Iter [2086/3125], train_loss:0.105773
Epoch [1/10], Iter [2087/3125], train_loss:0.055865
Epoch [1/10], Iter [2088/3125], train_loss:0.058389
Epoch [1/10], Iter [2089/3125], train_loss:0.085886
Epoch [1/10], Iter [2090/3125], train_loss:0.037964
Epoch [1/10], Iter [2091/3125], train_loss:0.037571
Epoch [1/10], Iter [2092/3125], train_loss:0.051286
Epoch [1/10], Iter [2093/3125], train_loss:0.072742
Epoch [1/10], Iter [2094/3125], train_loss:0.027918
Epoch [1/10]

Epoch [1/10], Iter [2234/3125], train_loss:0.046060
Epoch [1/10], Iter [2235/3125], train_loss:0.073936
Epoch [1/10], Iter [2236/3125], train_loss:0.048040
Epoch [1/10], Iter [2237/3125], train_loss:0.044033
Epoch [1/10], Iter [2238/3125], train_loss:0.058578
Epoch [1/10], Iter [2239/3125], train_loss:0.046442
Epoch [1/10], Iter [2240/3125], train_loss:0.070717
Epoch [1/10], Iter [2241/3125], train_loss:0.057559
Epoch [1/10], Iter [2242/3125], train_loss:0.071514
Epoch [1/10], Iter [2243/3125], train_loss:0.072684
Epoch [1/10], Iter [2244/3125], train_loss:0.071098
Epoch [1/10], Iter [2245/3125], train_loss:0.029106
Epoch [1/10], Iter [2246/3125], train_loss:0.047889
Epoch [1/10], Iter [2247/3125], train_loss:0.074630
Epoch [1/10], Iter [2248/3125], train_loss:0.039345
Epoch [1/10], Iter [2249/3125], train_loss:0.076240
Epoch [1/10], Iter [2250/3125], train_loss:0.046938
Epoch [1/10], Iter [2251/3125], train_loss:0.051236
Epoch [1/10], Iter [2252/3125], train_loss:0.060951
Epoch [1/10]

Epoch [1/10], Iter [2392/3125], train_loss:0.038500
Epoch [1/10], Iter [2393/3125], train_loss:0.043009
Epoch [1/10], Iter [2394/3125], train_loss:0.045287
Epoch [1/10], Iter [2395/3125], train_loss:0.052948
Epoch [1/10], Iter [2396/3125], train_loss:0.096492
Epoch [1/10], Iter [2397/3125], train_loss:0.084607
Epoch [1/10], Iter [2398/3125], train_loss:0.018984
Epoch [1/10], Iter [2399/3125], train_loss:0.058866
Epoch [1/10], Iter [2400/3125], train_loss:0.054521
Epoch [1/10], Iter [2401/3125], train_loss:0.035970
Epoch [1/10], Iter [2402/3125], train_loss:0.083726
Epoch [1/10], Iter [2403/3125], train_loss:0.040679
Epoch [1/10], Iter [2404/3125], train_loss:0.065046
Epoch [1/10], Iter [2405/3125], train_loss:0.094652
Epoch [1/10], Iter [2406/3125], train_loss:0.059551
Epoch [1/10], Iter [2407/3125], train_loss:0.065810
Epoch [1/10], Iter [2408/3125], train_loss:0.050208
Epoch [1/10], Iter [2409/3125], train_loss:0.066216
Epoch [1/10], Iter [2410/3125], train_loss:0.058400
Epoch [1/10]

Epoch [1/10], Iter [2550/3125], train_loss:0.078102
Epoch [1/10], Iter [2551/3125], train_loss:0.022630
Epoch [1/10], Iter [2552/3125], train_loss:0.032897
Epoch [1/10], Iter [2553/3125], train_loss:0.050063
Epoch [1/10], Iter [2554/3125], train_loss:0.053164
Epoch [1/10], Iter [2555/3125], train_loss:0.033120
Epoch [1/10], Iter [2556/3125], train_loss:0.046334
Epoch [1/10], Iter [2557/3125], train_loss:0.068456
Epoch [1/10], Iter [2558/3125], train_loss:0.070154
Epoch [1/10], Iter [2559/3125], train_loss:0.036025
Epoch [1/10], Iter [2560/3125], train_loss:0.070635
Epoch [1/10], Iter [2561/3125], train_loss:0.052198
Epoch [1/10], Iter [2562/3125], train_loss:0.043804
Epoch [1/10], Iter [2563/3125], train_loss:0.067197
Epoch [1/10], Iter [2564/3125], train_loss:0.080402
Epoch [1/10], Iter [2565/3125], train_loss:0.071421
Epoch [1/10], Iter [2566/3125], train_loss:0.044109
Epoch [1/10], Iter [2567/3125], train_loss:0.063801
Epoch [1/10], Iter [2568/3125], train_loss:0.075022
Epoch [1/10]

Epoch [1/10], Iter [2708/3125], train_loss:0.042753
Epoch [1/10], Iter [2709/3125], train_loss:0.054325
Epoch [1/10], Iter [2710/3125], train_loss:0.029269
Epoch [1/10], Iter [2711/3125], train_loss:0.056201
Epoch [1/10], Iter [2712/3125], train_loss:0.032027
Epoch [1/10], Iter [2713/3125], train_loss:0.041384
Epoch [1/10], Iter [2714/3125], train_loss:0.042245
Epoch [1/10], Iter [2715/3125], train_loss:0.049180
Epoch [1/10], Iter [2716/3125], train_loss:0.071382
Epoch [1/10], Iter [2717/3125], train_loss:0.053056
Epoch [1/10], Iter [2718/3125], train_loss:0.076437
Epoch [1/10], Iter [2719/3125], train_loss:0.036449
Epoch [1/10], Iter [2720/3125], train_loss:0.037378
Epoch [1/10], Iter [2721/3125], train_loss:0.056445
Epoch [1/10], Iter [2722/3125], train_loss:0.070102
Epoch [1/10], Iter [2723/3125], train_loss:0.032661
Epoch [1/10], Iter [2724/3125], train_loss:0.045753
Epoch [1/10], Iter [2725/3125], train_loss:0.051136
Epoch [1/10], Iter [2726/3125], train_loss:0.048787
Epoch [1/10]

Epoch [1/10], Iter [2866/3125], train_loss:0.054262
Epoch [1/10], Iter [2867/3125], train_loss:0.032128
Epoch [1/10], Iter [2868/3125], train_loss:0.070486
Epoch [1/10], Iter [2869/3125], train_loss:0.050579
Epoch [1/10], Iter [2870/3125], train_loss:0.048929
Epoch [1/10], Iter [2871/3125], train_loss:0.059329
Epoch [1/10], Iter [2872/3125], train_loss:0.059987
Epoch [1/10], Iter [2873/3125], train_loss:0.038087
Epoch [1/10], Iter [2874/3125], train_loss:0.042215
Epoch [1/10], Iter [2875/3125], train_loss:0.037359
Epoch [1/10], Iter [2876/3125], train_loss:0.064945
Epoch [1/10], Iter [2877/3125], train_loss:0.032644
Epoch [1/10], Iter [2878/3125], train_loss:0.035471
Epoch [1/10], Iter [2879/3125], train_loss:0.054034
Epoch [1/10], Iter [2880/3125], train_loss:0.055840
Epoch [1/10], Iter [2881/3125], train_loss:0.040988
Epoch [1/10], Iter [2882/3125], train_loss:0.076851
Epoch [1/10], Iter [2883/3125], train_loss:0.084683
Epoch [1/10], Iter [2884/3125], train_loss:0.052963
Epoch [1/10]

Epoch [1/10], Iter [3024/3125], train_loss:0.068196
Epoch [1/10], Iter [3025/3125], train_loss:0.039287
Epoch [1/10], Iter [3026/3125], train_loss:0.052125
Epoch [1/10], Iter [3027/3125], train_loss:0.025400
Epoch [1/10], Iter [3028/3125], train_loss:0.066438
Epoch [1/10], Iter [3029/3125], train_loss:0.038479
Epoch [1/10], Iter [3030/3125], train_loss:0.057109
Epoch [1/10], Iter [3031/3125], train_loss:0.034795
Epoch [1/10], Iter [3032/3125], train_loss:0.027901
Epoch [1/10], Iter [3033/3125], train_loss:0.050128
Epoch [1/10], Iter [3034/3125], train_loss:0.032854
Epoch [1/10], Iter [3035/3125], train_loss:0.053708
Epoch [1/10], Iter [3036/3125], train_loss:0.088014
Epoch [1/10], Iter [3037/3125], train_loss:0.075370
Epoch [1/10], Iter [3038/3125], train_loss:0.075677
Epoch [1/10], Iter [3039/3125], train_loss:0.063172
Epoch [1/10], Iter [3040/3125], train_loss:0.076501
Epoch [1/10], Iter [3041/3125], train_loss:0.058156
Epoch [1/10], Iter [3042/3125], train_loss:0.061623
Epoch [1/10]

NameError: name 'test_loader' is not defined

### 2、动态调整学习率
#### 2.1 torch.optim.lr_scheduler
学习率选择的问题：
- 1、学习率设置过小，会极大降低收敛速度，增加训练时间
- 2、学习率设置太大，可能导致参数在最优解两侧来回振荡

以上问题都是学习率设置不满足模型训练的需求，解决方案：
- PyTorch中提供了scheduler

官方API提供的torch.optim.lr_scheduler动态学习率：
- [lr_scheduler.LambdaLR](https://pytorch.org/docs/stable/generated/torch.optim.lr_scheduler.LambdaLR.html#torch.optim.lr_scheduler.LambdaLR)

- [lr_scheduler.MultiplicativeLR](https://pytorch.org/docs/stable/generated/torch.optim.lr_scheduler.MultiplicativeLR.html#torch.optim.lr_scheduler.MultiplicativeLR)

- [lr_scheduler.StepLR](https://pytorch.org/docs/stable/generated/torch.optim.lr_scheduler.StepLR.html#torch.optim.lr_scheduler.StepLR)

- [lr_scheduler.MultiStepLR](https://pytorch.org/docs/stable/generated/torch.optim.lr_scheduler.MultiStepLR.html#torch.optim.lr_scheduler.MultiStepLR)

- lr_scheduler.ExponentialLR

- lr_scheduler.CosineAnnealingLR

- lr_scheduler.ReduceLROnPlateau

- lr_scheduler.CyclicLR

- lr_scheduler.OneCycleLR

- lr_scheduler.CosineAnnealingWarmRestarts

#### 2.2、torch.optim.lr_scheduler.LambdaLR

torch.optim.lr_scheduler.LambdaLR(optimizer, lr_lambda, last_epoch=- 1, verbose=False)

```python
# LambdaLR 实现
lr_lambda = f(epoch)
new_lr = lr_lambda * init_lr
```

思想:初始学习率乘以系数，由于每一次乘系数都是乘初始学习率，因此系数往往是epoch的函数。

```python
#伪代码：Assuming optimizer has two groups.
    
    
lambda1 = lambda epoch: 1 / (epoch+1)
    
scheduler = LambdaLR(optimizer, lr_lambda=lambda1)
    
for epoch in range(100):
    
    train(...)
    
    validate(...)
    
    scheduler.step()
```


![image-2.png](attachment:image-2.png)

#### MultiplicativeLR
torch.optim.lr_scheduler.MultiplicativeLR(optimizer, lr_lambda, last_epoch=- 1, verbose=False)

与LambdaLR不同，该方法用前一次的学习率乘以lr_lambda,因此通常lr_lambda函数不需要与epoch有关。

```python
new_lr = lr_lambda * old_lr

```

![image.png](attachment:image.png)

#### 2.2、自定义scheduler
官方给的动态学习率调整的API如果均不能满足我们的诉求，应该怎么办？

##### 我们可以通过自定义函数adjust_learning_rate来改变param_group中lr的值

- 1、官方的API均不能满足诉求
- 2、我们根据adjust_learning_rate实现学习率调整方法

```python
# 训练中调用学习率方法
optimizer = torch.optim.SGD(model.parameters(),lr = args.lr,momentum = 0.9)
for epoch in range(10):
    train(...)
    validate(...)
    adjust_learning_rate(optimizer,epoch)
```

In [31]:
#函数：分段，每隔几(10)段个epoch,第一个epoch为序号0不计，使学习率变乘以0.1的epoch次方数
def adjust_learning_rate(optim, epoch, size=10, gamma=0.1):
    if (epoch + 1) % size == 0:
        pow = (epoch + 1) // size
        lr = learning_rate * np.power(gamma, pow)
        for param_group in optim.param_groups:
            param_group['lr'] = lr

#### 代码实例
- lr_scheduler.LambdaLR
- adjust_learning_rate

In [None]:

#训练&验证
writer = SummaryWriter("../train_skills")
# 定义损失函数和优化器
device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
# 损失函数
criterion = nn.CrossEntropyLoss()
# 优化器
optimizer = torch.optim.Adam(Resnet50.parameters(), lr=lr)

# 自定义 scheduler 
scheduler_my = LambdaLR(optimizer, lr_lambda=lambda epoch: 1/(epoch+1),verbose = True)
print("初始化的学习率：", optimizer.defaults['lr'])

epoch = max_epochs
Resnet50 = Resnet50.to(device)
total_step = len(train_loader)
train_all_loss = []
test_all_loss = []

for i in range(epoch):
    Resnet50.train()
    train_total_loss = 0
    train_total_num = 0
    train_total_correct = 0

    for iter, (images,labels) in enumerate(train_loader):
        images = images.to(device)
        labels = labels.to(device)
        
        outputs = Resnet50(images)
        loss = criterion(outputs,labels)
        train_total_correct += (outputs.argmax(1) == labels).sum().item()
        
        #backword
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        
       
        
        train_total_num += labels.shape[0]
        train_total_loss += loss.item()
        print("Epoch [{}/{}], Iter [{}/{}], train_loss:{:4f}".format(i+1,epoch,iter+1,total_step,loss.item()/labels.shape[0]))
    
    writer.add_scalar("lr", optim.param_groups[0]['lr'], i)
    
    print("第%d个epoch的学习率：%f" % (epoch, optimizer.param_groups[0]['lr']))
    scheduler_my.step() #scheduler
    #自定义调整lr
#     adjust_learning_rate(optimizer, i)
    
    Resnet50.eval()
    test_total_loss = 0
    test_total_correct = 0
    test_total_num = 0
    for iter,(images,labels) in enumerate(test_loader):
        images = images.to(device)
        labels = labels.to(device)
        
        outputs = Resnet50(images)
        loss = criterion(outputs,labels)
        test_total_correct += (outputs.argmax(1) == labels).sum().item()
        test_total_loss += loss.item()
        test_total_num += labels.shape[0]
    print("Epoch [{}/{}], train_loss:{:.4f}, train_acc:{:.4f}%, test_loss:{:.4f}, test_acc:{:.4f}%".format(
        i+1, epoch, train_total_loss / train_total_num, train_total_correct / train_total_num * 100, test_total_loss / test_total_num, test_total_correct / test_total_num * 100
    
    ))
    train_all_loss.append(np.round(train_total_loss / train_total_num,4))
    test_all_loss.append(np.round(test_total_loss / test_total_num,4))
writer.close()

Adjusting learning rate of group 0 to 1.0000e-04.
初始化的学习率： 0.0001
Epoch [1/2], Iter [1/3125], train_loss:0.777986
Epoch [1/2], Iter [2/3125], train_loss:0.662992
Epoch [1/2], Iter [3/3125], train_loss:0.767887
Epoch [1/2], Iter [4/3125], train_loss:0.748286
Epoch [1/2], Iter [5/3125], train_loss:0.686887
Epoch [1/2], Iter [6/3125], train_loss:0.675070
Epoch [1/2], Iter [7/3125], train_loss:0.655532
Epoch [1/2], Iter [8/3125], train_loss:0.713970
Epoch [1/2], Iter [9/3125], train_loss:0.675706
Epoch [1/2], Iter [10/3125], train_loss:0.665308
Epoch [1/2], Iter [11/3125], train_loss:0.670263
Epoch [1/2], Iter [12/3125], train_loss:0.597091
Epoch [1/2], Iter [13/3125], train_loss:0.541138
Epoch [1/2], Iter [14/3125], train_loss:0.471112
Epoch [1/2], Iter [15/3125], train_loss:0.570017
Epoch [1/2], Iter [16/3125], train_loss:0.569556
Epoch [1/2], Iter [17/3125], train_loss:0.552114
Epoch [1/2], Iter [18/3125], train_loss:0.569929
Epoch [1/2], Iter [19/3125], train_loss:0.524716
Epoch [1/2],

Epoch [1/2], Iter [166/3125], train_loss:0.122599
Epoch [1/2], Iter [167/3125], train_loss:0.108223
Epoch [1/2], Iter [168/3125], train_loss:0.157398
Epoch [1/2], Iter [169/3125], train_loss:0.112632
Epoch [1/2], Iter [170/3125], train_loss:0.092063
Epoch [1/2], Iter [171/3125], train_loss:0.092099
Epoch [1/2], Iter [172/3125], train_loss:0.143247
Epoch [1/2], Iter [173/3125], train_loss:0.107952
Epoch [1/2], Iter [174/3125], train_loss:0.150982
Epoch [1/2], Iter [175/3125], train_loss:0.154513
Epoch [1/2], Iter [176/3125], train_loss:0.122460
Epoch [1/2], Iter [177/3125], train_loss:0.130054
Epoch [1/2], Iter [178/3125], train_loss:0.075364
Epoch [1/2], Iter [179/3125], train_loss:0.092844
Epoch [1/2], Iter [180/3125], train_loss:0.131176
Epoch [1/2], Iter [181/3125], train_loss:0.089559
Epoch [1/2], Iter [182/3125], train_loss:0.137490
Epoch [1/2], Iter [183/3125], train_loss:0.148960
Epoch [1/2], Iter [184/3125], train_loss:0.088713
Epoch [1/2], Iter [185/3125], train_loss:0.098040


Epoch [1/2], Iter [330/3125], train_loss:0.123866
Epoch [1/2], Iter [331/3125], train_loss:0.139623
Epoch [1/2], Iter [332/3125], train_loss:0.097267
Epoch [1/2], Iter [333/3125], train_loss:0.087837
Epoch [1/2], Iter [334/3125], train_loss:0.079422
Epoch [1/2], Iter [335/3125], train_loss:0.085209
Epoch [1/2], Iter [336/3125], train_loss:0.147867
Epoch [1/2], Iter [337/3125], train_loss:0.149562
Epoch [1/2], Iter [338/3125], train_loss:0.107306
Epoch [1/2], Iter [339/3125], train_loss:0.114367
Epoch [1/2], Iter [340/3125], train_loss:0.075745
Epoch [1/2], Iter [341/3125], train_loss:0.081646
Epoch [1/2], Iter [342/3125], train_loss:0.114543
Epoch [1/2], Iter [343/3125], train_loss:0.107771
Epoch [1/2], Iter [344/3125], train_loss:0.091723
Epoch [1/2], Iter [345/3125], train_loss:0.085628
Epoch [1/2], Iter [346/3125], train_loss:0.069710
Epoch [1/2], Iter [347/3125], train_loss:0.080913
Epoch [1/2], Iter [348/3125], train_loss:0.078024
Epoch [1/2], Iter [349/3125], train_loss:0.132719


Epoch [1/2], Iter [494/3125], train_loss:0.061046
Epoch [1/2], Iter [495/3125], train_loss:0.103997
Epoch [1/2], Iter [496/3125], train_loss:0.109734
Epoch [1/2], Iter [497/3125], train_loss:0.070913
Epoch [1/2], Iter [498/3125], train_loss:0.069599
Epoch [1/2], Iter [499/3125], train_loss:0.078603
Epoch [1/2], Iter [500/3125], train_loss:0.133940
Epoch [1/2], Iter [501/3125], train_loss:0.072970
Epoch [1/2], Iter [502/3125], train_loss:0.075337
Epoch [1/2], Iter [503/3125], train_loss:0.094221
Epoch [1/2], Iter [504/3125], train_loss:0.091344
Epoch [1/2], Iter [505/3125], train_loss:0.085541
Epoch [1/2], Iter [506/3125], train_loss:0.089418
Epoch [1/2], Iter [507/3125], train_loss:0.066250
Epoch [1/2], Iter [508/3125], train_loss:0.112804
Epoch [1/2], Iter [509/3125], train_loss:0.084062
Epoch [1/2], Iter [510/3125], train_loss:0.087550
Epoch [1/2], Iter [511/3125], train_loss:0.073422
Epoch [1/2], Iter [512/3125], train_loss:0.089989
Epoch [1/2], Iter [513/3125], train_loss:0.056597


Epoch [1/2], Iter [658/3125], train_loss:0.080975
Epoch [1/2], Iter [659/3125], train_loss:0.084412
Epoch [1/2], Iter [660/3125], train_loss:0.081507
Epoch [1/2], Iter [661/3125], train_loss:0.106032
Epoch [1/2], Iter [662/3125], train_loss:0.044990
Epoch [1/2], Iter [663/3125], train_loss:0.071733
Epoch [1/2], Iter [664/3125], train_loss:0.068678
Epoch [1/2], Iter [665/3125], train_loss:0.060852
Epoch [1/2], Iter [666/3125], train_loss:0.061496
Epoch [1/2], Iter [667/3125], train_loss:0.099616
Epoch [1/2], Iter [668/3125], train_loss:0.043187
Epoch [1/2], Iter [669/3125], train_loss:0.042735
Epoch [1/2], Iter [670/3125], train_loss:0.063698
Epoch [1/2], Iter [671/3125], train_loss:0.054137
Epoch [1/2], Iter [672/3125], train_loss:0.122349
Epoch [1/2], Iter [673/3125], train_loss:0.045259
Epoch [1/2], Iter [674/3125], train_loss:0.096469
Epoch [1/2], Iter [675/3125], train_loss:0.058725
Epoch [1/2], Iter [676/3125], train_loss:0.092602
Epoch [1/2], Iter [677/3125], train_loss:0.066935


Epoch [1/2], Iter [822/3125], train_loss:0.060332
Epoch [1/2], Iter [823/3125], train_loss:0.069837
Epoch [1/2], Iter [824/3125], train_loss:0.081108
Epoch [1/2], Iter [825/3125], train_loss:0.064217
Epoch [1/2], Iter [826/3125], train_loss:0.077845
Epoch [1/2], Iter [827/3125], train_loss:0.062394
Epoch [1/2], Iter [828/3125], train_loss:0.078574
Epoch [1/2], Iter [829/3125], train_loss:0.077207
Epoch [1/2], Iter [830/3125], train_loss:0.052881
Epoch [1/2], Iter [831/3125], train_loss:0.105506
Epoch [1/2], Iter [832/3125], train_loss:0.085921
Epoch [1/2], Iter [833/3125], train_loss:0.062045
Epoch [1/2], Iter [834/3125], train_loss:0.078639
Epoch [1/2], Iter [835/3125], train_loss:0.091643
Epoch [1/2], Iter [836/3125], train_loss:0.070230
Epoch [1/2], Iter [837/3125], train_loss:0.061350
Epoch [1/2], Iter [838/3125], train_loss:0.100740
Epoch [1/2], Iter [839/3125], train_loss:0.085829
Epoch [1/2], Iter [840/3125], train_loss:0.060633
Epoch [1/2], Iter [841/3125], train_loss:0.071548


Epoch [1/2], Iter [986/3125], train_loss:0.037591
Epoch [1/2], Iter [987/3125], train_loss:0.067814
Epoch [1/2], Iter [988/3125], train_loss:0.079620
Epoch [1/2], Iter [989/3125], train_loss:0.061091
Epoch [1/2], Iter [990/3125], train_loss:0.059746
Epoch [1/2], Iter [991/3125], train_loss:0.053879
Epoch [1/2], Iter [992/3125], train_loss:0.072848
Epoch [1/2], Iter [993/3125], train_loss:0.079221
Epoch [1/2], Iter [994/3125], train_loss:0.057892
Epoch [1/2], Iter [995/3125], train_loss:0.063789
Epoch [1/2], Iter [996/3125], train_loss:0.049382
Epoch [1/2], Iter [997/3125], train_loss:0.027435
Epoch [1/2], Iter [998/3125], train_loss:0.045928
Epoch [1/2], Iter [999/3125], train_loss:0.048198
Epoch [1/2], Iter [1000/3125], train_loss:0.092894
Epoch [1/2], Iter [1001/3125], train_loss:0.083124
Epoch [1/2], Iter [1002/3125], train_loss:0.089966
Epoch [1/2], Iter [1003/3125], train_loss:0.068988
Epoch [1/2], Iter [1004/3125], train_loss:0.077688
Epoch [1/2], Iter [1005/3125], train_loss:0.0

Epoch [1/2], Iter [1147/3125], train_loss:0.057350
Epoch [1/2], Iter [1148/3125], train_loss:0.082387
Epoch [1/2], Iter [1149/3125], train_loss:0.049732
Epoch [1/2], Iter [1150/3125], train_loss:0.062127
Epoch [1/2], Iter [1151/3125], train_loss:0.059988
Epoch [1/2], Iter [1152/3125], train_loss:0.046885
Epoch [1/2], Iter [1153/3125], train_loss:0.063260
Epoch [1/2], Iter [1154/3125], train_loss:0.076795
Epoch [1/2], Iter [1155/3125], train_loss:0.039343
Epoch [1/2], Iter [1156/3125], train_loss:0.044740
Epoch [1/2], Iter [1157/3125], train_loss:0.079429
Epoch [1/2], Iter [1158/3125], train_loss:0.080212
Epoch [1/2], Iter [1159/3125], train_loss:0.072169
Epoch [1/2], Iter [1160/3125], train_loss:0.065028
Epoch [1/2], Iter [1161/3125], train_loss:0.062723
Epoch [1/2], Iter [1162/3125], train_loss:0.058256
Epoch [1/2], Iter [1163/3125], train_loss:0.069095
Epoch [1/2], Iter [1164/3125], train_loss:0.047539
Epoch [1/2], Iter [1165/3125], train_loss:0.083530
Epoch [1/2], Iter [1166/3125], 

Epoch [1/2], Iter [1308/3125], train_loss:0.069067
Epoch [1/2], Iter [1309/3125], train_loss:0.058699
Epoch [1/2], Iter [1310/3125], train_loss:0.065380
Epoch [1/2], Iter [1311/3125], train_loss:0.075818
Epoch [1/2], Iter [1312/3125], train_loss:0.052236
Epoch [1/2], Iter [1313/3125], train_loss:0.082141
Epoch [1/2], Iter [1314/3125], train_loss:0.069464
Epoch [1/2], Iter [1315/3125], train_loss:0.091378
Epoch [1/2], Iter [1316/3125], train_loss:0.064676
Epoch [1/2], Iter [1317/3125], train_loss:0.067352
Epoch [1/2], Iter [1318/3125], train_loss:0.057192
Epoch [1/2], Iter [1319/3125], train_loss:0.074985
Epoch [1/2], Iter [1320/3125], train_loss:0.067657
Epoch [1/2], Iter [1321/3125], train_loss:0.040115
Epoch [1/2], Iter [1322/3125], train_loss:0.076123
Epoch [1/2], Iter [1323/3125], train_loss:0.043271
Epoch [1/2], Iter [1324/3125], train_loss:0.053576
Epoch [1/2], Iter [1325/3125], train_loss:0.040913
Epoch [1/2], Iter [1326/3125], train_loss:0.059898
Epoch [1/2], Iter [1327/3125], 

Epoch [1/2], Iter [1469/3125], train_loss:0.049064
Epoch [1/2], Iter [1470/3125], train_loss:0.046045
Epoch [1/2], Iter [1471/3125], train_loss:0.044542
Epoch [1/2], Iter [1472/3125], train_loss:0.071227
Epoch [1/2], Iter [1473/3125], train_loss:0.091338
Epoch [1/2], Iter [1474/3125], train_loss:0.045170
Epoch [1/2], Iter [1475/3125], train_loss:0.066202
Epoch [1/2], Iter [1476/3125], train_loss:0.094935
Epoch [1/2], Iter [1477/3125], train_loss:0.062110
Epoch [1/2], Iter [1478/3125], train_loss:0.054103
Epoch [1/2], Iter [1479/3125], train_loss:0.061626
Epoch [1/2], Iter [1480/3125], train_loss:0.041887
Epoch [1/2], Iter [1481/3125], train_loss:0.069576
Epoch [1/2], Iter [1482/3125], train_loss:0.059234
Epoch [1/2], Iter [1483/3125], train_loss:0.054864
Epoch [1/2], Iter [1484/3125], train_loss:0.034114
Epoch [1/2], Iter [1485/3125], train_loss:0.090967
Epoch [1/2], Iter [1486/3125], train_loss:0.050541
Epoch [1/2], Iter [1487/3125], train_loss:0.066801
Epoch [1/2], Iter [1488/3125], 

Epoch [1/2], Iter [1630/3125], train_loss:0.063961
Epoch [1/2], Iter [1631/3125], train_loss:0.060558
Epoch [1/2], Iter [1632/3125], train_loss:0.080888
Epoch [1/2], Iter [1633/3125], train_loss:0.057318
Epoch [1/2], Iter [1634/3125], train_loss:0.061407
Epoch [1/2], Iter [1635/3125], train_loss:0.057165
Epoch [1/2], Iter [1636/3125], train_loss:0.061597
Epoch [1/2], Iter [1637/3125], train_loss:0.054902
Epoch [1/2], Iter [1638/3125], train_loss:0.054311
Epoch [1/2], Iter [1639/3125], train_loss:0.030938
Epoch [1/2], Iter [1640/3125], train_loss:0.049676
Epoch [1/2], Iter [1641/3125], train_loss:0.066437
Epoch [1/2], Iter [1642/3125], train_loss:0.076029
Epoch [1/2], Iter [1643/3125], train_loss:0.056621
Epoch [1/2], Iter [1644/3125], train_loss:0.063559
Epoch [1/2], Iter [1645/3125], train_loss:0.062448
Epoch [1/2], Iter [1646/3125], train_loss:0.049044
Epoch [1/2], Iter [1647/3125], train_loss:0.035747
Epoch [1/2], Iter [1648/3125], train_loss:0.072715
Epoch [1/2], Iter [1649/3125], 

Epoch [1/2], Iter [1791/3125], train_loss:0.081263
Epoch [1/2], Iter [1792/3125], train_loss:0.034454
Epoch [1/2], Iter [1793/3125], train_loss:0.046444
Epoch [1/2], Iter [1794/3125], train_loss:0.075485
Epoch [1/2], Iter [1795/3125], train_loss:0.046044
Epoch [1/2], Iter [1796/3125], train_loss:0.052903
Epoch [1/2], Iter [1797/3125], train_loss:0.088825
Epoch [1/2], Iter [1798/3125], train_loss:0.056114
Epoch [1/2], Iter [1799/3125], train_loss:0.090675
Epoch [1/2], Iter [1800/3125], train_loss:0.033013
Epoch [1/2], Iter [1801/3125], train_loss:0.038212
Epoch [1/2], Iter [1802/3125], train_loss:0.044110
Epoch [1/2], Iter [1803/3125], train_loss:0.047596
Epoch [1/2], Iter [1804/3125], train_loss:0.043175
Epoch [1/2], Iter [1805/3125], train_loss:0.073386
Epoch [1/2], Iter [1806/3125], train_loss:0.066542
Epoch [1/2], Iter [1807/3125], train_loss:0.046511
Epoch [1/2], Iter [1808/3125], train_loss:0.020690
Epoch [1/2], Iter [1809/3125], train_loss:0.028680
Epoch [1/2], Iter [1810/3125], 