pytorch的优化器：管理并更新模型中可学习参数的值，使得模型输出更接近真实标签  
导数：函数在指定坐标轴上的变化率  
梯度：一个向量，方向为方向导数取得最大值的方向  

基本属性  
* defaults:优化器超参数  
* state:参数的缓存，如momentum的缓存  
* param_groups:管理的参数组  
* _step_count:记录更新次数，学习率调整中使用  

基本方法：  
* zero_grad():清空所管理参数的梯度  
pytorch:张量梯度不自动清零  
* step():执行一步更新  
* add_param_group():添加参数组  
* state_dict():获取优化器当前状态信息字典  
* load_state_dict():加载状态信息字典#保存当前状态信息，防止因为意外避免模型终止

In [17]:
import os
import torch
import torch.optim as optim
weight = torch.randn((2, 2), requires_grad=True)
weight.grad = torch.ones((2, 2))
optimizer = optim.SGD([weight], lr=0.1)

In [18]:
#code
weight.data 

tensor([[-0.9556, -0.0451],
        [-0.4985,  0.6649]])

In [19]:
optimizer.step()
weight.data

tensor([[-1.0556, -0.1451],
        [-0.5985,  0.5649]])

In [20]:
#zero_grad
weight.data

tensor([[-1.0556, -0.1451],
        [-0.5985,  0.5649]])

In [21]:
optimizer.step()
weight.data, id(optimizer.param_groups[0]['params'][0]), id(weight)

(tensor([[-1.1556, -0.2451],
         [-0.6985,  0.4649]]),
 4758795120,
 4758795120)

In [22]:
weight.grad

tensor([[1., 1.],
        [1., 1.]])

In [23]:
optimizer.zero_grad()
weight.grad

tensor([[0., 0.],
        [0., 0.]])

In [24]:
#add_param_group
optimizer.param_groups

[{'params': [tensor([[-1.1556, -0.2451],
           [-0.6985,  0.4649]], requires_grad=True)],
  'lr': 0.1,
  'momentum': 0,
  'dampening': 0,
  'weight_decay': 0,
  'nesterov': False}]

In [25]:
w2 = torch.randn((3, 3), requires_grad=True)
optimizer.add_param_group({"params": w2, 'lr': 0.0001})
optimizer.param_groups

[{'params': [tensor([[-1.1556, -0.2451],
           [-0.6985,  0.4649]], requires_grad=True)],
  'lr': 0.1,
  'momentum': 0,
  'dampening': 0,
  'weight_decay': 0,
  'nesterov': False},
 {'params': [tensor([[ 0.3344, -0.1886, -0.2457],
           [ 1.4255,  0.6913, -0.3347],
           [ 1.0530,  0.7267, -1.8213]], requires_grad=True)],
  'lr': 0.0001,
  'momentum': 0,
  'dampening': 0,
  'weight_decay': 0,
  'nesterov': False}]

In [26]:
#state_dict
optimizer = optim.SGD([weight], lr=0.1, momentum=0.9)
opt_state_dict = optimizer.state_dict()
opt_state_dict

{'state': {},
 'param_groups': [{'lr': 0.1,
   'momentum': 0.9,
   'dampening': 0,
   'weight_decay': 0,
   'nesterov': False,
   'params': [4758795120]}]}

In [28]:
for i in range(10):
    optimizer.step()
optimizer.state_dict()

{'state': {4758795120: {'momentum_buffer': tensor([[0., 0.],
           [0., 0.]])}},
 'param_groups': [{'lr': 0.1,
   'momentum': 0.9,
   'dampening': 0,
   'weight_decay': 0,
   'nesterov': False,
   'params': [4758795120]}]}

In [29]:
torch.save(optimizer.state_dict(), os.path.join("optimizer_state_dict.pkl"))

In [30]:
#load state_dict
optimizer = optim.SGD([weight], lr=0.1, momentum=0.9)
state_dict = torch.load(os.path.join("optimizer_state_dict.pkl"))
optimizer.state_dict()

{'state': {},
 'param_groups': [{'lr': 0.1,
   'momentum': 0.9,
   'dampening': 0,
   'weight_decay': 0,
   'nesterov': False,
   'params': [4758795120]}]}

In [31]:
optimizer.load_state_dict(state_dict)
optimizer.state_dict()

{'state': {4758795120: {'momentum_buffer': tensor([[0., 0.],
           [0., 0.]])}},
 'param_groups': [{'lr': 0.1,
   'momentum': 0.9,
   'dampening': 0,
   'weight_decay': 0,
   'nesterov': False,
   'params': [4758795120]}]}