## State_Dict in PyTorch

A **state_dict** is an important aspect that needs to be kept in mind while saving and loading models in PyTorch. Basically state_dict objects are dictionaries. Hence they can easily saved, edited and restored. Only learnable parameters like Convolution layer, Linear layers and registered buffers like batchnorm's running mean have entries in saved state_dict. Even optimizers objects has entries in state_dict which contains info about optimizer's state and hyperparameters.

### Lets load libraries of PyTorch

In [1]:
import torch
import torch.nn as nn
import torch.optim as optim

### Create a simple neural network with convolution and linear layers

In [2]:
class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(3,6,5)
        self.pool = nn.MaxPool2d(2,2)
        self.conv2 = nn.Conv2d(6,16,5)
        self.fc1 = nn.Linear(16*5*5,128)
        self.fc2 = nn.Linear(128, 64)
        self.fc3 = nn.Linear(64, 10)

    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = x.view(-1,16*5*5)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

In [4]:
net = Net()
print(net)

Net(
  (conv1): Conv2d(3, 6, kernel_size=(5, 5), stride=(1, 1))
  (pool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (conv2): Conv2d(6, 16, kernel_size=(5, 5), stride=(1, 1))
  (fc1): Linear(in_features=400, out_features=128, bias=True)
  (fc2): Linear(in_features=128, out_features=64, bias=True)
  (fc3): Linear(in_features=64, out_features=10, bias=True)
)


### Initialize the optimizer
Lets us SGD with momentum

In [5]:
optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)

### Model and Optimizer stat_dict
Lets look at the state_dict properties of model and optimizer

In [8]:
#output for model state_dic
print("Model's state_dic: ")
for param_tensor in net.state_dict():
    print(param_tensor, ' ==> ', net.state_dict()[param_tensor].size())

Model's state_dic: 
conv1.weight  ==>  torch.Size([6, 3, 5, 5])
conv1.bias  ==>  torch.Size([6])
conv2.weight  ==>  torch.Size([16, 6, 5, 5])
conv2.bias  ==>  torch.Size([16])
fc1.weight  ==>  torch.Size([128, 400])
fc1.bias  ==>  torch.Size([128])
fc2.weight  ==>  torch.Size([64, 128])
fc2.bias  ==>  torch.Size([64])
fc3.weight  ==>  torch.Size([10, 64])
fc3.bias  ==>  torch.Size([10])


In [9]:
#output optimizer's state_dic
print("Optimizer's state_dic: ")
for var_name in optimizer.state_dict():
    print(var_name, ' ==> ', optimizer.state_dict()[var_name])    

Optimizer's state_dic: 
state  ==>  {}
param_groups  ==>  [{'lr': 0.001, 'momentum': 0.9, 'dampening': 0, 'weight_decay': 0, 'nesterov': False, 'params': [140483105481664, 140483103054336, 140483103054720, 140483103060032, 140483103062464, 140483103062784, 140483189793216, 140483103061248, 140483103062208, 140485032015872]}]
