# [SAVING AND LOADING MODELS](https://pytorch.org/tutorials/beginner/saving_loading_models.html)

In [2]:
import torch
import torch.nn as nn
import torch.optim as optim
import os

In [3]:
# Define model
class TheModelClass(nn.Module):
    def __init__(self):
        super(TheModelClass, self).__init__()
        self.conv1 = nn.Conv2d(3, 6, 5)
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(6, 16, 5)
        self.fc1 = nn.Linear(16 * 5 * 5, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = x.view(-1, 16 * 5 * 5)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

# Initialize model
model = TheModelClass()

# Initialize optimizer
optimizer = optim.SGD(model.parameters(), lr=0.001, momentum=0.9)

# Print model's state_dict
print("Model's state_dict:")
for param_tensor in model.state_dict():
    print(param_tensor, "\t", model.state_dict()[param_tensor].size())

# Print optimizer's state_dict
print("Optimizer's state_dict:")
for var_name in optimizer.state_dict():
    print(var_name, "\t", optimizer.state_dict()[var_name])

Model's state_dict:
conv1.weight 	 torch.Size([6, 3, 5, 5])
conv1.bias 	 torch.Size([6])
conv2.weight 	 torch.Size([16, 6, 5, 5])
conv2.bias 	 torch.Size([16])
fc1.weight 	 torch.Size([120, 400])
fc1.bias 	 torch.Size([120])
fc2.weight 	 torch.Size([84, 120])
fc2.bias 	 torch.Size([84])
fc3.weight 	 torch.Size([10, 84])
fc3.bias 	 torch.Size([10])
Optimizer's state_dict:
state 	 {}
param_groups 	 [{'lr': 0.001, 'momentum': 0.9, 'dampening': 0, 'weight_decay': 0, 'nesterov': False, 'params': [4907663936, 4907664008, 4907664080, 4907664152, 4907664224, 4907664296, 4907664368, 4907664440, 4907664512, 4907664584]}]


## Save/Load state_dict (Recommended)

In [11]:
state_dict_dir = "./state_dicts"
if not os.path.exists(state_dict_dir):
    os.mkdir(state_dict_dir)
model_dict_PATH = state_dict_dir + "/sample_model_state.pt"
optimizer_dict_PATH = state_dict_dir + "/sample_optimizer_state.pt"  # state_dictは.ptあるいは.pth形式
torch.save(model.state_dict(), model_dict_PATH)
torch.save(optimizer.state_dict(), optimizer_dict_PATH)

In [12]:
# model = TheModelClass(*args, **kwargs)
model = TheModelClass()
model.load_state_dict(torch.load(model_dict_PATH))
model.eval()
optimizer.load_state_dict(torch.load(optimizer_dict_PATH))

*Remember that you must call model.eval() to set dropout and batch normalization layers to evaluation mode before running inference!!!!!*

## Saving & Loading a General Checkpoint for Inference and/or Resuming Training
Save:

```
torch.save({
            'epoch': epoch,
            'model_state_dict': model.state_dict(),
            'optimizer_state_dict': optimizer.state_dict(),
            'loss': loss,
            ...
            }, PATH)
```

load:
```
model = TheModelClass(*args, **kwargs)
optimizer = TheOptimizerClass(*args, **kwargs)

checkpoint = torch.load(PATH)
model.load_state_dict(checkpoint['model_state_dict'])
optimizer.load_state_dict(checkpoint['optimizer_state_dict'])
epoch = checkpoint['epoch']
loss = checkpoint['loss']

model.eval()
# - or -
model.train()

```

When saving a general checkpoint, to be used for either inference or resuming training, you must save more than just the model’s state_dict. It is important to also save the optimizer’s state_dict, as this contains buffers and parameters that are updated as the model trains. Other items that you may want to save are the epoch you left off on, the latest recorded training loss, external torch.nn.Embedding layers, etc.

To save multiple components, organize them in a dictionary and use torch.save() to serialize the dictionary. A common PyTorch convention is to save these checkpoints using the .tar file extension.

Remember that you must call `model.eval()` to set dropout and batch normalization layers to evaluation mode before running inference. Failing to do this will yield inconsistent inference results. If you wish to resuming training, call `model.train()` to ensure these layers are in training mode.

## Saving Multiple Models in One File
モデルとoptimizerを一つのファイルに保存したり、複数のモデルや複数のoptimizerなどを一つのファイルに保存する方法。（先に言え

Save:
```
torch.save({
            'modelA_state_dict': modelA.state_dict(),  # 第一引数を辞書型にする
            'modelB_state_dict': modelB.state_dict(),
            'optimizerA_state_dict': optimizerA.state_dict(),
            'optimizerB_state_dict': optimizerB.state_dict(),
            ...
            }, PATH)
```

Load:
```
modelA = TheModelAClass(*args, **kwargs)
modelB = TheModelBClass(*args, **kwargs)
optimizerA = TheOptimizerAClass(*args, **kwargs)
optimizerB = TheOptimizerBClass(*args, **kwargs)

checkpoint = torch.load(PATH)
modelA.load_state_dict(checkpoint['modelA_state_dict'])
modelB.load_state_dict(checkpoint['modelB_state_dict'])
optimizerA.load_state_dict(checkpoint['optimizerA_state_dict'])
optimizerB.load_state_dict(checkpoint['optimizerB_state_dict'])

modelA.eval()
modelB.eval()
# - or -
modelA.train()
modelB.train()
```

## Warmstarting Model Using Parameters from a Different Model
転移学習で別のモデルをロードするときなど
```
modelB.load_state_dict(torch.load(PATH), strict=False)
```
のように`strict=False`と指定することで、モデルのkeyに過不足があっても無視してくれる。

## Saving & Loading Model Across Devices
GPUで保存してCPUで読み込む、、、みたいなこともできる。DataParallelしたやつもmodel.moduleにあるstate_dict()を保存すれば良い。詳しくはチュートリアルのページを参照.
saveとloadの場所が違う場合は、ロード時に`model.load_state_dict(torch.load(PATH, map_location="cuda:0"))`のようにmap_locationを指定する必要があるっぽい。gpuを使う場合はその後で`model.to(device)`(deviceにgpuが入った状態)

### Save on GPU, Load on GPU
Save: (ry

Load:
```
device = torch.device("cuda")
model = TheModelClass(*args, **kwargs)
model.load_state_dict(torch.load(PATH))
model.to(device)
# Make sure to call input = input.to(device) on any input tensors that you feed to the model
```