The following additional libraries are needed to run this
notebook. Note that running on Colab is experimental, please report a Github
issue if you have any problem.

In [None]:
!pip install -U mxnet-cu101mkl==1.6.0  # updating mxnet to at least v1.6


# File I/O

So far we discussed how to process data and how 
to build, train, and test deep learning models. 
However, at some point, we will hopefully be happy enough
with the learned models that we will want 
to save the results for later use in various contexts
(perhaps even to make predictions in deployment). 
Additionally, when running a long training process,
the best practice is to periodically save intermediate results (checkpointing)
to ensure that we do not lose several days worth of computation
if we trip over the power cord of our server.
Thus it is time we learned how to load and store 
both individual weight vectors and entire models. 
This section addresses both issues.

## Loading and Saving `ndarray`s

For individual `ndarray`s, we can directly 
invoke the `load` and `save` functions 
to read and write them respectively. 
Both functions require that we supply a name,
and `save` requires as input the variable to be saved.

```eval_rst

.. raw:: html

    <div class="mdl-tabs mdl-js-tabs mdl-js-ripple-effect"><div class="mdl-tabs__tab-bar text"><a href="#mxnet-1-0" class="mdl-tabs__tab is-active">mxnet</a><a href="#pytorch-1-1" class="mdl-tabs__tab ">pytorch</a></div>
```

```eval_rst
.. raw:: html

    <div class="mdl-tabs__panel is-active" id="mxnet-1-0">
```

In [1]:
from mxnet import np, npx
from mxnet.gluon import nn
npx.set_np()

x = np.arange(4)
npx.save('x-file', x)

```eval_rst
.. raw:: html

    </div>
```

```eval_rst
.. raw:: html

    <div class="mdl-tabs__panel " id="pytorch-1-1">
```

In [1]:
import torch
from torch import nn
from torch.nn import functional as F

x = torch.arange(4)
torch.save(x,"x-file")

```eval_rst
.. raw:: html

    </div>
```

```eval_rst
.. raw:: html

    </div>
```

We can now read this data from the stored file back into memory.

```eval_rst

.. raw:: html

    <div class="mdl-tabs mdl-js-tabs mdl-js-ripple-effect"><div class="mdl-tabs__tab-bar text"><a href="#mxnet-3-0" class="mdl-tabs__tab is-active">mxnet</a><a href="#pytorch-3-1" class="mdl-tabs__tab ">pytorch</a></div>
```

```eval_rst
.. raw:: html

    <div class="mdl-tabs__panel is-active" id="mxnet-3-0">
```

In [2]:
x2 = npx.load('x-file')
x2

[array([0., 1., 2., 3.])]

```eval_rst
.. raw:: html

    </div>
```

```eval_rst
.. raw:: html

    <div class="mdl-tabs__panel " id="pytorch-3-1">
```

In [2]:
x2 = torch.load("x-file")
x2

tensor([0, 1, 2, 3])

```eval_rst
.. raw:: html

    </div>
```

```eval_rst
.. raw:: html

    </div>
```

We can store a list of `ndarray`s and read them back into memory.

```eval_rst

.. raw:: html

    <div class="mdl-tabs mdl-js-tabs mdl-js-ripple-effect"><div class="mdl-tabs__tab-bar text"><a href="#mxnet-5-0" class="mdl-tabs__tab is-active">mxnet</a><a href="#pytorch-5-1" class="mdl-tabs__tab ">pytorch</a></div>
```

```eval_rst
.. raw:: html

    <div class="mdl-tabs__panel is-active" id="mxnet-5-0">
```

In [3]:
y = np.zeros(4)
npx.save('x-files', [x, y])
x2, y2 = npx.load('x-files')
(x2, y2)

(array([0., 1., 2., 3.]), array([0., 0., 0., 0.]))

```eval_rst
.. raw:: html

    </div>
```

```eval_rst
.. raw:: html

    <div class="mdl-tabs__panel " id="pytorch-5-1">
```

In [3]:
y = torch.zeros(4)
torch.save([x, y],'x-files')
x2, y2 = torch.load('x-files')
(x2, y2)

(tensor([0, 1, 2, 3]), tensor([0., 0., 0., 0.]))

```eval_rst
.. raw:: html

    </div>
```

```eval_rst
.. raw:: html

    </div>
```

We can even write and read a dictionary that maps 
from strings to `ndarray`s. 
This is convenient when we want 
to read or write all the weights in a model.

```eval_rst

.. raw:: html

    <div class="mdl-tabs mdl-js-tabs mdl-js-ripple-effect"><div class="mdl-tabs__tab-bar text"><a href="#mxnet-7-0" class="mdl-tabs__tab is-active">mxnet</a><a href="#pytorch-7-1" class="mdl-tabs__tab ">pytorch</a></div>
```

```eval_rst
.. raw:: html

    <div class="mdl-tabs__panel is-active" id="mxnet-7-0">
```

In [4]:
mydict = {'x': x, 'y': y}
npx.save('mydict', mydict)
mydict2 = npx.load('mydict')
mydict2

{'x': array([0., 1., 2., 3.]), 'y': array([0., 0., 0., 0.])}

```eval_rst
.. raw:: html

    </div>
```

```eval_rst
.. raw:: html

    <div class="mdl-tabs__panel " id="pytorch-7-1">
```

In [4]:
mydict = {'x': x, 'y': y}
torch.save(mydict, 'mydict')
mydict2 = torch.load('mydict')
mydict2

{'x': tensor([0, 1, 2, 3]), 'y': tensor([0., 0., 0., 0.])}

```eval_rst
.. raw:: html

    </div>
```

```eval_rst
.. raw:: html

    </div>
```

## Model Parameters

Saving individual weight vectors (or other `ndarray` tensors) is useful, 
but it gets very tedious if we want to save 
(and later load) an entire model.
After all, we might have hundreds of 
parameter groups sprinkled throughout. 
For this reason the framework provides built-in functionality 
to load and save entire networks.
An important detail to note is that this 
saves model *parameters* and not the entire model. 
For example, if we have a 3-layer MLP,
we need to specify the *architecture* separately. 
The reason for this is that the models themselves can contain arbitrary code, 
hence they cannot be serialized as naturally 
Thus, in order to reinstate a model, we need 
to generate the architecture in code 
and then load the parameters from disk. 
Let us start with our familiar MLP.

```eval_rst

.. raw:: html

    <div class="mdl-tabs mdl-js-tabs mdl-js-ripple-effect"><div class="mdl-tabs__tab-bar text"><a href="#mxnet-9-0" class="mdl-tabs__tab is-active">mxnet</a><a href="#pytorch-9-1" class="mdl-tabs__tab ">pytorch</a></div>
```

```eval_rst
.. raw:: html

    <div class="mdl-tabs__panel is-active" id="mxnet-9-0">
```

In [5]:
class MLP(nn.Block):
    def __init__(self, **kwargs):
        super(MLP, self).__init__(**kwargs)
        self.hidden = nn.Dense(256, activation='relu')
        self.output = nn.Dense(10)

    def forward(self, x):
        return self.output(self.hidden(x))

net = MLP()
net.initialize()
x = np.random.uniform(size=(2, 20))
y = net(x)

```eval_rst
.. raw:: html

    </div>
```

```eval_rst
.. raw:: html

    <div class="mdl-tabs__panel " id="pytorch-9-1">
```

In [5]:
class MLP(nn.Module):
    def __init__(self):
        super().__init__()
        self.hidden = nn.Linear(20, 256)
        self.output = nn.Linear(256, 10)
        
    def forward(self, x):
        return self.output(F.relu(self.hidden(x)))

net = MLP()
x = torch.randn(size=(2, 20))
y = net(x)

```eval_rst
.. raw:: html

    </div>
```

```eval_rst
.. raw:: html

    </div>
```

Next, we store the parameters of the model as a file with the name `mlp.params`.

```eval_rst

.. raw:: html

    <div class="mdl-tabs mdl-js-tabs mdl-js-ripple-effect"><div class="mdl-tabs__tab-bar text"><a href="#mxnet-11-0" class="mdl-tabs__tab is-active">mxnet</a><a href="#pytorch-11-1" class="mdl-tabs__tab ">pytorch</a></div>
```

```eval_rst
.. raw:: html

    <div class="mdl-tabs__panel is-active" id="mxnet-11-0">
```

In [6]:
net.save_parameters('mlp.params')

```eval_rst
.. raw:: html

    </div>
```

```eval_rst
.. raw:: html

    <div class="mdl-tabs__panel " id="pytorch-11-1">
```

In [6]:
torch.save(net.state_dict(), 'mlp.params')

```eval_rst
.. raw:: html

    </div>
```

```eval_rst
.. raw:: html

    </div>
```

To recover the model, we instantiate a clone 
of the original MLP model.
Instead of randomly initializing the model parameters, 
we read the parameters stored in the file directly.

```eval_rst

.. raw:: html

    <div class="mdl-tabs mdl-js-tabs mdl-js-ripple-effect"><div class="mdl-tabs__tab-bar text"><a href="#mxnet-13-0" class="mdl-tabs__tab is-active">mxnet</a><a href="#pytorch-13-1" class="mdl-tabs__tab ">pytorch</a></div>
```

```eval_rst
.. raw:: html

    <div class="mdl-tabs__panel is-active" id="mxnet-13-0">
```

In [7]:
clone = MLP()
clone.load_parameters('mlp.params')

```eval_rst
.. raw:: html

    </div>
```

```eval_rst
.. raw:: html

    <div class="mdl-tabs__panel " id="pytorch-13-1">
```

In [7]:
clone = MLP()
clone.load_state_dict(torch.load("mlp.params"))
clone.eval()

MLP(
  (hidden): Linear(in_features=20, out_features=256, bias=True)
  (output): Linear(in_features=256, out_features=10, bias=True)
)

```eval_rst
.. raw:: html

    </div>
```

```eval_rst
.. raw:: html

    </div>
```

Since both instances have the same model parameters, 
the computation result of the same input `x` should be the same. 
Let us verify this.

```eval_rst

.. raw:: html

    <div class="mdl-tabs mdl-js-tabs mdl-js-ripple-effect"><div class="mdl-tabs__tab-bar text"><a href="#mxnet-15-0" class="mdl-tabs__tab is-active">mxnet</a><a href="#pytorch-15-1" class="mdl-tabs__tab ">pytorch</a></div>
```

```eval_rst
.. raw:: html

    <div class="mdl-tabs__panel is-active" id="mxnet-15-0">
```

In [8]:
yclone = clone(x)
yclone == y

array([[ True,  True,  True,  True,  True,  True,  True,  True,  True,
         True],
       [ True,  True,  True,  True,  True,  True,  True,  True,  True,
         True]])

```eval_rst
.. raw:: html

    </div>
```

```eval_rst
.. raw:: html

    <div class="mdl-tabs__panel " id="pytorch-15-1">
```

In [8]:
yclone = clone(x)
yclone == y

tensor([[True, True, True, True, True, True, True, True, True, True],
        [True, True, True, True, True, True, True, True, True, True]])

```eval_rst
.. raw:: html

    </div>
```

```eval_rst
.. raw:: html

    </div>
```

## Summary

* The `save` and `load` functions can be used to perform File I/O for `ndarray` objects.
* We can save and load the entire sets of parameters for a network via a parameter dictionary. 
* Saving the architecture has to be done in code rather than in parameters.

## Exercises

1. Even if there is no need to deploy trained models to a different device, what are the practical benefits of storing model parameters?
1. Assume that we want to reuse only parts of a network to be incorporated into a network of a *different* architecture. How would you go about using, say the first two layers from a previous network in a new network.
1. How would you go about saving network architecture and parameters? What restrictions would you impose on the architecture?


```eval_rst

.. raw:: html

    <div class="mdl-tabs mdl-js-tabs mdl-js-ripple-effect"><div class="mdl-tabs__tab-bar text"><a href="#mxnet-17-0" class="mdl-tabs__tab is-active">mxnet</a><a href="#pytorch-17-1" class="mdl-tabs__tab ">pytorch</a></div>
```

```eval_rst
.. raw:: html

    <div class="mdl-tabs__panel is-active" id="mxnet-17-0">
```

[Discussions](https://discuss.d2l.ai/t/60)

```eval_rst
.. raw:: html

    </div>
```

```eval_rst
.. raw:: html

    <div class="mdl-tabs__panel " id="pytorch-17-1">
```

[Discussions](https://discuss.d2l.ai/t/61)

```eval_rst
.. raw:: html

    </div>
```

```eval_rst
.. raw:: html

    </div>
```