# Training a Matrix Product State for MNIST

This notebook will walk you through the steps for training a Matrix Product State to recognise digits in MNIST. The algorithm for training is as detailed in [Supervised Learning with Quantum-Inspired Tensor Networks](https://arxiv.org/abs/1605.05775).

## Setup
### Setting up the datasource
Before we create the model, we must first load the MNIST data. There is a class in trMPS that makes it easy to load in MNIST data in the correct format:

In [2]:
import MNISTpreprocessing

permuted = False
shuffled = True
shrink = True
data_source = MNISTpreprocessing.MNISTDatasource(shrink=shrink, permuted=permuted, shuffled=shuffled)

ImportError: No module named 'preprocessing'

permuted controls whether the individual pixels in the dataset are permuted. (This can be used as a test to see how well the MPS picks up very long-range correlations) shuffled controls whether the individual elements in the dataset are shuffled. This shouldn't matter in the case of MPS training, especially if we feed the whole dataset in at once, but the parameter exists for the case that we feed in the dataset bit by bit. shrink controls whether the images are max-pooled before being fed into the MPS. Depending on this parameter, the image is either fed in either as 14x14 or 28x28.

### Setting up the Matrix Product State
We can then initialise a Matrix Product State as follows:

In [None]:
from mps import MPS

d_feature = 2
d_output = 10
input_size = 784
lin_reg_learning_rate = 10**(-4)
if shrink:
    input_size = 196
    
network = MPS(d_feature, d_output, input_size)
network.prepare(data_source=data_source, learning_rate=lin_reg_learning_rate)
# network.prepare()

The parameters each determine the shape of the Matrix Product State. d_feature determines the size of each input, input_size determines how large the Matrix Product State, and d_output is the number of classes. Using these parameters, we can then initialise an MPS. Finally, before we do anything with the MPS, we must call its prepare method. By feeding in the data source when we prepare the MPS, the MPS' intial weights are initalised by the weights from linear regression, which leads to shorter training times. If you want to train the MPS from scratch, try commenting the current network.prepare line and uncommenting the one under it.

## Training the Matrix Product State
To train the Matrix Product State, we use an MPSOptimizer.

In [None]:
from optimizer import MPSOptimizer, MPSTrainingParameters, MPSOptimizerParameters

# Optimizer parameters
max_size = 30
min_singular_value = 0.001

lr_reg = 0.0

verbosity = 0

optimizer_parameters = MPSOptimizerParameters(lr_reg=lr_reg,
                                              verbosity=verbosity)
optimizer = MPSOptimizer(network, max_size, optimizer_parameters)

The max_size parameter controls how large the Matrix Product State's constituent Tensors can grow. The min singular value also controls this. These two parameters together control whether the Matrix Product State is fast (to train and to predict) but inaccurate or slow but more accurate. The faster and more inaccurate models also take up less space when saved. lr_reg controls how much the learning rate decreases as you train. Verbosity controls how much logging is printed during training. Set it to 0 to have no printing, a positive value n to have it print out the first n logs, and a negative value will have it print everything. An MPSOptimizerParameters is created with some of the parameters (There are more!) and this is then used to create an MPSOptimizer.

Below, we will create an MPSTrainingParameters object in a similar way, with the parameters being quite self explanatory, and then the MPS will be trained.

In [None]:
rate_of_change = 5 * 10 ** (-4)
batch_size = 2000
n_step = 6

training_parameters = MPSTrainingParameters(rate_of_change=rate_of_change)
optimizer.train(data_source, batch_size, n_step,
                training_parameters)

## Next Steps

There's lots more that can be done with Matrix Product States, and looking at the [documentation](http://trmps.readthedocs.io/en/latest/) might be a good place to start. You may have noticed that the training above took quite a long time. By using single-site DMRG as opposed to two-site DMRG as used above, by importing from singlesiteOptimizer and using SingleSiteMPSOptimizer instead of MPSOptimizer, you should see a dramatic improvement in speed. Using a cost function of squared distance, as used in the sqMPS and sqMPSOptimizer classes, included in the squaredDistanceMPS file, will also be faster. However, we find that using two-site DMRG as above, but with a larger batch size than we used in this notebook, provides more accurate results. Finally, if you want to try applying this to other datasets, the documentation provides some insights, and example scripts for how they can be used are included in the [github repo](https://github.com/TrMPS/MPS-MNIST).