# MNIST TensorNet example
This is an example of building neural networks with a _Tensor Train layer_ (_TT-layer_).

In short, the TT-layer is just a fully-connected layer with the weight matrix parametrized as a TT-matrix, which allows it to be much more compact and to use lots of hidden units without slowing down the learning and inference.

For the additional information see the following paper:

Tensorizing Neural Networks  
Alexander Novikov, Dmitry Podoprikhin, Anton Osokin, Dmitry Vetrov; In _Advances in Neural Information Processing Systems 28_ (NIPS-2015) [[arXiv](http://arxiv.org/abs/1509.06569)].

In [1]:
import tensorflow as tf
import t3f

In [2]:
# Load the MNIST data.
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets('MNIST_data', one_hot=True)

Extracting MNIST_data/train-images-idx3-ubyte.gz
Extracting MNIST_data/train-labels-idx1-ubyte.gz
Extracting MNIST_data/t10k-images-idx3-ubyte.gz
Extracting MNIST_data/t10k-labels-idx1-ubyte.gz


In [3]:
# Build placeholders for the data.
x_pl = tf.placeholder(tf.float32, shape=[None, 28*28])
y_pl = tf.placeholder(tf.float32, shape=[None, 10])

In [4]:
# Reshape the images from 28 x 28 to 32 x 32 so it would be easier
# to represent the number of pixels (32*32) as a tensor shape (4, 4, 4, 4, 4). 
x = tf.reshape(x_pl, [-1, 28, 28, 1])
x = tf.image.resize_images(x, [32, 32])
x = tf.reshape(x, [-1, 32*32])

In [5]:
# W1 = tf.get_variable("W1", shape=[32*32, 32*32],
#            initializer=tf.contrib.layers.xavier_initializer())
# Generate a random TT-matrix of size 1024 x 1024 and make it a variable.
W1 = t3f.get_variable("W1",
             initializer=t3f.random_matrix(((4, 4, 4, 4, 4), (4, 4, 4, 4, 4)), tt_rank=10))
b1 = tf.Variable(tf.zeros([32*32]))
# Use t3f.matmul to multiply a TT-matrix by a dense vector matrix x.
h1 = tf.nn.relu(t3f.matmul(x, W1) + b1)
W2 = tf.get_variable("W2", shape=[32*32, 10],
           initializer=tf.contrib.layers.xavier_initializer())
b2 = tf.Variable(tf.zeros([10]))
y = tf.matmul(h1, W2) + b2

In [6]:
cross_entropy = tf.reduce_mean(
    tf.nn.softmax_cross_entropy_with_logits(labels=y_pl, logits=y))
train_step = tf.train.AdamOptimizer(0.01).minimize(cross_entropy)

## Train the model
Note that it takes a few minutes.

In [7]:
sess = tf.InteractiveSession()
sess.run(tf.global_variables_initializer())

In [8]:
for _ in range(1000):
    batch = mnist.train.next_batch(100)
    train_step.run(feed_dict={x_pl: batch[0], y_pl: batch[1]})

## Test accuracy

In [9]:
correct_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(y_pl, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
print(accuracy.eval(feed_dict={x_pl: mnist.test.images, y_pl: mnist.test.labels}))

0.9085


It is not top notch, but hey, we've used a plain old two layered fully-connected network. Also notice that thanks to the TT-layer, the actual number of parameters is really low: the first (and the largest) fully-connected layer of size 1024 x 1024 uses only 5120 params to represent the TT-matrix.

In [10]:
str(W1)

'A TT-Matrix variable of size 1024 x 1024, underlying tensor shape: (4, 4, 4, 4, 4) x (4, 4, 4, 4, 4), TT-ranks: (1, 10, 10, 10, 10, 1)'

In [11]:
t3f.utils.number_of_params(W1)

AttributeError: 'module' object has no attribute 'number_of_params'

In [12]:
num_params = 0
for core in W1.tt_cores:
    num_params += sess.run(tf.size(core))
num_params

5120