In [1]:
from __future__ import absolute_import, print_function, division

In [2]:
import random
import numpy as np
import tensorflow as tf

## Synthetic data

We create a set of *boards* of size $5 \times 5$ with 3 channels for each position. The label at each position $(j,k)$ is computed as a function of the channel values $v_{jkl}$ at that position.

$$
    L_{jk} = \sum_{l=0}^2 a_l \cdot (v_{jkl})^{l+1}
$$

Here, $a_l$ denote arbitrary coefficients defined below. Note that this function is the same for every position. Thus, a sufficiently deep convolutional network with only $1 \times 1$ kernels should easily learn this function by simultaneously looking at all the positions of any given training board.

In [3]:
def create_data(N):
    batch = np.zeros([N,5,5,3])
    labels = np.zeros([N,5,5,1])
    a=[.9, .3, -.2]
    for i in range(N):
        for x in range(5):
            for y in range(5):
                for l in range(3):
                    v = 2*(random.random()-0.5)
                    batch[i][x][y][l] = v
                    labels[i][x][y][0] += a[l] * v**(l+1)
    return batch,labels

In [4]:
N = 100
batch, labels = create_data(N)
batch_t, labels_t = create_data(N)

In [5]:
batch.shape

(100, 5, 5, 3)

Smartly rearranging the dimensions of the first *board* of the batch shows the three $5 \times 5$ channels

In [6]:
print(np.rollaxis(batch[0], 2, 0))

[[[ 0.35850863 -0.50650643  0.70127042  0.10181383  0.24201709]
  [ 0.32239252  0.11228678 -0.82959599  0.81922063  0.44812617]
  [ 0.95869282 -0.61259188 -0.09248631  0.63458556  0.85511342]
  [ 0.70357736  0.93989487 -0.4975352   0.11490962  0.06208572]
  [-0.239458    0.94670246 -0.16017788 -0.2543227   0.85040315]]

 [[-0.31278101  0.11198577  0.74601011 -0.48412776 -0.40929415]
  [-0.6502192  -0.0682995   0.30129259 -0.95519985  0.17725096]
  [ 0.27154885  0.71561505 -0.81219297 -0.51167167  0.36633658]
  [-0.75669388  0.67317356  0.5935581  -0.26156191 -0.03661875]
  [ 0.70679617 -0.48727793  0.18582762  0.18072797  0.09590351]]

 [[-0.32884058 -0.57291667  0.15772602  0.06486667 -0.00777459]
  [-0.73416199 -0.28142968  0.7019069  -0.92166543  0.24770141]
  [-0.72763667 -0.57381429  0.39340685  0.3592027   0.1685061 ]
  [-0.17761214 -0.37058863  0.06300717 -0.2637818  -0.71707643]
  [ 0.86579486 -0.07786734 -0.25712719 -0.60855109 -0.69856288]]]


In [7]:
_inputs = tf.placeholder(tf.float32, [None, 5, 5, 3])
_labels = tf.placeholder(tf.float32, [None, 5, 5, 1])

### A special CNN
The convolutional network below can actually be regarded as a single convolutional layer with the kernel itself being a 5-layer feed-forward NN with layers $[3, 8, 32, 32, 1]$.

In [8]:
conv1 = tf.layers.conv2d(inputs=_inputs, filters=32, kernel_size=[1,1], strides=[1,1], padding='VALID', activation=tf.nn.elu)
conv2 = tf.layers.conv2d(inputs=conv1, filters=128, kernel_size=[1,1], strides=[1,1], padding='VALID', activation=tf.nn.elu)
conv3 = tf.layers.conv2d(inputs=conv2, filters=32, kernel_size=[1,1], strides=[1,1], padding='VALID', activation=tf.nn.elu)
conv4 = tf.layers.conv2d(inputs=conv3, filters=1, kernel_size=[1,1], strides=[1,1], padding='VALID')

loss = tf.losses.mean_squared_error(_labels,conv4)
optimizer = tf.train.AdamOptimizer(learning_rate=3e-4).minimize(loss)

Instructions for updating:
Use keras.layers.conv2d instead.
Instructions for updating:
Colocations handled automatically by placer.
Instructions for updating:
Use tf.cast instead.


### Training
We train the network and compute training loss and test loss once in a while

In [10]:
with tf.Session() as session:
    session.run(tf.global_variables_initializer())
    for i in range(3001):
        _ = session.run(optimizer, feed_dict={_inputs: batch, _labels: labels})
        if i % 1000 == 0:
            l = session.run(loss, feed_dict={_inputs: batch, _labels: labels})
            l_t = session.run(loss, feed_dict={_inputs: batch_t, _labels: labels_t})
            print(l, l_t)

0.05490766 0.053159654
0.0002807435 0.000308975
0.00010162818 0.00011235645
5.0289535e-05 5.325704e-05


From the smooth convergence also on the test set we can see that the network has indeed learned our label function.