> How can we generate polygons that segmenting an image from a hidden representation of that very information?

For now lets set aside the question of how we are going to train the network and just think about producing polygonous representations. 

In [1]:
import tensorflow as tf
import sonnet as snt

In [2]:
max_pts = 30
classes = 1  # should be able to consider the classes independently.
sizes = [128, 128]
max_instances = 10
batch = 50

In [3]:
sess = tf.InteractiveSession()
h = tf.random_normal([batch, 4, 4, 128])  # hidden states from somewhere

We could use cnns, fully conencted layers and interpret their output as the points of a polygon. The ith output is the ith point in a polygon, giving us the ordering needed.

In [4]:
with tf.variable_scope('semantic_segmentation'):
    net = [f for s in sizes for f in (snt.Conv2D(s, 3, stride=2), tf.nn.relu)]
    cnn = snt.Sequential(net)
    y = cnn(h)
    y = tf.reshape(y, [batch, 128])
    points = snt.Linear(2*max_pts*classes)(y)
    print(points)

Tensor("semantic_segmentation/linear/add:0", shape=(50, 60), dtype=float32)


Cool. What about multi polygons? We could similarly partition the output for each instance of a polygon and interpret the ith partition as the ith polygon.

In [5]:
with tf.variable_scope('instance_segmentation'):
    net = [f for s in sizes for f in (snt.Conv2D(s, 3, stride=2), tf.nn.relu)]
    cnn = snt.Sequential(net)
    y = cnn(h)
    y = tf.reshape(y, [batch, 128])
    points = snt.Linear(2*max_pts*classes*max_instances)(y)
    print(points)

Tensor("instance_segmentation/linear/add:0", shape=(50, 600), dtype=float32)


In [None]:
tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES, 'instance_segmentation/linear')

Hmm. The number of outputs is starting to become prohibitive... (but still a lot smaller than producing images).
However, CNNs have factorised the parameters required so that they are shared across many spartial positions (thus making the prediction of masks more efficient in parameters). We need sharing across the generation of different polygons.

In [None]:
### TODO. actually draw some polygons generated by these nets

In [21]:
h = tf.reshape(h, [50, -1, 128])

In [26]:
net = [f for s in sizes for f in (snt.GRU(hidden_size=s), tf.nn.relu)]
rnn = snt.DeepRNN(net, skip_connections=False)
init_state = rnn.initial_state(50)
output, next_state = tf.nn.dynamic_rnn(rnn, h, initial_state=init_state)
print(output)

points = snt.Linear(2*max_pts)(output[:, -1, :])  # just use the final output

Tensor("rnn_1/transpose:0", shape=(50, 16, 128), dtype=float32)


Awesome. Now how do we train it...

In [33]:
rnn = snt.GRU(hidden_size=128)
rnn = snt.ACTCore(rnn, 128, 1-1e-4, lambda x: x)
init_state = rnn.initial_state(50)
output, next_state = tf.nn.dynamic_rnn(rnn, h, initial_state=init_state)

In [34]:
output

(<tf.Tensor 'rnn_4/transpose:0' shape=(50, 16, 128) dtype=float32>,
 (<tf.Tensor 'rnn_4/transpose_1:0' shape=(50, 16, 1) dtype=float32>,
  <tf.Tensor 'rnn_4/transpose_2:0' shape=(50, 16, 1) dtype=float32>))

Or. We could represent a polygon as a set of vectors, starting from a seed point. (we could expect that the start and end should be at the same position, we could regularise for this)