# Canton

The Canton library is a lightweight wrapper around TensorFlow, focused on **parameter sharing**. The author of this library is a deep learning guy who has experienced both Torch, TensorFlow and Keras.

Canton is named after the city of Guangzhou. The French came a long time ago; they used to call this city "Canton", which sounds like "Guangdong" when pronounced in French, which is actually the name of the province, not the city. Since then, all westerners start to use the word Canton. The Yue language, a dialect of Chinese commonly used in Guangzhou and the United States, is known as "Cantonese" in English for this reason.

## The Canton Philosophy

- The network units, and the weights associated with them, should be tied together as one and not seperated.
- Obtaining the weight tensors of any given action(or a set of actions bound together) should be as easy as calling `some_action.get_weights()`, not `tf.very_long_method_name(some_collection).some_other_method(some_name_prefixes)`.
- You should by default be able to use a unit everywhere, while maintaining only one set of weights for that unit.

## Story Behind

TensorFlow is cool in general, but some of its designs are disasterous. The official way to share variables(weights) between copies of networks is to use `tf.variable_scope(scopename, reuse=True)` and `tf.get_variable(name)`. It then became the programmer's responsibility to specify(and keep track of) the scope names, variable names and flags. As a programmer, I soon realized that *There are only two hard things in Computer Science: cache invalidation and naming things.*

TensorFlow is from Google, where CS PhDs write all the code, so that mustn't be their problem. In order to deal with my own incompetence, I wrote this library.

> Keras also wrapped the quirks and weirdness of TensorFlow and allows for rapid prototyping, but if you want to introduce your own calculation and/or manipulation operations into the model, you must first inherit Keras' Layer class, then (in some cases) specify a shape inference function, which is boring and inefficient. Besides that, Keras does not support anything other than the kaggle-styled, input-to-output-chained, one-loss-updates-everything architecture. I tried various method to wrap aroud Keras(in order to add my own functionality), but the internal complexity of Keras continuously freaked me off(I read almost every page of its documentation and half its code).

> Other learning frameworks also made various attempts on solving the same problem, using fancy descriptions like "imperative vs declarative". Well, maybe they do need a lot of PhDs to solve the *second hardest thing in Computer Science...*

## Install

pip install canton

## Usage

Input the essentials:

In [1]:
import tensorflow as tf
import canton as ct
import numpy as np

Define our input:

In [2]:
input_variable = tf.Variable(np.random.normal(loc=0,scale=1,size=[1,256,256,3]
    ).astype('float32'))

Then feed it through three 2-D convolutional layers, where:
- conv_0 has its own weights
- conv_1 and conv_2 share weights

In order to do this we first create 2 convolutional layers, each with its own set of weights:

In [3]:
conv = ct.Conv2D(3,16,3)
shared_conv = ct.Conv2D(16,16,3)
print(conv.weights)
print(shared_conv.weights)

[<tensorflow.python.ops.variables.Variable object at 0x00000000092C3908>, <tensorflow.python.ops.variables.Variable object at 0x00000000092E7550>]
[<tensorflow.python.ops.variables.Variable object at 0x00000000092FFB38>, <tensorflow.python.ops.variables.Variable object at 0x00000000092FAB70>]


Then simply apply the second layer twice:

In [4]:
i = conv(input_variable)
i = shared_conv(i)
out = shared_conv(i)
print(out)

# define loss
loss = tf.reduce_mean(out**2.)

Tensor("add_2:0", shape=(1, 256, 256, 16), dtype=float32)


Now let's assume you only want to train the shared layer's weights (keep the first `conv` layer's weight frozen). Instead of using `tf.get_collection(some_keys_you_have_to_remember)`, or `get_layer('some_name').trainable = False`, you simply pick the weights you want to train and throw them into `optimizer.minimize()`:

In [5]:
# define optimizer
opt = tf.train.AdamOptimizer(1e-3)
# define train op
train_step = opt.minimize(loss,var_list=shared_conv.get_weights())

This seems stupid (doing more work than Keras) at first glance, but super handy if you happen to be training GANs or anything NOT for Kaggle competitions.

Now you can train it the TensorFlow way:

In [6]:
sess = ct.get_session() # just the TF Session
sess.run(tf.global_variables_initializer()) # initialize all weights
for i in range(10):
    res = sess.run([train_step,loss],feed_dict={}) # you should feed inputs if you have
    print('loss:',res[1])

loss: 0.266927
loss: 0.251622
loss: 0.237126
loss: 0.223424
loss: 0.210489
loss: 0.198296
loss: 0.186811
loss: 0.176009
loss: 0.165861
loss: 0.156332


Ok the loss is decreasing, which means the weights are getting trained. Now let's assume you like this "2Conv1Weight" idea very much, and wanna apply this layer two more times to your model:

In [7]:
out = shared_conv(out)
out = shared_conv(out)

# redefine loss
loss = tf.reduce_mean(out**2.)
# redefine train op (Note: do not redefine the optimizer, which will produce error due to variable scope clasing)
train_step = opt.minimize(loss,var_list=shared_conv.get_weights())

We don't have to reinitialize all the variables, since the previous session is still open:

In [8]:
for i in range(10):
    res = sess.run([train_step,loss],feed_dict={})
    print('loss:',res[1])

loss: 0.131151
loss: 0.118935
loss: 0.107218
loss: 0.0963017
loss: 0.0863248
loss: 0.0773284
loss: 0.0692928
loss: 0.0621648
loss: 0.0558718
loss: 0.0503311


As you can see from the loss value, the weights are not lost between two runs. Now let's assume you wanna save the weights to file(in numpy format) for future uses:

In [9]:
shared_conv.save_weights('shared_conv.npy')

weights obtained.
successfully saved to shared_conv.npy


True

Now you train the model for some more steps:

In [10]:
for i in range(10):
    res = sess.run([train_step,loss],feed_dict={})
    print('loss:',res[1])

loss: 0.0454626
loss: 0.041187
loss: 0.0374314
loss: 0.0341301
loss: 0.031224
loss: 0.0286602
loss: 0.0263941
loss: 0.0243854
loss: 0.0226004
loss: 0.0210091


Now the loss is too low, showing signs of overfitting. Assume you want to revert your weights to the last checkpoint:

In [11]:
shared_conv.load_weights('shared_conv.npy')

successfully loaded from shared_conv.npy
2 weights assigned.


True

Now train the model again. As you can see the loss values increased back to our previous checkpoint. (However the training dynamic governed by the Adam optimizer didn't change, so the results are not going to be exactly identical)

In [12]:
for i in range(10):
    res = sess.run([train_step,loss],feed_dict={})
    print('loss:',res[1])

loss: 0.0454626
loss: 0.0422226
loss: 0.0391709
loss: 0.0363211
loss: 0.0336778
loss: 0.0312398
loss: 0.0290012
loss: 0.0269524
loss: 0.025082
loss: 0.0233781
