## Install

Python 3 only:
```bash
pip install canton
```

## Usage

Import the essentials:

In [1]:
import tensorflow as tf
import canton as ct
import numpy as np

Define our input:

In [2]:
input_variable = tf.Variable(np.random.normal(loc=0,scale=1,size=[1,256,256,3]
    ).astype('float32'))

Then feed it through three 2-D convolutional layers, where:
- conv_0 has its own weights
- conv_1 and conv_2 share weights

In order to do this we first create 2 convolutional layers, each with its own set of weights:

In [3]:
conv = ct.Conv2D(3,16,3)
shared_conv = ct.Conv2D(16,16,3)
print(conv.weights)
print(shared_conv.weights)

[<tensorflow.python.ops.variables.Variable object at 0x0000000009F81780>, <tensorflow.python.ops.variables.Variable object at 0x0000000009F95BE0>]
[<tensorflow.python.ops.variables.Variable object at 0x0000000009FA4358>, <tensorflow.python.ops.variables.Variable object at 0x0000000009FBA940>]


Then simply apply the second layer twice:

In [4]:
i = conv(input_variable)
i = shared_conv(i)
out = shared_conv(i)
print(out)

# define loss
loss = tf.reduce_mean(out**2.)

Tensor("add_2:0", shape=(1, 256, 256, 16), dtype=float32)


Now let's assume you only want to train the shared layer's weights (keep the first `conv` layer's weight frozen). Instead of using `tf.get_collection(some_keys_you_have_to_remember)`, or `get_layer('some_name').trainable = False`, you simply pick the weights you want to train and throw them into `optimizer.minimize()`:

In [5]:
# define optimizer
opt = tf.train.AdamOptimizer(1e-3)
# define train op
train_step = opt.minimize(loss,var_list=shared_conv.get_weights())

This seems stupid (doing more work than Keras) at first glance, but super handy if you happen to be training GANs or anything NOT for Kaggle competitions.

Now you can train it the TensorFlow way:

In [6]:
sess = ct.get_session() # just the TF Session
sess.run(tf.global_variables_initializer()) # initialize all weights
for i in range(10):
    res = sess.run([train_step,loss],feed_dict={}) # you should feed inputs if you have
    print('loss:',res[1])

loss: 3.80577
loss: 3.61594
loss: 3.43501
loss: 3.26269
loss: 3.09886
loss: 2.94324
loss: 2.79556
loss: 2.65554
loss: 2.52299
loss: 2.39752


Ok the loss is decreasing, which means the weights are getting trained. Now let's assume you like this "2Conv1Weight" idea very much, and wanna apply this layer two more times to your model:

In [7]:
out = shared_conv(out)
out = shared_conv(out)

# redefine loss
loss = tf.reduce_mean(out**2.)
# redefine train op (Note: do not redefine the optimizer, which will produce error due to variable scope clashing)
train_step = opt.minimize(loss,var_list=shared_conv.get_weights())

We don't have to reinitialize all the variables, since the previous session is still open:

In [8]:
for i in range(10):
    res = sess.run([train_step,loss],feed_dict={})
    print('loss:',res[1])

loss: 5.11723
loss: 4.72265
loss: 4.32541
loss: 3.94039
loss: 3.57698
loss: 3.24042
loss: 2.9328
loss: 2.65436
loss: 2.40401
loss: 2.18001


As you can see from the loss values, the weights are not lost between two runs. Now let's assume you wanna save the weights to a file (in numpy format) for future uses:

In [9]:
shared_conv.save_weights('shared_conv.npy')

2 weights (and variables) obtained.
successfully saved to shared_conv.npy


True

Now you train the model for some more steps:

In [10]:
for i in range(10):
    res = sess.run([train_step,loss],feed_dict={})
    print('loss:',res[1])

loss: 1.98021
loss: 1.80226
loss: 1.6439
loss: 1.50296
loss: 1.37744
loss: 1.26553
loss: 1.16558
loss: 1.07618
loss: 0.996039
loss: 0.924061


Now the loss is too low, showing signs of overfitting. Assume you want to revert your weights to the last checkpoint:

In [11]:
shared_conv.load_weights('shared_conv.npy')

successfully loaded from shared_conv.npy
2 weights assigned.


True

Now train the model again. As you can see the loss values increased back to our previous checkpoint. (However the training dynamic governed by the Adam optimizer didn't change, so the results are not going to be exactly identical)

In [12]:
for i in range(10):
    res = sess.run([train_step,loss],feed_dict={})
    print('loss:',res[1])

loss: 1.98021
loss: 1.83788
loss: 1.70334
loss: 1.57729
loss: 1.46005
loss: 1.35165
loss: 1.2519
loss: 1.16046
loss: 1.07685
loss: 1.00061


## Concept of Cans

`Can` (`from canton import Can`) is the base class for the Conv2D layer above.

A Can is basically a container of actions and its associated weights.

When a Can is initialized, all its weight variables are created (but not initialized of course).

Every Can is callable after initialization. By calling a Can on a tensor, for example `i = shared_conv(i)`, you extend the computation graph and obtain a result tensor just like with TensorFlow, however no new weights will be created during the call. The weight is **shared** among all its calls.

As seen above, you can very easily save or restore the weights of a Can, or retrieve them as tensors. So, why not represent bigger building blocks, or even the whole network as a Can? That way we could build networks of arbitary complexity, and train them in interesting ways (like adding adversarial loss), without ever having to memorize all those variable names and scopes...

Yes, you can create Cans consisting of other Cans: that creates a Can Hierarchy.

## Can Hierarchy

Assume you came up with a new idea: Create two convolutional layer A and B, apply them one-after-another to the input N times:

- `i = B(A(i))` for N=1;

- `i = B(A(B(A(i))))` for N=2;

so why not combine A and B into one Can, and call that N times over the input? Then we only have to call `get_weights()`once to train with optimizer, call `save_weights()` once to save the parameters.

Here's the default class inheritance approach:

In [13]:
class DoubleConv(ct.Can):
    def __init__(self):
        super().__init__() # init base class
        self.convs = [ct.Conv2D(3,16,3),ct.Conv2D(16,3,3)] # define conv2d cans
        self.incan(self.convs) # add as subcans
    def __call__(self,i):
        i = self.convs[0](i)
        i = self.convs[1](i)
        return i

> Note: I know it's verbose. You don't always have to do that. Just keep reading.

By calling `self.incan(cans)`, you add one or more Can(s) as the **SubCan(s)** of the Can. You can access the list of a Can's SubCans via its **subcans** property.

In [14]:
dc = DoubleConv()
print(dc.subcans)

[<canton.cans.Conv2D object at 0x000000000A1DC9B0>, <canton.cans.Conv2D object at 0x000000000A1BA748>]


You can of course get its weights: It will traverse the hierarchy tree and collect weight tensors from its subcans.

In [15]:
print(dc.get_weights())

[<tensorflow.python.ops.variables.Variable object at 0x000000000A1DC9E8>, <tensorflow.python.ops.variables.Variable object at 0x000000000A1DCA58>, <tensorflow.python.ops.variables.Variable object at 0x000000000A1BA898>, <tensorflow.python.ops.variables.Variable object at 0x000000000A1DD6A0>]


That's right, 2 convolutions needs 4 variables (2 weights and 2 biases). 

And yes, you can call it and train it, just like before:

In [16]:
i = dc(input_variable)
out = dc(i) # N=2

loss = tf.reduce_mean(out**2.)
train_step = opt.minimize(loss, var_list=dc.get_weights())

sess.run(tf.global_variables_initializer()) # init and re-init all the weights (mainly for the optimizer)
for i in range(10):
    res = sess.run([train_step,loss],feed_dict={})
    print('loss:',res[1])

loss: 7.93796
loss: 7.23227
loss: 6.59103
loss: 6.00942
loss: 5.48248
loss: 5.00526
loss: 4.57313
loss: 4.18187
loss: 3.82753
loss: 3.50663


Again we can save and restore the Can:

In [17]:
dc.save_weights('test.npy')
dc.load_weights('test.npy')

4 weights (and variables) obtained.
successfully saved to test.npy
successfully loaded from test.npy
4 weights assigned.


True

## Alternative Facts

Class inheritance is boring. Is there any better ways to assemble a Can? Well you may use closure:

In [18]:
def DoubleConv2():
    can = ct.Can()
    convs = [ct.Conv2D(3,16,3),ct.Conv2D(16,3,3)]
    def call(i):
        i = convs[0](i)
        i = convs[1](i)
        return i
    can.incan(convs)
    can.set_function(call)
    return can

dc2 = DoubleConv2()
out = dc2(input_variable)

loss = tf.reduce_mean(out**2.)
train_step = opt.minimize(loss, var_list=dc2.get_weights())
sess.run(tf.global_variables_initializer()) # init and re-init all the weights (mainly for the optimizer)
for i in range(10):
    res = sess.run([train_step,loss],feed_dict={})
    print('loss:',res[1])

loss: 2.59758
loss: 2.4894
loss: 2.38474
loss: 2.28357
loss: 2.18589
loss: 2.09167
loss: 2.00088
loss: 1.9135
loss: 1.82947
loss: 1.74873


In [19]:
dc2.load_weights('test.npy')

successfully loaded from test.npy
4 weights assigned.


True

## Which is still unintuitive, ugly and verbose

Especially if you don't need parameter sharing inside the newly created Can. Well that's the price for all its convenience! Here's another solution if your model is simply a chain of Cans:

In [20]:
def DoubleConv3():
    c = ct.Can()
    c.add(ct.Conv2D(3,16,3))
    c.add(ct.Conv2D(16,3,3))
    c.chain()
    return c

Which is very close to what you would do with Keras.

`c.add()` is equal to `c.incan()`, except that it returns the added Can. `c.chain()` builds the \_\_call\_\_ function for a Can with all its SubCans, so you don't have to `set_function()` yourself.

Everything still works:

In [21]:
dc3 = DoubleConv3()
out = dc3(input_variable)

loss = tf.reduce_mean(out**2.)
train_step = opt.minimize(loss, var_list=dc3.get_weights())
sess.run(tf.global_variables_initializer()) # init and re-init all the variables (mainly for the optimizer)
for i in range(10):
    res = sess.run([train_step,loss],feed_dict={})
    print('loss:',res[1])

loss: 2.62191
loss: 2.52024
loss: 2.42178
loss: 2.3265
loss: 2.23437
loss: 2.14533
loss: 2.05935
loss: 1.97639
loss: 1.8964
loss: 1.81933


## Implement a new Can for your own need

please refer to `canton/cans.py`. Here's a simple example:

In [22]:
# you know, MLP
class Dense(ct.Can):
    def __init__(self,num_inputs,num_outputs):
        super().__init__()
        self.W = self.make_weight([num_inputs,num_outputs])
        self.b = self.make_bias([num_outputs])
    def __call__(self,i):
        W,b = self.W,self.b
        d = tf.matmul(i,W)+b
        return d