Skip to content

Latest commit

 

History

History
40 lines (32 loc) · 2.04 KB

Tensor_Regression_Networks.md

File metadata and controls

40 lines (32 loc) · 2.04 KB

Tensor Regression Networks

Main idea: CNN architectures work with tensors, since they work with images generally so the input is like (N, 84, 84, 3) if we're using RGB images. However, as I know, eventually the convolutional layers must end and the output should transform into a fully connected layer. This requires flattening the current data so that it's of shape (N, D) for some D. This throws away information (intuitively) so the authors have a tensor regression layer which replaces this flattening part (+fully connected layers afterwards). Clever!

This has two benefits:

  • Leveraging the tensor "structure" instead of throwing away information.
  • Reducing the number of parameters. Indeed, recall that FC layers are probably the most inefficient network layers.

Instead of flattening to get the final layer output (i.e. predictions) they use "low rank tensor decomposition." The "low rank" must be what captures the "essence" of the tensor.

The main technical part is showing how this can happen while also making the network differentiable. This is in a similar style as many papers, which introduce network components. They have to be made differentiable (to pass the peer review stage, of course). In particular, if we're doing a tensor contraction to get lower dimensional representations (formally called a Tensor Contraction Layer) this requires knowing the projection factor matrices V^(k). These are learned end-to-end with gradient backpropagation. The second layer they use is a Tensor Regression Layer.

Thus, they've got TCL and TRL layers. Note that contraction can (and probably should) happen before regression.

Benchmarks: they show that they can reduce the number of parameters in VGG and ResNet and still obtain state of the art performance. Models were implemented with the MXNet library, which might be better than TensorFlow for this purpose. And, um, MXNet is affiliated with Amazon, like these authors.

I think deeply understanding this requires more knowledge of tensors than I have, but I can see the rationale.