## Motivation

In [Novikov eta al. 2016](http://papers.nips.cc/paper/5787-tensorizing-neural-networks.pdf) they use the tensor-train representation to construct a weight matrix. However, the tensor-train constructs a high dimensional teensor and they simply reshape it into a matrix. I though this was interesting/weird and want to investigate. 

Specifically, I was interested in how parameters are shared across the constructed weight matrix. Weight tying is an important part of designing networks, and I am interested in the relationship between parameter tying schemes and tensor-networks and reshaping.

The motivating example is that a convolution can be written as a parameter sharing scheme in matrix form. Constructed using a circulant, ...?!

In [2]:
import sympy as sym

In [41]:
s = 'abcdefghijkl'

def construct_core(s, n):
    return sym.tensor.Array([[[sym.Symbol('{}_{}{}{}'.format(s,i,j,k)) 
                               for i in range(n)] 
                              for j in range(n)] 
                             for k in range(n)])
    
def construct_cores(N, n):
    return [construct_core(s[i], n) for i in range(N)]

x = construct_cores(5, 2)
x

[[[[a_000, a_100], [a_010, a_110]], [[a_001, a_101], [a_011, a_111]]],
 [[[b_000, b_100], [b_010, b_110]], [[b_001, b_101], [b_011, b_111]]],
 [[[c_000, c_100], [c_010, c_110]], [[c_001, c_101], [c_011, c_111]]],
 [[[d_000, d_100], [d_010, d_110]], [[d_001, d_101], [d_011, d_111]]],
 [[[e_000, e_100], [e_010, e_110]], [[e_001, e_101], [e_011, e_111]]]]

In [39]:
t = x[0]
for i in range(5-1):
    t = sym.tensorproduct(t, x[i+1])
    t = sym.tensorcontraction(t, (3,4))  # not sure if this is right...
t
# NEED a way to visualise!

[[[[[[[e_000*(d_000*(c_000*(a_000*b_000 + a_000*b_011) + c_001*(a_000*b_100 + a_000*b_111)) + d_000*(c_110*(a_000*b_000 + a_000*b_011) + c_111*(a_000*b_100 + a_000*b_111))) + e_000*(d_011*(c_000*(a_000*b_000 + a_000*b_011) + c_001*(a_000*b_100 + a_000*b_111)) + d_011*(c_110*(a_000*b_000 + a_000*b_011) + c_111*(a_000*b_100 + a_000*b_111))), e_100*(d_000*(c_000*(a_000*b_000 + a_000*b_011) + c_001*(a_000*b_100 + a_000*b_111)) + d_000*(c_110*(a_000*b_000 + a_000*b_011) + c_111*(a_000*b_100 + a_000*b_111))) + e_100*(d_011*(c_000*(a_000*b_000 + a_000*b_011) + c_001*(a_000*b_100 + a_000*b_111)) + d_011*(c_110*(a_000*b_000 + a_000*b_011) + c_111*(a_000*b_100 + a_000*b_111)))], [e_010*(d_000*(c_000*(a_000*b_000 + a_000*b_011) + c_001*(a_000*b_100 + a_000*b_111)) + d_000*(c_110*(a_000*b_000 + a_000*b_011) + c_111*(a_000*b_100 + a_000*b_111))) + e_010*(d_011*(c_000*(a_000*b_000 + a_000*b_011) + c_001*(a_000*b_100 + a_000*b_111)) + d_011*(c_110*(a_000*b_000 + a_000*b_011) + c_111*(a_000*b_100 + a_

In [30]:
help(sym.tensorcontraction)

Help on function tensorcontraction in module sympy.tensor.array.arrayop:

tensorcontraction(array, *contraction_axes)
    Contraction of an array-like object on the specified axes.
    
    Examples
    
    >>> from sympy import Array, tensorcontraction
    >>> from sympy import Matrix, eye
    >>> tensorcontraction(eye(3), (0, 1))
    3
    >>> A = Array(range(18), (3, 2, 3))
    >>> A
    [[[0, 1, 2], [3, 4, 5]], [[6, 7, 8], [9, 10, 11]], [[12, 13, 14], [15, 16, 17]]]
    >>> tensorcontraction(A, (0, 2))
    [21, 30]
    
    Matrix multiplication may be emulated with a proper combination of
    ``tensorcontraction`` and ``tensorproduct``
    
    >>> from sympy import tensorproduct
    >>> from sympy.abc import a,b,c,d,e,f,g,h
    >>> m1 = Matrix([[a, b], [c, d]])
    >>> m2 = Matrix([[e, f], [g, h]])
    >>> p = tensorproduct(m1, m2)
    >>> p
    [[[[a*e, a*f], [a*g, a*h]], [[b*e, b*f], [b*g, b*h]]], [[[c*e, c*f], [c*g, c*h]], [[d*e, d*f], [d*g, d*h]]]]
    >>> tensorcontraction(



Reshape.
Want;
- some properties that I can measure!?!
- some visualisations! (what happens when I reshape?)
- better intuition... need a concrete example to play with
-

## Neighborhoods

Picture I already have. Neighbors and where they go to.


## Connectedness (the dual of neighborhoods?)

What about the graph POV?

## How is reshape like a convolution?

For example, this is what we do when we want to do a convolution. Construct a tensor of patches (examples, X, Y, kernel, kernel) and then reshape it into a (examples x X x Y, kernel x kernel ) matrix.

## SVD

! And what about all the reshaping funny business going on in HSVD and HOSVD?


## Parameter sharing?

Aka, parameter sharing schemes. If we write the reshaped, constructed tensor, and show the receptive field of original parameters.
- are the receptive fields local, which tensor-nets/reshapings give local receptive fields?
- ?
-

This idea is orthogonal to reshaping, reshaping is just a nice way to visualise it?


$$\begin{aligned}
&= \begin{bmatrix}
a_{11} & a_{12} & a_{13} & a_{14} & a_{15} & a_{16} \\
a_{21} & a_{22} & a_{23} & a_{24} & a_{25} & a_{26} \\
a_{31} & a_{32} & a_{33} & a_{34} & a_{35} & a_{36} \\
a_{41} & a_{42} & a_{43} & a_{44} & a_{45} & a_{46} \\
a_{51} & a_{52} & a_{53} & a_{54} & a_{55} & a_{56} \\
a_{61} & a_{62} & a_{63} & a_{64} & a_{65} & a_{66} \\
\end{bmatrix} \\
&\text{(stack by columns. reshape by first indices fastest)}\\
&= \begin{bmatrix}
\begin{bmatrix}
a_{11} &  a_{31} & a_{51}\\
a_{21} & a_{41} & a_{61}\\
\end{bmatrix} & 
\begin{bmatrix}
a_{12} &  a_{32} & a_{52}\\
a_{22} & a_{42} & a_{62}\\
\end{bmatrix}\\
\begin{bmatrix}
a_{13} &  a_{33} & a_{53}\\
a_{23} & a_{43} & a_{63}\\
\end{bmatrix} & 
\begin{bmatrix}
a_{14} &  a_{34} & a_{54}\\
a_{24} & a_{44} & a_{64}\\
\end{bmatrix} \\
\begin{bmatrix}
a_{15} &  a_{35} & a_{55}\\
a_{25} & a_{45} & a_{65}\\
\end{bmatrix} & 
\begin{bmatrix}
a_{16} &  a_{36} & a_{56}\\
a_{26} & a_{46} & a_{66}\\
\end{bmatrix} \\
\end{bmatrix}\end{aligned}$$

Distances are not preserved. Originally $a_{33}$ is one index away from
$a_{32},a_{34},a_{23},a_{43}$. But after the reshaping, the set of
elements that d=1 are $a_{13},a_{53},a_{43},a_{31},a_{35},a_{34}$.
If we map these back into the original matrix, we can see that the
‘range’ of the indicies is speading. More are in each elements
neighbourhood. What does this mean?


Is reshaing a linear op!?
Does it commute, associate, distribute, ...
Firstly, its a unary operation?! So not sure what to do with that...

### Associativity

$\varrho(u) + (v + w) = (\varrho(u) + v) + w$

### Commutativity

$\varrho(a) + b = b + \varrho(a)$


$a(\mathring u + v) = \mathring{au} + av$



Reshaping is a permutation of the bases?