# Kronecker Product & Decomposition


For any matrix $\mathbf{X} \in \mathbb{R}^{m \times n}$ and $\mathbf{Y} \in \mathbb{R}^{m \times n}$, the Hadamard product $\mathbf{X} \circ \mathbf{Y}$ is defined as 
$$
\begin{equation}
\mathbf{X} \circ \mathbf{Y} = \begin{bmatrix} 
    \mathbf{X}_{11}\mathbf{Y}_{11} & \dots  & \mathbf{X}_{1n}\mathbf{Y}_{1n} \\
    \vdots & \ddots & \vdots \\
    \mathbf{X}_{m1}\mathbf{Y}_{m1} & \dots  & \mathbf{X}_{mn}\mathbf{Y}_{mn} 
    \end{bmatrix} \in \mathbb{R}^{m\times n},
\end{equation}
$$
where $\mathbf{X}_{ij}$ **only interacts** with $\mathbf{Y}_{ij}$. 



For any matrix $\mathbf{Z} \in \mathbb{R}^{p \times q}$, the \ac{KP}  $\mathbf{X} \otimes \mathbf{Z}$ is a block matrix:
$$
\begin{equation}
\mathbf{X} \otimes \mathbf{Z} = \begin{bmatrix} 
    \mathbf{X}_{11}\mathbf{Z} & \dots  & \mathbf{X}_{1n}\mathbf{Z} \\
    \vdots & \ddots & \vdots \\
    \mathbf{X}_{m1}\mathbf{Z} & \dots  & \mathbf{X}_{mn}\mathbf{Z} 
    \end{bmatrix} \in \mathbf{R}^{mp \times nq},
\end{equation}
$$
where \textbf{every element} of $\mathbf{X}$ \textbf{interacts with every element} of $\mathbf{Z}$. 
In contrast to the Hadamard product, the Kronecker product is not commutative, i.e., $\mathbf{X} \otimes \mathbf{Z} \not = \mathbf{Z} \otimes \mathbf{X}$ most commonly holds. For more details, we refer to~\cite{van2000ubiquitous,graham2018kronecker}. 

Numerous works have shown that Kronecker Product can be effectively applied to decompose a large matrix into two smaller matrices. A large weight matrix of $\mathbf{W}\in \mathbb R^{mp\times nq}$ can be decomposed into any $\mathbf{X} \in \mathbb{R}^{m_1  \times n_1} $ and $ \mathbf{Z} \in \mathbb{R}^{ \frac{mp}{m_1} \times \frac{nq}{n_1}}$. Different compression factors can be achieved through different shape configurations of smaller matrices. Tahaei et. al. have shown that the following formulation can be effectively used to compute a linear transformation encoded in a weight matrix $\mathbf{W}$: 
$$
\begin{equation}
\Big( \mathbf{X}\otimes \mathbf{Z} \Big)x=\mathcal{V} \Big(\mathbf{Z} \; \mathcal{R}_{\frac{nq}{n_1} \times n_1}(x) \mathbf{X}^\top \Big), \label{eq:implicit_kronecker_decomposition}
\end{equation}
$$
where $x \in \mathbb{R}^{nq}$ represent input feature vector, $\mathcal{V}: \mathbb{R} \to \mathbb{R}^{mp}$ flattens an input matrix into a vector, $ \mathcal{R}_{ \frac{nq}{n_1} \times n_1} $ converts x to a $\frac{nq}{n_1}$ by $n_1$ matrix by dividing the vector to columns of size $\frac{nq}{n_1}$ and concatenating the resulting columns together. This computation reduces the number of floating point operations from $(2 m_1 m_2 -1) n_1 n2 $ to $min((2n_2 -1)m_2 n_1 + (2n_1 -1) m_2 m_1 ,(2 n_1 -1)n_2 m_1 + (2 n_2 -1) m_2 m_1 )$, where $n_1= \frac{mp}{m1}$ and $n_2= \frac{nq}{n_1}$


**Disclaimer: Please refer to our research paper [Kronecker Decomposition for Knowledge Graph Embeddings](https://arxiv.org/abs/2205.06560) for more details and references.**

# Kronecker product


1. $W \in R^{m \times n}$

2. $X \in R^{m_1 \times n_1}$

3. $Z \in R^{m2 \times n_2 }$

3. $m_2= m \ m_1$ and $n_2 = n \ n_1$

For any matrix $X \in R^{m \times n}$ and $Y \in R^{p \times q}$, the Kronecker product $X \otimes Y$ is a block matrix:

\begin{align*}
W=X \otimes Y = \begin{bmatrix} 
x_{11}Y & \dots  & x_{1n}Y \\
\vdots & \ddots & \vdots \\
x_{m1}Y & \dots  & x_{mn}Y 
\end{bmatrix} \in R^{mp \times nq},
\end{align*}

where $x_{ij}$ is the element of $X$ at its $i^{\text{th}}$ row and $j^{\text{th}}$ column.

In [1]:
import torch
m, n = 6,4
m1, n1 = 3, 2
m2, n2 = m//m1, n//n1
A=torch.randn(m1,n1)
B=torch.randn(m2,n2)
W=torch.kron(A,B)
W.shape

torch.Size([6, 4])

# Implicit Linear Transformation via KP

In [2]:
def V(x):
    return x.flatten()
def R(x,n2,n1):
    return x.reshape(n2,n1)
def implicit_linear_transformation(A,B,x):
    _,n2 = B.shape
    _,n1 = A.shape    
    return V(B @ R(x,n2,n1)@A.T)

In [3]:
x=torch.randn(n)*1.
print(x,x.dtype)
print(W@x)
print(implicit_linear_transformation(A,B,x))

tensor([-0.7271, -0.4136,  0.2200,  0.0842]) torch.float32
tensor([ 1.5665, -2.2056, -0.1857,  0.2525,  0.2263, -0.3015])
tensor([ 0.1762, -0.0911,  0.1599, -0.7084,  0.2867, -0.4906])


In [4]:
x=torch.ones(n)* 3.1
print(x,x.dtype)
print(W@x)
print(implicit_linear_transformation(A,B,x))

tensor([3.1000, 3.1000, 3.1000, 3.1000]) torch.float32
tensor([-1.6691,  1.9497,  1.9045, -2.2247, -3.5096,  4.0996])
tensor([-1.6691,  1.9045, -3.5096,  1.9497, -2.2247,  4.0996])


In [5]:
batch_x=torch.randn(100).reshape(5,1,20)

# Batch Kronecker Product Implementations

In [6]:
def kron(a, b):
    """
    Kronecker product of matrices a and b with leading batch dimensions.
    Batch dimensions are broadcast. The number of them mush
    :type a: torch.Tensor
    :type b: torch.Tensor
    :rtype: torch.Tensor
    """
    siz1 = torch.Size(torch.tensor(a.shape[-2:]) * torch.tensor(b.shape[-2:]))
    res = a.unsqueeze(-1).unsqueeze(-3) * b.unsqueeze(-2).unsqueeze(-4)
    siz0 = res.shape[:-4]
    return res.reshape(siz0 + siz1)

In [7]:
kron(batch_x,batch_x).shape

torch.Size([5, 1, 400])

In [8]:
def kronecker_product_einsum_batched(A: torch.Tensor, B: torch.Tensor):
    """
    Batched Version of Kronecker Products
    :param A: has shape (b, a, c)
    :param B: has shape (b, k, p)
    :return: (b, ak, cp)
    """
    assert A.dim() == 3 and B.dim() == 3

    res = torch.einsum('bac,bkp->bakcp', A, B).view(A.size(0),
                                                    A.size(1) * B.size(1),
                                                    A.size(2) * B.size(2)
                                                    )
    return res

In [9]:
kronecker_product_einsum_batched(batch_x,batch_x).shape

torch.Size([5, 1, 400])