---
layout: single
toc: true
published: true 
header:
  teaser: \assets\images\ML\feedforward_train.png
categories:
  - Programming
tags:
  - Einsum
  - Tensors
excerpt: "All ironic clickbait aside, the einsum function is super useful."
--- 

> “Tensor products [are] some sort of product. . . of tensors, I believe.” - Andrew Gelman

# Motivation

Suppose you want to take the derivative of a $n \times m$ matrix $A \in \mathbb{R}^{n \times m}$ with respect to a $k$ length vector $v \in \mathbb{R}^k$. (If you have ever implemented any neural nets from scratch, you already know this is something you might want to do).  This is actually a bit tricky however, because in normal multivariate calculus, we'd represent the derivative of a $n$-length vector with respect to another $m$-length vector as a $n \times m$ matrix. However, because we're taking the derivative of a matrix with respect to a vector, we need to calculate $n \cdot m \cdot k$ separate values,

The solution is to use a **tensor**, which is a higher-dimensional generalization of a matrix. In particular, a vector is a 1D tensor, since it has just one axis, and a matrix is a 2D tensor, since it has two axes. More formally, if $A \in \mathbb{R}^{n \times m}, v \in \mathbb{R}^k$, then we write
$$ \frac{\partial A}{\partial v} \in \mathbb{R}^{n \times m \times k} $$
Similarly, if $B$ is a $k \times d$ matrix, i.e. $B \in \mathbb{R}^{k \times d}$, then we could easily take the derivative of $A$ with respect to $B$: 
$$ \frac{\partial A}{\partial B} \in \mathbb{R}^{n \times m \times k \times d} $$

Unfortunately, working with tensors can be a bit annoying. Suppose you were actually trying to calculate the derivative of $A$ with respect to $B$ in Python using the chain rule, i.e. there exists some $V \in \mathbb{R}^{k \times d}$, and we know
$$ \frac{\partial A}{\partial B} = \frac{\partial A}{\partial V} \cdot \frac{\partial V}{\partial B}$$

Implementing this tensor product would look something like this:

In [18]:
# Initializing Tensors
import numpy as np
n, m, k, d = 4, 3, 2, 6
dAdV = np.random.randn(n, m, k, d)
dVdB = np.random.randn(k, d, k, d)

# Now we will compute the tensor products
dVdB_t = dVdB.T
intermdiary1 = np.matmul([ ])
intermdiary2

(6, 2, 6, 2)


TypeError: Required argument 'b' (pos 2) not found