The following additional libraries are needed to run this
notebook. Note that running on Colab is experimental, please report a Github
issue if you have any problem.

In [None]:
!pip install https://tvm-repo.s3-us-west-2.amazonaws.com/cuda10.0-llvm6.0/tvm-0.6.dev0-cp36-cp36m-linux_x86_64.whl
!pip install https://tvm-repo.s3-us-west-2.amazonaws.com/cuda10.0-llvm6.0/topi-0.6.dev0-py3-none-any.whl
!pip install git+https://github.com/d2l-ai/d2l-tvm


# Matrix Multiplication

Matrix Multiplication is one of the most widely operators in scientific computing. Let's implement its computation in this chapter.

Given $A\in\mathbb R^{n\times l}$, and $B \in\mathbb R^{l\times m}$, if $C=AB$ then $C \in\mathbb R^{n\times m}$ and

$$C_{i,j} = \sum_{k=1}^l A_{i,k} B_{k,j}.$$

The elements assessed to compute $C_{i,j}$ are illustrated in :numref:`fig_matmul_default`.

![Compute $C_{x,y}$ in matrix multiplication.](http://tvm.d2l.ai/_images/matmul_default.svg)

:label:`fig_matmul_default`


The following function returns the computing expression of matrix multiplication.

In [1]:
import d2ltvm
import numpy as np
import tvm

# Save to the d2ltvm package
def matmul(n, m, l):
    """Return the computing expression of matrix multiplication
    A : n x l matrix
    B : l x m matrix
    C : n x m matrix with C = A B
    """
    k = tvm.reduce_axis((0, l), name='k')
    A = tvm.placeholder((n, l), name='A')
    B = tvm.placeholder((l, m), name='B')
    C = tvm.compute((n, m),
                    lambda x, y: tvm.sum(A[x, k] * B[k, y], axis=k),
                    name='C')
    return A, B, C

Let's compile a module for a square matrix multiplication.

In [2]:
n = 100
A, B, C = matmul(n, n, n)
s = tvm.create_schedule(C.op)
mod = tvm.build(s, [A, B, C])

And then verify the results. Note that NumPy may uses multi-threading to accelerate its computing, which may result in slightly different results due to numerical precision. There we use `assert_allclose` with a relative large tolerant error to test the correctness.

In [3]:
a, b, c = d2ltvm.get_abc((100, 100), tvm.nd.array)
mod(a, b, c)
np.testing.assert_allclose(np.dot(a.asnumpy(), b.asnumpy()),
                           c.asnumpy(), atol=1e-5)

## Summary