# Getting Tensor Comprehensions

```shell
$ conda install -y -c pytorch -c tensorcomp tensor_comprehensions
```
Note: Won;t work on your mac, this is my Ubuntu server.

In [1]:
import tensor_comprehensions as tc
import torch

In [2]:
lang = """
def matmul(float(M,N) A, float(N,K) B) -> (output) {
  output(i, j) +=! A(i, kk) * B(kk, j)
}
"""

In [3]:
matmul = tc.define(lang, name="matmul")
mat1, mat2 = torch.randn(3, 4).cuda(), torch.randn(4, 5).cuda()
out = matmul(mat1, mat2)



In [4]:
out

Variable containing:
-0.6847 -2.0118 -1.4697 -0.2940  0.6265
-2.6762  3.0610 -5.1735 -3.3560  0.6988
-0.4217 -2.4664 -2.6840  0.7718 -0.1402
[torch.cuda.FloatTensor of size 3x5 (GPU 0)]

## PyTorch layers in Tensor Comprehensions 

### Use of mapping option

Default Mapping: We provide various default options that can be chosen to closely represent the kernel. The defaults provided are:

- `pointwise, color=red`: if kernel resembles a pointwise operation
- `mlp`: if kernel resembles an Linear layer operation
- `conv`: if kernel resembles a convolution operation
- `group_conv`: if kernel resembles a group convolution operation
- `naive`: if none of the above, then chose naive default <-- This is why we get the warning
<font color='red'>bar</font>

In [5]:
# Specifying mapping options
matmul = tc.define(lang, name="matmul")
mat1, mat2 = torch.randn(100, 400).cuda(), torch.randn(400, 500).cuda()
out2 = matmul(mat1, mat2, options=tc.Options("mlp"))

In [6]:
out2

Variable containing:
 47.1385  26.1682  43.3114  ...  -18.0712 -19.8111   6.4440
-14.8453  23.7215   5.1744  ...  -11.0910 -42.5594  -6.9655
 22.0294  -0.2061  28.8685  ...   -3.6386  10.6883  17.2220
           ...               ⋱              ...            
  7.3602  22.4268  -5.8600  ...   28.8245  43.6945  -4.3526
  3.8995  -6.6547  -0.7039  ...  -10.2438  34.0169   2.0041
 -7.2508 -13.1540  12.9912  ...   -4.8052   5.5984 -16.6713
[torch.cuda.FloatTensor of size 100x500 (GPU 0)]

In [7]:
# Using reduction operators
# providing different input sizes for the same comprehension

matmul = tc.define(lang, name="matmul")
mat1, mat2 = torch.randn(3, 4).cuda(), torch.randn(4, 5).cuda()
out = matmul(mat1, mat2)

# different input sizes
mat3, mat4 = torch.randn(100, 400).cuda(), torch.randn(400, 500).cuda()
out2 = matmul(mat3, mat4)
print(out)
print(out2)

Variable containing:
 1.0609 -4.4141 -0.5586 -0.1936  3.4578
 0.7662  1.1169  0.4982  0.1863  0.2116
-0.5136 -1.8249 -1.7730 -1.2935 -1.5260
[torch.cuda.FloatTensor of size 3x5 (GPU 0)]

Variable containing:
 2.8782e+01 -6.1384e+00  1.2374e+01  ...   2.5832e+01 -6.8648e+00  4.2939e+00
 1.1138e+01 -1.8560e+01  5.5666e+00  ...   2.0697e+01  3.6589e-01 -7.2668e+00
-2.6165e+01  8.2968e+00 -3.1742e+01  ...  -1.0001e+01 -1.1940e+01 -4.2678e+00
                ...                   ⋱                   ...                
-6.7878e+00 -1.5808e+01  1.2729e+01  ...   4.0239e+00  3.8240e+01  4.7867e+00
 7.6880e+00  6.0249e-01  1.7772e+01  ...  -9.8221e+00  1.1662e+01  7.4712e+00
 1.9437e+00  3.7990e+01  3.3812e+00  ...   9.0278e+00 -2.3637e+01  1.8666e+01
[torch.cuda.FloatTensor of size 100x500 (GPU 0)]



#### Multiple TC definitions

Let’s say you want to define all of your TCs in one string and later use that string for running different operations defined in the string. You an do so easily. You can define a <font color='blue'>lang</font> variable that holds the TC definition for all your operations. Every time you want to run a different operation, you can make a <font color='blue'>tc.define</font> call on the <font color='blue'>lang</font> variable, specify the <font color='blue'>name</font> corresponding to the operation definition and get the TC layer for it. Below is an example for how to do this:

In [8]:
lang = """
def matmul(float(M,N) A, float(N,K) B) -> (output) {
  output(i, j) +=! A(i, kk) * B(kk, j)
}
def abs(float(M, N) A) -> (O1) {
  O1(m, n) = fabs(A(m, n))
}
"""
matmul = tc.define(lang, name="matmul")
mat1, mat2 = torch.randn(3, 4).cuda(), torch.randn(4, 5).cuda()
out = matmul(mat1, mat2)

abs = tc.define(lang, name="abs")
A = torch.randn(3, 4).cuda()
out = abs(A)



In [9]:
out

Variable containing:
 0.2958  0.0401  0.8931  2.1679
 1.2662  0.9898  0.6153  0.2392
 0.7058  0.0430  0.0851  0.8811
[torch.cuda.FloatTensor of size 3x4 (GPU 0)]