# Tutorial 1

In this notebook, I summarize different small examples to clarify the tricky points of PyTorch

In [1]:
import torch

TOC 

1. Shape changing

   - 1.1. The `unsqeeze` function
   - 1.2. Shape change for broadcasting operations
   - 1.3. Stacking tensors
   - 1.4. Concatenating tensors
   - 1.5. Slicing operations
   - 1.6. Permuting tensors
  
2. Basic operations

   - 2.1. Batch arithmetics
   - 2.2. Einsum notation
   - 2.3. Broadcasting matrix calculations
   
3. Automatic differentiation

   - 3.1. Simple 1D example
   - 3.2. 2D example
   - 3.3. More interesting case
   - 3.4. Computing Hessian

4. Misc functions

   - 4.1. Safely extract the key-value pairs from dictionaries
   - 4.2. Generating tensors

## 1. Shape changing

### 1.1. The `unsqeeze` function

Here is how it works

In [2]:
# Create a tensor
x = torch.tensor([1, 2, 3])
print("Original tensor:", x)
print("Shape:", x.shape)

# Use unsqueeze to add a dimension at position 0
x_unsqueeze_0 = torch.unsqueeze(x, 0)
print("unsqueeze(x, 0):", x_unsqueeze_0)
print("Shape:", x_unsqueeze_0.shape)

# Use unsqueeze to add a dimension at position 1
x_unsqueeze_1 = torch.unsqueeze(x, 1)
print("unsqueeze(x, 1):", x_unsqueeze_1)
print("Shape:", x_unsqueeze_1.shape)

Original tensor: tensor([1, 2, 3])
Shape: torch.Size([3])
unsqueeze(x, 0): tensor([[1, 2, 3]])
Shape: torch.Size([1, 3])
unsqueeze(x, 1): tensor([[1],
        [2],
        [3]])
Shape: torch.Size([3, 1])


### 1.2. Shape change for broadcasting operations

Consider two tensors, of different rank (shape). It is not possible to divide one by the other becasue the 
number of elements in the far-most dimension (axes) is different for the two tensors. For A is it 4 (for the axis = 2)
and for the tenor B it is 4 (axis = 0). Since the alignment of operations happens in the far-most dimensions first,
this won't work. Moreover, this is not what we actually want. We do want to align the axis = 0 of the tensor B with the 
axis = 0 of the tensor A and do the broadcasting for the remaining dimensions. 

To do this, we need to change the shape of the tensor B to match that of the A. In doing so, the dimensions for which the 
broadcasting will be done will have size of one (so-called singletons). 

Here are several ways of how to do it:

* Using `None`

* Using `unsqueeze(-1)` to expand in the last dimension. We do this repetitively.

* Using `unsquieeze(n)` to expand along the dimension n. We do this twice, but with different numbers.

* Using `view` to explicitly reshape the tensor to the expected shape. In the first form, we explicitly

  define the sizes along each axis. In the second form, we first create suitable tuple and pass it there.

In [3]:
A = torch.randn(2, 4, 4)      # shape [2, 4, 4]
B = torch.tensor([2.0, 4.0])  # shape [2]

print(A)
print(B)

print("=====Case 1=======")
C = A / B[:, None, None]      # shape [2, 4, 4]
print(C)

print("=====Case 2=======")
C = A/ B.unsqueeze(-1).unsqueeze(-1)
print(C)

print("=====Case 3=======")
C = A/ B.unsqueeze(1).unsqueeze(2)
print(C)

print("=====Case 4=======")
C = A/B.view(2,1,1)
print(C)

x = [2]; x.extend([1 for _ in range(2)])
x = tuple(x)
print("=====Case 5=======")
C = A/B.view(x)
print(C)

tensor([[[-1.8173,  1.7630, -0.9214,  0.6545],
         [ 0.4654, -0.0473,  0.4240,  0.1103],
         [ 0.0247, -0.9844, -0.7084,  0.8355],
         [-0.5720,  0.0171, -1.1509,  1.2556]],

        [[-0.6683, -1.7170, -0.6805, -1.7154],
         [-0.5060, -1.0532, -1.6286,  1.2230],
         [ 1.5697, -0.6841,  0.4466,  0.7155],
         [-0.1470, -0.3523, -0.4768, -0.2535]]])
tensor([2., 4.])
tensor([[[-0.9086,  0.8815, -0.4607,  0.3272],
         [ 0.2327, -0.0236,  0.2120,  0.0552],
         [ 0.0123, -0.4922, -0.3542,  0.4178],
         [-0.2860,  0.0086, -0.5754,  0.6278]],

        [[-0.1671, -0.4293, -0.1701, -0.4288],
         [-0.1265, -0.2633, -0.4071,  0.3058],
         [ 0.3924, -0.1710,  0.1117,  0.1789],
         [-0.0367, -0.0881, -0.1192, -0.0634]]])
tensor([[[-0.9086,  0.8815, -0.4607,  0.3272],
         [ 0.2327, -0.0236,  0.2120,  0.0552],
         [ 0.0123, -0.4922, -0.3542,  0.4178],
         [-0.2860,  0.0086, -0.5754,  0.6278]],

        [[-0.1671, -0.4293, -0.17

Analogously, we can use the `reshape` function:

* It returns a new tensor and doesn't affect the original one

* Use `-1` to automatically determine the size of the missing dmensions

In [4]:
x = torch.rand(2, 2, 2)
print("x =", x, x.shape)

z = x.reshape( 4, 2, 1 )
print("z =", z, z.shape)
print("x =", x, x.shape)

z = x.reshape(8,1,-1)
print("z =", z, z.shape)

z = x.reshape(-1,1,8)
print("z =", z, z.shape)

x = tensor([[[0.1093, 0.5209],
         [0.1620, 0.1344]],

        [[0.9223, 0.9554],
         [0.6976, 0.6609]]]) torch.Size([2, 2, 2])
z = tensor([[[0.1093],
         [0.5209]],

        [[0.1620],
         [0.1344]],

        [[0.9223],
         [0.9554]],

        [[0.6976],
         [0.6609]]]) torch.Size([4, 2, 1])
x = tensor([[[0.1093, 0.5209],
         [0.1620, 0.1344]],

        [[0.9223, 0.9554],
         [0.6976, 0.6609]]]) torch.Size([2, 2, 2])
z = tensor([[[0.1093]],

        [[0.5209]],

        [[0.1620]],

        [[0.1344]],

        [[0.9223]],

        [[0.9554]],

        [[0.6976]],

        [[0.6609]]]) torch.Size([8, 1, 1])
z = tensor([[[0.1093, 0.5209, 0.1620, 0.1344, 0.9223, 0.9554, 0.6976, 0.6609]]]) torch.Size([1, 1, 8])


We can expand the tensor along multiple axes in the following ways:

In [5]:
z = x.reshape(2,2,2,*[1]*3)
print("z = ", z, z.shape)

z = x.reshape(2,2,2,-1)
print("z = ", z, z.shape)

z =  tensor([[[[[[0.1093]]],


          [[[0.5209]]]],



         [[[[0.1620]]],


          [[[0.1344]]]]],




        [[[[[0.9223]]],


          [[[0.9554]]]],



         [[[[0.6976]]],


          [[[0.6609]]]]]]) torch.Size([2, 2, 2, 1, 1, 1])
z =  tensor([[[[0.1093],
          [0.5209]],

         [[0.1620],
          [0.1344]]],


        [[[0.9223],
          [0.9554]],

         [[0.6976],
          [0.6609]]]]) torch.Size([2, 2, 2, 1])


### 1.3. Stacking tensors

In [6]:
a = torch.rand([2,2])
print(a)
print(a.shape)
b = torch.stack([a,a])
print(b)
print(b.shape)

tensor([[0.3352, 0.4766],
        [0.5680, 0.5764]])
torch.Size([2, 2])
tensor([[[0.3352, 0.4766],
         [0.5680, 0.5764]],

        [[0.3352, 0.4766],
         [0.5680, 0.5764]]])
torch.Size([2, 2, 2])


### 1.4. Concatenating tensors

In [7]:
a = torch.rand([2,2])
print(a)
print(a.shape)
b = torch.cat([a,a], dim=0)
print(b)
print(b.shape)
c = torch.cat([a,a], dim=1)
print(c)
print(c.shape)

tensor([[0.3205, 0.3908],
        [0.4805, 0.2954]])
torch.Size([2, 2])
tensor([[0.3205, 0.3908],
        [0.4805, 0.2954],
        [0.3205, 0.3908],
        [0.4805, 0.2954]])
torch.Size([4, 2])
tensor([[0.3205, 0.3908, 0.3205, 0.3908],
        [0.4805, 0.2954, 0.4805, 0.2954]])
torch.Size([2, 4])


### 1.5. Slicing operations

In [8]:
a = torch.zeros(2,2,2)
cnt = 0
for i in range(2):
    for j in range(2):
        for k in range(2):
            a[i,j,k] = cnt
            cnt += 1
print(a)

tensor([[[0., 1.],
         [2., 3.]],

        [[4., 5.],
         [6., 7.]]])


Slicing left-to-right works as expected

In [9]:
print(a[0])
print(a[0,:,:])

tensor([[0., 1.],
        [2., 3.]])
tensor([[0., 1.],
        [2., 3.]])


In [10]:
print(a[0,0])
print(a[0,0,:])

tensor([0., 1.])
tensor([0., 1.])


In [11]:
print(a[0,0,0])

tensor(0.)


But going the other way is not:

In [12]:
print(a[:, 0])
print(a[:, :, 0])

tensor([[0., 1.],
        [4., 5.]])
tensor([[0., 2.],
        [4., 6.]])


Let’s assume:

```a.shape == (nbatch, ntraj, ndof)  # 3D tensor```

✅ a[:, 0]

    Means: Take all nbatch, pick ntraj=0, keep all ndof

    Result shape: (nbatch, ndof)

✅ a[:, :, 0]

    Means: Take all nbatch, all ntraj, pick ndof=0

    Result shape: (nbatch, ntraj)

The correct way to extract what we need would be this:

In [13]:
print(a)
print(a[..., 0])
print(a[..., 0, 0])

tensor([[[0., 1.],
         [2., 3.]],

        [[4., 5.],
         [6., 7.]]])
tensor([[0., 2.],
        [4., 6.]])
tensor([0., 4.])


### 1.6. Permuting tensors

In [14]:
x = torch.rand(1, 2, 3)
print("x = ", x, x.shape)

y = x.permute(0, 1, 2)
print("identity permutation = ", y, y.shape)

y = x.permute(0, 2, 1)
print("0->0, 1->2, 2->1 permutation = ", y, y.shape)

y = x.permute(1, 0, 2)
print("0->1, 1->0, 2->2 permutation = ", y, y.shape)

x =  tensor([[[0.0868, 0.0858, 0.0277],
         [0.8058, 0.9812, 0.5894]]]) torch.Size([1, 2, 3])
identity permutation =  tensor([[[0.0868, 0.0858, 0.0277],
         [0.8058, 0.9812, 0.5894]]]) torch.Size([1, 2, 3])
0->0, 1->2, 2->1 permutation =  tensor([[[0.0868, 0.8058],
         [0.0858, 0.9812],
         [0.0277, 0.5894]]]) torch.Size([1, 3, 2])
0->1, 1->0, 2->2 permutation =  tensor([[[0.0868, 0.0858, 0.0277]],

        [[0.8058, 0.9812, 0.5894]]]) torch.Size([2, 1, 3])


## 2. Basic operations

### 2.1. Batch arithmetics

In [15]:
a = torch.zeros(3,2)  # 3 trajectories with 2 DOFs
b = torch.zeros(2)    # 1 point with 2 DOF

print(a, b)
a[0,0], a[0,1] = 1, -1  # 0-th traj 
a[1,0], a[1,1] = 0,  1  # 1-st traj
a[2,0], a[2,1] = 2,  0  # 2-nd traj
print(a.shape)

b[0], b[1] = 0, 1
print(a, b)

tensor([[0., 0.],
        [0., 0.],
        [0., 0.]]) tensor([0., 0.])
torch.Size([3, 2])
tensor([[ 1., -1.],
        [ 0.,  1.],
        [ 2.,  0.]]) tensor([0., 1.])


In [16]:
print(a - 2*b)   # repeat for each trajectory (dimension 0 is broadcasted, since the alignment
                 # is done for the far-most dimensions (axis = 1 for a and axis = 0 for b)

tensor([[ 1., -3.],
        [ 0., -1.],
        [ 2., -2.]])


One can explicitly change the shape of b with the `view` function to make it consistent in shape with `a`

In [17]:
print(b.view(1,2))
print(a - 2*b.view(1,2))

tensor([[0., 1.]])
tensor([[ 1., -3.],
        [ 0., -1.],
        [ 2., -2.]])


But in this case, we are fine since the broadcasting works well.

So, for instance, we can also square all the elements:

In [18]:
print( (a - 2*b)**2 )  # repeat for each trajectory

tensor([[1., 9.],
        [0., 1.],
        [4., 4.]])


Now, we can compute producs or sums along different axes:

In [19]:
prd = torch.prod( (a - 2*b)**2, 1,  True) # product over all DOFs, keep dimension
print( prd.shape )
print( prd )  

torch.Size([3, 1])
tensor([[ 9.],
        [ 0.],
        [16.]])


In [20]:
prd = torch.prod( (a - 2*b)**2, 1,  False)  # product over all DOFs
                                            # Here, we don't keep dimension, so the 
                                            # rank of the resulting tensor is reduced
print(prd.shape)
print(prd)

torch.Size([3])
tensor([ 9.,  0., 16.])


Likewise, we can do a summation over different axes while keeping the dimension or not:

In [21]:
print(a.shape)
print(a)

print(a.sum(1, True))
print(torch.sum(a, 1, True))

print(a.sum(0, True))
print(torch.sum(a, 0, True))

torch.Size([3, 2])
tensor([[ 1., -1.],
        [ 0.,  1.],
        [ 2.,  0.]])
tensor([[0.],
        [1.],
        [2.]])
tensor([[0.],
        [1.],
        [2.]])
tensor([[3., 0.]])
tensor([[3., 0.]])


Or a slightly more complicated result, but more useful in the computational chemistry context:

In [22]:
print (torch.sum(torch.prod( (a - 2*b)**2, 1,  False) ) )  # reduce dimension and sum over all trajectories

tensor(25.)


### 2.2. Einsum notation

Consider 4 points with 2-dimensional vectors.

Let's say for each point we want to compute the outer products of the corresponding vectors (so 2 x 2 matrix).

It is very convenient to use the ellipses `...` notaton for the dimensions we don't care, but we want to 
broadcast to

This can be done this way:

In [23]:
psi = torch.tensor([ [1.0, 1.0], 
                      [-1.0, 2.0], 
                      [1.0, 0.0], 
                      [-1.0, 0.0] ])
print(psi.shape)

outer = torch.einsum("...i, ...j -> ...ij", psi, psi)

print("outer = ", outer, outer.shape)

torch.Size([4, 2])
outer =  tensor([[[ 1.,  1.],
         [ 1.,  1.]],

        [[ 1., -2.],
         [-2.,  4.]],

        [[ 1.,  0.],
         [ 0.,  0.]],

        [[ 1., -0.],
         [-0.,  0.]]]) torch.Size([4, 2, 2])


or we could have reshaped the input:

In [24]:
psi2 = psi.permute(1,0)
print(psi2, psi2.shape)

outer = torch.einsum("i..., j... -> ij...", psi2, psi2)
print("outer = ", outer, outer.shape)

outer2 = outer.permute(2, 0, 1)
print("outer2 = ", outer2, outer2.shape)

tensor([[ 1., -1.,  1., -1.],
        [ 1.,  2.,  0.,  0.]]) torch.Size([2, 4])
outer =  tensor([[[ 1.,  1.,  1.,  1.],
         [ 1., -2.,  0., -0.]],

        [[ 1., -2.,  0., -0.],
         [ 1.,  4.,  0.,  0.]]]) torch.Size([2, 2, 4])
outer2 =  tensor([[[ 1.,  1.],
         [ 1.,  1.]],

        [[ 1., -2.],
         [-2.,  4.]],

        [[ 1.,  0.],
         [ 0.,  0.]],

        [[ 1., -0.],
         [-0.,  0.]]]) torch.Size([4, 2, 2])


### 2.3. Broadcasting matrix operations

In this example, we have 3 grid points, each containing 1 degree of freedom, so the input is a 3 x 1 tensor.

We want to compute a 2 x 2 matrix for each of the point such that the diagonal elements of the matrix are set to
the value and the squared value of the first component of the corresponding point.

Here, we use the ellipse `...` notation to refer to any possible dimensions not explicitly considered

In [25]:
def x2(x):
    # x[npoints, ndims]
    print("x.shape = ", x.shape)
    v = torch.zeros((*x.shape[:-1], 2, 2), dtype=torch.float)
    print("v.shape = ", v.shape)
    v[..., 0, 0] = x[..., 0]; 
    v[..., 1, 1] = x[..., 0]**2
    return v

q = torch.tensor([ [1], [2], [3] ])

Vmat = x2(q)

print(q)
print(Vmat)

x.shape =  torch.Size([3, 1])
v.shape =  torch.Size([3, 2, 2])
tensor([[1],
        [2],
        [3]])
tensor([[[1., 0.],
         [0., 1.]],

        [[2., 0.],
         [0., 4.]],

        [[3., 0.],
         [0., 9.]]])


## 3. Automatic differentiation

### 3.1. Simple 1D example

Here, we compute the derivative of the quadratic function:

$$ \frac{d}{dx} x^2 = 2 x $$

and use this derivative to advance the $x$ variable.

Note:
* the function that we use to generate the tensor that will be differentiated later should retern a "scalar" (single value)
  
* if we want to differentiate wrt `x` variable, it should have the`requires_grad=True` property

In [26]:
def mysq(x):
    return torch.sum(x**2)

x = torch.tensor([1.0], requires_grad=True)
for i in range(3):
    y = mysq(x)
    [z] = torch.autograd.grad(y, x)
    print(x, y, z)
    x = x + z

tensor([1.], requires_grad=True) tensor(1., grad_fn=<SumBackward0>) tensor([2.])
tensor([3.], grad_fn=<AddBackward0>) tensor(9., grad_fn=<SumBackward0>) tensor([6.])
tensor([9.], grad_fn=<AddBackward0>) tensor(81., grad_fn=<SumBackward0>) tensor([18.])


### 3.2. 2D example 

In [27]:
def mysq(x,y):
    return torch.sum(x**2 + y**2)

x = torch.tensor([1.0], requires_grad=True)
y = torch.tensor([-1.0], requires_grad=True)

f = mysq(x, y)
[dfdx, dfdy] = torch.autograd.grad(f, [x, y])
print(x)
print(y)
print(f)
print(dfdx)
print(dfdy)

tensor([1.], requires_grad=True)
tensor([-1.], requires_grad=True)
tensor(2., grad_fn=<SumBackward0>)
tensor([2.])
tensor([-2.])


### 3.3. More interesting case

Now, imagine we have a function that takes the `q` tensor with the last dimension representing 
different DOFs, e.g. `x = q[:, 0]` and y = `q[:, 1]`, for instance 

$$f = x^2 y$$

$$\frac{df}{dx} = 2 x y$$
$$\frac{df}{dy} = x^2$$

In [28]:
def my_func1(q):
    # Note that we use the [...] operator to make it work for whatever number
    # of dimensions we have beofre
    return q[...,0]**2 * q[...,1]

Now, let's say we need the derivatives of this function at a number of points (either for plotting PES or for different 
trajectories):

In [29]:
q = torch.tensor([ [0.0, 0.5],
                   [-1.0, 1.0],
                   [2.0, -2.0],
                  ], requires_grad=True
                )

Note that since the function returns a tensor for outer dimensions, we 
need to sum everything up first:

In [30]:
z = torch.sum(my_func1(q))
print(q)
print(z)

[der1] = torch.autograd.grad(z, [ q ]);
print(der1)

tensor([[ 0.0000,  0.5000],
        [-1.0000,  1.0000],
        [ 2.0000, -2.0000]], requires_grad=True)
tensor(-7., grad_fn=<SumBackward0>)
tensor([[ 0.,  0.],
        [-2.,  1.],
        [-8.,  4.]])


But what if we have two batches of trajectories:

In [31]:
q2 = torch.stack([q,q])
q2.shape
print(q2)

z2 = torch.sum(my_func1(q2))
print(z2)

[der1_2] = torch.autograd.grad(z2, [ q2 ]);
print(der1_2)

tensor([[[ 0.0000,  0.5000],
         [-1.0000,  1.0000],
         [ 2.0000, -2.0000]],

        [[ 0.0000,  0.5000],
         [-1.0000,  1.0000],
         [ 2.0000, -2.0000]]], grad_fn=<StackBackward0>)
tensor(-14., grad_fn=<SumBackward0>)
tensor([[[ 0.,  0.],
         [-2.,  1.],
         [-8.,  4.]],

        [[ 0.,  0.],
         [-2.,  1.],
         [-8.,  4.]]])


### 3.4. Computing Hessian

First, define the function

In [32]:
def my_func2(q, params):
    """
    q[..., itraj, idof]
    k[idof]
    q_min[idof]
    """
    k, q_min = params["k"], params["q_min"]

    ntraj, ndof = q.shape[-2], q.shape[-1]  # taking the last two dimensions
    res = torch.zeros(())
    for n in range(ntraj):
        for i in range(ndof):
            res = res + k[i] * (q[..., n, i] - q_min[i])**2 + k[i]**2 * q[..., n, i]**3 - k[i] * q[..., n, 0] * q[..., n, 1]
    return res

Test the function:

In [33]:
q = torch.tensor([ [0.0, 0.0],
                   [1.0, 0.0],
                   [-1.0, 1.0]
                 ], requires_grad=True)
q_min = torch.tensor( [ 0.0, 0.5] )
k = torch.tensor( [1.0, 2.0])
params = {"k":k, "q_min":q_min}

f = my_func2(q, params)
print(f)

tensor(10.5000, grad_fn=<SubBackward0>)


Now define the function to compute the gradients:

In [34]:
def compute_derivatives(q, function, function_params):
    ntraj, ndof = q.shape[-2], q.shape[-1]

    # Compute the function itself
    f = function(q, function_params)

    # Compute the first gradients
    [grad] = torch.autograd.grad(f, q, create_graph=True)

    return f, grad

And compute them

In [35]:
f, grad = compute_derivatives(q, my_func2, params)

print(f)
print(grad)

tensor(10.5000, grad_fn=<SubBackward0>)
tensor([[ 0., -2.],
        [ 5., -5.],
        [-2., 17.]], grad_fn=<AddBackward0>)


Now define the function to compute the Hessian:

In [36]:
def compute_derivatives_hess(q, function, function_params):
    ntraj, ndof = q.shape[-2], q.shape[-1]

    # Compute the function itself
    f = function(q, function_params)

    # Compute the first gradients
    [grad] = torch.autograd.grad(f, q, create_graph=True)
    #print(grad.shape)

    # Compute the second gradients
    hess = torch.zeros( (ntraj, ndof, ndof) )
    for k in range(ntraj):
        for i in range(ndof):
            [ d2f ] = torch.autograd.grad( grad[k, i], q, create_graph=True, retain_graph=True)
            #print(d2f.shape)
            hess[k, i, :] = d2f[k, :]

    return f, grad, hess

Now compute all, including Hessian:

In [37]:
f, grad, hess = compute_derivatives_hess(q, my_func2, params)

print("f = ", f)
print("grad = ", grad)
print("hess = ", hess)

f =  tensor(10.5000, grad_fn=<SubBackward0>)
grad =  tensor([[ 0., -2.],
        [ 5., -5.],
        [-2., 17.]], grad_fn=<AddBackward0>)
hess =  tensor([[[ 2., -3.],
         [-3.,  4.]],

        [[ 8., -3.],
         [-3.,  4.]],

        [[-4., -3.],
         [-3., 28.]]], grad_fn=<CopySlices>)


**Exercise 1**: Generalize the above differentiation functions to the case of multiple batches

## 4. Misc functions

### 4.1. Safely extract the key-value pairs from dictionaries

If a keyword doesn't exist in the dictionary, trying to extract the corresponding value from the 
dictionary will lead to an error, so one can use the `get` function to define the default value.
Let's see how it works:

In [38]:
def set_vars(prms):
    x = prms.get("x", 2.0)
    y = prms.get("y", 0.0)
    print(x,y)

As you can see below:
- the value of the `x` variable is extracted from the input dictionary
- the value of the `y` variable is not defined in the dictionary, so the default value is used

Also, not that the input dictionary `prms` isn't modified by the `get` function

In [39]:
# Now use it
prms = {"x":-1.0}
print(prms)
set_vars(prms)
print(prms)

{'x': -1.0}
-1.0 0.0
{'x': -1.0}


### 4.2. Generating tensors

In [40]:
torch.tensor([[-2]]*5)

tensor([[-2],
        [-2],
        [-2],
        [-2],
        [-2]])

In [41]:
torch.tensor([[[-2]]*2]*2)

tensor([[[-2],
         [-2]],

        [[-2],
         [-2]]])