# PyTorch Tensors:
Tensors are the central data abstration in PyTorch. This interactive notebook provides an in-depth introduction to the `torch.Tensor` class.

First, we import the PyTorch modules. We also import Python's math module for some comparative examples:

In [1]:
import torch
import math

## Creating Tensors:
The simplest way to create tensors is with the `torch.empty()` call. This creates an empty tensor with the specified dimensions:

In [2]:
x = torch.empty(3, 4)

print (type(x))
print (x)

<class 'torch.Tensor'>
tensor([[0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00],
        [0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00],
        [2.8026e-45, 0.0000e+00, 1.1210e-44, 0.0000e+00]])


Let's unpack the code:
- We created a tensor using one of the factory methods attached to the `torch` module.
- The tensor itself is 2 dimensional, 3 rows and 4 columns.
- The type of object returned in `torch.Tensor`, which is an alias for `torch.FloatTensor`; by default, Pytorch tensors are populated by 32-bit floating point numbers.
- While the values may sometimes differ, it is because the `torch.empty()` call only allocates memory for the tensor and not initialize them with any values; values or non-values are simply whatever was in memory at time of allocation.

A brief note about tensors, their number of dimensions, and terminology:
- A 1-dimensional tensor is sometimes called a _vector_.
- A 2-dimensional tensor is sometimes called a _matrix_.
- Anything with more than 2 dimensions are generally called _tensors_.

More often than not, tensors are initialized with some values. We can itialize them with different values with different calls:

In [3]:
zeros = torch.zeros(3, 4)

ones = torch.ones(3, 4)

torch.manual_seed(1729)
rand = torch.rand(3, 4)

## Zeros matrix
print (f"This is a zeros matrix:")
print (zeros)
print ("\n")

print (f"This is a ones matrix:")
print (ones)
print ("\n")

print (f"This is a random matrix:")
print (rand)
print ("\n")

identity = torch.eye(4)

print (f"This is an identity matrix:")
print (identity)
print ("\n")

This is a zeros matrix:
tensor([[0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.]])


This is a ones matrix:
tensor([[1., 1., 1., 1.],
        [1., 1., 1., 1.],
        [1., 1., 1., 1.]])


This is a random matrix:
tensor([[0.3126, 0.3791, 0.3087, 0.0736],
        [0.4216, 0.0691, 0.2332, 0.4047],
        [0.2162, 0.9927, 0.4128, 0.5938]])


This is an identity matrix:
tensor([[1., 0., 0., 0.],
        [0., 1., 0., 0.],
        [0., 0., 1., 0.],
        [0., 0., 0., 1.]])




The factory methods do as we expect, `torch.zeros` and `torch.ones` initialize matrices with zeros and ones with the specified shape respectively. `torch.rand` populates the matrix with random numbers bewteen 0 and 1. `torch.eye` takes in one integer, k, as a parameter and returns a kxk identity matrix.

For `torch.rand` we can also specify seeds to facilitate reproductibility:

In [4]:
torch.manual_seed(1234)
r1 = torch.rand(4, 4)

r2 = torch.rand(4, 4)

torch.manual_seed(1234)
r3 = torch.rand(4, 4)

r4 = torch.rand(4, 4)

print ("This is a random 4x4 matrix with seed = 1234:")
print (r1,'\n')

print ("This is a random 4x4 matrix with no given seed:")
print (r2,'\n')

print ("This is another matrix but with the same seed = 1234:")
print (r3,'\n')

print ("This is another matrix without the seed:")
print (r4,'\n')

This is a random 4x4 matrix with seed = 1234:
tensor([[0.0290, 0.4019, 0.2598, 0.3666],
        [0.0583, 0.7006, 0.0518, 0.4681],
        [0.6738, 0.3315, 0.7837, 0.5631],
        [0.7749, 0.8208, 0.2793, 0.6817]]) 

This is a random 4x4 matrix with no given seed:
tensor([[0.2837, 0.6567, 0.2388, 0.7313],
        [0.6012, 0.3043, 0.2548, 0.6294],
        [0.9665, 0.7399, 0.4517, 0.4757],
        [0.7842, 0.1525, 0.6662, 0.3343]]) 

This is another matrix but with the same seed = 1234:
tensor([[0.0290, 0.4019, 0.2598, 0.3666],
        [0.0583, 0.7006, 0.0518, 0.4681],
        [0.6738, 0.3315, 0.7837, 0.5631],
        [0.7749, 0.8208, 0.2793, 0.6817]]) 

This is another matrix without the seed:
tensor([[0.2837, 0.6567, 0.2388, 0.7313],
        [0.6012, 0.3043, 0.2548, 0.6294],
        [0.9665, 0.7399, 0.4517, 0.4757],
        [0.7842, 0.1525, 0.6662, 0.3343]]) 



As we can see, values of r1 and r3, and r2 and r4 are the same. Manually resetting the seed before r3 to that it was in r1 makes it so that following computations return identical results. 

More information on PyTorch reproducibility is available [here](https://pytorch.org/docs/stable/notes/randomness.html).

## Tensor Shape:
Often when performing operations on tensors, we need it to be of compatible _shape_ - that is having the same shape for additions and subtractions, or the right dimensions for matrix multiplications or dot products. To create matrices of the same shape, we have the `torch.*_like()` method.

In [5]:
x = torch.empty(2, 2, 3)
print (x.shape)
print (x)

empty_like_x = torch.empty_like(x)
print (empty_like_x.shape)
print (empty_like_x)

zeros_like_x = torch.zeros_like(x)
print (zeros_like_x.shape)
print (zeros_like_x)

ones_like_x = torch.ones_like(x)
print (ones_like_x.shape)
print (ones_like_x)

rand_like_x = torch.rand_like(x)
print (rand_like_x.shape)
print (rand_like_x)


torch.Size([2, 2, 3])
tensor([[[0., 0., 0.],
         [0., 0., 0.]],

        [[0., 0., 0.],
         [0., 0., 0.]]])
torch.Size([2, 2, 3])
tensor([[[0., 0., 0.],
         [0., 0., 0.]],

        [[0., 0., 0.],
         [0., 0., 0.]]])
torch.Size([2, 2, 3])
tensor([[[0., 0., 0.],
         [0., 0., 0.]],

        [[0., 0., 0.],
         [0., 0., 0.]]])
torch.Size([2, 2, 3])
tensor([[[1., 1., 1.],
         [1., 1., 1.]],

        [[1., 1., 1.],
         [1., 1., 1.]]])
torch.Size([2, 2, 3])
tensor([[[0.7893, 0.3216, 0.5247],
         [0.6688, 0.8436, 0.4265]],

        [[0.9561, 0.0770, 0.4108],
         [0.0014, 0.5414, 0.6419]]])


In the above cell, we first defined the shape of an empty tensor `x`. We then used `torch.empty_like(x)`, `torch.zeros_like(x)`,  `torch.ones_like(x)` and `torch.rand_like(x)` to make tensors with the same dimensions.

Another way to create tensors is to specify the data directly from a PyTorch collection:

In [6]:
some_constants = torch.tensor([[3.1415926, 2.71828], [1.61803, 0.0072897]])
print (some_constants)

some_integers = torch.tensor((2, 3, 5, 7, 11, 13, 17, 19))
print (some_integers)

more_integers = torch.tensor(((2, 4, 6), [3, 6, 9]))
print (more_integers)

tensor([[3.1416, 2.7183],
        [1.6180, 0.0073]])
tensor([ 2,  3,  5,  7, 11, 13, 17, 19])
tensor([[2, 4, 6],
        [3, 6, 9]])


Using `torch.tensor()` is the most straightforward way to create a tensor if we already have a Python tuple or list.

Note: `torch.tensor` creates a copy of the data. It does not replace the data.

## Tensor Data Types:
Setting the tensor datatype is possible in a number of ways:

In [7]:
a = torch.ones((2, 3), dtype=torch.int16)

b = torch.rand((2, 3), dtype=torch.float64) * 20

c = b.to(torch.int32)

d = torch.rand(2, 3)

print ("Specify datatype as 16 bit integer:")
print (a)
print (a.dtype)

print ("\nSpecify datatype as 64 bit floating point number:")
print (b)
print (b.dtype)

print ("\nSpecify datatype as conversion to 32 bit integer:")
print (c)
print (c.dtype)

print ("\nDefault datatype (32 bit floating point number):")
print (d)
print (d.dtype)


Specify datatype as 16 bit integer:
tensor([[1, 1, 1],
        [1, 1, 1]], dtype=torch.int16)
torch.int16

Specify datatype as 64 bit floating point number:
tensor([[7.6155, 7.0313, 1.4186],
        [0.4486, 9.4176, 6.3867]], dtype=torch.float64)
torch.float64

Specify datatype as conversion to 32 bit integer:
tensor([[7, 7, 1],
        [0, 9, 6]], dtype=torch.int32)
torch.int32

Default datatype (32 bit floating point number):
tensor([[0.1100, 0.2541, 0.4333],
        [0.4451, 0.4966, 0.7865]])
torch.float32


The simplest way to set datatypes is with an optional argument at creation time. In `a` we define `dtype=torch.int16`, resulting in the values being `1` instead of `1.`.

While we group the dimensions of the tensor as a tuple, this is not strictly necessary. PyTorch takes the series of unlabeled integers as arguments for tensor shape, but adding brackets as formatting facilitates reading.

Another way to set the datatype is with the `.to()` method. In the cell above, we create a random 64-bit float tensor `b` and convert the values to 32 bit integers with the `.to()` method.

If unspecified, PyTorch sets the datatypes to 32-bit float as in `d`.

Available datatypes include:
- `torch.bool`
- `torch.int8`
- `torch.int16`
- `torch.int32`
- `torch.int64`
- `torch.half`
- `torch.float`
- `torch.double`
- `torch.bfloat`

## Math and Logic with PyTorch Tensors
After creating tensors, we need to manipulate them. For demonstration, we create some tensors using operations:

In [8]:
ones = torch.zeros(2,2) + 1
twos = torch.ones(2,2) * 2
threes = (torch.ones(2,2) * 7 - 1)/2
fours = twos ** 2
sqrt2s = twos ** 0.5

print (ones)
print (twos)
print (threes)
print (fours)
print (sqrt2s)

alt_threes = ones + twos
alt_fours = threes + (twos/twos)

print (alt_threes)
print (alt_fours)

tensor([[1., 1.],
        [1., 1.]])
tensor([[2., 2.],
        [2., 2.]])
tensor([[3., 3.],
        [3., 3.]])
tensor([[4., 4.],
        [4., 4.]])
tensor([[1.4142, 1.4142],
        [1.4142, 1.4142]])
tensor([[3., 3.],
        [3., 3.]])
tensor([[4., 4.],
        [4., 4.]])


As demonstrated, arithmatic operations between tensors and scalars such as addition, multiplication, division, and exponentiation are distributed over all elements of the tensor. Similar operations bewteen two tensors behave as we intuitively expect.

It is important to note the tensors are of the same shape. Performing arithmatic operations on tensors of dissimilar shapes produces errors.

The exception to the same-shape rule is with _tensor broadcasting_. Here is an example:

In [9]:
rand = torch.rand(2, 4)

doubled  = rand * (torch.ones(1, 4)*2)

print (rand)
print (doubled)

tensor([[0.6604, 0.1303, 0.3498, 0.3824],
        [0.8043, 0.3186, 0.2908, 0.4196]])
tensor([[1.3208, 0.2606, 0.6996, 0.7648],
        [1.6086, 0.6372, 0.5816, 0.8392]])


The trick to broadcasting is:
- Each tensor must have at least 1 dimension.
- Comparing the dimension sizes of the two tensors, _going from last to first_
    - Each dimension must be equal, _or_
    - One of the dimensions must be size 1, _or_
    - The dimension does not exist in one of the tnsors

Tensors of identical shape are trivially "broadcastable", as the earlier examples.

Here are some other examples of broadcasting:

In [10]:
a = torch.ones(4, 3, 2)

b = a * torch.rand(3, 2) # second and third dimensions are identical to a, and dim 1 absent

c = a * torch.rand(3, 1) # third dimension is 1, second dimension identical to a, and dim 1 absent

d = a * torch.rand(1, 2) # third dimension is identical to a, second dimension is 1, and dim 1 absent

print (a.shape)
print (b.shape)
print (c)
print (d.shape)

torch.Size([4, 3, 2])
torch.Size([4, 3, 2])
tensor([[[0.1880, 0.1880],
         [0.5174, 0.5174],
         [0.7849, 0.7849]],

        [[0.1880, 0.1880],
         [0.5174, 0.5174],
         [0.7849, 0.7849]],

        [[0.1880, 0.1880],
         [0.5174, 0.5174],
         [0.7849, 0.7849]],

        [[0.1880, 0.1880],
         [0.5174, 0.5174],
         [0.7849, 0.7849]]])
torch.Size([4, 3, 2])


If we look at the values of each tensor above:
- The multiplication operation that created `b` was broadcasted over every "layer" of `a`. (dim 1)
- For `c`, the operation was broadcasted over every layer and row of `a` - every 3-element column is identical. (dim 1, 3)
- For `d`, the operation was switched around, making every row identical. (dim 1, 2)

For more information on broadcasting, a more detailed documention is available [here](https://pytorch.org/docs/stable/notes/broadcasting.html).

Here are some examples where boradcasting will fail:

In [11]:
## Uncomment to see examples:
# a = torch.ones(4, 3, 2)

# b = a * torch.rand(4, 3) # dimensions must match last to first

# c = a * torch.rand(2, 3) # both 3rd and 2nd dimensions differ

# d = a * torch.rand((0, )) # empty tensors cannot be broadcast

## More Math with Tensors
PyTorch provides over three hundred operations for tensor manipulation. A more detialed inventory is available [here](https://pytorch.org/docs/stable/torch.html#math-operations).

Here are some sample operations often encountered:

In [12]:
# Common functions:
a = torch.rand(2, 4) * 2 - 1
print ("\nCommon functions:")
print (torch.abs(a))
print (torch.ceil(a))
print (torch.floor(a))
print (torch.clamp(a, -0.5, 0.5))

# Trigonometric functions and their inverse
angles = torch.tensor([0, math.pi/4, math.pi/2, 3*math.pi/4])
sines = torch.sin(angles)
inverses = torch.asin(sines)
print ("\nSine and Arcsine:")
print (angles)
print (sines)
print (inverses)

# Comparisons:
print ("\nBroadcasted, element-wise equality comparison:")
d = torch.tensor(([1., 2.], [3., 4.]))
e = torch.ones(1, 2) # many comparison operations support broadcasting
print (torch.eq(d, e)) # returns a tensor with bool type

# Reductions:
print ("\nReduction operations:")
print (torch.max(d))           # returns a single-element tensor
print (torch.max(d).item())    # extracts the value from the returned tensor
print (torch.mean(d))          # average
print (torch.std(d))           # standard deviation
print (torch.prod(d))          # product of all numbers
print (torch.unique(torch.tensor([1,2,1,2,1,2,1,2,1,2]))) # filters for unique numbers

# Vector and linear algebra operations:
v1 = torch.tensor([1., 0., 0.])
v2 = torch.tensor([0., 1., 0.])
m1 = torch.rand(2,2)
m2 = torch.eye(2)*3

print("\nVectors & matrices:")
print (torch.cross(v2, v1)) # negative of z unit vector (v1 x v2 == -v2 x v1)
print (m1)
m3 = torch.matmul(m1, m2)
print (m3) # matmul 3 times m1
print (torch.svd(m3)) # single value decomp


Common functions:
tensor([[0.4183, 0.6450, 0.1114, 0.7541],
        [0.9275, 0.5390, 0.9243, 0.5521]])
tensor([[1., -0., -0., -0.],
        [1., 1., -0., -0.]])
tensor([[ 0., -1., -1., -1.],
        [ 0.,  0., -1., -1.]])
tensor([[ 0.4183, -0.5000, -0.1114, -0.5000],
        [ 0.5000,  0.5000, -0.5000, -0.5000]])

Sine and Arcsine:
tensor([0.0000, 0.7854, 1.5708, 2.3562])
tensor([0.0000, 0.7071, 1.0000, 0.7071])
tensor([0.0000, 0.7854, 1.5708, 0.7854])

Broadcasted, element-wise equality comparison:
tensor([[ True, False],
        [False, False]])

Reduction operations:
tensor(4.)
4.0
tensor(2.5000)
tensor(1.2910)
tensor(24.)
tensor([1, 2])

Vectors & matrices:
tensor([ 0.,  0., -1.])
tensor([[0.6772, 0.5274],
        [0.6325, 0.0910]])
tensor([[2.0317, 1.5822],
        [1.8975, 0.2729]])
torch.return_types.svd(
U=tensor([[-0.8142, -0.5805],
        [-0.5805,  0.8142]]),
S=tensor([3.1125, 0.7864]),
V=tensor([[-0.8854,  0.4648],
        [-0.4648, -0.8854]]))


## Altering Tensors in Place:
Most binary operations on tensors will return a third new tensor. When we say `c = a * b` (where `a` and `b` are tensors), the new tensor `c` will occupy a region of memory distinct from the other tensors.

There are times, though, that we may wish to alter tensors in place - for example, if we do element-wise computation where we can discard intermediate values. For this, most of the math functions have a version with an appended underscore (\_) that will alter the tensor in place.

In [13]:
a = torch.tensor([0, math.pi/4, math.pi/2, 3*math.pi/4])
print ('a:')
print (a)
print (torch.sin(a)) # this operation creates a new tensor in memory
print ('After operation:')
print (a) # a is unchanged

b = torch.tensor([0, math.pi/4, math.pi/2, 3*math.pi/4])
print ('b:')
print (b)
print (torch.sin_(b)) # this operation modeifies the old tnsor in-place 
print ('After operation:')
print (b) # b is changed

a:
tensor([0.0000, 0.7854, 1.5708, 2.3562])
tensor([0.0000, 0.7071, 1.0000, 0.7071])
After operation:
tensor([0.0000, 0.7854, 1.5708, 2.3562])
b:
tensor([0.0000, 0.7854, 1.5708, 2.3562])
tensor([0.0000, 0.7071, 1.0000, 0.7071])
After operation:
tensor([0.0000, 0.7071, 1.0000, 0.7071])


For arithmatic operations, there are functions that behave similarly:

In [14]:
a = torch.ones(2,2)
b = torch.rand(2,2)

print ('Before:')
print (a, '\n', b)
print ('\nAfter adding:')
print (a.add_(b)) # in-place addition of b to a
print (a, '\n', b)
print ('\nAfter multiplying:')
print (a.mul_(b)) # in-place multiplication of b to a
print (a, '\n', b)

Before:
tensor([[1., 1.],
        [1., 1.]]) 
 tensor([[0.2323, 0.7269],
        [0.1187, 0.3951]])

After adding:
tensor([[1.2323, 1.7269],
        [1.1187, 1.3951]])
tensor([[1.2323, 1.7269],
        [1.1187, 1.3951]]) 
 tensor([[0.2323, 0.7269],
        [0.1187, 0.3951]])

After multiplying:
tensor([[0.2862, 1.2552],
        [0.1329, 0.5512]])
tensor([[0.2862, 1.2552],
        [0.1329, 0.5512]]) 
 tensor([[0.2323, 0.7269],
        [0.1187, 0.3951]])


Note that these in-place arithmatic functions are methods on the `torch.Tensor` object, not attached to the `torch` module like many other functions (e.g. `torch.sin()`). As we can see from `a.add_(b)`, _the calling tensor_ is the one that gets changed in place.

There is another option for placing the results of a computation in an existing, allocated tensor. Many of the methods and functions we've seen so far - including creation methods - have an `out` argument that allows specification of an output tensor. If the `out` tensor is the correct shape and `dtype`, this can happen without new memory allocation.

In [15]:
a = torch.rand(2, 2)
b = torch.rand(2, 2)
c = torch.zeros(2, 2)
old_id = id(c)

print(c)
d = torch.matmul(a, b, out=c)
print(c)                # contents of c have changed

assert c is d           # test c & d are same object, not just containing equal values
assert id(c), old_id    # make sure that our new c is the same object as the old one

torch.rand(2, 2, out=c) # works for creation too!
print(c)                # c has changed again
assert id(c), old_id    # still the same object!

tensor([[0., 0.],
        [0., 0.]])
tensor([[0.7960, 0.9876],
        [0.6180, 0.8033]])
tensor([[0.9874, 0.7316],
        [0.2814, 0.0651]])


## Copying Tensors:
As with any object in Python, assigning a tensor to a variable makes the variable a _label_ of the tensor, and does not copy it. For example:

In [16]:
a = torch.ones(2,2)
b = a

a[0][1] = 561 # changes made to a are also seen when b is called. 
print (b) # b is only a label to the tensor object. Not an independant tensor.

tensor([[  1., 561.],
        [  1.,   1.]])


To create a separate copy, the `clone()` method can be used:

In [17]:
a = torch.ones(2,2)
b = a.clone()

assert b is not a

a[1][1] = 200
print ('a:')
print (a)
print ('b:')
print (b)

a:
tensor([[  1.,   1.],
        [  1., 200.]])
b:
tensor([[1., 1.],
        [1., 1.]])


__An important note on the__ `clone()` __method.__ If the source tensor has autograd enabled, then so will the clone. In many cases, this will be beneficial. For example, if the model has multiple computation paths in the `forward()` method, and _both_ the source and clone contribute to the model output, then enabling autograd in both is necessary for learning. If the source already has autograd enabled, then enabling autgrad for the clone is trivial.

On the other hand, if we're doing a computation where _neither_ the original tensor nor it clone need to track their gradients, then as long as the source tensor has autograd turned off, there are no problems.

Issues arrise when we compute using the model's `forward()` function, where gradients are turned on by default, but want to pull out values mid-stream to generate metrics. In this case, we _don't_ want to turn off autograd for the source, but need autograd to be off for the cloned matrices. For this, we can use the `.detach()` method on the source tensor:

In [18]:
a = torch.rand(2, 2, requires_grad = True) # turn on autograd
print (a)

b = a.clone()
print (b)

c = a.detach().clone()
print (c)

print (a)

tensor([[0.0065, 0.5035],
        [0.3082, 0.3742]], requires_grad=True)
tensor([[0.0065, 0.5035],
        [0.3082, 0.3742]], grad_fn=<CloneBackward0>)
tensor([[0.0065, 0.5035],
        [0.3082, 0.3742]])
tensor([[0.0065, 0.5035],
        [0.3082, 0.3742]], requires_grad=True)


What's happening here?
- We create `a`, with `requires_grad=True` turned on. 
- When we print `a`, it informs us that the property `requires_grad=True` - this means that autograd and computation history tracking are turned on
- We clone `a` and label it `b`. When we print `b`, we can see that it's tracking its computation history - it has inherited `a`'s autograd settings, and added to the computation history.
- We clone `a` to `c`, but we call `.detach()` first.
- Printing `c`, we see no computation history, and no `requires_grad=True`.

The `detach()` method _detaches the tensor from its computation history_. It say "do what comes next as if autograd was off". It does this _without_ changing the original properties, so when we print `a` again after cloning `c`, we see that `requires_grad=True` remains unchanged.

## Moving to GPU:
One of the major advantages of PyTorch is its robust acceleration on CUDA-compatible Nvidia GPUs. ("CUDA" stands for _Compute Unified Device Architecture_, which is Nvidia's platform for parallel computing.) So far, everything so far has been done on the CPU. For heavier computations, moving to a GPU is more efficient.

First, we need to check if a CUDA-compatible GPU is available:

In [19]:
if torch.cuda.is_available():
    print ('CUDA device available.')
else:
    print ('No CUDA device.')

CUDA device available.


Once we've determined that one or more GPUs is available, we need to move our data to where the GPU can see. CPU computations access data on the RAM. GPUs have a dedicated memory attached. Data must be moved from the RAM to GPU memory before it can be processed by the GPU.

There are multiple ways to get data onto the target device. We can do it at creation time:

In [20]:
if torch.cuda.is_available():
    gpu_rand = torch.rand(2,2, device='cuda')
    print (gpu_rand)
else:
    print ('No CUDA device.')

tensor([[0.1272, 0.8167],
        [0.5440, 0.6601]], device='cuda:0')


By default, new tensors are created on the CPU, so we have to specify when we want to create our tensor on the GPU with the optional `device` argument. You can see when we print the new tensor, PyTorch informs us which device it's on (if not on CPU).

We can query the number GPUs available with `torch.cuda.device_count()`. If we have more than one GPU, we can specify them by index: `device = 'cuda:0'`, `device = 'cuda:1'`, etc.

As a coding practice, specifying our devices everywhere with string constants is pretty fragile. In an ideal world, code would perform robustly whether on a CPU or a GPU hardware. We can do this by creating a device handle that can be passed to tensors instead of a string:

In [21]:
device = ('cuda' if torch.cuda.is_available() else 'cpu')
print (f'Device is "{device}".')

x = torch.rand(2,2, device=device)
print (x)

Device is "cuda".
tensor([[0.6208, 0.0276],
        [0.3255, 0.1114]], device='cuda:0')


If we have an existing tensor already on one device, we can move it to another using the `to()` method. The following code creates a tensor at the default CPU device, then moves it to the device handle we defined in the previous cell.

In [22]:
y = torch.rand(2, 2)
print (y)
y = y.to(device)
print (f'y device is : {y.device}')

y_cpu = y.clone()
y_cpu = y_cpu.to("cpu")
print (f'y_cou device is: {y_cpu.device}')

tensor([[0.4297, 0.9729],
        [0.9739, 0.4533]])
y device is : cuda:0
y_cou device is: cpu


It is important to note that computations can only be done if the device has access to both tensors in memory. I.e., both tensors have to be on the same CUDA device, or CPU to interact with each other.

## Manipulating Tensor Shapes:
Sometimes, we need to change the shapes of our tensors. Below, we'll look at a few common cases, and how to handle them.

### Changing the Number of Dimensions:
One case where we might need to change the number of dimensions is passing a single instance of input to the model; one picture from a batch, or one example from a set. PyTorch models generally expect _batches_ of inputs.

For example, imagine a model that works on 226x226 pixel coloured images. When we load and transform it, we get a tensor of shape `(3, 226, 226)` for each picture fed into the model. The model then expects the batch to have the shape of `(N, 3, 226, 226)`, where `N` is the number of items in the batch. To make a batch of one, we use the `unsqueeze` method:

In [23]:
a = torch.rand(3, 226, 226)
b = a.unsqueeze(0)

print (a.shape)
print (b.shape)

torch.Size([3, 226, 226])
torch.Size([1, 3, 226, 226])


The `unsqueeze()` method adds a dimension of extent 1. Specifying `unsqueeze(0)` adds is as a new dimension zero, pushing the rest back by one. This gives us a batch with one item of 3x226x226.

So if that is _unsqueezing_, what is _squeezing_? `squeeze` takes advantage of the fact that any dimension of extent 1 _does not_ change the number of elements in the tensor.

In [24]:
c = torch.rand(1,1,1,1,1,1)
print (c)

tensor([[[[[[0.5017]]]]]])


Continuing the example above, let's say the model's output is a 20-element vector for each input. We would then expect the output to have a shape `(N, 20)`, where `N` is the number of instances in the input batch. That means that for our single-input batch, we'll get an output shape of `(1, 20)`.

What if we want to do some _non-batched_ computation with a single output - operations that expect a 20-element vector?

In [25]:
a = torch.rand(1, 20)
print (f'shape of a is: {a.shape}')
print ('a:\n', a)

b = a.squeeze(0)
print (f'shape of b is: {b.shape}')
print ('b:\n', b)

c = torch.rand(2, 2)
print (f'shape of c is: {c.shape}')

d = c.squeeze(0)
print (f'shape of d is: {d.shape}')

shape of a is: torch.Size([1, 20])
a:
 tensor([[0.8647, 0.8954, 0.4120, 0.2145, 0.1577, 0.3815, 0.1463, 0.5738, 0.1817,
         0.0133, 0.2362, 0.0667, 0.4781, 0.5967, 0.2972, 0.1862, 0.7488, 0.0809,
         0.3406, 0.7557]])
shape of b is: torch.Size([20])
b:
 tensor([0.8647, 0.8954, 0.4120, 0.2145, 0.1577, 0.3815, 0.1463, 0.5738, 0.1817,
        0.0133, 0.2362, 0.0667, 0.4781, 0.5967, 0.2972, 0.1862, 0.7488, 0.0809,
        0.3406, 0.7557])
shape of c is: torch.Size([2, 2])
shape of d is: torch.Size([2, 2])


We can see from the shape that the 2-dimensional tensor `a` is now a one dimensional tensor `b`. The square brackets `[]` denoted the extra dimensions, with `a` having an extra set, and `b` without.

`squeeze()` is only usable with dimensions of extent 1. Calling the `squeeze()` method on a dimension of size 2 in `c` returns a tensor with the original shape. 

Another way to use the `unsqueeze()` method is to ease broadcasting:

In [26]:
a = torch.ones(4,3,2)

c = a * torch.rand(3, 1)
print (c)

tensor([[[0.1740, 0.1740],
         [0.6526, 0.6526],
         [0.0788, 0.0788]],

        [[0.1740, 0.1740],
         [0.6526, 0.6526],
         [0.0788, 0.0788]],

        [[0.1740, 0.1740],
         [0.6526, 0.6526],
         [0.0788, 0.0788]],

        [[0.1740, 0.1740],
         [0.6526, 0.6526],
         [0.0788, 0.0788]]])


In [27]:
a = torch.ones(4, 3, 2)
b = torch.rand(3)
print (a.shape)
print (b.shape)

b = b.unsqueeze(1)
print (b.shape)

c = a*b
print (c)

torch.Size([4, 3, 2])
torch.Size([3])
torch.Size([3, 1])
tensor([[[0.1818, 0.1818],
         [0.0603, 0.0603],
         [0.5349, 0.5349]],

        [[0.1818, 0.1818],
         [0.0603, 0.0603],
         [0.5349, 0.5349]],

        [[0.1818, 0.1818],
         [0.0603, 0.0603],
         [0.5349, 0.5349]],

        [[0.1818, 0.1818],
         [0.0603, 0.0603],
         [0.5349, 0.5349]]])


The `squeeze()` and `unsqueeze()` methods have in-place versions `squeeze_()` and `unsqueeze_()`.

In [28]:
x = torch.rand(3, 3)
x.unsqueeze_(0)
print (x.shape)

torch.Size([1, 3, 3])


Sometimes we'll want to more radically change the shapes of tensors, while still preserving the number of elements and their contents. One case where this happens is at the interface between a convolutional layer and the linear layer of a model; this is common in image classification systems. A convolutional kernel yields an output tensor of shape _features x width x height_, but the following linear layer expects a 1-dimensional input. The `reshape()` method is the solution, provided that the dimensions we request yield the same number of elements as the input tensors:

In [29]:
output3d = torch.rand(3, 50, 50)
print (output3d.shape)

input1d = output3d.reshape(3*50*50)
print (input1d.shape)

print (torch.reshape(output3d, (3*50*50,)).shape)

torch.Size([3, 50, 50])
torch.Size([7500])
torch.Size([7500])


Note: the argument `(3*50*50,)` expects a __tuple__ when specifying a tensor shape, but when the shape is the first argument of a method, we can cheat with a series of integers. Here we add the parantheses and comma to convince the method that this is a one-element tuple.

When it can, `reshape()` will return a _view_ on the tensor being changed - that is, a separate tensor object looking at the same underlying region of the memory. _This is important_: That means any change made to the source tensor will be reflected in the view on that tensor, unless we `clone()` it.

There _are_ conditions beyond the scope of this introduction where `reshape()` has to return a tensor carrying a copy of the data. More information can be found [here](https://pytorch.org/docs/stable/torch.html#torch.reshape).

## NumPy Bridge:
In the section above on boradcasting, it was mentioned that PyTorch's boradcast semantics are compatible with NumPy's - but the kinship between PyTorch and NumPy goes even deeper than that.

If we have existing ML or scientific code with data stored in NumPy ndarrays, we may wish to express that same data as PyTorch tensors, whether to take advantage of PyTorch's GPU acceleration, of its efficient abstractions for building ML models. It's easy to switch between ndarrays and PyTorch tensors:

In [30]:
import numpy as np

numpy_array = np.ones((2,3))
print (numpy_array)

pytorch_tensor = torch.from_numpy(numpy_array)
print (pytorch_tensor)

[[1. 1. 1.]
 [1. 1. 1.]]
tensor([[1., 1., 1.],
        [1., 1., 1.]], dtype=torch.float64)


PyTorch creates a tensor of the same shape and containing the same data as the NumPy array, going as far as to keep NumPy's default 64-bit float data type.

The conversion can just as easily go the other way:

In [31]:
pytorch_rand = torch.rand(2, 3)
print (pytorch_rand)

numpy_rand = pytorch_rand.numpy()
print (numpy_rand)

tensor([[0.8216, 0.8794, 0.8226],
        [0.7592, 0.7540, 0.8187]])
[[0.82161367 0.87936646 0.82257426]
 [0.7592339  0.75399923 0.818666  ]]


It is important to know that these converted objevts are using _the same underlying memory_ as their source pbjects, meaning that changes to one are reflected in the other:

In [32]:
numpy_array[1,1] = 23
print (pytorch_tensor)

pytorch_rand[1,1] = 17
print (numpy_rand)

tensor([[ 1.,  1.,  1.],
        [ 1., 23.,  1.]], dtype=torch.float64)
[[ 0.82161367  0.87936646  0.82257426]
 [ 0.7592339  17.          0.818666  ]]
