#Getting started with tensors in PyTorch

In [1]:
import torch

# Data representation in neural networks

Modern machine-learning systems use *tensors* as their basic data structures. These are fundamentally containers for data. You will already have used matrices, which are examples of two-dimensional tensors. The tensor's *rank* is its number of axes (like dimensions for a matrix)

# Tensors

*  Scalars: Tensors which contain only one number. 0-dimensional tensors
*  Vectors (1D tensors). An array of numbers is a vector or 1D tensor, and has one axis.
*  Matrices are 2D tensors. The two axes are referred to as *rows* and *columns*
*  3D tensors and higher. Putting a 2D matrix in a new array gives you a 3D tensor (and so on to higher dimensions..). You will end up spending a lot of time making sure your tensors are the right shape, when debugging your deep networks... :-)

* Manipulating tensors in Numpy
* data batches
* Examples of training data used, in each case with $N=$ `samples` examples in the set:
   * Vector data - 2D `(samples, features)`
   * Time-series/sequence data - 3D tensors shaped `(samples, timesteps, features)`)
   
   ![alt text](https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcRrnL1abPz6Nkugh29zNvaq-L_KqAvJAIBJQdr_dpi4Khpnrn_sww)
   * Image data -- 4D tensors shaped `(samples, height, width, channels)`
   
   ![alt text](https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcRbVhpi2ZiSb7fl0Q8YMXBhMQ7Ny-rwHG82TEpjgIm8yKCfiV0r)
   * Video data - 5D tensors shaped `(samples, frames, height, width, channels)`. Think about the memory requirements of these tensors. How much memory would you need for a 60 second video at 256x256 sampled 30 times a second?
---
# Tensor operations

* Element-wise operations - e.g. the activation function `relu()` when implemented as `torch.max(z,0.0)` is applied independently to each entry of the tensor considered (rather than using a `for` loop to run through each entry). Usually much more efficient than implementing loops. In a number of cases these element-wise operations are different from the typical operations on matrices. So element-wise multiplication corresponds to the Hadamard product $z = x \odot y$ (sometimes written $x \circ y$) where each element is multiplied together to get a new element rather than the typical matrix multiplication.
* Broadcasting - when the shapes of tensors differs and there is no ambiguity, and the results are legal, the smaller tensor can be *broadcasted* to match the shape of the larger tensor.
     * this has two steps:
     1. axes are added to the smaller tensor to match the `ndim` of the larger (from the left-hand side)
     2. the smaller tensor is repeated alongside any axes (assuming they are a multiple of the larger axes size) to match the full shape of the larger tensor.
     E.g.
     
     `x = torch.rand(64,3,32,10)`
     
    `y = torch.rand(3, 32, 1)`
    
    `z = torch.max(x,y)` - z has shape (64,3,32,10) like `x`
    
* Tensor dot - aka *tensor product* . In torch use `z=torch.dot(x,y)`, in mathematical notation $z=x \cdot y$. Note this is different from the elementwise multiplication `z=x*y`. The dot product of two vectors (which have to be the same size) is a scalar. The dot product of a matrix $x$ and a vector $y$ is a vector where the cofficients are the dot products between $y$ and the rows of $x$.
* Tensor reshaping


In [2]:
x=torch.ones(64, 3, 32, 10)
y=torch.ones(3, 32, 1)
z = torch.max(x, y)
print(z)
print(z.size())

tensor([[[[1., 1., 1.,  ..., 1., 1., 1.],
          [1., 1., 1.,  ..., 1., 1., 1.],
          [1., 1., 1.,  ..., 1., 1., 1.],
          ...,
          [1., 1., 1.,  ..., 1., 1., 1.],
          [1., 1., 1.,  ..., 1., 1., 1.],
          [1., 1., 1.,  ..., 1., 1., 1.]],

         [[1., 1., 1.,  ..., 1., 1., 1.],
          [1., 1., 1.,  ..., 1., 1., 1.],
          [1., 1., 1.,  ..., 1., 1., 1.],
          ...,
          [1., 1., 1.,  ..., 1., 1., 1.],
          [1., 1., 1.,  ..., 1., 1., 1.],
          [1., 1., 1.,  ..., 1., 1., 1.]],

         [[1., 1., 1.,  ..., 1., 1., 1.],
          [1., 1., 1.,  ..., 1., 1., 1.],
          [1., 1., 1.,  ..., 1., 1., 1.],
          ...,
          [1., 1., 1.,  ..., 1., 1., 1.],
          [1., 1., 1.,  ..., 1., 1., 1.],
          [1., 1., 1.,  ..., 1., 1., 1.]]],


        [[[1., 1., 1.,  ..., 1., 1., 1.],
          [1., 1., 1.,  ..., 1., 1., 1.],
          [1., 1., 1.,  ..., 1., 1., 1.],
          ...,
          [1., 1., 1.,  ..., 1., 1., 1.],
        

Note that when an uninitialized matrix is created, whatever values were in the allocated memory at the time will appear as the initial values.

An alternative is to explicitly set things to be random:

In [3]:
x = torch.rand(5, 3)
print(x)

tensor([[0.3317, 0.1188, 0.5048],
        [0.8529, 0.5217, 0.7226],
        [0.3889, 0.4374, 0.6927],
        [0.3136, 0.0951, 0.8732],
        [0.2574, 0.3719, 0.9289]])


Construct a matrix filled zeros and of dtype long:

In [4]:
x = torch.zeros(5, 3, dtype=torch.long)
print(x)

tensor([[0, 0, 0],
        [0, 0, 0],
        [0, 0, 0],
        [0, 0, 0],
        [0, 0, 0]])


Let us take a list of three numbers in python:

In [5]:
a = [1.0, 2.0, 1.0]

we can access the first element of the list using the index 0:

In [6]:
a[0]

1.0

In [7]:
a[2] = 3.0
a

[1.0, 2.0, 3.0]

We can create a PyTorch tensor

In [8]:
import torch
a = torch.ones(3)
a

tensor([1., 1., 1.])

The first entry isn't a 1, it is a tensor with one element in it

In [9]:
a[0]

tensor(1.)

but you can easily convert it to a float:

In [10]:
float(a[0])

1.0

In [11]:
a[0].detach().numpy()

array(1., dtype=float32)

In [12]:
a[1] = 2.0
a

tensor([1., 2., 1.])

We can construct a tensor directly from data, by passing a Python list to the constructor:

In [13]:
points = torch.tensor([5.5, 3, 6, 10])
print(points)

tensor([ 5.5000,  3.0000,  6.0000, 10.0000])


This has the same result as the code below

In [14]:
points = torch.zeros(4) # <1>
points[0] = 5.5 # <2>
points[1] = 3.0
points[2] = 6.0
points[3] = 10.0
print(points)

tensor([ 5.5000,  3.0000,  6.0000, 10.0000])


or create a tensor based on an existing tensor. These methods will reuse properties of the input tensor, e.g. dtype, unless new values are provided by user

In [15]:
x = x.new_ones(5, 3, dtype=torch.double)      # new_* methods take in sizes

print(x)

x = torch.randn_like(x, dtype=torch.float)    # override dtype!

print(x)                                      # result has the same size

tensor([[1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.]], dtype=torch.float64)
tensor([[ 0.3235, -1.6789, -0.3105],
        [ 0.7018,  0.2412,  0.0907],
        [ 0.6303,  0.9010, -0.3965],
        [-1.0251, -1.1379,  0.1549],
        [ 1.1804, -2.6709, -1.6864]])


In [16]:
print(x.size())

torch.Size([5, 3])


Python lists or tuples of numbers are collections of Python objects that are individually allocated in memory. PyTorch tensors or NumPy arrays on the other hand are views over (typically) contiguous memory blocks containing  C numeric types unboxed rather than Python objects. For instance, float32 datatypes would each consist of 32-bit (4 byte) IEEE floating point values. This means that a 1D tensor of 1,000,000 float numbers will require exactly 4,000,000 contiguous bytes to be stored, plus a small overhead for the meta data (e.g. dimensions, numeric type)

What if we want to refer to 2D points? We can create a 2D tensor:

In [17]:
points = torch.tensor([[4.0, 1.0], [5.0, 3.0], [2.0, 1.0]])
points

tensor([[4., 1.],
        [5., 3.],
        [2., 1.]])

and you can check the shape of a tensor:

In [18]:
points.shape

torch.Size([3, 2])

You can also use zeros or ones to initialise the tensor, giving it a specific shape

In [19]:
points = torch.zeros(3, 2)
points

tensor([[0., 0.],
        [0., 0.],
        [0., 0.]])

In [20]:
points = torch.ones(3,2)
points

tensor([[1., 1.],
        [1., 1.],
        [1., 1.]])

By default, tensors are created using datatype torch.float32 (32-bit floating point numbers). However, different datatypes for the elements can be specified with the dtype argument, or there are also methods that create tensors with a specific datatype.

In [21]:
points = torch.tensor([[4.0, 1.0], [5.0, 3.0], [2.0, 1.0]], dtype=torch.float32)
print(points)
print(points.dtype)

tensor([[4., 1.],
        [5., 3.],
        [2., 1.]])
torch.float32


In [22]:
ppoints = torch.FloatTensor([[4.0, 1.0], [5.0, 3.0], [2.0, 1.0]])
print(points)
print(points.dtype)

tensor([[4., 1.],
        [5., 3.],
        [2., 1.]])
torch.float32


If we wanted the $y$ coordinate of the 0th point:

In [23]:
points[0, 1]

tensor(1.)

or just get the full 2D coordinates of the 0th point:

In [24]:
points[0]

tensor([4., 1.])

***Views on storage***
 Values in Tensors are allocated in contiguous chunks of memory, managed by `torch.Storage` instances. A storage is a one-dimensional array of numerical data, i.e. a contiguous block of memory containing numbers of a given type, such as `float`, 32-bits representing a floating point number, or `int64`, 64-bits representing an integer. A PyTorch  `Tensor` is a view over such a  `Storage` that is capable of indexing into that storage using an offset and and per-dimension strides.

 Multiple tensors can index the same storage, even if they index into the data differently. When we requested  `points[0]` above, what we got back is another tensor that indexes the same storage as the  tensor, just not all of it and points with different dimensionality (1D vs 2D).  The underlying memory is allocated only once, however, so creating alternate tensor-views on the data can be done quickly, no matter the size of the data managed by the `Storage` instance.
    

**Indexing into storage** Let’s see how indexing into the storage works in practice with our 2D points. The storage for a given tensor is accessible using the `.storage` property:

In [25]:
points = torch.tensor([[4.0, 1.0], [5.0, 3.0], [2.0, 1.0]])
points.storage()

  points.storage()


 4.0
 1.0
 5.0
 3.0
 2.0
 1.0
[torch.storage.TypedStorage(dtype=torch.float32, device=cpu) of size 6]

Even though the tensor reports itself as having 3 rows and 2 columns, the storage under the hood is a contiguous array of size 6. In this sense, the tensor just knows how to translate a pair of indices into a location in the storage. We can also index into a storage manually, for instance:

In [26]:
points_storage = points.storage()
print(points_storage[0])
print(points.storage()[1])

4.0
1.0


If you have a one element tensor, use .item() to get the value as a Python number

In [27]:
print(points[0,1].item())

1.0


We can’t index a storage of a 2D tensor using two indices. The layout of a storage is always one-dimensional, irrespective of the dimensionality of any and all tensors that might refer to it.

Changing the value of a storage leads to changing the content of its referring tensor:

In [28]:
points = torch.tensor([[4.0, 1.0], [5.0, 3.0], [2.0, 1.0]])
points_storage = points.storage()
points_storage[0] = 2.0
points

tensor([[2., 1.],
        [5., 3.],
        [2., 1.]])

In order to index into a storage, tensors rely on a few pieces of information, which, together with their storage, unequivocally define them: size, storage offset and stride.  The size (or shape, in NumPy parlance) is a tuple indicating how many elements across each dimension the tensor represents. The storage offset is the index in the storage corresponding to the first element in the tensor. Stride is the number of elements in the storage that need to be skipped over to obtain the next element along each dimension.

In [29]:
points = torch.tensor([[4.0, 1.0], [5.0, 3.0], [2.0, 1.0]])
second_point = points[1]
print(second_point.storage_offset())

second_point.size(), second_point.shape

2


(torch.Size([2]), torch.Size([2]))

The resulting tensor has offset 2 in the storage (since we need to skip the first point, which has two items) and the size is an instance of the `Size` class containing one element, since the tensor is one-dimensional. Important note: this is the same information as contained in the `shape` property of tensor objects:


Last, stride is a tuple indicating the number of elements in the storage that have to be skipped when the index is increased by 1 in each dimension. For instance, our `points`  tensor has a stride of : (2, 1)

In [30]:
points.stride()

(2, 1)

This indirection between a tensor and its storage leads to some operations, like transposing a tensor or extracting a sub-tensor, to be inexpensive, as they do not lead to memory reallocations; instead they consist in allocating a new tensor object with a different value for size, storage offset or stride.

Let’s try with transposing now. Let’s take our  tensor, that has individual points in the points rows and x and y coordinates in the columns, and turn it around so that individual points are along the columns. We take this opportunity to introduce the  function, a short-hand alternative t for 2-dimensional tensors


In [31]:
points = torch.tensor([[4.0, 1.0], [5.0, 3.0], [2.0, 1.0]])
points

tensor([[4., 1.],
        [5., 3.],
        [2., 1.]])

In [32]:
points_t = points.t()
points_t

tensor([[4., 5., 2.],
        [1., 3., 1.]])

We can easily verify that the two tensors share the same storage:


In [33]:
id(points.storage()) == id(points_t.storage())
points.storage()

 4.0
 1.0
 5.0
 3.0
 2.0
 1.0
[torch.storage.TypedStorage(dtype=torch.float32, device=cpu) of size 6]

and that they differ only in the shape and stride


In [34]:
points.stride()

(2, 1)

In [35]:
points_t.stride()

(1, 2)

Transposing in PyTorch is not limited to matrices. We can transpose a multidimensional array by specifying the two dimensions along which transposing (i.e. flipping shape and stride) should occur

In [36]:
some_t = torch.ones(3, 4, 5)
transpose_t = some_t.transpose(0, 2)
some_t.shape

torch.Size([3, 4, 5])

In [37]:
transpose_t.shape

torch.Size([5, 4, 3])

In [38]:
some_t.stride()

(20, 5, 1)

In [39]:
transpose_t.stride()

(1, 5, 20)

In [40]:
points.is_contiguous()

True

In [41]:
points_t.is_contiguous()

False

In [42]:
points = torch.tensor([[4.0, 1.0], [5.0, 3.0], [2.0, 1.0]])
points_t = points.t()
points_t

tensor([[4., 5., 2.],
        [1., 3., 1.]])

In [43]:
points_t.storage()

 4.0
 1.0
 5.0
 3.0
 2.0
 1.0
[torch.storage.TypedStorage(dtype=torch.float32, device=cpu) of size 6]

In [44]:
points_t.stride()

(1, 2)

The contiguous() method returns a tensor which has had memory reallocated so that the data elements are contiguous in memory.

In [45]:
points_t_cont = points_t.contiguous()
points_t_cont

tensor([[4., 5., 2.],
        [1., 3., 1.]])

In [46]:
points_t_cont.stride()

(3, 1)

In [47]:
points_t_cont.storage()

 4.0
 5.0
 2.0
 1.0
 3.0
 1.0
[torch.storage.TypedStorage(dtype=torch.float32, device=cpu) of size 6]

Tensors of different datatypes can be allocated by specifying the `dtype` attribute.

In [48]:
double_points = torch.ones(10, 2, dtype=torch.double)
short_points = torch.tensor([[1, 2], [3, 4]], dtype=torch.short)

In [49]:
print(double_points.dtype)
print(short_points.dtype)

torch.float64
torch.int16


Alternatively, methods such as `double()` can be used to convert the datatype of a current tensor.

In [50]:
double_points = torch.zeros(5, 2).double()
short_points = torch.ones(5, 2).short()
print(double_points)
print(short_points)

tensor([[0., 0.],
        [0., 0.],
        [0., 0.],
        [0., 0.],
        [0., 0.]], dtype=torch.float64)
tensor([[1, 1],
        [1, 1],
        [1, 1],
        [1, 1],
        [1, 1]], dtype=torch.int16)


The `to()` method can also be used to convert tensors to a particular torch datatype.

In [51]:
double_points = torch.zeros(5, 2).to(torch.double)
short_points = torch.ones(5, 2).to(dtype=torch.short)
print(double_points)
print(short_points)

tensor([[0., 0.],
        [0., 0.],
        [0., 0.],
        [0., 0.],
        [0., 0.]], dtype=torch.float64)
tensor([[1, 1],
        [1, 1],
        [1, 1],
        [1, 1],
        [1, 1]], dtype=torch.int16)


And finally the method `type()` can also be used to convert tensors to a particular type.

In [52]:
points = torch.randn(5, 2)
print(points)
short_points = points.type(torch.short)
print(short_points)

tensor([[ 0.8742,  1.4732],
        [ 0.5044,  0.4933],
        [-1.2388,  0.9348],
        [-0.3704,  1.6001],
        [ 0.9247,  1.0760]])
tensor([[ 0,  1],
        [ 0,  0],
        [-1,  0],
        [ 0,  1],
        [ 0,  1]], dtype=torch.int16)


In [53]:
# reset points back to original value
points = torch.tensor([[4.0, 1.0], [5.0, 3.0], [2.0, 1.0]])

***Indexing Tensors***
What if we need to obtain a tensor containing all points but the first? That’s easy using range indexing notation, the same that applies to standard Python lists, which we quickly recall. (Add print statements to the other entries if you are unsure what they will produce as output.)

In [54]:
some_list = list(range(6))
print(some_list)
some_list[:]            # <1>
some_list[1:4]          # <2>
some_list[1:]           # <3>
some_list[:4]           # <4>
some_list[:-1]          # <5>
print(some_list[1:4:2]) # <6>

[0, 1, 2, 3, 4, 5]
[1, 3]


1. all elements in the list
2. from element 1 inclusive to element 4 exclusive
3. from element 1 inclusive to the end of the list
4. from the start of the list to element 4 exclusive
5. from the start of the list to one before the last element
6. from element 1 inclusive to element 4 exclusive in steps of 2

To achieve our goal we can use the same notation for PyTorch tensors, with the added benefit that, just like in NumPy and in other Python scientific libraries, we can use range indexing for each of the dimensions of the tensor:

(Again add print statements if you are uncertain what values are produced by any of these statements.)

In [55]:
print(points)
points[1:]          # <1>
points[1:, :]       # <2>
points[1:, 0]       # <3>
print(points[None]) # <4>

tensor([[4., 1.],
        [5., 3.],
        [2., 1.]])
tensor([[[4., 1.],
         [5., 3.],
         [2., 1.]]])


1. All rows after first, implicitly all columns
2. All rows after first, all columns
3. All rows after first, first column
4. Add dimension of size one, just like unsqueeze (note the extra square brackets printed compared to just points)

***Named Tensors***
The dimensions (or axes) of our Tensors usually index something like pixel locations or color channels. This means that when we want to index into our Tensor, we need to remember the ordering of the dimensions and write our indexing accordingly. As data is transformed through multiple tensors, keeping track of which dimension contains what data can be error-prone.
To make things concrete, imagine that we have a 3D Tensor like `img_t` (we will use dummy data for simplicity here) and want to convert it to grayscale. We looked up typical weights for the colors to derive a single brightness value

In [56]:
img_t = torch.randn(3, 5, 5) # shape [channels, rows, columns]
weights = torch.tensor([0.2126, 0.7152, 0.0722])


We also often want our code to generalize - for example from grayscale images represented as 2D Tensors with height and width dimensions to color images adding a third channel dimension (as in RGB) or from a single image to a batch of images.

In [57]:
batch_t = torch.randn(2, 3, 5, 5) # shape [batch, channels, rows, columns]

So sometimes the RGB channels are in dimension 0 and sometimes in dimension 1. But we can generalize by counting from the end: They are always in dimension -3, the third from the end. The lazy, unweighted mean would thus be written as follows:


In [58]:
img_gray_naive = img_t.mean(-3)
batch_gray_naive = batch_t.mean(-3)

img_gray_naive.shape, batch_gray_naive.shape

(torch.Size([5, 5]), torch.Size([2, 5, 5]))

But now we have the weight, too. PyTorch will allow us to multiply things that are of same shape, but also of shapes where one operand is of size one in a given dimension. It also appends leading dimensions of size one automatically. This is a feature called *broadcasting*. We see that our `batch_t` of shape (2, 3, 5, 5) gets multiplied with the `unsqueezed_weights` of shape (3, 1, 1) to a tensor of shape (2, 3, 5, 5), from which we can then sum the third dimension from the end (the 3 channels).

Make sure you understand how the `unsqueeze()` method is working from the printed outputs (count the square brackets on the output!), and then also how the broadcasting is working when doing multiplications.

In [59]:
print(f"Shape of weights: {weights.shape}")
print(f"Value: {weights} \n")

unsqueezed_weights = weights.unsqueeze(-1).unsqueeze(-1)
print(f"Shape of unsqueezed weights: {unsqueezed_weights.shape}")
print(f"Value: {unsqueezed_weights}\n")

img_weights = (img_t * unsqueezed_weights)
batch_weights = (batch_t * unsqueezed_weights)
img_gray_weighted = img_weights.sum(-3)
batch_gray_weighted = batch_weights.sum(-3)

print(f"Shape of batch_gray_weighted: {batch_gray_weighted.shape}")


Shape of weights: torch.Size([3])
Value: tensor([0.2126, 0.7152, 0.0722]) 

Shape of unsqueezed weights: torch.Size([3, 1, 1])
Value: tensor([[[0.2126]],

        [[0.7152]],

        [[0.0722]]])

Shape of batch_gray_weighted: torch.Size([2, 5, 5])


Because this gets messy quickly (and for efficiency), there even is a PyTorch function einsum (adapted from NumPy) that specifies an indexing mini-language  giving index names to 28 dimensions for sums of such products. As often in Python, broadcasting — a form of summarizing unnamed things — is done using three dots `...`

In [60]:
img_gray_weighted_fancy   = torch.einsum('...chw,c->...hw', img_t, weights)
batch_gray_weighted_fancy = torch.einsum('...chw,c->...hw', batch_t, weights)

batch_gray_weighted_fancy.shape

torch.Size([2, 5, 5])

PyTorch 1.3 added  as an experimental feature. Tensor factory functions such as `tensor`  or `rand` take a `names` argument. The names should be a sequence of strings

In [61]:
weights_named = torch.tensor([0.2126, 0.7152, 0.0722], names=['channels'])

weights_named


  weights_named = torch.tensor([0.2126, 0.7152, 0.0722], names=['channels'])


tensor([0.2126, 0.7152, 0.0722], names=('channels',))

When we already have a tensor and want to add names (but not change existing ones), we can call the `refine_names` method  on it. Similar to indexing, the ellipsis `...` allows you to leave out refine_names … any number of dimensions. With the  `rename` sibling method you can also overwrite or drop (by passing in `None`) existing names.

In [62]:
img_named =  img_t.refine_names(..., 'channels', 'rows', 'columns')
batch_named = batch_t.refine_names(..., 'channels', 'rows', 'columns')

print("img named:", img_named.shape, img_named.names)
print("batch named:", batch_named.shape, batch_named.names)


img named: torch.Size([3, 5, 5]) ('channels', 'rows', 'columns')
batch named: torch.Size([2, 3, 5, 5]) (None, 'channels', 'rows', 'columns')


For operations with two inputs, in addition to the usual dimension checks, i.e. that sizes are either the same or one is 1 and can be broadcast to the other, PyTorch will now check the names for us. So far, it does not automatically align dimensions, so we need to do this explicitly. The method  `align_as` returns a tensor with missing dimensions added and existing ones permuted to the right order

In [63]:
weights_aligned = weights_named.align_as(img_named)
weights_aligned.shape, weights_aligned.names

(torch.Size([3, 1, 1]), ('channels', 'rows', 'columns'))

Functions accepting dimension arguments, like `sum`, also take named dimensions. A nice feature for robustness is that if you try to combine dimensions with different names, you get an error.  Named tensors have the potential of eliminating many sources of alignment errors which are a frequent source of debugging problems.

In [64]:
gray_named = (img_named * weights_aligned).sum('channels')

gray_named.shape, gray_named.names

(torch.Size([5, 5]), ('rows', 'columns'))

The code below should create an error...

In [65]:
gray_named = (img_named[..., :3] * weights_named).sum('channels')

RuntimeError: Error when attempting to broadcast dims ['channels', 'rows', 'columns'] and dims ['channels']: dim 'columns' and dim 'channels' are at the same position from the right but do not match.

***Tensor Element Types***
Python numeric types can be sub-optimal for several reasons:
* Numbers in Python are full-fledged objects. while a floating point number might only take, for instance, 32 bits to be represented on a computer, Python will convert them in a full-fledged Python object with reference counting, etc.. This operation, called boxing, is not a problem if we need to store a small number of them, but allocating millions of such numbers gets very inefficient;
* Lists in Python are meant for sequential collections of objects. there are no operations defined for, say, efficiently taking the dot product of two vectors, or summing vectors together; also, Python lists have no way of optimizing the layout of their content in memory, as they are indexable collections of pointers to Python objects (of any kind, not just numbers); last, Python lists are one-dimensional, and while one can create lists of lists, this is again very inefficient;
* The Python interpreter is slow compared to optimized, compiled code. Performing mathematical operations on large collections of numerical data can be much faster using optimized code written in a compiled, low-level language like C.


For these reasons, data science libraries rely on NumPy, or introduce dedicated data structures like PyTorch tensors, that provide efficient low-level implementations of numerical data structures and related operations on them, wrapped in a convenient high-level API. To enable this, the objects within a tensor must be all numbers of the same type and PyTorch must keep track of this numeric type.

The `dtype` argument to tensor constructors (that is, functions like `tensor`, `zeros`, `ones`) specifies the numerical data (d) type that will be contained in the tensor. The data type specifies the possible values the tensor can hold (integers vs. floating point numbers) and the number of bytes per value.  The `dtype` argument is deliberately similar to the standard NumPy argument of the same name.

Computations happening in neural networks are typically executed in 32-bit floating point precision. Higher precision, like 64-bit, will not buy us improvements in the accuracy of a model and will require more memory and computing time. The 16-bit floating point, half precision data type is not present natively in standard CPUs, but it is offered on modern GPUs. It is possible to switch to half-precision to decrease the footprint of a neural network model if needed, with minor impact on accuracy.


PyTorch tensors can be converted to NumPy arrays and vice versa very efficiently. By doing so, we can leverage the huge swath of functionality in the wider Python ecosystem that has built up around the NumPy array type. This zero-copy interoperability with NumPy arrays is due to the storage system working with the Python buffer protocol Converting tensors to `numpy` arrays is very straightforward:

In [67]:
points = torch.ones(3, 4)
points_np = points.numpy()
points_np

array([[1., 1., 1., 1.],
       [1., 1., 1., 1.],
       [1., 1., 1., 1.]], dtype=float32)

which will return a NumPy multidimensional array of the right size, shape and numerical type. Interestingly, the returned array shares the same underlying buffer with the tensor storage. This means that the `numpy` method can be effectively executed at basically no cost, as long as the data  sits in CPU RAM. It also means that modifying the NumPy array will lead to a change in the originating tensor.
If the tensor is allocated on the GPU, PyTorch will make a copy of the content of the tensor into a NumPy array allocated on the CPU.
Vice-versa, we can obtain a PyTorch tensor from a NumPy array this way:

In [68]:
points = torch.from_numpy(points_np)
points

tensor([[1., 1., 1., 1.],
        [1., 1., 1., 1.],
        [1., 1., 1., 1.]])

While the default numeric type in PyTorch is 32 bit floating point, for the one for numpy it is 64 bit.

In [69]:
import numpy as np

a = torch.ones(5)
print(a, a.dtype)
b = a.numpy()
print(b, b.dtype)

c = np.ones(5)
print(c, c.dtype)


tensor([1., 1., 1., 1., 1.]) torch.float32
[1. 1. 1. 1. 1.] float32
[1. 1. 1. 1. 1.] float64


Note how the `numpy` array changes value if we change the `tensor`

In [70]:
print(a)
print(b)
a.add_(1)
print(a)
print(b)

tensor([1., 1., 1., 1., 1.])
[1. 1. 1. 1. 1.]
tensor([2., 2., 2., 2., 2.])
[2. 2. 2. 2. 2.]


***Serialising tensors***

Creating a tensor on the fly is all well and fine, but if the data inside is of any value to us, we will want to save it to a file and load it back at some point. After all, we don’t want to have to retrain a model from scratch every time we start running our program! PyTorch uses `pickle` under the hood to serialize the tensor object, plus dedicated serialization code for the storage. Here’s how we can save our `points` tensor to an `ourpoints.t` file: (note that this won't work as is in `colab` but would be fine if you ran the notebook locally on your machine -- we'll cover colab file issues later).

In [72]:
torch.save(points, './drive/../data/p1ch3/ourpoints.t')

RuntimeError: Parent directory ./drive/../data/p1ch3 does not exist.

One way to save and load data from and to colab is to mount a google drive (you will get some space with your account if you created it for colab).

In [73]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


As an alternative, we can pass a file descriptor in lieu of the filename

In [None]:
torch.save(points, '/content/drive/MyDrive/ourpoints.t')

In [None]:
with open('/content/drive/MyDrive/ourpoints.t','wb') as f:
   torch.save(points, f)

In [None]:
points = torch.load('/content/drive/MyDrive/ourpoints.t')
points

In [None]:
with open('/content/drive/MyDrive/ourpoints.t','rb') as f:
   points = torch.load(f)

points

While this is a way we can quickly save tensors in case we only want to load them with PyTorch, the file format itself is not interoperable. We can’t read the tensor with software other than PyTorch. Depending on the use case, this may or may not be a limitation, but we should learn how to save tensors interoperably for those times it is.

***Serialising to HDF5 with h5py***

For those cases when you need to, however, you can use the HDF5 format and library. HDF5 is a portable and widely supported format for representing serialized multidimensional arrays, organized in a nested key-value dictionary. Python supports HDF5 through the  library `h5py`, which accepts and returns data under the form of NumPy arrays.


In [None]:
import h5py

f = h5py.File('/content/drive/MyDrive/ourpoints.hdf5', 'w')
dset = f.create_dataset('coords', data=points.numpy())
f.close()

Here `coords` is a key into the HDF5 file. We can have other keys, even nested ones. One of the interesting things in HDF5 is that we can index the dataset while on disk and access only the elements we are interested in. Let us suppose we want to load just the last two points in our dataset:


In [None]:
f = h5py.File('/content/drive/MyDrive/ourpoints.hdf5', 'r')
dset = f['coords']
last_points = dset[-2:]
last_points

What happened here is that data has not been loaded when the file was opened or the dataset was required. Rather, data stayed on disk until we requested the second and last rows in the dataset. At that point `h5py`,  has accessed those two columns and returned a NumPy array-like object encapsulating that region in that dataset that behaves like a NumPy array and has the same API.
Owing to this fact, we can pass the returned object to the `torch.from_numpy` function to obtain  a tensor directly. Note that in this case the data is copied over to the tensor’s storage.

In [None]:
last_points = torch.from_numpy(dset[-2:])
f.close()
last_points

***Moving tensors to the GPU***
 Every Torch tensor can be transferred to (one of) the GPU(s) in order to perform massively parallel, fast computations. All operations that will be performed on the tensor will be carried out using GPU-specific routines that come with PyTorch. In addition to the `dtype`, a PyTorch `Tensor` also has a notion of `device`, which is where on the computer the tensor data is being placed. Here is how we can create a tensor on the GPU by specifying the corresponding argument to the constructor:
(Note: if running this in Colab you must have changed 'runtime' to 'GPU' for this to work - in the Edit/Notebook Settings menu item.)

In [None]:
import torch # Doing this again since your Notebook would have been reset if you changed settings.

points_gpu = torch.tensor([[4.0, 1.0], [5.0, 3.0], [2.0, 1.0]], device='cuda')
points_gpu

We could instead copy a tensor created on the CPU onto the GPU using the  method `to()`

In [None]:
points = torch.tensor([[4.0, 1.0], [5.0, 3.0], [2.0, 1.0]], device='cpu')
points_gpu = points.to(device='cuda')

points_gpu

Doing so returns a new tensor that has the same numerical data, but stored in the RAM of the GPU, rather than in regular system RAM. Now that the data is stored locally on the GPU, we’ll start to see the speedups mentioned earlier when performing mathematical operations on the tensor. In almost all cases, CPU- and GPU-based tensors expose the same user-facing API, making it much easier to write code that is agnostic to where, exactly, the heavy number crunching is running.

In case our machine has more than one GPU, we can also decide on which GPU we allocate the tensor by passing a zero-based integer identifying the GPU on the machine, such as


In [None]:
points_gpu = points.to(device='cuda:0')

In [None]:
points = 2 * points                         # <1>
points_gpu = 2 * points.to(device='cuda')   # <2>

1. Multiplication performed on the CPU.
2. Multiplication performed on the GPU.

Note that the  tensor is not brought back to the CPU once the result has been points_gpu computed. What happened in the line above is that

1) the  tensor has been copied to the GPU;
2) a new tensor has been allocated on the GPU points and used to store the result of the multiplication;
3) a handle to that GPU tensor is returned.

Therefore, if we also add a constant to the result

In [None]:
points_gpu = points_gpu + 4

the addition is still performed on the GPU, no information flows to the CPU (except if we print or access the resulting tensor). In order to move the tensor back to the CPU we need to provide a `cpu` argument to the `to` method, such as

In [None]:
points_cpu = points_gpu.to(device='cpu')

There are also specific methods to transfer tensors between the CPU and GPU(s).

In [None]:
points_gpu = points.cuda()
points_gpu = points.cuda(0)
points_cpu = points_gpu.cpu()

***The tensor API***

Let's get a feel for the tensor operations that PyTorch offers, to give a feel for the API. The vast majority of operations on and between tensors are available under the `torch` module https://pytorch.org/docs/stable/torch.html and can also be called as methods of a tensor object:

In [None]:
a = torch.ones(3, 2)
a_t = torch.transpose(a, 0, 1)

a.shape, a_t.shape

or, for exactly the same result, as a method of the `a` tensor:

In [None]:
a = torch.ones(3, 2)
a_t = a.transpose(0, 1)

a.shape, a_t.shape

For more details look at the online docs http://pytorch.org/docs They are exhaustive and well organized, with the tensor operations divided into groups:
* Creation ops — functions for constructing a tensor, like `ones`  and `from_numpy`
* Indexing, slicing, joining, mutating ops — functions for changing the shape, stride or content a tensor, like `transpose`
* Math ops — functions for manipulating the content of the tensor through computations
    * Pointwise ops — functions for obtaining a new tensor by applying a function to each element independently, like `abs` and  `cos`
    * Reduction ops — functions for computing aggregate values by iterating through tensors, like `mean`, `std` and `norm`
    * Comparison ops — functions for evaluating numerical predicates over tensors, like `equal` and `max`
    * Spectral ops — functions for transforming in and operating in the frequency domain, like `stft` and `hamming_window`
    * Other operations — special functions operating on vectors, like `cross`, or matrices, like `trace`
    * BLAS and LAPACK operations — functions following the BLAS (Basic Linear Algebra Subprograms) specification for scalar, vector-vector, matrix-vector and matrix-matrix operations
* Random sampling — functions for generating values by drawing randomly from probability distributions, like `randn` and `normal`
* Serialization — functions for saving and loading tensors, like `load` and `save`
* Parallelism — functions for controlling the number of threads for parallel CPU execution, like `set_num_threads`


***Summary***

* Neural networks transform floating point representations into other floating point representations, with the starting and ending representations typically being human-interpretable. The intermediate representations are less so.
* These floating point representations are stored in Tensors.
* Tensors are multidimensional arrays; they are the basic data structure in PyTorch.
* PyTorch has a comprehensive standard library for tensor creation, manipulation and mathematical operations.
* Tensors can be serialized to disk and loaded back.
* All tensor operations in PyTorch can execute on the CPU as well as on the GPU, with no change in the code.
* PyTorch uses a trailing underscore to indicate that a function operates in-place on a tensor (e.g. `Tensor.sqrt_`).