<a href="https://colab.research.google.com/github/Gladiator07/Deep-Learning-with-PyTorch/blob/main/Tensors.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [2]:
import torch
a = torch.ones(3)
a

tensor([1., 1., 1.])

In [3]:
a[1]

tensor(1.)

In [4]:
float(a[1])

1.0

In [5]:
a[2] = 2.0
a

tensor([1., 1., 2.])

In [6]:
points = torch.zeros(6)
points[0] = 4.0
points[1] = 1.0
points[2] = 5.0
points[3] = 3.0
points[4] = 2.0
points[5] = 1.0

In [7]:
points

tensor([4., 1., 5., 3., 2., 1.])

In [8]:
points = torch.tensor([4.0, 1, 5.0, 3.0, 2.0, 1.0])
points

tensor([4., 1., 5., 3., 2., 1.])

In [9]:
float(points[0]), float(points[1])

(4.0, 1.0)

In [10]:
points = torch.zeros(3,2)
points

tensor([[0., 0.],
        [0., 0.],
        [0., 0.]])

In [11]:
# 2-D tensor
points = torch.tensor([[4.0, 1.0], [5.0, 3.0], [2.0, 1.0]])
points

tensor([[4., 1.],
        [5., 3.],
        [2., 1.]])

In [12]:
points.shape

torch.Size([3, 2])

In [13]:
points[0,1]

tensor(1.)

In [14]:
points[1,1]

tensor(3.)

In [15]:
points[0]

tensor([4., 1.])

In [16]:
# all rows after the first; implicitly all columns
points[1:]

tensor([[5., 3.],
        [2., 1.]])

In [17]:
points[1:, :]

tensor([[5., 3.],
        [2., 1.]])

In [18]:
# all rows after the first; first column
points[1:, 0]

tensor([5., 2.])

In [19]:
# adds a dimension of size 1, just like unsqueeze
a = points[None]

In [20]:
a.shape

torch.Size([1, 3, 2])

In [21]:
img_t = torch.randn(3, 5, 5)  # shape [channels, rows, columns]
weights = torch.tensor([0.2126, 0.7152, 0.0722])

In [22]:
batch_t = torch.randn(2, 3, 5, 5) # shape [batch, channels, rows, columns]

In [23]:
img_gray_naive = img_t.mean(-3)
batch_gray_naive = batch_t.mean(-3)
img_gray_naive.shape, batch_gray_naive.shape

(torch.Size([5, 5]), torch.Size([2, 5, 5]))

In [24]:
print(weights)
print(weights.shape)
unsqueezed_weights = weights.unsqueeze(1).unsqueeze_(-1)
print(unsqueezed_weights)
print(unsqueezed_weights.shape)

tensor([0.2126, 0.7152, 0.0722])
torch.Size([3])
tensor([[[0.2126]],

        [[0.7152]],

        [[0.0722]]])
torch.Size([3, 1, 1])


In [25]:
img_weights = (img_t * unsqueezed_weights)
batch_weights = (batch_t * unsqueezed_weights)
# print(img_weights)
# print(batch_weights)
print(img_weights.shape)
print(batch_weights.shape)

torch.Size([3, 5, 5])
torch.Size([2, 3, 5, 5])


In [26]:
img_gray_weighted = img_weights.sum(-3)
batch_gray_weighted = batch_weights.sum(-3)
batch_weights.shape, batch_t.shape, unsqueezed_weights.shape

(torch.Size([2, 3, 5, 5]), torch.Size([2, 3, 5, 5]), torch.Size([3, 1, 1]))

## Named Tensor

In [27]:
weights_named = torch.tensor([0.2126, 0.7152, 0.0722], names=['channels'])
weights_named

  """Entry point for launching an IPython kernel.


tensor([0.2126, 0.7152, 0.0722], names=('channels',))

In [28]:
img_named = img_t.refine_names(..., 'channels', 'rows', 'columns') 
batch_named = batch_t.refine_names(..., 'channels', 'rows', 'columns') 
print("img named:", img_named.shape, img_named.names) 
print("batch named:", batch_named.shape, batch_named.names)

img named: torch.Size([3, 5, 5]) ('channels', 'rows', 'columns')
batch named: torch.Size([2, 3, 5, 5]) (None, 'channels', 'rows', 'columns')


In [29]:
weights_aligned = weights_named.align_as(img_named)
weights_aligned.shape, weights_aligned.names

(torch.Size([3, 1, 1]), ('channels', 'rows', 'columns'))

In [30]:
gray_named = (img_named * weights_aligned).sum('channels')
gray_named.shape, gray_named.names

(torch.Size([5, 5]), ('rows', 'columns'))

In [31]:
gray_plain = gray_named.rename(None)
gray_plain.shape, gray_plain.names

(torch.Size([5, 5]), (None, None))

## Data-Types

In [32]:
double_points = torch.ones(10, 2, dtype=torch.double)
short_points = torch.tensor([[1,2], [3, 4]], dtype=torch.short)

In [33]:
short_points.dtype

torch.int16

In [34]:
# casting 
double_points = torch.zeros(10, 2).double()
short_points = torch.ones(10, 2).short()

In [35]:
# more convenient method
double_points = torch.zeros(10, 2).to(torch.double)
short_points = torch.ones(10, 2).to(dtype=torch.short)

In [36]:
# mixing input types in operations
points_64 = torch.rand(5, dtype=torch.double)
points_short = points_64.to(torch.short)
points_64 * points_short

tensor([0., 0., 0., 0., 0.], dtype=torch.float64)

## The Tensor API

In [37]:
a = torch.ones(3,2)
a_t = torch.transpose(a, 0, 1)
a.shape, a_t.shape

(torch.Size([3, 2]), torch.Size([2, 3]))

In [38]:
# method of a tensor
a = torch.ones(3, 2)
a_t = a.transpose(0, 1)
a.shape, a_t.shape

(torch.Size([3, 2]), torch.Size([2, 3]))

## Indexing into Storage

In [39]:
points = torch.tensor([[4.0, 1.0], [5.0, 3.0], [2.0, 1.0]])
points.storage()

 4.0
 1.0
 5.0
 3.0
 2.0
 1.0
[torch.FloatStorage of size 6]

Even though the tensor reports itself as having three rows and two columns, the storage under the hood is a contiguous array of size 6. In this sense, the tensor just knows how to translate a pair of indices into a location in the storage. 

In [40]:
# indexing into storage
points_storage = points.storage()
points_storage[0]

4.0

In [41]:
points.storage()[1]

1.0

In [42]:
# changing the value of a tensor
points = torch.tensor([[4.0, 1.0], [5.0, 3.0], [2.0, 1.0]])
points_storage = points.storage()
points_storage[0] = 2.0
points

tensor([[2., 1.],
        [5., 3.],
        [2., 1.]])

## Modifying stored values: In-place operations

A small number of operations exist only as methods of the Tensor object. They are recognizable from a trailing underscore in their name, like zero_, which indicates that the method operates in place by modifying the input instead of creating a new output tensor and returning it. For instance, the zero_ method zeros out all the elements of the input. Any method without the trailing underscore leaves the source tensor unchanged and instead returns a new tensor: 

In [43]:
a = torch.ones(3,2)
a.zero_()
a

tensor([[0., 0.],
        [0., 0.],
        [0., 0.]])

## Tensor metadata: Size, offset, and stride

In order to index into a storage, tensors rely on a few pieces of information that, together with their storage, unequivocally define them: size, offset, and stride.

The size (or shape, in NumPy parlance) is a tuple indicating how many elements across each dimension the tensor represents. The storage offset is the index in the storage corresponding to the first element in the tensor. The stride is the number of elements in the storage that need to be skipped over to obtain the next element along each dimension.

### Views of another tensor's storage

In [44]:
points = torch.tensor([[4.0, 1.0], [5.0, 3.0], [2.0, 1.0]])
second_point = points[1]
second_point.storage_offset()

2

In [45]:
second_point.size()

torch.Size([2])

In [46]:
points.stride()

(2, 1)

In [47]:
second_point = points[1]
second_point.size()

torch.Size([2])

In [48]:
second_point.storage_offset()

2

In [49]:
second_point.stride()

(1,)

In [50]:
points.stride()

(2, 1)

The stride is a tuple indicating the number of elements in the storage that have to be skipped when the index is increased by 1 in each dimension. For instance, our points tensor has a stride of (2, 1)

Accessing an element i, j in a 2D tensor results in accessing the storage_offset + stride[0] * i + stride[1] * j element in the storage. The offset will usually be zero; if this tensor is a view of a storage created to hold a larger tensor, the offset might be a positive value. This indirection between Tensor and Storage makes some operations inexpensive, like transposing a tensor or extracting a subtensor, because they do not lead to memory reallocations. Instead, they consist of allocating a new Tensor object with a different value for size, storage offset, or stride.

In [51]:
second_point = points[1]
second_point.size()

torch.Size([2])

In [52]:
second_point.storage_offset()

2

In [53]:
second_point.stride()

(1,)

### Transposing without copying

In [54]:
points = torch.tensor([[4.0, 1.0], [5.0, 3.0], [2.0, 1.0]])
points

tensor([[4., 1.],
        [5., 3.],
        [2., 1.]])

In [55]:
points_t = points.t()
points_t

tensor([[4., 5., 2.],
        [1., 3., 1.]])

In [56]:
# verify that the two tensors share the same storage
id(points.storage()) == id(points_t.storage())

True

In [57]:
# only differ in shape and stride
points.stride()

(2, 1)

In [58]:
points_t.stride()

(1, 2)

### Transposing in higher dimensions

In [59]:
some_t = torch.ones(3, 4, 5)
transpose_t = some_t.transpose(0, 2)
some_t.shape

torch.Size([3, 4, 5])

In [60]:
transpose_t.shape

torch.Size([5, 4, 3])

In [61]:
some_t.stride()

(20, 5, 1)

In [62]:
transpose_t.stride()

(1, 5, 20)

### Contiguous tensors

A tensor whose values are laid out in the storage starting from the rightmost dimension onward (that is, moving along rows for a 2D tensor) is defined as contiguous.

In [63]:
points.is_contiguous()

True

Obtaining a new contigous tensor from a non-contiguous one

In [64]:
points = torch.tensor([[4.0, 1.0], [5.0, 3.0], [2.0, 1.0]])
points_t = points.t()
points_t

tensor([[4., 5., 2.],
        [1., 3., 1.]])

In [65]:
points_t.storage()

 4.0
 1.0
 5.0
 3.0
 2.0
 1.0
[torch.FloatStorage of size 6]

In [66]:
points_t.stride()

(1, 2)

In [67]:
points_t_cont = points_t.contiguous()
points_t_cont

tensor([[4., 5., 2.],
        [1., 3., 1.]])

In [68]:
points_t_cont.stride()

(3, 1)

## Moving tensors to GPU

In [69]:
points_gpu = torch.tensor([[4.0, 1.0], [5.0, 3.0], [2.0, 1.0]], device='cuda')

In [70]:
# copying a tensor created on the CPU onto the GPU
points_gpu = points.to(device='cuda')

If machine has more than one GPU, we can also decide on which GPU we allocate the tensor by passing a zero-based integer identifying the GPU on the machine, such as

In [72]:
points_gpu = points.to(device='cuda:0')

## NumPy interoperability

In [73]:
# converting tensors to numpy array
points = torch.ones(3, 4)
points_np = points.numpy()
points_np

array([[1., 1., 1., 1.],
       [1., 1., 1., 1.],
       [1., 1., 1., 1.]], dtype=float32)

In [74]:
# PyTorch tensor from a NumPy array
points = torch.from_numpy(points_np)

## Serializing tensors

Creating a tensor on the fly is all well and good, but if the data inside is valuable, we will want to save it to a file and load it back at some point. After all, we don’t want to have to retrain a model from scratch every time we start running our program! PyTorch uses pickle under the hood to serialize the tensor object, plus dedicated serialization code for the storage. Here’s how we can save our points tensor to an ourpoints.t file:

In [76]:
path = '/content/drive/MyDrive/PyTorch/Deep Learning with PyTorch Book/Chapter-3/ourpoints.t'
torch.save(points, path)

In [77]:
# alternative
with open(path, 'wb') as f:
    torch.save(points, f) 

In [78]:
# loading back 
points = torch.load(path)

In [79]:
points

tensor([[1., 1., 1., 1.],
        [1., 1., 1., 1.],
        [1., 1., 1., 1.]])

### Serializing to HDF5 with h5py

HDF5 is a portable, widely supported format for representing serialized multidimensional arrays, organized in a nested keyvalue dictionary. Python supports HDF5 through the h5py library (www.h5py.org), which accepts and returns data in the form of NumPy arrays.

In [80]:
import h5py
path = '/content/drive/MyDrive/PyTorch/Deep Learning with PyTorch Book/Chapter-3/ourpoints.hdf5'
f = h5py.File(path, 'w')
dset = f.create_dataset('coords', data=points.numpy())
f.close()

Here 'coords' is a key into the HDF5 file. We can have other keys—even nested ones. One of the interesting things in HDF5 is that we can index the dataset while on disk and access only the elements we’re interested in. Let’s suppose we want to load just the last two points in our dataset:



In [81]:
f = h5py.File(path, 'r')
dset = f['coords']
last_points = dset[-2:]

In [82]:
last_points

array([[1., 1., 1., 1.],
       [1., 1., 1., 1.]], dtype=float32)

The data is not loaded when the file is opened or the dataset is required. Rather, the data stays on disk until we request the second and last rows in the dataset. At that point, h5py accesses those two columns and returns a NumPy array-like object encapsulating that region in that dataset that behaves like a NumPy array and has the same API.