In [1]:
import torch
import numpy as np

In [2]:
a = [1,2,3]
b = np.array([4,5,6],dtype=np.int32)

t_a = torch.tensor(a)
t_b = torch.from_numpy(b)

print(t_a)
print(t_b)

tensor([1, 2, 3])
tensor([4, 5, 6], dtype=torch.int32)


Results in tensors t_a and t_b with their properties (shape and data type) adopted from their source. 

In [3]:
t_ones = torch.ones(2,3)
t_ones.shape

torch.Size([2, 3])

In [4]:
print(t_ones)

tensor([[1., 1., 1.],
        [1., 1., 1.]])


Finally, creating a tensor from random values can be done as follows:

In [5]:
rand_tensor = torch.rand(2,3)
print(rand_tensor)

tensor([[0.3623, 0.3692, 0.8871],
        [0.9506, 0.0168, 0.1237]])


Manipulating the data type and shape of a tensor:
- Manipulating tensors via several functions that cast, reshape, transpose, squeeze (remove dimensions).
- The torch.to() function can be used to change the data type of a tensor to a desired type:

In [6]:
t_a

tensor([1, 2, 3])

In [7]:
t_a_new = t_a.to(torch.int64)
print(t_a_new.dtype)

torch.int64


Manipulating the shape of a tensor:
- PyTorch provides useful functions (or operations) to achieve this
- such as torch.transpose(), torch.reshape(), and torch.squeeze()
- ex:

In [8]:
t = torch.rand(3,5)
t_tr = torch.transpose(t,0,1)
print(t.shape,' --> ',t_tr.shape)

torch.Size([3, 5])  -->  torch.Size([5, 3])


Reshaping a tensor e.g. (1D to 2D)

In [9]:
t = torch.zeros(30)
t_reshape = t.reshape(5,6)
print(t_reshape.shape)

torch.Size([5, 6])


Removing the unncecessary dimensions (dimensions that have size 1, which are not needed):

In [10]:
t = torch.zeros(1,2,1,4,1)
t_sqz = torch.squeeze(t, 2)
print(t.shape,' --> ',t_sqz.shape)

torch.Size([1, 2, 1, 4, 1])  -->  torch.Size([1, 2, 4, 1])


**Applying mathematical operations to tensors:**
- Applying math ops in particular linear algebra is necessary for building most ml models. We now cover some widely used linear algebra operations such as element-wise product, matrix mul, computing the norm of a tensor.
- First, lets instantiate two random tensors, one with uniform distribution in the range (-1,1) and the other with a standard normal distribution

In [11]:
torch.manual_seed(1)

<torch._C.Generator at 0x17ad9b7ea50>

In [12]:
t1 = 2*torch.rand(5,2)-1
t2 = torch.normal(mean=0, std=1, size=(5,2))

Now, to compute the element-wise product of t1 and t2, we use the following:

In [13]:
t3 = torch.multiply(t1,t2)
print(t3)

tensor([[ 0.4426, -0.3114],
        [ 0.0660, -0.5970],
        [ 1.1249,  0.0150],
        [ 0.1569,  0.7107],
        [-0.0451, -0.0352]])


to compute the mean, sum, and standard deviation along a certain axis (or axes), we use torch.mean(), torch.sum(), and torch.std(). For example:

In [14]:
t1

tensor([[ 0.5153, -0.4414],
        [-0.1939,  0.4694],
        [-0.9414,  0.5997],
        [-0.2057,  0.5087],
        [ 0.1390, -0.1224]])

In [15]:
t4 = torch.mean(t1, axis=0)
print(t4)

tensor([-0.1373,  0.2028])


matrix-matrix product between t1 and t2 -> computed using torch.matmul:

In [17]:
t1.shape, t2.shape

(torch.Size([5, 2]), torch.Size([5, 2]))

In [18]:
t5 = torch.matmul(t1, torch.transpose(t2,0,1))
print(t5)

tensor([[ 0.1312,  0.3860, -0.6267, -1.0096, -0.2943],
        [ 0.1647, -0.5310,  0.2434,  0.8035,  0.1980],
        [-0.3855, -0.4422,  1.1399,  1.5558,  0.4781],
        [ 0.1822, -0.5771,  0.2585,  0.8676,  0.2132],
        [ 0.0330,  0.1084, -0.1692, -0.2771, -0.0804]])


In [19]:
t6 = torch.matmul(torch.transpose(t1,0,1),t2)
print(t6)

tensor([[ 1.7453,  0.3392],
        [-1.6038, -0.2180]])


finally, the torch.linalg.norm fxn is for computing the Lp norm of a tensor. for example, we can calculate the L2 norm of t1 as follows:

In [20]:
t1

tensor([[ 0.5153, -0.4414],
        [-0.1939,  0.4694],
        [-0.9414,  0.5997],
        [-0.2057,  0.5087],
        [ 0.1390, -0.1224]])

In [None]:
norm_t1 = torch.linalg.norm(t1, ord=2, dim=1)

**Math operations to a tensor:**
- We cover some widely used linear algebra operations, such as element-wise product, matrix multiplication, and computing the norm of a tensor:
- First, lets instantiate two random tensors, one with uniform distribution with the range (-1,1) and the other with a standard normal distribution

In [21]:
# setting the random seed:
torch.manual_seed(1)

<torch._C.Generator at 0x17ad9b7ea50>

In [22]:
t1 = 2*torch.rand(5,2)-1
t2 = torch.normal(mean=0,std=1,size=(5,2))

Now, to compute the element-wise product of t1 and t2, we can use the following:

In [23]:
t3 = torch.multiply(t1,t2)
print(t3)

tensor([[ 0.4426, -0.3114],
        [ 0.0660, -0.5970],
        [ 1.1249,  0.0150],
        [ 0.1569,  0.7107],
        [-0.0451, -0.0352]])


to compute the mean, sum, and standard deviation along a certain axis (or axes), we can use torch.mean(), torch.sum(), and torch.std(). For example, the mean of each column in t1 can be computed as follows:

In [24]:
t4 = torch.mean(t1, axis=0)
print(t4)

tensor([-0.1373,  0.2028])


The matrix-matrix product is computed using the matmul function:

In [25]:
t6 = torch.matmul(torch.transpose(t1,0,1),t2)
print(t6)

tensor([[ 1.7453,  0.3392],
        [-1.6038, -0.2180]])


Finally, the torch.linalg.norm function is useful for computing the Lp norm of a tensor. For example, we can calculate the L2 norm as follows:

In [26]:
t1

tensor([[ 0.5153, -0.4414],
        [-0.1939,  0.4694],
        [-0.9414,  0.5997],
        [-0.2057,  0.5087],
        [ 0.1390, -0.1224]])

In [27]:
norm_t1 = torch.linalg.norm(t1, ord=2, dim=1)

In [28]:
norm_t1

tensor([0.6785, 0.5078, 1.1162, 0.5488, 0.1853])

In [29]:
((0.5153**2)+(-0.4414)**2)**0.5

0.6785042741206573

In [30]:
np.sqrt(np.sum(np.square(t1.numpy()),axis=1))

array([0.67846215, 0.5078282 , 1.1162277 , 0.5487652 , 0.18525197],
      dtype=float32)

Split, Stack, and Concatenate Tensors:
- We now cover pytorch operations for splitting a tensor into multiple tensors, or the recerse: stacking and concatenating multiple tensors into a single one
- Assume we have a single tensor, and we want to split it into two or more tensors. Pytorch provides the convenient fxn: torch.chunk(), which divides an input tensor into a list of equally sized tensors. We can determine the desired number of splits as an integer using the chunks argument to split a tensor along the desired dimension specified by the dim argument. In this case, the total size of the input tensor along the specified dimension must be divisible by the desired number of splits. Alternatively, we can provide the desired sizes in a list using the torch.split() function:

Providing the number of splits:

In [31]:
torch.manual_seed(1)

<torch._C.Generator at 0x17ad9b7ea50>

In [32]:
t = torch.rand(6)
print(t)

tensor([0.7576, 0.2793, 0.4031, 0.7347, 0.0293, 0.7999])


In [33]:
t_splits = torch.chunk(t, 3)

In [34]:
t_splits

(tensor([0.7576, 0.2793]), tensor([0.4031, 0.7347]), tensor([0.0293, 0.7999]))

In [35]:
[item.numpy() for item in t_splits]

[array([0.7576316 , 0.27931088], dtype=float32),
 array([0.40306926, 0.73468447], dtype=float32),
 array([0.02928156, 0.7998586 ], dtype=float32)]

A tensor of size 6, was divided into a list of 3 tensors, each with size 2. If the tensor isn't divisible by the chunks value, then the last chunk will be smaller.

Providing the sizes of different splits:
- Alternatively, instead of defining the number of splits, we can also specify the sizes of output tensors directly. Here, we're splitting a tensor of size 5 into tensors of size 3 and 2:

In [36]:
torch.manual_seed(1)

<torch._C.Generator at 0x17ad9b7ea50>

In [37]:
t = torch.rand(5)
print(t)

tensor([0.7576, 0.2793, 0.4031, 0.7347, 0.0293])


In [38]:
t_splits = torch.split(t, split_size_or_sections=[3,2])

In [39]:
t_splits

(tensor([0.7576, 0.2793, 0.4031]), tensor([0.7347, 0.0293]))

In [40]:
[item.numpy() for item in t_splits]

[array([0.7576316 , 0.27931088, 0.40306926], dtype=float32),
 array([0.73468447, 0.02928156], dtype=float32)]

Sometimes:
- we're working with multiple tensors and need to concatenate or stack them to create a single tensor. In this case, pytorch's functions such as stack() and cat() come in handy. For example, let's create a 1D tensor, A, containing 1s with size 3, a 1D tensor B, containing 0s with size 2, and concatenate them into a 1D tensor C, of size 5:

In [46]:
A = torch.ones(3)
B = torch.zeros(2)
C = torch.cat([A,B], axis=0)
print(C)

tensor([1., 1., 1., 0., 0.])


If we create 1D tensors A and B, both with size 3, then we can stack them together to form a 2D tensor S:

In [48]:
A = torch.ones(3)
B = torch.zeros(3)
S = torch.stack([A,B], axis=1)
print(S)

tensor([[1., 0.],
        [1., 0.],
        [1., 0.]])


**Building Input Pipelines in PyTorch:**

- torch.nn is a module for building NN models. In cases where the training dataset is rather small and can be loaded as a tensor into memory, we can directly use this tensor for training. In typical usecases, however, when the dataset is too large to fit into computer memory, we will need to load the data from the main storage device in chunks, that is, batch by batch. In addition, we may need to construct a data-processing pipeline to apply certain transformations and preprocessing steps to our data, such as mean centering, scaling, or adding noise to augment the training procedure and to prevent overfitting. 
- Applying preprocessing functions manually everytime can be quite cumbersome. Luckily, pytorch provides a special class for constructing efficient and convenient preprocessing pipelines. 

**Creating a PyTorch DataLoader from existing tensors:**
- If the data already exists in the form of a tensor object, a Python list, or a numpy array, we can easily create a dataset loader using the torch.utils.data.DataLoader() class. It returns an object of the dataloader class, which we can use to iterate through the individual elements of the input dataset. As a simple example, consider the following code, which creates a dataset from a list of values from 0 to 5:

In [49]:
from torch.utils.data import DataLoader

In [50]:
t = torch.arange(6, dtype=torch.float32)

In [51]:
data_loader = DataLoader(t)

we can easily iterate through a dataset entry by entry as follows:

In [52]:
for item in data_loader:
    print(item)

tensor([0.])
tensor([1.])
tensor([2.])
tensor([3.])
tensor([4.])
tensor([5.])


If we want to create batches from this dataset, with a desired batch size of 3, we can do this with the batch-size argument as follows:

In [53]:
data_loader = DataLoader(t, batch_size=3, drop_last=False)

In [54]:
for i,batch in enumerate(data_loader,1):
    print(f'batch {i}:',batch)

batch 1: tensor([0., 1., 2.])
batch 2: tensor([3., 4., 5.])


We can always iterate through a dataset directly, but as you saw, DataLoader provides an automatic and customizable batching to a dataset:

**Combining two tensors into a joint dataset:**
- Often, we may have the data in two (or possibly more) tensors. For example, we could have a tensor for features and a tensor for labels. In such cases we need to build a dataset that combines these tensors, which will allow us to retreive the elements of these tensors in tuples. 
- Assume we have two tensors, t_x, and t_y. Tensor t_x holds our feature values, each of size 3, and t_y stores the class labels. 

In [None]:
torch.manual_seed(1)

t_x = torch.rand([4,3])

In [None]:
# function to unpivot columns in a panda dataframe
def unpivot(t):