- torch.Tensor is the fundamental data structure in pytorch.
- it's a data structure used to store and manipulate data.
- like a numpy array contains data of same type. 
- can hold scalars, vectors, matrices and n dimensional arrays
- derived from torch.Tensor class. 
- tensor operations are faster than numpy for gpu acceleration
- tensors can be stored and manipulated at scale using distributed processing on multiple cpus and gpus
- tensors keep track of their graph computations (autograd) which is a key part of implementing a deep learning library

- By default, the tensor data type will be derived from the input data type and the tensor will be allocated to the cpu device. 
- we can add two tensors directly with + sign as tensors support operator overloading. 

In [1]:
import torch

x = torch.tensor([[1, 2, 3], [4, 5, 6]])
y = torch.tensor([[7, 8, 9], [10, 11, 12]])
z = x + y
print(z)
print(z.size())

tensor([[ 8, 10, 12],
        [14, 16, 18]])
torch.Size([2, 3])


torch.Tensor() is deprecated version of torch.tensor(). torch.Tensor() is alias for torch.FloatTensor()

- storage location can be accessed with z.device attribute
- torch.cuda.is_available() will return True if device has GPU support. 
- ouput for torch.cuda.is_available() will tell how many GPUs are availbale and which one to use. 

In [2]:
device = 'cuda' if torch.cuda.is_available() else 'cpu'

x = torch.tensor([[1, 2, 3], [4, 5, 6]], device = device)
y = torch.tensor([[7, 8, 9], [10, 11, 12]], device = device)
z = x + y
print(z)
print(z.size())
print(z.device)

tensor([[ 8, 10, 12],
        [14, 16, 18]], device='cuda:0')
torch.Size([2, 3])
cuda:0


- it's common to transfer tensor from one device to another 
- can be done with torch.to() method. 
- further operations on moved tensor will store results in target device
- operations between tensors in different devices will result in errors. 

In [3]:
device = 'cuda' if torch.cuda.is_available() else 'cpu'
x = x.to(device)
y = y.to(device)
z = x+y
z = z.to('cpu')
print(z.device)

cpu


the following are equivalent. they can be used instead of device objects
- device = 'cuda'
- device = torch.device('cuda')
- device = 'cuda:0'
- device = torch.device('cuda:0')

- tensors can be created from preexisting numeric data(python data structures, numpy arrays etc.,), create random samplings. 

In [4]:
import numpy

# python list
w = torch.tensor([1, 2, 3])
# python tuple
w = torch.tensor((1, 2, 3))
# numpy array
w = torch.tensor(numpy.array([1, 2, 3]))
# uninitialized
w = torch.empty(100, 200)
# all elements are 0s and shape is provided
w = torch.zeros(100, 200)
# all elements are 1s and shape is provided
w = torch.ones(100, 200)

In [5]:
# 100x200 tensor with elements from uniform distribution on the interval [0, 1)
w = torch.rand(100, 200)
# 100x200 tensor with elements from normal distribution with mean 0 and variance 1
w = torch.randn(100, 200)
# 100x200 elements are random integers between 5 and 10
w = torch.randint(5, 10, (100, 200))
# specifying device and datatype
w = torch.empty((100, 200), dtype = torch.float64, device = 'cuda')
# intialized to have same size, data type and device as another tensor
w = torch.empty_like(w)

- ones_like and zeros_like() can also be used to create tensors similar to other tensors. 
- linspace() can be used to create tensor with linearly spaced points between two points. 
- logspace() can be used to create tensor with logarithmically spaced points between two points. 
- eye() can be used to create a tensor with ones on diagonals and zeros everywhere else ie., identity matrix
- full() create tensor of specified shape filled with specified value
- load() and save() to load from and save to pickle files
- torch.numpy() and torch.tolist() to convert tensors to numpy arrays and python lists. 
- torch.bernoully() draws binnary random numbers from a bernoulli distribution

- x.dtype to get data type 
- x.device to get device
- x.shape to get shape
- x.ndim to get rank aka number of dimensions
- x.requires_grad indicates whether the tensor keeps track of graph computations 
- x.grad contains actual gradients if requires_grad is true
- x.grad_fn stores graph computation function used
- x.is_cuda, x.is_sparse, x.is_quantized, x.is_leaf, x.is_mkldnn various indicators 
- x.layout indicates how tensor is laid out in memory

In [6]:
w.layout

torch.strided

In [8]:
w = torch.tensor([1, 2, 3], dtype = torch.float32)
print(w.dtype)
# converting to int
w = w.int()
print(w.dtype)
# to() can be used to convert to other datatypes
w = w.to(torch.float64)
print(w.dtype)

torch.float32
torch.int32
torch.float64


- pytorch automatically converts datatypes to appropriate datatypes
- to reduce space complexity and to reuse memory and perform inplace operations, append _ to function name. 

    for example: y.add_(x) adds x to y and stores it again in y. 

- torch.empty_like() will create empty tensor with dtype, device and layout properties of target tensor. 
- zeros_like(), full_like(), rand_like(), randn_like(), rand_int_like() also work similarly. 

- indexing and slicing will return tensor object even if array is only single element. need to use item() to convert single element tensor to a python value when passing to other functions for example print()

In [9]:
x = torch.tensor([[1, 2], [3, 4], [5, 6], [7, 8]])
print(x)
print(x[1, 1])
print(x[1, 1].item())

tensor([[1, 2],
        [3, 4],
        [5, 6],
        [7, 8]])
tensor(4)
4


In [10]:
# indexing, slicing all can be done similar to numpy arrays. 
# transpose 
print(x.t())

tensor([[1, 3, 5, 7],
        [2, 4, 6, 8]])


In [11]:
# view is preferred over reshape
print(x.view((2, 4)))

tensor([[1, 2, 3, 4],
        [5, 6, 7, 8]])


In [12]:
# tensors can be combined with torch.stack()
y = torch.stack((x, x))
print(y)

tensor([[[1, 2],
         [3, 4],
         [5, 6],
         [7, 8]],

        [[1, 2],
         [3, 4],
         [5, 6],
         [7, 8]]])


In [13]:
# tensors can be split with torch.unbind()
a, b = x.unbind(dim = 1)
print(a, b)

tensor([1, 3, 5, 7]) tensor([2, 4, 6, 8])


- torch.cat() to concatenate sequences along specified dimension
- torch.chunk() to split tensor into specific number of chunks. it only returns a view
- torch.narrow() returns narrow version of input tensor
- torch.reshape() returns reshaped. use torch.view() to ensure tensor is not copied. 
- torch.squeeze() returns tensor with all dimensions of input tensor of size 1 removed. used to remove ununsed dimensions. for example convert images from 4d to 3d
- torch.unsqueeze() is used to add a dimension of size 1. most pytorch models expect batch of data as input, unsqueeze() helps  when we have only one data sample passing 3d image to torch to create a batch of one image
- torch.transpose() only transposes specific dimensions. best for multidimensional tensors
- torch.where() returns tensor of selected elements depending on specified condition 

- basic math funcs: add(), div(), mul(), neg(), reciprocal(), true_divide()
- truncation funcs: ceil(), clamp(), floor(), floor_divide(), fmod(), frac(), lerp(), remainder(), round(), sigmoid(), trunc()
- complex num funcs: abs(), angle(), coj(), imga(), real()
- trigonometry funcs: acos(), asin(), atan(), cos(), cosh(), deg2rad(), rad2deg(), sin(), sinh(), tan(), tanh()
- bitwise op funcs: bitwise_not(), bitwise_and(), bitwise_or(), bitwise_xor()
- error funcs: erf(), erfc(), erfinv()
- cumulative math funcs: addcdiv(), addcmul()

reduction operations reduce dimensionality or rank of tensor. 
- torch.argmax() returns indices of maximum values across all elements or a dimensions
- torch.dist() returns the p-norm of two tensors 
- torch.sum() returns sum of all elements or a dimension
- torch.unique() removes duplicates across the tensor or a dimension
- torch.unique_consecutive() removes consecutive duplicates
- lot of these functions require dim parameter. it's similar to axis is numpy
- if dim is not specified, operation is performed across all dimensions. ex: dim = 1 will copute operation across each row and so on. 

- it is common to chain methods together like torch.rand(2, 2).max().item() returns max value

comparision functions return tensor of booleans  after comparision
- compare tensors with one other: eq(), ge(), gt(), le(), lt() or ==, >, >=, <, <=, != 
- test tensor status or conditions: isclose(), isfinite(), isinf(), isnan()
- return a single boolean for whole tensor - allclose(), equal()
- find values over tensor along a dimension - argosrt(), kthvalue(), max(), min(), sort(), topk()
- torch.eq() return a boolean tensor and torch.equal() returns a single boolean value
- torch.allclose() returns a single value if all elements are close to a specified value

many computations such as gradient descent use linear algebra to implement calculations. 
- pytorch linalg operations are based on Basic Linear Algebra Subprograms (BLAS) and Linear Algebra Package (LAPACK) standardized libraries. 
- torch.matmul() matrix product of two tensors. supports broadcasting
- torch.chain_matmul() computes a matrix product of n tensors
- torch.mm() matrix product of two tensors without broadcasting
- torch.addmm() matrix product of two tensors and adds it to input
- torch.bmm() batch of matrix products
- torch.addbmm() computes batch of matrix products and adds it to the input
- torch.baddbmm() computes batch and matrix products and adds it to the input batch
- torch.mv() computes product of matrix and tensor
- torch.addmv() computes product of matrix and vector and adds it to the input
- torch.matrix_power() returns tensor raised to power of n (for square tensors)
- torch.eig() finds eigen values and eigen vectors of real square tensor
- torch.inverse() computes inverse of square tensor
- torch.det() determinant of matrix/batch of matrices
- torch.solve() returns solution to a system of linear equations 
- torch.svd() performs single value decomposition
- torch.pca_lowrank() performs linear princinple component analysis
- torch.cholesky() performs cholesky decomposition

- fast, inverse and short time fourier transforms: fft(), ifft(), stft()
- histogram and bin counts: histc(), bincount()
- windowing algorithms: bartlett_window(), blackman_window(), hamming_window(), hann_window()
- matrix reduction and restructuring functions: flatten(), flip(), rot90(), repeat_interleave(), meshgrid(), roll(), combinations()

backward() function uses pytorch's automatic differentitation package, torch.autograd, to differentiate and compute gradientes of tensors based on chain rule. this is what makes pytorch powerful for deep learning. 

defining a function f = sum(x^2) and finding df/dx for each variable in the matrix. in order to do this we set requires_grad = True for tensor x

In [14]:
x = torch.tensor([[1, 2, 3], [4, 5, 6]], dtype = torch.float, requires_grad = True)
print(x)

tensor([[1., 2., 3.],
        [4., 5., 6.]], requires_grad=True)


In [15]:
f = x.pow(2).sum()
print(f)

tensor(91., grad_fn=<SumBackward0>)


In [16]:
f.backward()
print(x.grad)

tensor([[ 2.,  4.,  6.],
        [ 8., 10., 12.]])


f.backward() performs differentiation wrt f and stores it in x.grad attribute. 

diff of x^2 = 2x and x.grad contains all elements multiplied with 2

training neural networks requires us to compute weight gradients on backward pass. as our nns get deeper and more complex, feature automates the complex computations. 