## 1.1 Getting started

* Tensor are n-dimensional arrays of numerical values.
* In one dimension case i.e when only one axis exists, a tensor is called a vector.
* With two axes, a tensor is called a `matrix`.
* With k > 2axes, we drop the specialized names and refer the objects as $k^{\text{th}}$ -order tensor.


In [2]:
import torch

In [3]:
##invoking arange() function to create vectors of
##evenly spaced values (starting from 0(included) to n(not included))
x = torch.arange(12,dtype=torch.float32)
x

tensor([ 0.,  1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9., 10., 11.])

In [4]:
##insecting the number of elements
x.numel()

12

In [5]:
##inspecting shape of the tensor
x.shape

torch.Size([12])

* We can alter the shape of our input tensor x to a desired shape.
* Let's transform it into a `(3,4)` tensor.

In [6]:
## reshaping the tensor
X = x.reshape(3,4)
X

tensor([[ 0.,  1.,  2.,  3.],
        [ 4.,  5.,  6.,  7.],
        [ 8.,  9., 10., 11.]])

In [7]:
##intializing a tensor that only contains zeros
torch.zeros((2,3,4))

tensor([[[0., 0., 0., 0.],
         [0., 0., 0., 0.],
         [0., 0., 0., 0.]],

        [[0., 0., 0., 0.],
         [0., 0., 0., 0.],
         [0., 0., 0., 0.]]])

In [8]:
##initializing a tensor that only contains ones
torch.ones((2,3,4))

tensor([[[1., 1., 1., 1.],
         [1., 1., 1., 1.],
         [1., 1., 1., 1.]],

        [[1., 1., 1., 1.],
         [1., 1., 1., 1.],
         [1., 1., 1., 1.]]])

In [9]:
##creating a tensor that contains random elements
## which are drawn from a normal distribution with
## mean 0 and standard deviation 1.
torch.randn(3,4)

tensor([[-0.5232,  0.4025,  0.5351, -0.1273],
        [ 0.9866, -0.9827,  0.2127,  0.1972],
        [ 0.2483, -1.1980, -0.9886, -2.8093]])

In [10]:
##constructing a tensor by suppying exact values
## for each element by suplying Python lists containing numerical literals.
torch.tensor([[2,1,4,3],[1,2,3,4],[4,3,2,1]])

tensor([[2, 1, 4, 3],
        [1, 2, 3, 4],
        [4, 3, 2, 1]])

## 1.2 Indexing and Slicing

* Having interacted with Python lists, we can access tensor elements by indexing (starting with 0).
* To access an element  based on its position relative to the end of the list, we can use a negative indexing.
* We can also access whole ranges of indices by slicing e.g `X[start:finish]`.
* When only one index or slice is specified for a $k^{\text{th}}$ order tensor, it applied along the axis 0. So in the code below, [-1] selects the last row and [1:3] selects the second and third rows.


In [11]:
X[-1],X[1:3]

(tensor([ 8.,  9., 10., 11.]),
 tensor([[ 4.,  5.,  6.,  7.],
         [ 8.,  9., 10., 11.]]))

In [12]:
##writing elements of a matrix by specifying indices
X[1,2] = 17
X

tensor([[ 0.,  1.,  2.,  3.],
        [ 4.,  5., 17.,  7.],
        [ 8.,  9., 10., 11.]])

* In the code above we have replaced the third element of the second row with 17.

In [13]:
###replacing multiple elements with the same value
X[:2,:] = 12
X

tensor([[12., 12., 12., 12.],
        [12., 12., 12., 12.],
        [ 8.,  9., 10., 11.]])

* In the code above we have replaced the every element in the first two rows with 12.

## 1.3 Operations

* Tensor can be manipulated mathematically.
* Among the most useful of these are elementwise operations.
* These apply a standard scalar operation to each element of a tensor.
* For functions with two inputs, elementwise operations apply some standard binary operator on each corresponding elements.

In [14]:
torch.exp(x)

tensor([162754.7969, 162754.7969, 162754.7969, 162754.7969, 162754.7969,
        162754.7969, 162754.7969, 162754.7969,   2980.9580,   8103.0840,
         22026.4648,  59874.1406])

* Ginven any two vectors `u` and `v` of the same shape, and a binary operator $f$, we can produce a vector `c = F(u,v)` by setting $c_i$ <--- $f(u_i,v_i)$ for all $i$, where $c_i,u_i$ and $v_i$ are the $i^{th}$ elements of vectors `c,u,v`.

In [15]:
##attempting vector operation
## i.e addition,subtraction,multiplication,division
x= torch.randn(2,4)
y = torch.randn(2,4)
x_sum = x+y
x_sub = x-y
x_mul = x*y
x_div = x/y
print(f"Sum\n: {x_sum}")
print(f"Subtraction:\n {x_sub}")
print(f"Multiplication:\n {x_mul}")
print(f"Division:\n {x_div}")

Sum
: tensor([[-0.4361,  2.8039, -1.7413,  1.0569],
        [ 1.2013,  0.8472,  0.1963,  0.7587]])
Subtraction:
 tensor([[-0.5367,  0.2197, -1.3062, -1.6376],
        [-2.4212,  0.5981,  0.6461, -1.7001]])
Multiplication:
 tensor([[-0.0245,  1.9533,  0.3315, -0.3912],
        [-1.1047,  0.0900, -0.0947, -0.5787]])
Division:
 tensor([[-9.6626,  1.1701,  7.0033, -0.2155],
        [-0.3367,  5.8006, -1.8731, -0.3829]])


* Other elementwise operations include dot products and matrix multiplication.
* We can also `concatenate` multiple tensors, stacking them end-to-end to form a larger one. Here we just need to provide a list of tensors and tell the system along which axis to concatenate.

In [16]:
##concatenating tensors along axis 0 (rows) and axis 1 (columns)
X = torch.arange(12,dtype=torch.float32).reshape((3,4))
Y = torch.tensor([[2.0,1,4,3],[1,2,4,5],[4,5,7,9]])
concat_0 = torch.cat((X,Y),axis=0)
concat_1 = torch.cat((X,Y),axis=1)
print(f"Concatenation along X axis:\n{concat_0}")
print(f"Concatenation along Y axis:\n {concat_1}")

Concatenation along X axis:
tensor([[ 0.,  1.,  2.,  3.],
        [ 4.,  5.,  6.,  7.],
        [ 8.,  9., 10., 11.],
        [ 2.,  1.,  4.,  3.],
        [ 1.,  2.,  4.,  5.],
        [ 4.,  5.,  7.,  9.]])
Concatenation along Y axis:
 tensor([[ 0.,  1.,  2.,  3.,  2.,  1.,  4.,  3.],
        [ 4.,  5.,  6.,  7.,  1.,  2.,  4.,  5.],
        [ 8.,  9., 10., 11.,  4.,  5.,  7.,  9.]])


In [17]:
## checking if individual elements in positions
## x[i,j] are equal to their counterparts in Y[i,j]
X == Y

tensor([[False,  True, False,  True],
        [False, False, False, False],
        [False, False, False, False]])

* Summing all elements in the tensor yields a tensor with only one element.

In [18]:
X.sum()

tensor(66.)

## 1.4 Broadcasting

* Under certain conditions, even when shapes of different tensors differ, we can still perform elementwise binary operations by invoking the `broadcasting mechanism`.
* Broadcasting works according to the following two-step procedure:
    1. expand one or both arrays by copying elements along axes with length 1 so that after this trnasformation, the two tensors have the same shape.
    2. Perform an elementwise operation on the resulting arrays.

In [19]:

a = torch.arange(3).reshape((3,1))
b = torch.arange(2).reshape((1,2))
a,b

(tensor([[0],
         [1],
         [2]]),
 tensor([[0, 1]]))

* Since a has the shape `3x1` and b has `1x2` we cannot perform elementwise operations since they don't match up.
* By using broadcasting mechanism we produce a larger `3x2` matrix by replicating matrix a along the columns and b along the rows before adding them elementwise.

In [20]:
a+b

tensor([[0, 1],
        [1, 2],
        [2, 3]])

* What we've done here is that we have added every element in column 1 of matrix `a` with the element in column 1 of `b` and the elements in column 2 of matrix `a` are added with the element in column 2 of matrix `b`.

## 1.5 Saving memory

* Running operations can cause new memory to be allocated to host results of the operation.
* For example if we write Y= X+Y, we dereference the tensor that Y used to point to and instead point Y at the newly allocated memory.
* We demonstrate this issue with Python's `id()` function which gives us the exact address of the referenced object in memory.
* After we run `Y= Y + X`, `id(Y)` points to a different memory location.
* This is becuase python first (Y+X) ,allocating new memory for the results and then points Y to this new memory location in memory.

In [21]:
before = id(Y)
Y = Y+X
after = id(Y)
print(f"Original address of Y :\n {before}")
print(f"New address of Y after operation:\n {after}")


Original address of Y :
 135065117010480
New address of Y after operation:
 135064793736576


* This might be disadvantageous for two reasons.
   1. We do not want to run around allocating memory unnecessarily all the time. Since in machine learning, we ofte have hundreds of megabytes of parameters and update all of them multiple times per second. Whenever possible, we want to perform these updates in place.
   2. We might point at the same parameters from multiple variables. If we do not update in place, we must be careful to update all these references, lest we spring a memory leak or accidentally refer to stale parameters.

* Performing in-pace operation(directly modifies the content of an existing object without creating a new object in memory) is easy.
* We can assign the results of an operation to a previously allocated array Y by using the slice notation `Y[:] = <expression>`.
* To illustrate this concept, we overwrite the values of tensor Z, after initializing it, using `zeros_like`, to have the same shape as Y.

In [24]:
Z = torch.zeros_like(Y)
print("id(Z) before operations:  ",id(Z))
Z[:] = X+Y
print("id(Z) after operations: ",id(Z))

id(Z) before operations:   135064794087040
id(Z) after operations:  135064794087040


* We can see that the memory address of Z remains unchanged after and before performing the addition operation.
* If the value of X is not resued in subsequent computations, we can also use `X[:] = X  + Y` or `X+= Y`to reduce the memory overhead of the operation.

In [25]:
before = id(X)
X += Y
after = id(X)
print(f"Original address of X :\n {before}")
print(f"New address of X after operation:\n {after}")

Original address of X :
 135065117009760
New address of X after operation:
 135065117009760


## 1.6 Conversion to Other Python Objects.

* Here we discuss how to convert a Torch tensor into a Numpy array(`ndarray`) or vice versa.
* The torch tensor and Numpy array will share their underlying memory, and changing one through an in-place operation will also change the other.

In [29]:
x = torch.rand(3,4)
##converting our torch tensor into a Numpy array
x_numpy = x.numpy()
print(f"X numpy array:\n {x_numpy} with type {x_numpy.dtype}")
print(f"X torch tensor:\n {x} with type {x.type}")


X numpy array:
 [[0.3624351  0.27228266 0.43480718 0.08316082]
 [0.51023644 0.9535912  0.88890636 0.3480937 ]
 [0.55317515 0.9988607  0.14375871 0.01171231]] with type float32
X torch tensor:
 tensor([[0.3624, 0.2723, 0.4348, 0.0832],
        [0.5102, 0.9536, 0.8889, 0.3481],
        [0.5532, 0.9989, 0.1438, 0.0117]]) with type <built-in method type of Tensor object at 0x7ad7390f1130>


In [30]:
##converting numpy array to torch tensor
import numpy as np
y_numpy = np.random.rand(3,4)
y_tensor = torch.from_numpy(y_numpy)
print(f"Y numpy array:\n {y_numpy} with type {y_numpy.dtype}")
print(f"Y torch tensor:\n {y_tensor} with type {y_tensor.type}")
#

Y numpy array:
 [[0.3451951  0.67849378 0.82285378 0.68915998]
 [0.2131686  0.44357157 0.54637831 0.9502011 ]
 [0.55519485 0.23948984 0.35470996 0.8815368 ]] with type float64
Y torch tensor:
 tensor([[0.3452, 0.6785, 0.8229, 0.6892],
        [0.2132, 0.4436, 0.5464, 0.9502],
        [0.5552, 0.2395, 0.3547, 0.8815]], dtype=torch.float64) with type <built-in method type of Tensor object at 0x7ad739241c70>


In [31]:
## converting a size-1 (one-dimensional) tensor to a Python scalar
a = torch.tensor([3.5])
a,a.item(),float(a),int(a)

(tensor([3.5000]), 3.5, 3.5, 3)

## 1.7 Practice problems

* Create two tensors X and Y instead of using the conditional statement X == Y to X < Y or X >Y and see what kind of tensor you can get

In [34]:
X = torch.rand(5,4)
Y = torch.rand(5,4)
X, Y

(tensor([[0.6070, 0.7414, 0.1732, 0.2428],
         [0.9420, 0.2906, 0.7648, 0.6884],
         [0.2344, 0.3913, 0.0099, 0.2126],
         [0.9556, 0.8379, 0.6033, 0.5656],
         [0.1678, 0.6160, 0.5200, 0.7301]]),
 tensor([[0.3945, 0.6983, 0.2189, 0.6039],
         [0.2304, 0.8795, 0.6217, 0.0367],
         [0.2784, 0.0497, 0.4455, 0.0983],
         [0.8323, 0.9304, 0.7798, 0.3126],
         [0.6691, 0.4858, 0.8877, 0.0028]]))

In [32]:
X > Y

tensor([[False,  True,  True,  True],
        [ True,  True,  True,  True],
        [ True,  True,  True,  True]])

In [35]:
X < Y

tensor([[False, False,  True,  True],
        [False,  True, False, False],
        [ True, False,  True, False],
        [False,  True,  True, False],
        [ True, False,  True, False]])

 Replace the two tensors that operate by element in the broadcasting mechanism with other shapes e.g 3-dimensional tensors

In [36]:
M = torch.randn(5,4)
N = torch.rand(1,4)
M, N

(tensor([[ 0.1268, -0.6906,  0.4353,  0.6531],
         [-0.7315,  2.0004,  0.7396, -0.0297],
         [-0.8235,  1.1948,  0.6285,  0.7980],
         [-0.5931, -0.4258, -0.2687,  1.0047],
         [ 0.0716, -0.9957,  0.6677,  1.3002]]),
 tensor([[0.8118, 0.2217, 0.5634, 0.6096]]))

In [37]:
P = M + N
P

tensor([[ 0.9386, -0.4689,  0.9987,  1.2626],
        [ 0.0803,  2.2221,  1.3030,  0.5799],
        [-0.0117,  1.4165,  1.1919,  1.4075],
        [ 0.2187, -0.2041,  0.2947,  1.6142],
        [ 0.8834, -0.7741,  1.2311,  1.9097]])