# Introduction to PyTorch tensors (and a bit of linear algebra)
This small notebook serves two purposes: an itroduction into Jupyter notebooks and inot PyTorch. By now you should have installed the provided Conda environment or have set up your own. If you have not started the notebook and are just reading this on the screen, do the following. 

1) activate the conda environment: 
<br> <center>
    **conda activate ai_lecture**
</center>

2) change into the folder where your notbook is located and call: 
<br> <center>
    **jupyter notebook**
</center>

A browser window should open and you shouls see this notebook

<br>

First, we will import pytorch and additional libraries for plotting

In [1]:
import torch
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib notebook
sns.set_theme()
sns.set_style("darkgrid")

The main object we will work with is torch.Tensor. As already mentioned in the lecture slides, even if we call them tensors, mathematically they are not tensors but multidimensional arrays. There are multiple constructors. Blow we will create some tensors

In [2]:
# create empty tensor 
x1 = torch.Tensor(10)
print(x1)
print(f'dim(x1) = {x1.shape}')

tensor([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])
dim(x1) = torch.Size([10])


In [3]:
# create empty tensor 
x2 = torch.empty(10)
print(x2)
print(f'dim(x2) = {x2.shape}')

tensor([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])
dim(x2) = torch.Size([10])


In [4]:
# create empty tensor 
x3 = torch.Tensor(3,3)
print(x3)
print(f'dim(x3) = {x3.shape}')

tensor([[0., 0., 0.],
        [0., 0., 0.],
        [0., 0., 0.]])
dim(x3) = torch.Size([3, 3])


In [5]:
# create empty tensor 
x4 = torch.Tensor((3,3))
print(x4)
print(f'dim(x4) = {x4.shape}')

tensor([3., 3.])
dim(x4) = torch.Size([2])


In [6]:
# create zero tensor 
x5 = torch.zeros(3,3)
print(x5)
print(f'dim(x5) = {x5.shape}')

tensor([[0., 0., 0.],
        [0., 0., 0.],
        [0., 0., 0.]])
dim(x5) = torch.Size([3, 3])


In [7]:
# create tensor filled with ones
x6 = torch.ones(3,3)
print(x6)
print(f'dim(x6) = {x6.shape}')

tensor([[1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.]])
dim(x6) = torch.Size([3, 3])


In [8]:
# create integer tensor filled with ones 
x7 = torch.ones(3,3, dtype=int)
print(x7)
print(f'dim(x7) = {x7.shape}')

tensor([[1, 1, 1],
        [1, 1, 1],
        [1, 1, 1]])
dim(x7) = torch.Size([3, 3])


In [9]:
# create integer tensor 
x8 = torch.ones(2,3).int()
print(x8)
print(f'dim(x8) = {x8.shape}')

tensor([[1, 1, 1],
        [1, 1, 1]], dtype=torch.int32)
dim(x8) = torch.Size([2, 3])


In [10]:
# create integer tensor 
x9 = torch.ones(2,3,4).long()
print(x9)
print(f'dim(x9) = {x9.shape}')

tensor([[[1, 1, 1, 1],
         [1, 1, 1, 1],
         [1, 1, 1, 1]],

        [[1, 1, 1, 1],
         [1, 1, 1, 1],
         [1, 1, 1, 1]]])
dim(x9) = torch.Size([2, 3, 4])


In [11]:
# create a R^2x2x2x2 tensor filled with 2 
x10 = torch.ones(2,2,2,2)*2
print(x10)
print(f'dim(x10) = {x10.shape}')

tensor([[[[2., 2.],
          [2., 2.]],

         [[2., 2.],
          [2., 2.]]],


        [[[2., 2.],
          [2., 2.]],

         [[2., 2.],
          [2., 2.]]]])
dim(x10) = torch.Size([2, 2, 2, 2])


In [12]:
# init tensor via lists 
x11 = torch.tensor([[1,2,3],[4,5,6]])
print(x11)
print(f'dim(x11) = {x11.shape}')

tensor([[1, 2, 3],
        [4, 5, 6]])
dim(x11) = torch.Size([2, 3])


In [13]:
# init tensor via range
x12 = torch.tensor(range(5))
print(x12)
print(f'dim(x12) = {x12.shape}')

tensor([0, 1, 2, 3, 4])
dim(x12) = torch.Size([5])


In [14]:
# init tensor via arange
x13 = torch.arange(5)
print(x13)
print(f'dim(x13) = {x13.shape}')

tensor([0, 1, 2, 3, 4])
dim(x13) = torch.Size([5])


In [15]:
# init uniform random tensor
x14 = torch.rand(3,3)
print(x14)
print(f'dim(x14) = {x14.shape}')

tensor([[0.6012, 0.6950, 0.5248],
        [0.0682, 0.9152, 0.6645],
        [0.3765, 0.7342, 0.0662]])
dim(x14) = torch.Size([3, 3])


In [16]:
# init normal distributed random tensor
x15 = torch.randn(3,3)
print(x15)
print(f'dim(x15) = {x15.shape}')

tensor([[-2.5130, -0.4173,  0.8839],
        [ 1.6019, -0.0243,  0.0776],
        [-0.5087,  2.3142,  0.7596]])
dim(x15) = torch.Size([3, 3])


In [17]:
# change view of tensor
x16 = torch.arange(10)
print(x16)
print(f'dim(x16) = {x16.shape}')
x16_ = x16.view(2,5)
print(x16_)
print(f'dim(x16) = {x16_.shape}')

tensor([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
dim(x16) = torch.Size([10])
tensor([[0, 1, 2, 3, 4],
        [5, 6, 7, 8, 9]])
dim(x16) = torch.Size([2, 5])


In [18]:
# change view of tensor 
x17 = torch.arange(10)
print(x17)
print(f'dim(x17) = {x17.shape}')
x17_ = x17.view(2,-1)
print(x17_)
print(f'dim(x17) = {x17_.shape}')

tensor([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
dim(x17) = torch.Size([10])
tensor([[0, 1, 2, 3, 4],
        [5, 6, 7, 8, 9]])
dim(x17) = torch.Size([2, 5])


In [19]:
# reshape tensor
x18 = torch.arange(10)
print(x18)
print(f'dim(x18) = {x18.shape}')
x18_ = x18.reshape(2,-1)
print(x18_)
print(f'dim(x18) = {x18_.shape}')

tensor([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
dim(x18) = torch.Size([10])
tensor([[0, 1, 2, 3, 4],
        [5, 6, 7, 8, 9]])
dim(x18) = torch.Size([2, 5])


In [20]:
# above we have seen that "=" asigns a reference
# can we also create a copy?
x19 = torch.rand(2,2)
print(x19)
x19_ = x19.clone()
x19_[0,0] = -1.
print(x19)
print(x19_)

tensor([[0.7193, 0.7052],
        [0.2441, 0.1376]])
tensor([[0.7193, 0.7052],
        [0.2441, 0.1376]])
tensor([[-1.0000,  0.7052],
        [ 0.2441,  0.1376]])


In [21]:
# indexing: we have already seen that we can index via []
# e.g.
x20 = torch.rand(2,2)
print(x20)
print("\nindex element 0,0:")
print(x20[0,0])
print("\nget first column:")
print(x20[:,0])
print("\nget first row:")
print(x20[0,:])

tensor([[0.3473, 0.6251],
        [0.0852, 0.0093]])

index element 0,0:
tensor(0.3473)

get first column:
tensor([0.3473, 0.0852])

get first row:
tensor([0.3473, 0.6251])


In [22]:
# indexing: we call also assign values
print("\nset element 0,0:")
x20[0,0] = -5
print(x20)
print("\nset first column:")
x20[:,0] = 5
print(x20)
print("\nset first row:")
x20[0,:] = -3
print(x20)


set element 0,0:
tensor([[-5.0000,  0.6251],
        [ 0.0852,  0.0093]])

set first column:
tensor([[5.0000, 0.6251],
        [5.0000, 0.0093]])

set first row:
tensor([[-3.0000, -3.0000],
        [ 5.0000,  0.0093]])


In [23]:
# compare elements
x21 = torch.rand(5,5)
print('create random matrix:')
print(x21)
x21_geq = x21 > 0.5
print('\nelements >= 0.5:')
print(x21_geq)
print("\ncan be used for indexing:")
print(x21[x21_geq])
print("\nor for assinging values:")
x21[x21_geq] = 0.0
print(x21)
print("\nwe can performe the same directly:")
x21[x21 == 0.0] = 1.0
print(x21)

create random matrix:
tensor([[0.0407, 0.1988, 0.0360, 0.0462, 0.2466],
        [0.8939, 0.3612, 0.8495, 0.0500, 0.3038],
        [0.0247, 0.9837, 0.6193, 0.7061, 0.5910],
        [0.6764, 0.4181, 0.5315, 0.0891, 0.7415],
        [0.8460, 0.4814, 0.8796, 0.9385, 0.0199]])

elements >= 0.5:
tensor([[False, False, False, False, False],
        [ True, False,  True, False, False],
        [False,  True,  True,  True,  True],
        [ True, False,  True, False,  True],
        [ True, False,  True,  True, False]])

can be used for indexing:
tensor([0.8939, 0.8495, 0.9837, 0.6193, 0.7061, 0.5910, 0.6764, 0.5315, 0.7415,
        0.8460, 0.8796, 0.9385])

or for assinging values:
tensor([[0.0407, 0.1988, 0.0360, 0.0462, 0.2466],
        [0.0000, 0.3612, 0.0000, 0.0500, 0.3038],
        [0.0247, 0.0000, 0.0000, 0.0000, 0.0000],
        [0.0000, 0.4181, 0.0000, 0.0891, 0.0000],
        [0.0000, 0.4814, 0.0000, 0.0000, 0.0199]])

we can performe the same directly:
tensor([[0.0407, 0.1988, 0.036

Vector product between two vectors $\boldsymbol{v}$, $\boldsymbol{w} \in \mathbb{R}^d$:
1) element wise

<br><center>
    $\boldsymbol{u} = 
    \boldsymbol{v} \odot \boldsymbol{w} =  
    \begin{bmatrix}
         v_1 \cdot w_1  \\
         \vdots \\
         v_d \cdot w_d
    \end{bmatrix}$  
 
 2) "scalar product" or "inner product": various notations
    <br><center>
    $ \sum_{i=1}^d v_i \cdot w_i   = ~ 
    < \boldsymbol{v}, \boldsymbol{w} > = ~
    \boldsymbol{v}^T \boldsymbol{w} = ~
    \begin{bmatrix}
         v_1 & \ldots & v_d
    \end{bmatrix} \cdot
    \begin{bmatrix}
         w_1 \\ \vdots  \\ w_d
    \end{bmatrix}$
</center>
 
    
3) the "outer product" is defined as
<br><center>
    $\boldsymbol{v} \boldsymbol{w}^T = 
    \begin{bmatrix}
         v_1 \\ \vdots \\ v_d
    \end{bmatrix} \cdot
    \begin{bmatrix}
         w_1 & \ldots  & w_d
    \end{bmatrix}$ =
    \begin{bmatrix}
         v_1 w_1 & \ldots  & v_1 w_d \\
          \vdots & \ddots & \vdots \\
         v_d w_1 & \ldots & v_d w_d
    \end{bmatrix}$
</center>
    
    
 The length of a vector in euclidian space is measured by the euclidian norm:
 <br><center>
     $||\boldsymbol{v}|| = \sqrt{ \boldsymbol{v}^T \boldsymbol{v}} = \sqrt{ \sum_{i=1}^d v_i^2}$
 </center>

In [24]:
# vector vector multiplication
# elementwise
v = torch.ones(5)*2
w = torch.rand(5)
print(f'v: {v}')
print(f'w: {w}')
u = v*w
print(f'u: {u}')


# sum up all elements
s = u.sum()
print(f's: {s}')


# inner product: multiple options
ip1 = v@w
print(f'ip12: {ip1}')
ip2 = v.dot(w)
print(f'ip12: {ip2}')
ip3 = v.dot(w)
print(f'ip12: {ip3}')


# outer product 1
op1 = v.outer(w)
print(f'op1: {op1}')


# outer product 2
op2 = v.view(-1,1) @ w.view(1,-1)
print(f'op2: {op2}')


# calculate the norm (length) of vector v2
n1 = v.norm()
print(f'n1: {n1}')


# calculate the norm via the inner product
n2 = (v*v).sum().sqrt()
print(f'n2: {n2}')
n3 = v.dot(v).sqrt()
print(f'n3: {n3}')

v: tensor([2., 2., 2., 2., 2.])
w: tensor([0.0927, 0.5078, 0.8095, 0.2511, 0.1804])
u: tensor([0.1854, 1.0157, 1.6189, 0.5023, 0.3608])
s: 3.6830899715423584
ip12: 3.6830899715423584
ip12: 3.6830899715423584
ip12: 3.6830899715423584
op1: tensor([[0.1854, 1.0157, 1.6189, 0.5023, 0.3608],
        [0.1854, 1.0157, 1.6189, 0.5023, 0.3608],
        [0.1854, 1.0157, 1.6189, 0.5023, 0.3608],
        [0.1854, 1.0157, 1.6189, 0.5023, 0.3608],
        [0.1854, 1.0157, 1.6189, 0.5023, 0.3608]])
op2: tensor([[0.1854, 1.0157, 1.6189, 0.5023, 0.3608],
        [0.1854, 1.0157, 1.6189, 0.5023, 0.3608],
        [0.1854, 1.0157, 1.6189, 0.5023, 0.3608],
        [0.1854, 1.0157, 1.6189, 0.5023, 0.3608],
        [0.1854, 1.0157, 1.6189, 0.5023, 0.3608]])
n1: 4.4721360206604
n2: 4.4721360206604
n3: 4.4721360206604


In [25]:
w.outer(v)

tensor([[0.1854, 0.1854, 0.1854, 0.1854, 0.1854],
        [1.0157, 1.0157, 1.0157, 1.0157, 1.0157],
        [1.6189, 1.6189, 1.6189, 1.6189, 1.6189],
        [0.5023, 0.5023, 0.5023, 0.5023, 0.5023],
        [0.3608, 0.3608, 0.3608, 0.3608, 0.3608]])

In [26]:
# creating 5x5 identity matrix 
I = torch.eye(5)
print(I)

tensor([[1., 0., 0., 0., 0.],
        [0., 1., 0., 0., 0.],
        [0., 0., 1., 0., 0.],
        [0., 0., 0., 1., 0.],
        [0., 0., 0., 0., 1.]])


In [27]:
# createing N=200 normal distributed datapointss in R^2
X = torch.randn(2,200)
# plot the data points
plt.close()
sns.scatterplot(x=X[0,:], y=X[1,:])
plt.show()

<IPython.core.display.Javascript object>

In [28]:
# creat a new dataset by adding a vector [20,20]
b = torch.tensor([20,20])
X2 = X + b.view(2,1) # why do we need to change the view?
plt.close()
sns.scatterplot(x=X[0,:], y=X[1,:])
sns.scatterplot(x=X2[0,:], y=X2[1,:])
plt.show()

  plt.close()


<IPython.core.display.Javascript object>

As you can see, adding a vector translates our data points. In detail, the operation is
<br><center>
    $X_2 = X + \boldsymbol{b} = \begin{bmatrix}
                            x_{1,1} & x_{1,2} \\
                            x_{2,1} & x_{2,2}
                           \end{bmatrix} + 
                           \begin{bmatrix}
                            b_1 \\
                            b_2
                           \end{bmatrix} =
                            \begin{bmatrix}
                            x_{1,1} + b_1 & x_{1,2} + b_1 \\
                            x_{2,1} + b_2 & x_{2,2} + b_2
                           \end{bmatrix}$
</center>

Below, we will not only add a vector (i.e. translate the data points) but multiply it with a matrix beforehand

In [29]:
# creating a 2x2 diagonal matrix and adding a vector [10,10]
W = torch.zeros(2,2)
W[0,0] = -0.5
W[1,1] = -6.
print(f'W = {W}')
# now creates a new dataset according to D3 = AD + b2
X3 = W@X
plt.close()
sns.scatterplot(x=X[0,:], y=X[1,:])
sns.scatterplot(x=X2[0,:], y=X2[1,:])
sns.scatterplot(x=X3[0,:], y=X3[1,:])
plt.show()

W = tensor([[-0.5000,  0.0000],
        [ 0.0000, -6.0000]])


  plt.close()


<IPython.core.display.Javascript object>

The matrix we multiply our data with is $W =  \begin{bmatrix}
                            0.5 & 0 \\
                            0 & -4
                           \end{bmatrix} $
                           
Writing it out:

<br><center>
     $ X W = \begin{bmatrix}
                            x_{1,1} & x_{1,2} \\
                            x_{2,1} & x_{2,2}
                           \end{bmatrix} 
                            \begin{bmatrix}
                            0.5 & 0 \\
                            0 & -4
                           \end{bmatrix}  =
                            \begin{bmatrix}
                            0.5 ~ x_{1,1} & -4 ~ x_{1,2} \\
                            0.5 ~ x_{2,1} & -4 ~ x_{2,2}
                           \end{bmatrix}$
</center>

<br>

We see that multiplying our data with a diagonal matrix scales it along the $x$ and $y$ axis. In the following cell we will add values $\neq 0$ to the off-diagonal

In [30]:
# now creat a random 2x2 matrix that also has off diagonal entries
# what happens if you switch the sign of the off-diagonal entries?
W2 = torch.zeros(2,2)
W2[0,0] = 0.5
W2[1,1] = -4.
W2[0,1] = 2
W2[1,0] = -2
print(f'W2 = {W2}')
# now creates a new dataset according to D4 = A4 @ D + b4
X4 = W2@X
plt.close()
sns.scatterplot(x=X[0,:], y=X[1,:])
sns.scatterplot(x=X2[0,:], y=X2[1,:])
sns.scatterplot(x=X3[0,:], y=X3[1,:])
sns.scatterplot(x=X4[0,:], y=X4[1,:])
plt.show()

W2 = tensor([[ 0.5000,  2.0000],
        [-2.0000, -4.0000]])


  plt.close()


<IPython.core.display.Javascript object>

As we can see, multiplying the data with a matrix can not only scale the data, but also transforms it in other ways, e.g. rotate it.
Do we loss information when multiplying the data with a matrix? Let's see if we can find another matrix that reverts the operation from befor

In [31]:
W2inv = torch.inverse(W2)
print(f'A4inv = {W2inv}')
Xinv = W2inv@X4
plt.close()
sns.scatterplot(x=X[0,:], y=X[1,:])
sns.scatterplot(x=X2[0,:], y=X2[1,:])
sns.scatterplot(x=X3[0,:], y=X3[1,:])
sns.scatterplot(x=Xinv[0,:], y=Xinv[1,:])
plt.show()

A4inv = tensor([[-2.0000, -1.0000],
        [ 1.0000,  0.2500]])


  plt.close()


<IPython.core.display.Javascript object>

We see that we exactly recovered the old data cloude. What we have calucated above is the so called inverse matrix ${W}^{-1}$ of $W$. For any matrix $B$ to be an inverse matrix of another matrix $A$ it must be true that:
<br><center>
       $B A = A B = A^{-1} A = A A^{-1} = I$
</center>
where $I$ is the identity matrix we already know from above. We already see, that this can only be true if $A$ is a square matrix. But do all square matrices have a inverse counterpart? No! We call matrices that are not square or square without an inverse "singular matrices". We wont't go deeper into this topic for the moment. 

<br>

Matrix multiplications are so called "linear mappings". Linear mappings are a general concept in mathematics with the following characteristics: A mapping $f$ is linear, if
<br><center>
       $f(x + y) = f(x) + f(y)$ and $f(\lambda x) = \lambda f(x)$
<center>
    
For our matrix multiplication this means that if we mulitply our data $X$ with the sum of tow matrices $A + B$ or with a scalar porduct $\lambda A$ we have
<br><center>
       $ (A + B)X = AX + BX$ and $\lambda AX =  A ( \lambda X)$
<center>

    
All transformations that can be expressed as a matrix multiplication plus one translation are so called "affine transformations". The most basic neural networks (fully connected) are multiple such affine transformations stacked on ech other seperated by pointwise nonlinear functions. 

 Let us apply a pointwise nonlinear function on the output of a affine transfomration of our input data. What does "pointwise" mean? Suppose we have a function $\sigma: \mathbb{R} \rightarrow \mathbb{R}$ and we apply it on a vector(!) $\boldsymbol{v} \in \mathbb{R}^d$. What we really do is the following:
<br><center>
    $\sigma(\boldsymbol{v}) = \begin{bmatrix}
         \sigma(v_1) \\ \vdots  \\ \sigma(v_d)
    \end{bmatrix}$
</center>
i.e. apply the function on every entry. Torch makes this very convenient. If we pass a multivariat array to a scalar valued function, torch applies this function on on every entry of the array.

In [32]:
b3.shape

NameError: name 'b3' is not defined

In [None]:
# lets use X and apply a affine trasnfomrmation and then a pointwise non linearity!
W3 = torch.rand(2,2)
b3 = torch.rand(2)
WXb = W3@X + b3.view(-1,1)
sX = torch.sigmoid(WXb)
plt.close()
sns.scatterplot(x=X[0,:], y=X[1,:])
sns.scatterplot(x=WXb[0,:], y=WXb[1,:])
sns.scatterplot(x=sX[0,:], y=sX[1,:])
plt.show()

In [33]:
# lets use X and apply a pointwise non linearity!
sX = torch.tanh(WXb)
plt.close()
sns.scatterplot(x=X[0,:], y=X[1,:])
sns.scatterplot(x=WXb[0,:], y=WXb[1,:])
sns.scatterplot(x=sX[0,:], y=sX[1,:])
plt.show()

NameError: name 'WXb' is not defined

In [34]:
# lets use X and apply a pointwise non linearity!
# can you guess how the function lookes like?
sX = torch.relu(WXb)
plt.close()
sns.scatterplot(x=X[0,:], y=X[1,:])
sns.scatterplot(x=WXb[0,:], y=WXb[1,:])
sns.scatterplot(x=sX[0,:], y=sX[1,:])
plt.show()

NameError: name 'WXb' is not defined

Tensors also have two important proerties: .data and .grad

In [35]:
X5 = torch.rand(2,2)
print(f'X5 = {X5}')
print(f'X5.data = {X5.data}')
print(f'X5.grad = {X5.grad}')

X5 = tensor([[0.2207, 0.2692],
        [0.7696, 0.6550]])
X5.data = tensor([[0.2207, 0.2692],
        [0.7696, 0.6550]])
X5.grad = None


We will see later what they are used for!