# The Mathematics Behind Data Science

This workshop will cover material ranging from what a vector is all the way to L<sup>p</sup> norms, loss functions, and gradient descent! We want to emphasize that a strong math background is not required for this workshop, as we'll be presenting the material in a beginner oriented, hands-on way. That means that we will introduce material both in terms of what you may code up in any given project, and the abstract math objects which represent them. In the simplest case, a vector can be described as a 1D array, but that's not enough to justify many of the techniques employed in DL. In order to extend that, we will dive into the math that powers the code. Specifically, this workshop aims to cover:

- Tensors
    - Vectors
    - Matrices
    - Tensors (multi-dimensional arrays)

- Functional Operations
    - Sum/Product reductions
    - Tensor tiling
- Norms
    - Euclidean distance
    - Metrics
    - L<sup>1</sup>, L<sup>2</sup>, L<sup>p</sup> norms
- Loss Functions
    - Mean Squared Error
    - Cross Entropy
- Dimensionality Reduction
    - Principle Component Analysis

## Tensors and Data

### Vectors

Vectors for the basis for the vast majority of data-science but are especially significant in deep learning! To begin with, vectors are the most fundamental way by which we represent data. Ultimately, a vector can be considered a "list" of numbers, with each number being assigned an **index**. These indices give us a way to explicitly refer to the individual elements composing the vector. Extending past that, we'll also talk about the operations that can occur using vectors, and more complex ways to manipulate them. Python doesn't come with any data type that's particularly nice for what we intend to do, so instead we look to Numpy's arrays. Numpy provides us an incredibly convenient way to represent such an abstract, and we'll explore it in great depth throughout this workshop.

In [1]:
import numpy as np
np.set_printoptions(suppress=True) # Convenience setting. Can safely ignore


a=[1,2,3,4] # Standard Python list
b=np.array([5,6,7]) # Numpy array made based on the given python list
c=np.array(a) # Another way to create a numpy array based on an existing list

print(a) 
print(b) # Note the numpy array prints without commas ","
print(c) # This cleans up the print statement and is helpful for larger constructions

print()
print(a[0])
print(b[1]) # Access elements just as you regularly would
print(c[2])


[1, 2, 3, 4]
[5 6 7]
[1 2 3 4]

1
6
3


Mathematically, a vector is referred to as $N$ *dimensional* if and only if it has exactly $N$ *elements*. A vector is also a *tensor of order 1*, or alternatively, a *first order tensor* (we'll use the first name scheme). We're going to draw a sharp distinction between dimensionality and order. In a lot of data science literature, these two are generally interchangably referred to as *dimension*, but that leads to way too many headaches for us to justify using it. We've defined dimensionality for our purposes, so what is order? Well, roughly speaking the order of a vector/matrix/tensor is **the number of indices required to specify which element you're talking about**. So again, a vector is a first order tensor, meaning we need only one index to exactly specify which element we want to access. This is shown in our last example! 

For notation, we'll write out a vector, such as "a" in the code above like this: $\vec a = \left<1,2,3,4 \right>$. Other acceptable notations include: $a=[1,2,3,4]^\intercal$, or $a=\begin{bmatrix}1\\2\\3\\4\end{bmatrix}$. We use the first one because it's honestly just easier, and doesn't mess up vertical spacing unlike *some* methods.

Now, suppose we have $\vec{v}$, an $N$ dimensional vector, and suppose every element will be a real number, then we say that $\vec{v}\in\mathbb{R}^N$ read as *v is an element of R-N* (not *R to the power of N*). If your elements are instead integers, the statement would be $\vec{v}\in\mathbb{Z}^N$. In general, if your elements belong to a set $S$, then you write $\vec{v}\in S^N$

So if we had a vector $\vec{u}$ which had 6 elements which are real numbers, we'd write $\vec{u}\in\mathbb{R}^6$. If it had 6 integers instead, $\vec{u}\in\mathbb{Z}^6$. Now suppose 4 of those elements were real numbers, and 2 were integers, then $\vec{u}\in\mathbb{R}^4\times\mathbb{Z}^2$. Note that $\mathbb{R}^6=\mathbb{R}^3\times\mathbb{R}^3$. This language gives us much more flexibility in discussing more complex ideas.

**Note:** for convenience, we will stop writing the arrow above the vector, since in context it is generally clear whether or not a variable refers to a vector.

### Matrices

Many of you may be familiar with matrices, but most of you will have only seen them in passing. A matrix can be defined in many ways, but the most programmer-friendly description is as a **collection of elements**, much like how we defined our vectors! With matrices, however, we need two indices in order to specify which element we're talking about.  

In [6]:
x=np.array([[1,2,3]
          ,[4,5,6]
          ,[7,8,9]]) # We'll explain this creation in a little bit

print(x)
print(x[0,0]) # First row, first column, the top-left element
print(x[0,1]) # First row, second column
print(x[1,0]) # Second row, first column
print(x[2,2]) # Third row, third column, the bottom-right element

print(x[0])

[[1 2 3]
 [4 5 6]
 [7 8 9]]
1
2
4
9
[1 2 3]


As we can see in the above example, we require two indices (e.g 0,1) in order to specify a particular element, hence this is a *tensor of order 2*. So what exactly happened when we only specified one index? Well, it returned the first row of the matrix. This leads to another interpretation of a matrix: **a list of vectors**. If you have vectors $v_1,v_2,v_3$, then you can construct a matrix $A=[v_1,v_2,v_3]^\intercal$ where our three vectors serve as *rows* in the matrix. In the above code, $v_1=\left<1,2,3\right>$, $v_2=\left<4,5,6\right>$, $v_3=\left<7,8,9\right>$. This is why when we run `print(x[0])` we get $v_1$. 

In math literature we generally construct matrices using vectors as their *columns*, and this can cause a bit of confusion. In practice it doesn't matter all too much though, since you can always use a handy function to change your rows to columns and vice versa known as the *transpose*, usually denoted as $A^\intercal$. In fact, when we defined $A$ originally, we used the transpose so that we could set each $v_i$ as a row instead of column. Let's see it in action below.

In [16]:
print(x) # Regular
print()
print(x[0]) # First row of x

print()
y=x.transpose()
print(y) # Switch rows with columns
print()
print(y[0]) # First row of y, but first column of x

[[1 2 3]
 [4 5 6]
 [7 8 9]]

[1 2 3]

[[1 4 7]
 [2 5 8]
 [3 6 9]]

[1 4 7]


Undeniably the most common operation regarding matrices is *matrix multiplication*. This can be a little unintuitive at first, but we hope to help explain the weird behaviour. First off, matrices need to have the right kind of shape before being multiplied together. We say a matrix $A$ is $m\times n$ when it has $m$ rows and $n$ columns. If we want to multiply matrices $A$ and $B$ to get $C=AB$, then $B$ needs to have the same number of rows as $A$ has columns. So if $A$ is $m\times n$ then $B$ must be $n\times l$. Then $C$ would end up being an $m\times l$ matrix. An easy, albeit mathematically lax, way to figure out shape compatibilities for matrices is that the shape of $C=AB$ is $(m\times n)\times(n\times l)\implies m\times (n\times n) \times l \implies m\times l$. You can imagine it as the inner variable $n$ *canceling out*. Which also helps to remind you that $B$ needs to have the same number of rows as $A$ has columns, else the inner dimensions wouldn't be the same and hence wouldn't cancel.

In [17]:
A = np.array([[1,2],[3,4],[1,4]]) # A is 3x2
B = np.array([[5,6],[7,8]]) # B is 2x2

print('This is the first matrix:')
print(A)
print()

print('This is the second matrix:')
print(B)
print()

print('This is their product:')

#multiplies the 2 matrices together
print(np.dot(A,B)) # AB is (3x2)x(2x2)=>3x(2x2)x2=>3x2

This is the first matrix:
[[1 2]
 [3 4]
 [1 4]]

This is the second matrix:
[[5 6]
 [7 8]]

This is their product:
[[19 22]
 [43 50]
 [33 38]]


So what exactly is going on here? Let's start with a simple example using $A=[\vec a_1,\vec a_2]$ where $a_i\in\mathbb{R}^3$ (note since we're not using the transpose, each $\vec a_i$ is a *column* of $A$), a $3x2$ matrix and $\vec b= \left<b_1,b_2 \right>$, a $2x1$ matrix, which is alternatively just a *vector*, then $C=Ab$ is going to be $3\times 1$, so it'll also be a vector! We can define $Ab=\sum_{i=1}^2b_i\vec a_i=b_1\vec a_1 + b_2\vec a_2$.

**Note**: We switched back to arrow notation here since we'll be juggling between vectors and scalars, and so it's not entirely obvious in context

In [5]:
#initializes a 3x2 matrix A
A = np.array([[1,1],[3,0],[2,1]])

#initializes a vector b of length 2
b = np.array([-1,2])

print('This is a 3x2 matrix A:')
print(A)
print()

print('This a length 2 vector b:')
print(b)
print()

#multiplies Ab
print('This is their product Ab:')
print(np.dot(A,b))
print()

#demonstrates the internal steps in calculating Ab
b1a1 = b[0] * np.transpose(A)[0]
print('This is the first value of b times the first column of A:')
print(b1a1)
print()

b2a2 = b[1] * np.transpose(A)[1]
print('This is the second value of b times the second column of A:')
print(b2a2)
print()

print('This is their sum:')
print(b1a1 + b2a2)
print()

This is a 3x2 matrix A:
[[1 1]
 [3 0]
 [2 1]]

This a length 2 vector b:
[-1  2]

This is their product Ab:
[ 1 -3  0]

This is the first value of b times the first column of A:
[-1 -3 -2]

This is the second value of b times the second column of A:
[2 0 2]

This is their sum:
[ 1 -3  0]



Let's look at a few more examples

In [44]:
#initializes a 5x4 matrix A
A = np.array([[1,1,1,1],[1,0,1,0],[1,2,3,4],[0,0,0,0],[-1,1,-1,0]])
print('This is a 5x4 matrix A:')
print(A)
print()
print("Shape of A: ",A.shape) # Prints the shape of A


#initializes a vector b of length 4
b = np.array([1,0,-1,2])
print('This is a length 4 vector b:')
print(b)
print()

#multiplies Ab
print('This is the product Ab:')
print(np.dot(A,b))
print()

#initializes a 6x6 identity matrix
I = np.eye(6)
print('This is a 6x6 identity matrix I:')
print(I)
print()

#initializes a vector v of length 6
v = np.linspace(-1,1,6)
print('This is a length 6 vector v:')
print(v)
print()

#multiplies Iv. Note the result of multiplication by the "identity matrix"
print('This is the product Iv:')
print(np.dot(I,v))
print()

#The next print statement would fail because it multiplies a 5x4 matrix A against a length 6 vector v
#print(np.dot(A,v))

This is a 5x4 matrix A:
[[ 1  1  1  1]
 [ 1  0  1  0]
 [ 1  2  3  4]
 [ 0  0  0  0]
 [-1  1 -1  0]]

Shape of A:  (5, 4)
This is a length 4 vector b:
[ 1  0 -1  2]

This is the product Ab:
[2 0 6 0 0]

This is a 6x6 identity matrix I:
[[1. 0. 0. 0. 0. 0.]
 [0. 1. 0. 0. 0. 0.]
 [0. 0. 1. 0. 0. 0.]
 [0. 0. 0. 1. 0. 0.]
 [0. 0. 0. 0. 1. 0.]
 [0. 0. 0. 0. 0. 1.]]

This is a length 6 vector v:
[-1.  -0.6 -0.2  0.2  0.6  1. ]

This is the product Iv:
[-1.  -0.6 -0.2  0.2  0.6  1. ]



So far we've only multiplied a matrix against a vector, but what about against another matrix? Well, once again, thinking about a matrix as a collection of vectors will help us out. Instead of getting the whole matrix in one shot, we'll do it column by column. Take the same $A$ as defined before, but now consider a matrix $B=[\vec b_1, \vec b_2, \vec b_3]$ with $\vec b_i\in\mathbb{R}^2$, so $B$ is a $2\times3$, and since $A$ is a $3\times2$, our resulting matrix $C=AB$ will be a $3\times3$. We'll first calculate $\vec c_1=\sum_{i=1}^3b_{1,i}\vec a_i=b_{1,1}\vec a_1 + b_{1,2}\vec a_2 +b_{1,3}\vec a_3 $ where $b_{j,i}$ refers to the $i^\text{th}$ element of vector $b_j$. Hopefully this looks at least a *little* familiar. This is the exact same process as we had for our matrix times vector case! So $\vec c_i=A\vec b_i$. There are many other ways to imagine matrix multiplication, each with their own merits, but this definition is what we recommend for most people.

In [6]:
#initializes a 2x3 matrix A
A = np.array([[1,0,1],[-1,1,0]])
print('This is a 2x3 matrix A:')
print(A)
print()

#initializes a 3x2 matrix B
B = np.array([[3,2],[4,1],[0,3]])
print('This is a 3x2 matrix B:')
print(B)
print()

#multiplies AB
print('This is the product AB:')
print(np.dot(A,B))
print()

#multiplies A*b1
Ab1 = np.dot(A,B[:,0])
print('This is the product A*b1:')
print(Ab1)
print()

#multiplies A*b2
Ab2 = np.dot(A,B[:,1])
print('This is the product A*b2:')
print(Ab2)
print()

#constructs [A*b1, A*b2]
Ab12 = np.stack([Ab1,Ab2], 1)
print('This is the matrix with columns A*b1 and A*b2:')
print(Ab12)
print()

#multiplies BA
print('This is the product BA:')
print(np.dot(B,A))

This is a 2x3 matrix A:
[[ 1  0  1]
 [-1  1  0]]

This is a 3x2 matrix B:
[[3 2]
 [4 1]
 [0 3]]

This is the product AB:
[[ 3  5]
 [ 1 -1]]

This is the product A*b1:
[3 1]

This is the product A*b2:
[ 5 -1]

This is the matrix with columns A*b1 and A*b2:
[[ 3  5]
 [ 1 -1]]

This is the product BA:
[[ 1  2  3]
 [ 3  1  4]
 [-3  3  0]]


In [20]:
#initializes a 5x4 matrix A
A = np.array([[1,1,1,1],[1,0,1,0],[1,2,3,4],[0,0,0,0],[-1,1,-1,0]])
print('This is a 5x4 matrix A:')
print(A)
print()

#initializes a vector b of length 4
b = np.array([1,0,-1,2])
print('This is a length 4 vector b:')
print(b)
print()

#multiplies Ab
print('This is the product Ab:')
print(np.dot(A,b))
print()

#initializes a 6x6 identity matrix
I = np.eye(6)
print('This is a 6x6 identity matrix I:')
print(I)
print()

#initializes a vector v of length 6
v = np.linspace(-1,1,6)
print('This is a length 6 vector v:')
print(v)
print()

#multiplies Iv. Note the result of multiplication by the "identity matrix"
print('This is the product Iv:')
print(np.dot(I,v))
print()

#The next print statement would fail because it multiplies a 5x4 matrix A against a length 6 vector v
#print(np.dot(A,v))

This is a 5x4 matrix A:
[[ 1  1  1  1]
 [ 1  0  1  0]
 [ 1  2  3  4]
 [ 0  0  0  0]
 [-1  1 -1  0]]

This is a length 4 vector b:
[ 1  0 -1  2]

This is the product Ab:
[2 0 6 0 0]

This is a 6x6 identity matrix I:
[[1. 0. 0. 0. 0. 0.]
 [0. 1. 0. 0. 0. 0.]
 [0. 0. 1. 0. 0. 0.]
 [0. 0. 0. 1. 0. 0.]
 [0. 0. 0. 0. 1. 0.]
 [0. 0. 0. 0. 0. 1.]]

This is a length 6 vector v:
[-1.  -0.6 -0.2  0.2  0.6  1. ]

This is the product Iv:
[-1.  -0.6 -0.2  0.2  0.6  1. ]



**Remark:** a very special case of matrix multiplication is when given two vectors $a,b$, we calculate $c=b^\intercal a=\sum_{i=1}^na_ib_i$, which is perhaps better known as the dot product! So $c=b^\intercal a=a\cdot b$

### Tensors

Tensors are the greatest abstraction of vectors and matrices we'll be coverinf during this workshop. Tensors are best discussed in terms of their *order*. As a reminder, we defined *order* as the number of indices required to specify a single element in the object. We've already covered tensors of order 1 and 2 so let's look at *tensors of order 3*. Just as matrices were considered **lists of vectors**, a *tensors of order 3* can be considered a **list of matrices**.

In [37]:
#initializes a 4x3x2 tensor with the values from 1 to 24
A1=np.array([[1,2],[3,4],[5,6]])
A2=np.array([[7,8],[9,10],[11,12]])
A3=np.array([[13,14],[15,16],[17,18]])
A4=np.array([[19,20],[21,22],[23,24]])
print("The matrices:")
print(A1)
print(A2)
print(A3)
print(A4)
t = np.array([A1,A2,A3,A4])
print('\nThis is t:')
print(t)

#prints the shape of the tensor t
#note all
print(t.shape)
print()

#note that to access a specific element, an index must be specified for each axis
print('These are some elements of t:')
print(t[0,0,0]) #element located in the first position of each axis
print(t[1,0,0]) #element located in the second position of first axis, first position of second and third axes
print(t[0,1,0]) #element located in the first position of first axis, second position of second axis, first position of third axis
print(t[0,0,1]) #element located in the first position of first and second axes, second position of third axis
print(t[3,2,1]) #element located in the fourth position of first axis, third position of second axis, and second position of third axis
print()

The matrices:
[[1 2]
 [3 4]
 [5 6]]
[[ 7  8]
 [ 9 10]
 [11 12]]
[[13 14]
 [15 16]
 [17 18]]
[[19 20]
 [21 22]
 [23 24]]

This is t:
[[[ 1  2]
  [ 3  4]
  [ 5  6]]

 [[ 7  8]
  [ 9 10]
  [11 12]]

 [[13 14]
  [15 16]
  [17 18]]

 [[19 20]
  [21 22]
  [23 24]]]
(4, 3, 2)

These are some elements of t:
1
7
3
2
24



There's no reason to stop there though. In general we have *tensors of order $n$* which can be seen as lists of *tensors of order $(n-1)$* (...or as matrices of *tensors of order $(n-2)$*, or as a *tensors of order 3* of *tensors of order $(n-3)$*). Let's look at a *tensor of order 5*

In [38]:
#initializes a 2x2x2x2x2 tensor with values from 1 to 32
b = np.array([[[[[1,2],[3,4]],[[5,6],[7,8]]],[[[9,10],[11,12]],[[13,14],[15,16]]]],[[[[17,18],[19,20]],[[21,22],[23,24]]],[[[25,26],[27,28]],[[29,30],[31,32]]]]])
print('This is a big tensor\'s shape and values')
print(b.shape)
print(b)

This is a big tensor's shape and values
(2, 2, 2, 2, 2)
[[[[[ 1  2]
    [ 3  4]]

   [[ 5  6]
    [ 7  8]]]


  [[[ 9 10]
    [11 12]]

   [[13 14]
    [15 16]]]]



 [[[[17 18]
    [19 20]]

   [[21 22]
    [23 24]]]


  [[[25 26]
    [27 28]]

   [[29 30]
    [31 32]]]]]


It's a pain to write out every value of a tensor, surrounded by the appropriate brackets, so for certain cases there are very handy intialization tools for creating some commonly used tensors. As well as commonly used tensors, numpy allows us to specify what types of elements they will contain. We've been assuming all our elements are real numbers, or equivelantly, `float` values. In general they can be 32-bit integers (`int32`), 64-bit integers (`int64`), 32-bit floats (`float32`), 64-bit floats (`float64`), 128-bit complex numbers (`complex`), booleans (`bool`), objects (`object`), strings (`string_`), or unicode characters (`unicoded_`)

In [45]:
#initializes a 2x2x5 tensor with the boolean value 0 in each position
z = np.zeros([2,2,5], dtype=bool)
print('This is a tensor of boolean zeroes:')
print(z)
print()

#initializes a 2x2x5 tensor with the 32-bit float value 1 in each position
o = np.ones([2,2,5], dtype=np.float32)
print('This is a tensor of 32-bit float ones:')
print(o)
print()

#initializes a 2x2x5 tensor with the 64-bit integer value of c in each position
c = 42
f = np.full([2,2,5], c, np.int64)
print('This is a tensor with a constant 64-bit integer value in each position:')
print(f)
print()

#initializes a 2x2x5 tensor with random values in each position
r = np.random.random([2,2,5])
print('This is a tensor with random values in each position:')
print(r)
print()

#constructs a tensor from other tensors
b = np.stack([z,o,f,r])
print('This is a tensor formed by stacking the others along a new axis:')
print(b)
print()

This is a tensor of zeroes:
[[[0. 0. 0. 0. 0.]
  [0. 0. 0. 0. 0.]]

 [[0. 0. 0. 0. 0.]
  [0. 0. 0. 0. 0.]]]

This is a tensor of ones:
[[[1. 1. 1. 1. 1.]
  [1. 1. 1. 1. 1.]]

 [[1. 1. 1. 1. 1.]
  [1. 1. 1. 1. 1.]]]

This is a tensor with a constant value in each position:
[[[42 42 42 42 42]
  [42 42 42 42 42]]

 [[42 42 42 42 42]
  [42 42 42 42 42]]]

This is a tensor with random values in each position:
[[[0.42118609 0.97747345 0.91497386 0.0629408  0.77644124]
  [0.63281608 0.33425122 0.28135532 0.1361897  0.85222988]]

 [[0.80931678 0.09038362 0.44470766 0.9199382  0.45210217]
  [0.96078022 0.06880233 0.32282673 0.41682529 0.39713642]]]

This is a tensor formed by stacking the others along a new axis:
[[[[ 0.          0.          0.          0.          0.        ]
   [ 0.          0.          0.          0.          0.        ]]

  [[ 0.          0.          0.          0.          0.        ]
   [ 0.          0.          0.          0.          0.        ]]]


 [[[ 1.          1. 

Next up we'll take a look at the different ways that you can manipulate tensors, starting with simple arithmetic operations. They can be broadly divided into two categories: scalar, and tensor. Scalar arithmetic is just arithmetic that involves one tensor and a scalar. Tensor arithmetic happens between two tensors. Here are some basic scalar arithmetic operations:

In [46]:
#initializes a 2x2x2 tensor with specified values
v = np.array([[[2,3.5],[4,2]],[[1.5,1],[2,1]]])
print('This is v:')
print(v)
print()

#constructs a new 2x2x2 tensor formed by adding 2 to each of the entries in v
vAdd = v + 2
print('This is v+2:')
print(vAdd)
print()

#constructs a new 2x2x2 tensor formed by subtracting 2 from each of the entries in v
vSubtract = v - 2
print('This is v-2:')
print(vSubtract)
print()

#constructs a new 2x2x2 tensor formed by multiplying each of the entries by 2
vMultiply = v * 2
print('This is v*2:')
print(vMultiply)
print()

#constructs a new 2x2x2 tensor formed by dividing each of the entries by 2
vDivide = v / 2
print('This is v/2: ')
print(vDivide)

This is v:
[[[2.  3.5]
  [4.  2. ]]

 [[1.5 1. ]
  [2.  1. ]]]

This is v+2:
[[[4.  5.5]
  [6.  4. ]]

 [[3.5 3. ]
  [4.  3. ]]]

This is v-2:
[[[ 0.   1.5]
  [ 2.   0. ]]

 [[-0.5 -1. ]
  [ 0.  -1. ]]]

This is v*2:
[[[4. 7.]
  [8. 4.]]

 [[3. 2.]
  [4. 2.]]]

This is v/2: 
[[[1.   1.75]
  [2.   1.  ]]

 [[0.75 0.5 ]
  [1.   0.5 ]]]


In [47]:
#assigns the name w to v, creating a shallow copy
w = v

#constructs a new tensor x with the values of v, creating a deep copy
x = np.copy(v)

print('This is v: ')
print(v)
print()

#adds 1 to each value in v
v += 1
print('This is v after adding 1')
print(v)
print()

#divides each value in v by 2
v /= 2
print('This is v after dividing by 2')
print(v)
print()

print('This is w, changed by the operations performed on v: ')
print(w)
print()

print('This is x, unchanged by the operations performed on v: ')
print(x)

This is v: 
[[[2.  3.5]
  [4.  2. ]]

 [[1.5 1. ]
  [2.  1. ]]]

This is v after adding 1
[[[3.  4.5]
  [5.  3. ]]

 [[2.5 2. ]
  [3.  2. ]]]

This is v after dividing by 2
[[[1.5  2.25]
  [2.5  1.5 ]]

 [[1.25 1.  ]
  [1.5  1.  ]]]

This is w, changed by the operations performed on v: 
[[[1.5  2.25]
  [2.5  1.5 ]]

 [[1.25 1.  ]
  [1.5  1.  ]]]

This is x, unchanged by the operations performed on v: 
[[[2.  3.5]
  [4.  2. ]]

 [[1.5 1. ]
  [2.  1. ]]]


These occur as element-wise operations. We say an operation is element wise when instead of applying the operation to the tensors, we can directly apply it to their elements instead. For example, if we have a tensor $A$ and a scalar $k$ so that $C=A+k$ then we say that $c_i=a_i+k$, and so the operation is instead defined for every individual element. Tensor arithmetic can also be element wise. Suppose we have tensors $A,B$ and define $C=A*B$, then we can instead define $C$ element-wise like this: for every element $c_{i,j}=a_{i,j}*b_{i,j}$. 

**Note:** here we use $*$ to represent element-wise multiplication instead of proper tensor/matrix multiplication which is instead denoted by placing the matrices/tensors adjacent to eachother, e.g. $C=AB$ 

Below are some examples of tensor arithmetic

In [2]:
#defines two 2x3x2 tensors, S and T
S = np.array([[[1,0],[-2,1],[3,2]],[[-3,0],[2,2],[0,-1]]])
T = np.array([[[-2,4],[3,3],[-1,-1]],[[1,-3],[1,1],[2,1]]])
print('This is a 2x3x2 tensor S:')
print(S)
print()
print('This is a 2x3x2 tensor T:')
print(T)
print()

#element-wise operations between 2 tensors work the same way as the scalar operations
print('This is the sum of S and T:')
print(S+T)
print()

print('This is the difference of S and T:')
print(S-T)
print()

print('This is the element-wise product of S and T:')
print(S*T)
print()

print('This is the element-wise quotient of S and T:')
print(S/T)

This is a 2x3x2 tensor S:
[[[ 1  0]
  [-2  1]
  [ 3  2]]

 [[-3  0]
  [ 2  2]
  [ 0 -1]]]

This is a 2x3x2 tensor T:
[[[-2  4]
  [ 3  3]
  [-1 -1]]

 [[ 1 -3]
  [ 1  1]
  [ 2  1]]]

This is the sum of S and T:
[[[-1  4]
  [ 1  4]
  [ 2  1]]

 [[-2 -3]
  [ 3  3]
  [ 2  0]]]

This is the difference of S and T:
[[[ 3 -4]
  [-5 -2]
  [ 4  3]]

 [[-4  3]
  [ 1  1]
  [-2 -2]]]

This is the element-wise product of S and T:
[[[-2  0]
  [-6  3]
  [-3 -2]]

 [[-3  0]
  [ 2  2]
  [ 0 -1]]]

This is the element-wise quotient of S and T:
[[[-0.5         0.        ]
  [-0.66666667  0.33333333]
  [-3.         -2.        ]]

 [[-3.         -0.        ]
  [ 2.          2.        ]
  [ 0.         -1.        ]]]


#### Aggregate Functions
So far we've talked a lot about tensors and their shapes, but one term we should define is an **axis**. A *tensor of order $n$* has $n$ axes. You can think of a single axis as "direction" along which you can traverse the tensor. Since a vector is a *tensor of order 1*, it only has one axis. You can only traverse it vertically (or horizontally. Depends on notion). A matrix has two axes. You can traverse it horizontally or vertically. In practice, different axes store different *types* of information. 

For example, if we have a single 360x720 image with RGB values for each pixel, then it would be stored as a *tensor of order 3* with a shape of (3,360,720) representing (#color channels, height, width). Abstractly, it doesn't matter what order the axes are in, i.e. (720,3,360) is completely valid, but in practice there are often standards for what axes represent what information. The most important thing is making sure you stay consistent with whatever standards you adopt. Here we're using PyTorch's "channels first" standard of putting the number of (color) channels first. Now, if we wanted to, we could also represent the image in this shape: (1,3,1,360,1,1,1,720). Would that be useful? Probably not. With that being said, sometimes it is quite helpful to include an extra axis in machine learning applications, but that'll be covered later.

One of the best features of numpy is their incredibly well optimized system of aggregate functions. These functions also make tensors much more useful. An aggregate function is one that *aggregates*, or *reduces*, a certain value along certain axes. For example, perhaps the most common aggregate function is the `np.sum` function. By default, most aggregate functions will reduce along *all* axes, and return a single scalar, but you can specify along which axes they reduce.

In [14]:
#initializes a 3x2x2 tensor T with random values from 0 to 10
T = np.random.random([3,2,2]) * 10
print('This is a tensor T:')
print(T)
print()

#sums the elements of T
print('This is the sum of the elements of T:')
print(np.sum(T))
print()

#sums the elements of T along the first axis
print('This is the sum of the elements of T along the first axis:')
print(np.sum(T, axis=0))
print()

#sums the elements of T along the first axis with a for-loop
U = np.zeros([T.shape[1], T.shape[2]])
for i in range(T.shape[0]):
    for j in range(T.shape[1]):
        for k in range(T.shape[2]):
            U[j,k] += T[i,j,k]
print('This is the sum of the elements of T along the first axis given by a for-loop:')
print(U)
print()

#sums the elements of T along the second and third axes simultaneously
print('This is the sum of the elements of T along the second and third axes:')
print(np.sum(T, axis=(1,2)))
print()

#sums the elements of T along the second and third axes
V = np.zeros(T.shape[0])
for i in range(T.shape[0]):
    for j in range(T.shape[1]):
        for k in range(T.shape[2]):
            V[i] += T[i,j,k]
print('This is the sum of the elements of T along the second and third axes given by a for-loop:')
print(V)
print()

#multiplies the elements of T together
print('This is the product of the elements of T:')
print(np.prod(T))
print()

#multiplies the elements of T along the third axis
#this produces a 3x2 matrix consisting of the columns of each matrix of T element-wise multiplied together
print('This is the product of the elements of T along the third axis:')
print(np.prod(T, axis=2))
print()

#finds the maximum and minimum elements of T along the 2nd axis
#produces 3x2 matrices consisting of the maximal/minimal element of each column of the matrices composing T
print('This is maximum reduction of T along the second axis:')
print(np.max(T, axis=1))
print('This is the minimum reduction of T along the second axis:')
print(np.min(T, axis=1))
print()

#finds the average and median of the elements of T along the first axis
print('This is the average reduction of T along the first axis:')
print(np.mean(T, axis=0))
print('This is the median reduction of T along the first axis:')
print(np.median(T, axis=0))

This is a tensor T:
[[[1.59758585 1.84357114]
  [5.31202091 7.47264605]]

 [[5.287028   4.9141707 ]
  [0.06699536 3.06867199]]

 [[5.35072292 9.10017147]
  [9.20637863 9.67819355]]]

This is the sum of the elements of T:
62.89815657790939

This is the sum of the elements of T along the first axis:
[[12.23533678 15.8579133 ]
 [14.5853949  20.2195116 ]]

This is the sum of the elements of T along the first axis given by a for-loop:
[[12.23533678 15.8579133 ]
 [14.5853949  20.2195116 ]]

This is the sum of the elements of T along the second and third axes:
[16.22582395 13.33686605 33.33546657]

This is the sum of the elements of T along the second and third axes given by a for-loop:
[16.22582395 13.33686605 33.33546657]

This is the product of the elements of T:
2709321.3614799776

This is the product of the elements of T along the third axis:
[[ 2.94526317 39.69485205]
 [25.98135808  0.20558679]
 [48.69249607 89.1011143 ]]

This is maximum reduction of T along the second axis:
[[5.312020

#### Tiling

Tiling is a technique used when creating or modifying tensors which essentially repeats tensors along certain axes.

In [18]:
#generates a vector of length 6 with values from 0 to 5
v = np.linspace(0,5,6)
print('This is a vector v of length 6 with values from 0 to 5:')
print(v)
print()

#constructs a vector length 18 by tiling v
print('This is a vector of length 18 constructed by tiling v 3 times along its axis')
print(np.tile(v, 3))
print()

#constructs a 6x6 matrix by tiling v
print('This is a 6x6 matrix constructed by tiling v 6 times along a new first axis:')
print(np.tile(v, [5,1]))
print()

#constructs a 2x4x6 tensor by tiling v
print('This is a 2x4x6 tensor constructed by tiling v 2 times along a new first axis and 4 times along a new second axis:')
print(np.tile(v, [2,4,1]))
print()

#initializes a 6x6 matrix with values from 0 to 10
A = np.stack([np.linspace(i,i+5,6) for i in range(0,6,1)], axis=1)
print('This is a 6x6 matrix with values from 0 to 10:')
print(A)
print()

#constructs a 2x2x6x6 tensor with values formed by tiling A
print('This is a 2x2x6x6 tensor constructed by tiling A 2 times along two new axes:')
print(np.tile(A, [2,2,1,1]))

This is a vector v of length 6 with values from 0 to 5:
[0. 1. 2. 3. 4. 5.]

This is a vector of length 18 constructed by tiling v 3 times along its axis
[0. 1. 2. 3. 4. 5. 0. 1. 2. 3. 4. 5. 0. 1. 2. 3. 4. 5.]

This is a 6x6 matrix constructed by tiling v 6 times along a new first axis:
[[0. 1. 2. 3. 4. 5.]
 [0. 1. 2. 3. 4. 5.]
 [0. 1. 2. 3. 4. 5.]
 [0. 1. 2. 3. 4. 5.]
 [0. 1. 2. 3. 4. 5.]]

This is a 2x4x6 tensor constructed by tiling v 2 times along a new first axis and 4 times along a new second axis:
[[[0. 1. 2. 3. 4. 5.]
  [0. 1. 2. 3. 4. 5.]
  [0. 1. 2. 3. 4. 5.]
  [0. 1. 2. 3. 4. 5.]]

 [[0. 1. 2. 3. 4. 5.]
  [0. 1. 2. 3. 4. 5.]
  [0. 1. 2. 3. 4. 5.]
  [0. 1. 2. 3. 4. 5.]]]

This is a 6x6 matrix with values from 0 to 10:
[[ 0.  1.  2.  3.  4.  5.]
 [ 1.  2.  3.  4.  5.  6.]
 [ 2.  3.  4.  5.  6.  7.]
 [ 3.  4.  5.  6.  7.  8.]
 [ 4.  5.  6.  7.  8.  9.]
 [ 5.  6.  7.  8.  9. 10.]]

This is a 2x2x6x6 tensor constructed by tiling A 2 times along two new axes:
[[[[ 0.  1.  2.  3.  

## Norms and Metrics

The mathematical definition of **distance** is formalized as a **metric**. A metric is a function $d:X\times X\rightarrow[0,\infty)$ that obeys 4 properties: 
$$
\forall x,y,z\in X, d(x,y)\geq 0\\
d(x,y)=0 \iff x=y\\
d(x,y)=d(y,x)\\
d(x,z)\leq d(x,y)+d(y,z)\\
$$

You already have some experience with metrics, whether you know it by that name or not. How close are $3,7$? Well they're only $4$ units away. What about $-3,-7$? $4$ as well. Same as $7,3$. When dealing with real numbers, the metric we usually use is  $d(x,y)=|x-y|$, which perfectly obeys all our established properties. There's nothing stopping us from having a metric on things other than real numbers, for example, vectors! The most classic example is probably *euclidean distance*, defined as the square root of the sum of squares. To define our *euclidean distance* metric $d: \mathbb{R}^n\times\mathbb{R}^n\rightarrow [0,\infty)$, we say that $\forall x,y\in\mathbb{R}^n, d(x,y)=\sqrt{\sum_{i=1}^n(x_i-y_i)^2}$


Every metric also gives us what is known as a **norm**. You can think of it as a generalization of the absolute value function. In fact, $f(x)=|x|$ is a proper norm. Given a metric $d$, its associated norm $f(x)=d(0,x)$, or its distance from zero (or whatever represents "zero", e.g. the zero vector). Similarly, you can generally induce a distance from a norm by defining $d(x,y)=f(x-y)$, so we'll usually discuss these two semi-interchangeably. When you've already established a norm that you're working in, or it is otherwise obvious, then instead of referring to the actual norm function, you can instead refer to the double-bar notation: $f(x)=\|x\|$. 

Norms and metrics are incredibly important in data science, and machine learning especially, since they define the idea of **distance**. That notion of **distance** also defines how similar/dissimilar data is to other data. This results in determining the shape, or *topology*, of the space that your data exists in. It's all very abstract right now, but we'll see concrete examples of this in later workshops. 

A special family of norms exists, known as $L^p$ norms $L^p:\mathbb{R}^n\rightarrow[0,\infty)$, defined by: $$L^p(x)=(\sum_{i=1}^n(|x_i|)^p)^{1/p}$$This is a widely used family, including two very heavily used norms: $L^1(x)=\sum_{i=1}^n|x_i|$, and $L^2(x)=(\sum_{i=1}^n|x_i|^2)^{1/2}=\sqrt{\sum_{i=1}^nx_i^2}$

$L^2$ is the familiar euclidean norm, while $L^1$ is a norm popularly used in image analysis. Another nice property is that in $\mathbb{R}^n, L^2(x)=$, 

Let's check out what these things look like when coded up!

In [27]:
#initializes 2 vectors, u and v, of length 3
u = np.array([1,0,1])
v = np.array([-1,0,1])
print('This is a vector u:')
print(u)
print()
print('This is a vector v:')
print(v)
print()

def euclideanDistance(a,b, debug=False):
    difference = a-b
    squaredDifference = difference * difference
    sum = np.sum(squaredDifference)
    distance = np.sqrt(sum)
    if(debug):
        print('The difference between a and b is ', difference)
        print('Element-wise squaring gets us  ', squaredDifference)
        print('The sum of those squares is ', sum)
        print('The square root of the sum is ', distance)
        print('Therefore the euclidean distance is: ')
    return distance

#finds the euclidean distance between u and v
print('This is the euclidean distance between u and v:')
print(euclideanDistance(u, v, debug=True))
print()

#initializes two vectors, u and v, of length 4
u = np.array([1,2,3,4])
v = np.array([-1,4,-3,5])
print('This is a vector u:')
print(u)
print()
print('This is a vector v:')
print(v)
print()

#finds the euclidean distance between u and v
print('This is the euclidean distance between u and v:')
print(euclideanDistance(u, v, debug=True))
print()

print('This is a vector u:')
print(u)
print()
print('This is a vector v:')
print(v)
print()

#finds the L1 norm of v
print('This is the L1 norm of v:')
print(np.linalg.norm(v,1))
print()

#finds the L1 norm of v by following the formula


This is a vector u:
[1 0 1]

This is a vector v:
[-1  0  1]

This is the euclidean distance between u and v:
The difference between a and b is  [2 0 0]
Element-wise squaring gets us   [4 0 0]
The sum of those squares is  4
The square root of the sum is  2.0
Therefore the euclidean distance is: 
2.0

This is a vector u:
[1 2 3 4]

This is a vector v:
[-1  4 -3  5]

This is the euclidean distance between u and v:
The difference between a and b is  [ 2 -2  6 -1]
Element-wise squaring gets us   [ 4  4 36  1]
The sum of those squares is  45
The square root of the sum is  6.708203932499369
Therefore the euclidean distance is: 
6.708203932499369

This is a vector u:
[1 2 3 4]

This is a vector v:
[-1  4 -3  5]

This is the L1 norm of v:
13.0

