# The Mathematics Behind Data Science

This workshop will cover material ranging from what a vector is all the way to L<sup>p</sup> norms, loss functions, and gradient descent! We want to emphasize that a strong math background is not required for this workshop, as we'll be presenting the material in a beginner oriented, hands-on way. That means that we will introduce material both in terms of what you may code up in any given project, and the abstract math objects which represent them. In the simplest case, a vector can be described as a 1D array, but that's not enough to justify many of the techniques employed in DL. In order to extend that, we will dive into the math that powers the code. Specifically, this workshop aims to cover:

- Tensors
    - Vectors
    - Matrices
    - Tensors (multi-dimensional arrays)

- Functional Reduction
- Norms
    - Euclidean Distance
    - Metrics
    - L<sup>1</sup>, L<sup>2</sup>, L<sup>p</sup> norms
- Loss Functions
    - Mean Squared Error
    - Cross Entropy
- Derivatives and Jacobians
    - Single-variable derivatives
    - Multi-variable derivatives (Jacobians)
    - Gradients
- Dimensionality Reduction
    - Principle Component Analysis
    - t-SNE
    - UMAP
- Chain Rule
- Computational Graphs

## Tensors and Data

### Vectors

Vectors for the basis for the vast majority of data-science but are especially significant in deep learning! To begin with, vectors are the most fundamental way by which we represent data. Ultimately, a vector can be considered a "list" of numbers, with each number being assigned an **index**. These indices give us a way to explicitly refer to the individual elements composing the vector. Extending past that, we'll also talk about the operations that can occur using vectors, and more complex ways to manipulate them. Python doesn't come with any data type that's particularly nice for what we intend to do, so instead we look to Numpy's arrays. Numpy provides us an incredibly convenient way to represent such an abstract, and we'll explore it in great depth throughout this workshop.

In [3]:
import numpy as np


a=[1,2,3,4] # Standard Python list
b=np.array([5,6,7]) # Numpy array made based on the given python list
c=np.array(a) # Another way to create a numpy array based on an existing list

print(a) 
print(b) # Note the numpy array prints without commas ","
print(c) # This cleans up the print statement and is helpful for larger constructions

print()
print(a[0])
print(b[1]) # Access elements just as you regularly would
print(c[2])


[1, 2, 3, 4]
[5 6 7]
[1 2 3 4]

1
6
3


Mathematically, a vector is referred to as $N$ *dimensional* if and only if it has exactly $N$ *elements*. A vector is also a *first order tensor*. We're going to draw a sharp distinction between dimensionality and order. In a lot of data science literature, these two are generally interchangably referred to as *dimension*, but that leads to way too many headaches for us to justify using it. We've defined dimensionality for our purposes, so what is order? Well, roughly speaking the order of a vector/matrix/tensor is **the number of indices required to specify which element you're talking about**. So again, a vector is a first order tensor, meaning we need only one index to exactly specify which element we want to access. This is shown in our last example! 

For notation, we'll write out a vector, such as "a" in the code above like this: $\vec a = \left<1,2,3,4 \right>$. Other acceptable notations include: $a=[1,2,3,4]^\intercal$, or $a=\begin{bmatrix}1\\2\\3\\4\end{bmatrix}$. We use the first one because it's honestly just easier, and doesn't mess up vertical spacing unlike *some* methods.

Now, suppose we have $\vec{v}$, an $N$ dimensional vector, and suppose every element will be a real number, then we say that $\vec{v}\in\mathbb{R}^N$ read as *v is an element of R-N* (not *R to the power of N*). If your elements are instead integers, the statement would be $\vec{v}\in\mathbb{Z}^N$. In general, if your elements belong to a set $S$, then you write $\vec{v}\in S^N$

So if we had a vector $\vec{u}$ which had 6 elements which are real numbers, we'd write $\vec{u}\in\mathbb{R}^6$. If it had 6 integers instead, $\vec{u}\in\mathbb{Z}^6$. Now suppose 4 of those elements were real numbers, and 2 were integers, then $\vec{u}\in\mathbb{R}^4\times\mathbb{Z}^2$. Note that $\mathbb{R}^6=\mathbb{R}^3\times\mathbb{R}^3$. This language gives us much more flexibility in discussing more complex ideas.

**Note:** for convenience, we will stop writing the arrow above the vector, since in context it is generally clear whether or not a variable refers to a vector.

### Matrices

Many of you may be familiar with matrices, but most of you will have only seen them in passing. A matrix can be defined in many ways, but the most programmer-friendly description is as a **collection of elements**, much like how we defined our vectors! With matrices, however, we need two indices in order to specify which element we're talking about.  

In [6]:
x=np.array([[1,2,3]
          ,[4,5,6]
          ,[7,8,9]]) # We'll explain this creation in a little bit

print(x)
print(x[0,0]) # First row, first column, the top-left element
print(x[0,1]) # First row, second column
print(x[1,0]) # Second row, first column
print(x[2,2]) # Third row, third column, the bottom-right element

print(x[0])

[[1 2 3]
 [4 5 6]
 [7 8 9]]
1
2
4
9
[1 2 3]


As we can see in the above example, we require two indices (e.g 0,1) in order to specify a particular element. So what exactly happened when we only specified one index? Well, it returned the first row of the matrix. This leads to another interpretation of a matrix: **a list of vectors**. If you have vectors $v_1,v_2,v_3$, then you can construct a matrix $A=[v_1,v_2,v_3]^\intercal$ where our three vectors serve as *rows* in the matrix. In the above code, $v_1=\left<1,2,3\right>$, $v_2=\left<4,5,6\right>$, $v_3=\left<7,8,9\right>$. This is why when we run `print(x[0])` we get $v_1$. 

In math literature we generally construct matrices using vectors as their *columns*, and this can cause a bit of confusion. In practice it doesn't matter all too much though, since you can always use a handy function to change your rows to columns and vice versa known as the *transpose*, usually denoted as $A^\intercal$. In fact, when we defined $A$ originally, we used the transpose so that we could set each $v_i$ as a row instead of column. Let's see it in action below.

In [16]:
print(x) # Regular
print()
print(x[0]) # First row of x

print()
y=x.transpose()
print(y) # Switch rows with columns
print()
print(y[0]) # First row of y, but first column of x

[[1 2 3]
 [4 5 6]
 [7 8 9]]

[1 2 3]

[[1 4 7]
 [2 5 8]
 [3 6 9]]

[1 4 7]


Undeniably the most common operation regarding matrices is *matrix multiplication*. This can be a little unintuitive at first, but we hope to help explain the weird behaviour. First off, matrices need to have the right kind of shape before being multiplied together. We say a matrix $A$ is $m\times n$ when it has $m$ rows and $n$ columns. If we want to multiply matrices $A$ and $B$ to get $C=AB$, then $B$ needs to have the same number of rows as $A$ has columns. So if $A$ is $m\times n$ then $B$ must be $n\times l$. Then $C$ would end up being an $m\times l$ matrix. An easy, albeit mathematically lax, way to figure out shape compatibilities for matrices is that the shape of $C=AB$ is $(m\times n)\times(n\times l)\implies m\times (n\times n) \times l \implies m\times l$. You can imagine it as the inner variable $n$ *canceling out*. Which also helps to remind you that $B$ needs to have the same number of rows as $A$ has columns, else the inner dimensions wouldn't be the same and hence wouldn't cancel.

In [17]:
A = np.array([[1,2],[3,4],[1,4]]) # A is 3x2
B = np.array([[5,6],[7,8]]) # B is 2x2

print('This is the first matrix:')
print(A)
print()

print('This is the second matrix:')
print(B)
print()

print('This is their product:')

#multiplies the 2 matrices together
print(np.dot(A,B)) # AB is (3x2)x(2x2)=>3x(2x2)x2=>3x2

This is the first matrix:
[[1 2]
 [3 4]
 [1 4]]

This is the second matrix:
[[5 6]
 [7 8]]

This is their product:
[[19 22]
 [43 50]
 [33 38]]


So what exactly is going on here? Let's start with a simple example using $A=[a_1,a_2]$ where $a_i\in\mathbb{R}^3$ (note since we're not using the transpose, each $a_i$ is a *column* of $A$), a $3x2$ matrix and $b$, a $2x1$ matrix, which is alternatively just a *vector*. Now we can define $Ab=\sum_{i=1}^2b_ia_i$

Let's look at a few more examples