# Vectors: An Introduction

### Learning Objectives:
- [Points & Space](#Points-&-Space)
- [Vector Length](#Vector-Length)
- [Vector Addition & Subtraction](#Vector-Addition-&-Subtraction)

# Points & Space

__Points__ are simply a list of numbers that specifies a position in space with its __coordinates__. The number of coordinates determines the number of __dimensions__ of that space. If our space is defined by a line, all we need is one coordinate to define its position. If our space is defined by a plane, all we need are two coordinates, and if in 3-D, we would need 3 coordinates. This logic can be applied to an any N-D space, which for any dimension greater than 3 is known as a __hyperspace__.

<img src="images/points_in_space.png"
     alt="Orthogonality"
     style="display: block; margin-left:auto; margin-right:auto; width:60%"
     />

So what are __vectors__? Vectors are a useful representation of points in any N-D space. In general, a vector is an ordered list made of __components__, each that can take a range of values to define a coordinate along a given dimension. It is ordered since each vector __entry__ refers to a coordinate along a specific dimension.

In this section, we will aim to help you be able to visualize vectors, as well as carry out vector operations in Python. In Python, we can create vectors either as standard lists, or as NumPy arrays.

In [30]:
import numpy as np

x1_list = [1, 2] # 2-D vector using a standard list
x1_numpy = np.array([1, 2]) # 2D vector using a numpy array

To gain some intuition, let us consider a 2-D vector space with components $x_{1}$ and $x_{2}$. If we draw this in Cartesian coordinates, we can visualize this vector as an instruction in the form of an arrow to move from the origin (0,0), to the values our components take. Let us consider an example vector in this space, $\vec{\mathbf{v}}$. Vectors are commonly denoted as columns as follows:

$$\vec{\mathbf{v}} = \begin{bmatrix} 1 \\ 2 \end{bmatrix}$$

But can also be denoted as rows:

$$\vec{\mathbf{v}} = \begin{bmatrix} 1 & 2 \end{bmatrix}$$

In this case, the vector instructs us to move from (0,0) to (1,2). This is visualized in the plot below:

In [33]:
# Visualisation Code

import plotly.graph_objects as go

x1 = [0,1]
x2 = [0,2]

fig = go.Figure(data=[go.Scatter(
    x=x1, y=x2,
    mode='markers',
    marker=dict(size=[10,50],
            color=["black","orange"])
    )])

fig.update_layout(
    title="Initial Vector Plot (v)",
    xaxis_title="$x_{1}$",
    yaxis_title="$x_{2}$",
)
fig.add_trace(go.Scatter(x=[0, 1], y=[0, 2],marker_color="black"))
fig.update_layout(showlegend=False)
fig.show()

... But what does this actually mean and how is it useful? Well, in the context of data science and machine learning, data can have multiple __features__, which are properties of our data that can take a range of values. Hence, we can use vectors to have different compoenents correspond to the value of different properties of our data. We will now consider an example below with a dataset containing different animals, each having a value for two features/properties: animal cuteness and animal size!

<table>
  <tr>
    <th>Animal</th>
    <th>Cuteness</th>
    <th>Size</th>
  </tr>
  <tr>
    <td>Lion</td>
    <td>80</td>
    <td>50</td>
  </tr>
  <tr>
    <td>Elephant</td>
    <td>75</td>
    <td>95</td>
  </tr>
  <tr>
    <td>Hyena</td>
    <td>10</td>
    <td>30</td>
  </tr>
  <tr>
    <td>Mouse</td>
    <td>60</td>
    <td>8</td>
  </tr>
  <tr>
    <td>Pig</td>
    <td>30</td>
    <td>30</td>
  </tr>
  <tr>
    <td>Horse</td>
    <td>50</td>
    <td>65</td>
  </tr>
  <tr>
    <td>Dolphin</td>
    <td>90</td>
    <td>45</td>
  </tr>
  <tr>
    <td>Wasp</td>
    <td>2</td>
    <td>1</td>
  </tr>
  <tr>
    <td>Giraffe</td>
    <td>60</td>
    <td>80</td>
  </tr>
  <tr>
    <td>Dog</td>
    <td>95</td>
    <td>20</td>
  </tr>
  <tr>
    <td>Alligator</td>
    <td>8</td>
    <td>40</td>
  </tr>
  <tr>
    <td>Mole</td>
    <td>30</td>
    <td>12</td>
  </tr>
  <tr>
    <td>Black Widow</td>
    <td>100</td>
    <td>30</td>
  </tr>
  </tr>
</table>

By representing information as above as vectors, we can encode multiple properties into a single entity, which helps us develop a much more holistic understanding of how examples relate to each other, such as whcih animals are similar to which other animals. For instance, if we plot the table above, we get the visual representation below:

In [6]:
import plotly.graph_objects as go

animal_labels = ["Lion", "Elephant", "Hyena", "Mouse", "Pig", "Horse", "Dolphin", "Wasp", "Giraffe", "Dog", "Alligator", "Mole", "Scarlett Johansson", "The Rock"]
animal_cuteness = [80, 75, 10, 60, 30, 50, 90, 1, 60, 95, 8, 30, 100, 50]
animal_size = [50, 95, 30, 8, 30, 65, 45, 1, 80, 20, 40, 12, 30, 100]


fig = go.Figure(data=[go.Scatter(
    x=animal_cuteness, y=animal_size,
    text=animal_labels,
    mode='markers+text',
    marker_color='orange',
    marker_size=50)
])

fig.update_layout(
    title="Animal Cuteness vs Animal Size",
    xaxis_title="Animal Cuteness",
    yaxis_title="Animal Size",
)

fig.show()


We have plotted this data in what is known as __feature space__, which just means that the dimensions of the vectors correspond to the features/properties of the data. Thanks to a vector representation, we can see that elephants seem very far from the origin in feature space, whereas wasps are very close. We can even go as far as say that alligators and hyenas are remarkably similar in feature space, as they are very close, whereas wasps are very different from elephants!

We are now building an intuition as to how far things are from the origin or from each other in feature space, which leads us to the concept of vector length and distance.

# Vector Length & Euclidian Distance
We can now go over the concept of vector __length__, also known as __magnitude__. Vector length is a measure of the size of a vector based on its respective components, and is completely independent of direction. If we look at the case of the 2-D vector $\mathbf{\vec{v}}$ shown below, with its length along each dimension also displayed. We see that it forms a right-angle triangle, meaning we can solve our problem with the Pythagoras Theorem!

In [7]:
# Visualisation Code

import plotly.graph_objects as go

x1 = [0,1,0,1]
x2 = [0,2,2,0]


fig = go.Figure(data=[go.Scatter(
    x=x1, y=x2,
    mode='markers',
    marker=dict(size=[50,50,25,25]),
    marker_color="orange")
])

fig.update_layout(
    title="Component-decomposed Vector (v)",
    xaxis_title="$x_{1}$",
    yaxis_title="$x_{2}$",
)
fig.add_trace(go.Scatter(x=[0, 1], y=[0, 2],marker_color="black"))
fig.add_trace(go.Scatter(x=[0, 1], y=[0, 0],marker_color="black"))
fig.add_trace(go.Scatter(x=[0, 0], y=[0, 2],marker_color="black"))
# adding annotations
fig.add_annotation(
            x=0.5,
            y=0.6,
            text="$\sqrt{x_{1}^{2} + x_{2}^{2}}$")
fig.add_annotation(
            x=0.02,
            y=0.8,
            text="$x_{2}$")
fig.add_annotation(
            x=0.8,
            y=0.05,
            text="$x_{1}$")
fig.update_annotations(dict(
            xref="x",
            yref="y",
            showarrow=False,
            ax=0,
            ay=-40
))

fig.update_layout(showlegend=False)
fig.show()

Hence, the length of a 2-D vector, which is the hypotenuse of the triangle above, is given by the equation below:

$$||\vec{\mathbf{x}}|| = \sqrt{x_{1}^{2} + x_{2}^{2}}$$

Where $||\vec{\mathbf{x}}||$ denotes the length of the vector $\vec{\mathbf{x}}$.

This means that for our example vector, $\vec{\mathbf{v}}$, we get $||\vec{\mathbf{v}}|| = \sqrt{1^{2} + 2^{2}} = \sqrt{5}$. So how do we extend this to higher dimensions? Well luckily, mathematicians have shown that the exact same process can be applied for any number of dimensions to obtain the length of a vector, giving us the following equations:

$$\text{3-D Case:     }||\vec{\mathbf{x}}|| = \sqrt{x_{1}^{2} + x_{2}^{2} + x_{3}^{2}}$$
$$\text{N-D Case:     }||\vec{\mathbf{x}}|| = \sqrt{\sum_{i=1}^{N} x_{i}^{2}} $$
($\sum$ is the capital Greek letter sigma, and means summation)

So now that we have the means, mathematically, how can we use Python to calculate the length of a vector? Below, you will calculate it in three ways: basic programming, with the Python standard library and NumPy. Note that for basic programming, a common rule of thumb is that __summation of many terms implies using iteration__, as it is a repetitive task. Once you have written a program that computes length of a vector, print the length of all the vectors from the animal example!

In [None]:
import math

# Example vector
x = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

# Standard Python
def vector_length(v):
    length = 0
    for value in v:
        length += value ** 2
    length = length ** 0.5
    return length
print("Vector length:", vector_length(x))

# Math module
x_squared = [math.pow(val,2) for val in x]
length2 = math.sqrt(sum(x_squared))
print("Vector length:", length2)

# NumPy
x = np.array(x)
length3 = np.linalg.norm(x)
print("Vector length:", length3)

So now you have seen how to compute the length of a vector. How can we extend this concept to help us gauge how far two vectors are apart in feature space? Well, consider the following decomposition of the elephant and pig vectors:

In [23]:
import plotly.graph_objects as go

animal_labels = ["Elephant", "Pig"]
animal_cuteness = [75, 30]
animal_size = [95, 30]

fig = go.Figure(data=[go.Scatter(x=[30, 75], y=[30, 95], line_color='black')])
fig.add_trace(go.Scatter(
    x=animal_cuteness, y=animal_size,
    text=animal_labels,
    mode='markers+text',
    marker_color='orange',
    marker_size=50)
)

# Adding traces for decomposing our vectors
fig.add_trace(go.Scatter(x=[0, 30], y=[0, 0], line_color='red'))
fig.add_trace(go.Scatter(x=[0, 0], y=[0, 30], line_color='red'))
fig.add_trace(go.Scatter(x=[30, 75], y=[0, 0], line_color='blue'))
fig.add_trace(go.Scatter(x=[0, 0], y=[30, 95], line_color='blue'))
fig.add_trace(go.Scatter(x=[0, 75], y=[3, 3], line_color='green'))
fig.add_trace(go.Scatter(x=[2, 2], y=[0, 95], line_color='green'))

fig.add_trace(go.Scatter(x=[30, 75], y=[30, 95], line_color='black'))

fig.update_layout(
    title="Animal Cuteness vs Animal Size",
    xaxis_title="Animal Cuteness",
    yaxis_title="Animal Size",
    showlegend=False
)

fig.show()

From the above decomposition, we see that in terms of length in this space, the distance between pig and elephant is 45 in cuteness and 65 in size, which is what we get when we subtract the elephant cuteness and size from the pig's. This allows us to determine what is known as the __euclidian distance__ between two vectors/points, typically just referred to as _distance_. If we consider two 2-D vectors, $x$ and $y$, such as the elephant and pig examples, the euclidian distance is given as follows:

$$d(x,y) = \sqrt{(x_{1} - y_{1})^{2} + (x_{2} - y_{2})^{2}}$$

Just as with length, we can elaborate on the concept distance to higher dimensions:
$$\text{N-D Case:     }d(x,y) = \sqrt{\sum_{i=1}^{N} (x_{i}-y_{i})^{2}} $$

Great! So now we have a measure of how close two vectors are in (in our example feature) space. Let us write a function that computes the euclidian distance between two input vectors. Then, use the function created to compute the distance between the alligator and hyena vectors and between the elephant and pig vectors.

In [25]:
## Computing distance
def d(x, y):
    dist = 0
    for xi, yi in zip(x, y):
        dist += (xi - yi)**2
    return dist ** 0.5

elephant, pig = [75, 95], [30, 30]
alligator, hyena = [8, 40], [10, 30]
print(d(elephant, pig))
print(d(alligator, hyena))


79.05694150420949
10.198039027185569


# Vector Addition & Subtraction

Another factors that make vectors (and eventually as you will see matrices) powerful representations of information is that it provides us with a very simple form of notation for more complex procedures for large amounts of calculations. The simplest of these operations are __vector addition__ and __vector subtraction__.

Let us consider 2D vectors for now. To carry these operations out, we simply __add__ or __subtract__ components of the vectors respectively, as shown in the equations below:

$$ \vec{\mathbf{x}}+\vec{\mathbf{y}} = \begin{bmatrix} x_{1}+y_{1} \\ x_{2}+y_{2} \end{bmatrix}$$

$$ \vec{\mathbf{x}}-\vec{\mathbf{y}} = \begin{bmatrix} x_{1}-y_{1} \\ x_{2}-y_{2} \end{bmatrix}$$

If we think about what we are using these vectors for, these operations make sum. By only adding/subtracting the corresponding vector components, we achieve the same result as carring out the operations one by one. This way, we obtain a concise way of subtracting cuteness from cuteness, size from size, etc. Below is an example

$$\begin{bmatrix} 3 \\ 3 \end{bmatrix} + \begin{bmatrix} 2 \\ 1 \end{bmatrix} = \begin{bmatrix} 5 \\ 4 \end{bmatrix}$$

This may not seem like a big step, but as we get into more complex operations, the simplicity of vector notation will gradually give off its appeal. Vector sums and differences can also be visualised in space, as we see below:

In [28]:
# Visualisation Code

x1 = [0,3,2,5]
x2 = [0,3,1,4]


fig = go.Figure(data=[go.Scatter(
    x=x1, y=x2,
    mode='markers',
    marker = dict(size=[10,30,30,60], 
                  color=["black","orange","orange","orange"]),
    )
])

fig.update_layout(
    title="Vector Addition",
    xaxis_title="$x_{1}$",
    yaxis_title="$x_{2}$",
)
fig.add_trace(go.Scatter(x=[0, 3], y=[0, 3],marker_color="black"))
fig.add_trace(go.Scatter(x=[0, 2], y=[0, 1],marker_color="black"))
fig.add_trace(go.Scatter(x=[0, 5], y=[0, 4],marker_color="black"))
# adding annotations
fig.add_annotation(
            x=1,
            y=0.3,
            text="$\mathbf{y}$")
fig.add_annotation(
            x=1.5,
            y=1.8,
            text="$\mathbf{x}$")
fig.add_annotation(
            x=3,
            y=2.5,
            text="$\mathbf{x+y}$")
fig.update_annotations(dict(
            xref="x",
            yref="y",
            showarrow=False,
            ax=0,
            ay=-40
))

fig.update_layout(showlegend=False)
fig.show()

We can also visualise subtraction, as shown below for the following example:

$$\begin{bmatrix} 5 \\ 4 \end{bmatrix} - \begin{bmatrix} 2 \\ 1 \end{bmatrix} = \begin{bmatrix} 3 \\ 3 \end{bmatrix}$$

So, now that we have the intuition, let's have a short code break so you can code up vector addition and subtraction with basic lists, then see how it can be done with NumPy.

In [31]:
## Example vectors
vector1 = [1,2,3,4,5,6,7,8,9,10]
vector2 = [1,1,1,1,1,1,1,1,1,1]

## Basic Python
def vec_add(v1, v2):
    resultant_vector = []
    for v1_val, v2_val in zip(v1, v2):
        resultant_vector.append(v1_val + v2_val)
    return resultant_vector

def vec_sub(v1, v2):
    resultant_vector = []
    for v1_val, v2_val in zip(v1, v2):
        resultant_vector.append(v1_val - v2_val)
    return resultant_vector

print("vector1 + vector2 =", vec_add(vector1, vector2))
print("vector1 - vector2 =", vec_sub(vector1, vector2))
print()


## NumPy
vector1 = np.array(vector1)
vector2 = np.array(vector2)

print("vector1 + vector2 =", vector1 + vector2)
print("vector1 - vector2 =", vector1 - vector2)

vector1 + vector2 = [2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
vector1 - vector2 = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

vector1 + vector2 = [ 2  3  4  5  6  7  8  9 10 11]
vector1 - vector2 = [0 1 2 3 4 5 6 7 8 9]


These concepts also work in higher dimensions. Before focusing on words themselves, let's briefly add another dimension to our animal vectors and see what that looks like:

<table>
  <tr>
    <th>Animal</th>
    <th>Cuteness</th>
    <th>Size</th>
    <th>Ferocity</th>
  </tr>
  <tr>
    <td>Lion</td>
    <td>80</td>
    <td>50</td>
    <td>85</td>
  </tr>
  <tr>
    <td>Elephant</td>
    <td>75</td>
    <td>95</td>
    <td>20</td>
  </tr>
  <tr>
    <td>Hyena</td>
    <td>10</td>
    <td>30</td>
    <td>90</td>
  </tr>
  <tr>
    <td>Mouse</td>
    <td>60</td>
    <td>8</td>
    <td>1</td>
  </tr>
  <tr>
    <td>Pig</td>
    <td>30</td>
    <td>30</td>
    <td>10</td>
  </tr>
  <tr>
    <td>Horse</td>
    <td>50</td>
    <td>65</td>
    <td>30</td>
  </tr>
  <tr>
    <td>Dolphin</td>
    <td>90</td>
    <td>45</td>
    <td>20</td>
  </tr>
  <tr>
    <td>Wasp</td>
    <td>2</td>
    <td>1</td>
    <td>100</td>
  </tr>
  <tr>
    <td>Giraffe</td>
    <td>60</td>
    <td>80</td>
    <td>65</td>
  </tr>
  <tr>
    <td>Dog</td>
    <td>95</td>
    <td>20</td>
    <td>15</td>
  </tr>
  <tr>
    <td>Alligator</td>
    <td>8</td>
    <td>40</td>
    <td>90</td>
  </tr>
  <tr>
    <td>Mole</td>
    <td>30</td>
    <td>12</td>
    <td>15</td>
  </tr>
  <tr>
    <td>Black Widow</td>
    <td>100</td>
    <td>30</td>
    <td>69</td>
  </tr>
  </tr>
</table>


In [37]:
# just to remind us ;)
animal_labels = ["Lion", "Elephant", "Hyena", "Mouse", "Pig", "Horse", "Dolphin", "Wasp", "Giraffe", "Dog", "Alligator", "Mole", "Scarlett Johansson", "The Rock"]
animal_cuteness = [80, 75, 10, 60, 30, 50, 90, 1, 60, 95, 8, 30, 100, 50]
animal_size = [50, 95, 30, 8, 30, 65, 45, 1, 80, 20, 40, 12, 30, 100]
animal_ferocity = [85, 20, 90, 1, 10, 30, 20, 100, 65, 15, 90, 15, 69, 100]


# nothing particularly important... just used for visualisation purposes
animal_mean_stats = [np.mean(k) for k in zip(animal_cuteness, animal_size, animal_ferocity)]

In [35]:
fig = go.Figure(data=[go.Scatter3d(
    x=animal_cuteness, y=animal_size, z=animal_ferocity,
    text=animal_labels,
    mode='markers+text',
    marker=dict(
        size=12,
        color=animal_mean_stats,                # set color to an array/list of desired values
        colorscale='Viridis',   # choose a colorscale
        opacity=0.8
    ))
])

fig.update_layout(title="Animal Cuteness vs Animal Size vs Animal Ferocity",
    scene = dict(
    xaxis_title='Animal Cuteness',
    yaxis_title='Animal Size',
    zaxis_title='Animal Ferocity')
)


fig.show()


Pretty neat isn't it? For 3D vectors, we are still able to visualise them with respect to each other space and see differences between different animals that may not have been clear before when ommitting their ferocity. We can extend the ideas we have dealt with so far to be able to compute the mean animal vector given our dataset by taking the average of each individual component. This procedure can be more easily represented with vector notation. Given $\vec{v_{1}}$, $\vec{v_{2}}$, ..., $\vec{v_{n}}$, the mean vector is given as:

$$\vec{v_{\mu}} = \frac{1}{N}\sum_{i=1}^{n} (\vec{v_{i}})$$

We will see this in more detail in the next notebook, by multiplying or dividing vectors by a __scalar__ (single value), is equivalent to applying the operation to each component respectively. Now, let's use this below and see what the mean animal looks like for our example!

In [41]:
# Adding the mean to the dataset
cute_mean, size_mean, ferocity_mean = np.mean(animal_cuteness), np.mean(animal_size), np.mean(animal_ferocity)

animal_labels.append('Mean')
animal_cuteness.append(cute_mean)
animal_size.append(size_mean)
animal_ferocity.append(ferocity_mean)

# nothing particularly important... just used for visualisation purposes
animal_mean_stats = [np.mean(k) for k in zip(animal_cuteness, animal_size, animal_ferocity)]

fig = go.Figure(data=[go.Scatter3d(
    x=animal_cuteness, y=animal_size, z=animal_ferocity,
    text=animal_labels,
    mode='markers+text',
    marker=dict(
        size=12,
        color=animal_mean_stats,                # set color to an array/list of desired values
        colorscale='Viridis',   # choose a colorscale
        opacity=0.8
    ))
])

# Short code script to add lines joining animals to the mean animal
for cuteness, size, ferocity in zip(animal_cuteness[:-1], animal_size[:-1], animal_ferocity[:-1]):
    fig.add_trace(go.Scatter3d(x=[cuteness, cute_mean], y=[size, size_mean], z=[ferocity, ferocity_mean]))

fig.update_layout(title="Animal Cuteness vs Animal Size vs Animal Ferocity",
    scene = dict(
    xaxis_title='Animal Cuteness',
    yaxis_title='Animal Size',
    zaxis_title='Animal Ferocity'),
    showlegend=False
)


fig.show()


While beyond 3D vectors we can no longer visualize them, the intuition behind them remains the same. In data science and machine learning, you may work with data that contains thousands of different dimensions. This is where this newly found vector representation will become really handy!

# Challenges

__Question 1__: Create a 'Vector' class that has the following properties:
- Takes in a standard list as input containing the respective vector components
- Has a magic method \__add\__ that adds another Vector object to the current vector
- Has a magic method \__sub\__ that subtracts another Vector object from the current vector
- Has a method called length that returns the vector length/magnitude of the given vector
- Has a method called dist that returns the distance between the given Vector object and an input Vector object