### Geometry of Vectors

We refer them as `row` or `col` vector 
\begin{split}\mathbf{x} = \begin{bmatrix}1\\7\\0\\1\end{bmatrix},\end{split}


In [1]:
x = [1, 7, 0, 1]

### Dot Products and Angles
If we take two `col` vectors `u` and `v`, we can perform their dot product 

$\mathbf{u}^\top\mathbf{v} = \sum_i u_i\cdot v_i.$

The `dot` product also has a geometric interpretation, it is closely related to the `angle` b/w the two vectors.
Consider this

<img src="http://www.d2l.ai/_images/vec-angle.svg"/>

We take two specific vectors

$\mathbf{v} = (r,0) \; \text{and} \; \mathbf{w} = (s\cos(\theta), s \sin(\theta)).$

The vector `v` is of length `r` and runs parallel to the `x-axis`, and vector `w` is of length `s` and at angle $\theta$. If we compute the `dot` product of these vectors, we get

$\mathbf{v}\cdot\mathbf{w} = rs\cos(\theta) = \|\mathbf{v}\|\|\mathbf{w}\|\cos(\theta).$

In [2]:
import torch
import torchvision
from IPython import display
from torchvision import transforms

In [3]:
# Angle b/w two vectors
def angle(v, w):
    return torch.acos(v.dot(w) / (torch.norm(v) * torch.norm(w)))

angle(torch.tensor([0, 1, 2], dtype=torch.float32), torch.tensor([2.0, 3, 4]))

tensor(0.4190)

### Cosine Similarity
In ML, where the angle is used to measure the `closeness` of two vectors, we use `cosine similarity`.

$\cos(\theta) = \frac{\mathbf{v}\cdot\mathbf{w}}{\|\mathbf{v}\|\|\mathbf{w}\|}.$

The `cosine` takes `max = 1` when two vectors point in the same direction, a `min = -1` when they point in opposite directions, and add a value of `0` when the two vectors are `orthogonal`. If the components of high dimensional vectors are sampled randomly with mean `0`, their `cosine` will nearly be close to `0`.

### HyperPlanes
They are a generalization to higher dimensions of a `line` or of a `plane`. In `d` dimensional vector space, a hyperplane has `d-1` dimensions.

Suppose we have a `col` vector $\mathbf{w}=[2,1]^\top$, we want to know what are the points `v` with `w.v = 1`?. By recalling the connection b/w `dot` products and `angles`. we can see that this is equivalent to

$\|\mathbf{v}\|\|\mathbf{w}\|\cos(\theta) = 1 \; \iff \; \|\mathbf{v}\|\cos(\theta) = \frac{1}{\|\mathbf{w}\|} = \frac{1}{\sqrt{5}}.$

<img src="http://www.d2l.ai/_images/proj-vec.svg"/>

If we consider the geometric meaning of the expression, we see it is equivalent to say that the length of the projection of `v` onto the direction of `w` is exactly $1/\|\mathbf{w}\|$. The set of all points where this is true is a line at right angles to the vector `w`. 

When we look at what happens when we ask about a set of points with $\mathbf{w}\cdot\mathbf{v} > 1$ or $\mathbf{w}\cdot\mathbf{v} < 1$, we can see that there are cases where the projections are longer or shorter than $1/\|\mathbf{w}\|$. In this way, we have found a way to cut our space into two halves, where all the points on one side have dot product below a threshold.
<img src="http://www.d2l.ai/_images/space-division.svg"/>

The story of high dimension is also the same. If we take $\mathbf{w} = [1,2,3]^\top$ and ask about the points in 3D with $\mathbf{w}\cdot\mathbf{v} = 1$, we obtain a plane at `right` angles to the vector `w`. The two inequalities define the two sides of the plane
<img src="http://www.d2l.ai/_images/space-division-3d.svg"/>

We can understand `linear classification models` as methods to find `hyperplanes` that separate the different target classes. In this, such `hyperplanes` are referred to as `decision planes`. These models end with a `linear` layer fed into a `softmax`, so one can interpret the role of the deep neural network to be find a `non-linear` embedding such that the `target classes` can be seperated cleanly by hyperplanes. 

For example, below we classify MNIST dataset by just taking the vector b/w their means to define the `decision plane`.

In [5]:
trans = transforms.Compose([
    transforms.ToTensor()
])

train = torchvision.datasets.FashionMNIST(root='data/', transform=trans, train=True, download=True)
test = torchvision.datasets.FashionMNIST(root='data/', transform=trans, train=False, download=True)

X_train_0 = torch.stack(
    [x[0] * 256 for x in train if x[1] == 0]).type(torch.float32)
X_train_1 = torch.stack(
    [x[1] * 256 for x in train if x[1] == 1]).type(torch.float32)
X_test = torch.stack(
    [x[0] * 256 for x in test if x[1] == 0 or x[1] == 1]).type(torch.float32)
y_test = torch.stack([torch.tensor(x[1]) for x in test
                      if x[1] == 0 or x[1] == 1]).type(torch.float32)

# Compute averages
ave_0 = torch.mean(X_train_0, axis=0)
ave_1 = torch.mean(X_train_1, axis=0)

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-images-idx3-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-images-idx3-ubyte.gz to data/FashionMNIST\raw\train-images-idx3-ubyte.gz


  0%|          | 0/26421880 [00:00<?, ?it/s]

Extracting data/FashionMNIST\raw\train-images-idx3-ubyte.gz to data/FashionMNIST\raw

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-labels-idx1-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-labels-idx1-ubyte.gz to data/FashionMNIST\raw\train-labels-idx1-ubyte.gz


  0%|          | 0/29515 [00:00<?, ?it/s]

Extracting data/FashionMNIST\raw\train-labels-idx1-ubyte.gz to data/FashionMNIST\raw

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-images-idx3-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-images-idx3-ubyte.gz to data/FashionMNIST\raw\t10k-images-idx3-ubyte.gz


  0%|          | 0/4422102 [00:00<?, ?it/s]

Extracting data/FashionMNIST\raw\t10k-images-idx3-ubyte.gz to data/FashionMNIST\raw

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-labels-idx1-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-labels-idx1-ubyte.gz to data/FashionMNIST\raw\t10k-labels-idx1-ubyte.gz


  0%|          | 0/5148 [00:00<?, ?it/s]

Extracting data/FashionMNIST\raw\t10k-labels-idx1-ubyte.gz to data/FashionMNIST\raw



TypeError: expected Tensor as element 0 in argument 0, but got int