## n维空间中的距离
- 向量的长度
- 两点之间距离的度量 - 范数距离

20180303

### 1. 简介

- wiki

In linear algebra, functional analysis, and related areas of mathematics, a **norm** is a function that assigns a strictly positive length or size to each vector in a vector space(向量空间)—save for the zero vector, which is assigned a length of zero. A **seminorm(半范数)**, on the other hand, is allowed to assign zero length to some non-zero vectors (in addition to the zero vector).

A norm must also satisfy certain properties pertaining to scalability and additivity which are given in the formal definition below.

A simple example is two dimensional Euclidean space $R^2$ equipped with the "Euclidean norm" (see below) Elements in this vector space (e.g., (3, 7)) are usually drawn as arrows in a 2-dimensional cartesian coordinate(笛卡尔坐标系) system starting at the origin (0, 0). The Euclidean norm assigns to each vector the length of its arrow. Because of this, the Euclidean norm is often known as the magnitude.

A vector space on which a norm is defined is called a normed vector space(赋范线性空间). Similarly, a vector space with a seminorm is called a seminormed vector space. It is often possible to supply a norm for a given vector space in more than one way.

- 补充

我们知道距离的定义是一个宽泛的概念，只要满足非负、自反、三角不等式就可以称之为距离。范数是一种强化了的距离概念，它在定义上比距离多了一条数乘的运算法则。有时候为了便于理解，我们可以把范数当作距离来理解。

在数学上，范数包括向量范数和矩阵范数，向量范数表征向量空间中向量的大小，矩阵范数表征矩阵引起变化的大小。一种非严密的解释就是，对应向量范数，向量空间中的向量都是有大小的，这个大小如何度量，就是用范数来度量的，不同的范数都可以来度量这个大小，就好比米和尺都可以来度量远近一样；对于矩阵范数，学过线性代数，我们知道，通过运算AX=B，可以将向量X变化为B，矩阵范数就是来度量这个变化大小的。

### 2. 定义

Given a vector space $V$ over a subfield $F$ of the complex numbers, a norm on V is a function $p: V \to \mathbb{ R }$(该向量空间中所有向量到实数的映射) with the following properties:

For all a ∈ F and all u, v ∈ V,

1. p(av) = |a| p(v) (being absolutely homogeneous(绝对均匀) or absolutely scalable).
2. p(u + v) ≤ p(u) + p(v) (being subadditive or satisfying the triangle inequality(三角不等式)).
3. p(v) ≥ 0 (being positive or more precisely non-negative(非负性)).
4. If p(v) = 0 then v=0 is the zero vector (being definite or being point-separating).


### 3. Lp 范数

- 在$n$维空间中的所有分量的p次方求和，再开p次方根

${\displaystyle \left\|x\right\|_{p}=\left(|x_{1}|^{p}+|x_{2}|^{p}+\dotsb +|x_{n}|^{p}\right)^{\frac {1}{p}}.} $


### 4. L2范数(欧几里得范数)

这是我们在机器学习中使用的最多的范数，就是我们常说的欧氏距离，n维空间中的L2范数计算方法如下：

${\displaystyle \left\|x\right\|_{2}=\left({x_{1}}^{2}+{x_{2}}^{2}+\dotsb +{x_{n}}^{2}\right)^{\frac {1}{2}} = \sqrt{\sum_{i=1}^{n}{x_i^2}}} $

根据L2范数的定义，常见的代价函数均方误差函数(MSE)也可以写成下面的形式：

$L(\theta) = MSE(\theta) = \frac{1}{m}\sum_{i=1}^{m}{(\hat{y}^{(i)} - y^{(i)})^2} = \frac{1}{m} \sum_{i=1}^{m} {(\theta^T \cdot x^{(i)} - y^{(i)})^2} = \frac{1}{m} ||X \cdot \theta - y||_2^2$

- m表示训练集中样本的数量

上面的式子不仅仅是简化了MSE函数的表示形式，也是从另一个不同的角度来看待预测值与实际值之间的误差的方式。如果训练集中有m个样本点，那么等式中最右边的式子表示整个预测值(可以看做m维向量)与全部真实观察值(也就是训练集中的标签，同样是一个m维的向量)之间的L2范数的平方，再除以m。这样就**把全部预测值和全部观察值当做了m维向量空间中的两个点**了。

- 太有意思了，这是2018年以来的最大发现了！三维空间中的1个点可以用二维平面上的三个点来表示；二维平面上的m个点，可以用m维空间中的一个点来表示！

### 5. 二维空间下的不同范数单位圆

- 可以在这里自己画：https://www.geogebra.org/graphing
  - $l_{2/3}-norm$ 单位圆: `nroot(x^(2 / 3) + y^(2 / 3),2 / 3) = 1` 

- from wiki

![](https://upload.wikimedia.org/wikipedia/commons/d/d4/Vector-p-Norms_qtl1.svg)

### Reference

https://zh.wikipedia.org/wiki/%E8%8C%83%E6%95%B0

https://en.wikipedia.org/wiki/Norm_(mathematics)

http://blog.csdn.net/shijing_0214/article/details/51757564


### 6. 范数的计算

#### 6.1 myself 

In [1]:
import numpy as np

In [2]:
a = np.array([3, 4])
b = np.array([5, 6])

In [3]:
np.sqrt(np.sum(np.square(a - b)))

2.8284271247461903

#### 6.2 sk-learn
- http://scikit-learn.org/stable/modules/generated/sklearn.metrics.pairwise.euclidean_distances.html

In [4]:
from sklearn.metrics.pairwise import euclidean_distances

In [5]:
euclidean_distances(a.reshape(1,-1), b.reshape(1,-1))

array([[ 2.82842712]])

In [6]:
np.random.randint?

[0;31mDocstring:[0m
randint(low, high=None, size=None, dtype='l')

Return random integers from `low` (inclusive) to `high` (exclusive).

Return random integers from the "discrete uniform" distribution of
the specified dtype in the "half-open" interval [`low`, `high`). If
`high` is None (the default), then results are from [0, `low`).

Parameters
----------
low : int
    Lowest (signed) integer to be drawn from the distribution (unless
    ``high=None``, in which case this parameter is one above the
    *highest* such integer).
high : int, optional
    If provided, one above the largest (signed) integer to be drawn
    from the distribution (see above for behavior if ``high=None``).
size : int or tuple of ints, optional
    Output shape.  If the given shape is, e.g., ``(m, n, k)``, then
    ``m * n * k`` samples are drawn.  Default is None, in which case a
    single value is returned.
dtype : dtype, optional
    Desired dtype of the result. All dtypes are determined by their
    name

In [7]:
a2 = np.random.randint(10, size=(3, 2))
# b2 = np.random.randint(10, size=(3, 2))
print(a2)
a2_dis = euclidean_distances(a2, a2)

[[0 2]
 [5 8]
 [0 2]]


In [8]:
import pandas as pd

In [9]:
pd.DataFrame(data=a2_dis, columns=['a', 'b', 'c'], index=['a', 'b', 'c'])

Unnamed: 0,a,b,c
a,0.0,7.81025,0.0
b,7.81025,0.0,7.81025
c,0.0,7.81025,0.0


#### 6.3 scipy
- https://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.distance.euclidean.html

In [10]:
from scipy.spatial import distance

In [11]:
distance.euclidean(a.reshape(1,-1), b.reshape(1,-1))

2.8284271247461903

#### 6.4 numpy
- https://docs.scipy.org/doc/numpy/reference/generated/numpy.linalg.norm.html

In [12]:
from numpy import linalg as LA

In [13]:
LA.norm(a-b, 2)

2.8284271247461903

In [14]:
LA.norm(np.array([1, 0.805750223149824]) - np.array([1, 0.70998168853072]), 2)

0.095768534619104062

#### 6.5 计算两两点之间的距离

In [15]:
def get_cluster_distance(X):
    """
    return mean, min and max distance of a matrix
    """
    m,n = X.shape
    dis = (0, 0, 0)
    if m>=2:
        iu1 = np.triu_indices(m, 1)
        each_node_distance = euclidean_distances(X)[iu1]
        mean_dis = each_node_distance.mean()
        max_dis = each_node_distance.max()
        min_dis = each_node_distance.min()
        dis = (mean_dis, min_dis, max_dis)
    return dis

In [16]:
X = np.arange(10).reshape(5,2)
X

array([[0, 1],
       [2, 3],
       [4, 5],
       [6, 7],
       [8, 9]])

In [17]:
get_cluster_distance(X)  # 每行为一个单独的样本

(5.6568542494923815, 2.8284271247461903, 11.313708498984761)

In [18]:
euclidean_distances(X)

array([[  0.        ,   2.82842712,   5.65685425,   8.48528137,  11.3137085 ],
       [  2.82842712,   0.        ,   2.82842712,   5.65685425,
          8.48528137],
       [  5.65685425,   2.82842712,   0.        ,   2.82842712,
          5.65685425],
       [  8.48528137,   5.65685425,   2.82842712,   0.        ,
          2.82842712],
       [ 11.3137085 ,   8.48528137,   5.65685425,   2.82842712,   0.        ]])