# Weight, Mean, Normalization and Distance

Contents:

1. How to compute weights?
    - What about nagative values in input? e.g. `[0.6, 100.0, 12, -2.0, 3456.0]`
2. Arithmetic mean, Harmonic mean, Geometric mean
3. Normalization
    - Normalize weights
    - Min Max normalization
    - Standarlization
    - Normalize to be sum of x
4. Distances








In [1]:
import numpy as np

# Weights

Weight is a positive scalar value, (i.e. a single real number). The word *scalar* is used is to emphasize that it will be used to multiply a *vector* (to produce a new vector). 

Often a series of weights $w_i$ (with $i=1 \ldots n$) is *normalized*, that is, for all $i$, $0 \leq w_i \leq 1$ and $\sum w_i = 1$.


## Applying Weights

### Weighted Sum 

$\sum w_i x_i$

### Weighted Average

$\overline{x} = \frac{\sum w_i x_i}{\sum w_i}$


## Normalizing weights

$w_i = \frac{w_i}{\sum_i^n w_i}$

Q: What if $w$ has nagative? e.g. $w = [0.6, 100, 12, -2, 3456]$

In [48]:
def normalize(w: np.ndarray):
    """Normalize w to be 0 <= w <= 1 and sum(w) = 1."""
    assert w.ndim == 1
    assert
    w = w - w.min() + 1 # Remove negative before ?
    return w / w.sum()

In [49]:
x = np.array([0.6, 100, 12, -2, 34])
x

array([  0.6, 100. ,  12. ,  -2. ,  34. ])

In [50]:
normalize(x)

array([0.02255639, 0.64536341, 0.09398496, 0.00626566, 0.23182957])

In [51]:
normalize(x).sum()

1.0

In [55]:
x = np.array([2, 2, 2, 2, 2])
normalize(x)

array([0.2, 0.2, 0.2, 0.2, 0.2])

In [53]:
x = np.array([2, 3, 4, 0.5, 0.5])
normalize(x)

array([0.2 , 0.28, 0.36, 0.08, 0.08])

In [54]:
x / x.sum()

array([0.2 , 0.3 , 0.4 , 0.05, 0.05])

**So the conclusion is**:

- $\sum w_i$ can not be 0, e.g. when $w = [-1, 1]$, 
- negative values are not accepted as weights

# Mean


## Arithmetic Mean

$\overline{x} = \frac{\sum_i^n x_i}{n}$

## Harmonic mean

$H = \frac{n}{\frac{1}{x_1} + \frac{1}{x_2} + ... + \frac{1}{x_n}}= \frac{n}{\sum_i^n \frac{1}{x_i}}$

Q: When to use harmonic mean?

In short, harmonic mean is used when you'd like to mix some numbers computed on differnt criteria. In ML, *F1* score is the harmonic mean of *precision* and *recall*. As they are computed differently, to mix them as one single value to reflect the performance of a classifier, you want the *F1* score to be high when both of them are high.

Check this [post](https://stackoverflow.com/questions/26355942/why-is-the-f-measure-a-harmonic-mean-and-not-an-arithmetic-mean-of-the-precision) for more details.


In [63]:
precision = 0.9
recall = 0.1

In [64]:
# arithmetic mean
(precision + recall) / 2

0.5

In [65]:
# harmonic mean
2 / (1/precision + 1/recall)

0.18

## Geometric mean

$G = (\prod_i^n x_i)^{-n} = ^n\sqrt{x_1 x_2 ... x_n}$

Q: When to use geometric mean?

When mixing numbers which have big difference in terms of scale and you don't want to loose information on smaller numbers. 

In [101]:
g1 = np.array([10, 200, 2430, 3350, 10000]).astype('float32')
g2 = np.array([1, 20, 243, 335, 10000000]).astype('float32')

In [102]:
g1.mean()

3198.0

In [103]:
g2.mean()

2000119.8

In [104]:
g1.prod()**(1/5)

695.5625857987039

In [105]:
g2.prod()**(1/5)

438.87032657778997

## Fun facts

Given two positive values $x_1$ and $x_2$, thier Harmonic mean is:

$H = \frac{2}{\frac{1}{x_1} + \frac{1}{x_2}} = \frac{2 x_1 x_2}{x_1 + x_2}$

thier Arithmetic mean is:

$A = \frac{x_1 + x_2}{2}$

and thier Geometric mean is:

$G = \sqrt{x_1 x_2}$

and we have:

$H = \frac{G^2}{A}$

# Normalization vs Standarlization