# Weights, Mean and Normalization

In [38]:
import numpy as np
np.random.seed(1)

# Weights

Weight is a positive scalar value, (i.e. a single real number). The word *scalar* is used is to emphasize that it will be used to multiply a *vector* (to produce a new vector). 

Often a series of weights $w_i$ (with $i=1 \ldots n$) is *normalized*, that is, for all $i$, $0 \leq w_i \leq 1$ and $\sum w_i = 1$.


## Applying Weights

### Weighted Sum 

### $\sum w_i x_i$

### Weighted Average

### $\overline{x} = \frac{\sum w_i x_i}{\sum w_i}$


## Normalizing weights

### $w_i = \frac{w_i}{\sum w_i}$

### $w_i = \frac{w_i}{\sum w_i} x$ 
(**Normalize weights to be sum of $x$**. If you can sum to 1, you can sum to anything.)

Q: What if $w$ has nagative? e.g. $w = [0.6, 100, 12, -2, 3456]$

In [40]:
def normalize(w: np.ndarray):
    """Normalize w to be 0 <= w <= 1 and sum(w) = 1."""
    assert w.ndim == 1
    w = w - w.min() + 1 # Remove negative before ?
    return w / w.sum()

In [49]:
x = np.array([0.6, 100, 12, -2, 34])
x

array([  0.6, 100. ,  12. ,  -2. ,  34. ])

In [50]:
normalize(x)

array([0.02255639, 0.64536341, 0.09398496, 0.00626566, 0.23182957])

In [51]:
normalize(x).sum()

1.0

In [55]:
x = np.array([2, 2, 2, 2, 2])
normalize(x)

array([0.2, 0.2, 0.2, 0.2, 0.2])

In [53]:
x = np.array([2, 3, 4, 0.5, 0.5])
normalize(x)

array([0.2 , 0.28, 0.36, 0.08, 0.08])

In [54]:
x / x.sum()

array([0.2 , 0.3 , 0.4 , 0.05, 0.05])

**So the conclusion is**:

- $\sum w_i$ can not be 0, e.g. when $w = [-1, 1]$, 
- Negative values are not accepted as weights

## Computing the optimal weight $\alpha$ and $1 - \alpha$


Given three $n$ dimensional vectors $f$, $g$ and $d$, compute the optimal weight $\alpha$ that miminize the root mean square error between 
$\alpha f + (1 - \alpha) g$ and $d$. i.e. $f$ and $g$ are both forecasts of $n$ time steps where the ture demand is $d$, we need to find the optimal mix of two forecasts that fits the demand.

### Solving the problem analytically

$rmse(\alpha) = \sqrt{\frac{1}{n} \sum{[\alpha f_n + (1 - \alpha) g_n - d_n]^2}}$

Since $rmse(\alpha)$ is a quadratic function which has a global minimum, we can compute $\alpha$ by setting it's derivative to 0. 

Rearange the function to be:

$r(\alpha) = [{\sum \frac{1}{n} (\alpha f_n + (1 - \alpha) g_n - d_n)^2}]^\frac{1}{2}$

let $p(h) = h^\frac{1}{2}$, $h(\alpha) = \sum \frac{1}{n} (\alpha f_n + (1 - \alpha) g_n - d_n)^2$

$r'(\alpha) = \frac{dp}{dh} \frac{dh}{d\alpha}$, where

$\frac{dp}{dh} = \frac{1}{2 \sqrt{h}}$

$\frac{dh}{d\alpha} = \sum \frac{d}{d\alpha} [\frac{1}{n} (\alpha f_n + (1 - \alpha) g_n - d_n)^2]$

$\frac{dh}{d\alpha} = \sum \frac{1}{n} 2(\alpha f_n + (1 - \alpha) g_n - d_n)(f_n - g_n)$

$\frac{dh}{d\alpha} = \frac{2}{n} \sum{(f_n - g_n)^2\alpha + (f_n - g_n)(g_n - d_n)}$

Thus:

$r'(\alpha) = \frac{1}{n \sqrt{h}} \sum{(f_n - g_n)^2\alpha + (f_n - g_n)(g_n - d_n)}$

When $r'(\alpha) = 0$, it can only be when the numerator is $0$. Thus:

$\sum{(f_n - g_n)^2\alpha + (f_n - g_n)(g_n - d_n)} = 0$

$\sum{(f_n - g_n)^2 \alpha} = - \sum{(f_n - g_n)(g_n - d_n)}$

finally,

### $\alpha = - \frac{\sum{(f_n - g_n)(g_n - d_n)}}{\sum{(f_n - g_n)^2}}$

In [2]:
d = np.random.randint(low=0, high=100, size=(10,))
f = d - 0.7 * d
g = (d - 0.3 * f) / 0.7 # set alpha to 0.3

print(f'd: {d}')
print(f'f: {f}')
print(f'g: {g}')

d: [37 12 72  9 75  5 79 64 16  1]
f: [11.1  3.6 21.6  2.7 22.5  1.5 23.7 19.2  4.8  0.3]
g: [ 48.1  15.6  93.6  11.7  97.5   6.5 102.7  83.2  20.8   1.3]


In [14]:
def compute_alpha(f: np.ndarray, g: np.ndarray, d: np.ndarray):
    numerator = -1 * ((f - g) * (g - d)).sum()
    denominator = ((f - g)**2).sum()
    return numerator / denominator

In [15]:
compute_alpha(f, g, d)

0.3

# Mean


## Arithmetic Mean

### $\overline{x} = \frac{\sum\limits_{i}^{n} x_i}{n}$

## Harmonic mean

### $H = \frac{n}{\frac{1}{x_1} + \frac{1}{x_2} + ... + \frac{1}{x_n}}= \frac{n}{\sum\limits_{i}^{n} \frac{1}{x_i}}$

Q: When to use harmonic mean?

In short, harmonic mean is used when you'd like to mix some numbers computed on differnt criteria. In ML, *F1* score is the harmonic mean of *precision* and *recall*. As they are computed differently, to mix them as one single value to reflect the performance of a classifier, you want the *F1* score to be high when both of them are high.

Check this [post](https://stackoverflow.com/questions/26355942/why-is-the-f-measure-a-harmonic-mean-and-not-an-arithmetic-mean-of-the-precision) for more details.


In [63]:
precision = 0.9
recall = 0.1

In [64]:
# arithmetic mean
(precision + recall) / 2

0.5

In [65]:
# harmonic mean
2 / (1/precision + 1/recall)

0.18

## Geometric mean

### $G = (\prod\limits_{i}^{n} x_i)^{-n} = ^n\sqrt{x_1 x_2 ... x_n}$

Q: When to use geometric mean?

When mixing numbers which have big difference in terms of scale and you don't want to loose information on smaller numbers. 

In [101]:
g1 = np.array([10, 200, 2430, 3350, 10000]).astype('float32')
g2 = np.array([1, 20, 243, 335, 10000000]).astype('float32')

In [102]:
g1.mean()

3198.0

In [103]:
g2.mean()

2000119.8

In [104]:
g1.prod()**(1/5)

695.5625857987039

In [105]:
g2.prod()**(1/5)

438.87032657778997

## Fun facts

Given two positive values $x_1$ and $x_2$, thier Harmonic mean is:

$H = \frac{2}{\frac{1}{x_1} + \frac{1}{x_2}} = \frac{2 x_1 x_2}{x_1 + x_2}$

thier Arithmetic mean is:

$A = \frac{x_1 + x_2}{2}$

and thier Geometric mean is:

$G = \sqrt{x_1 x_2}$

and we have:

$H = \frac{G^2}{A}$

# Normalization

## Min Max Normalization (Feature Scaling)

Rescale inputs to be between $[0, 1]$


### $x' = \frac{x - min(x)}{max(x) - min(x)}$

Reverse scaling:

### $x = a + \frac{(x' - min(x'))(b - a)}{max(x') - min(x')}$ 

where $a = min(x)$, $b = max(x)$


In [18]:
def min_max_scaling(x: np.ndarray):
    a = x.min()
    b = x.max()
    return (x - a) / (b - a), a, b

def re_scale(x: np.ndarray, a, b):
    return a + ((x - x.min()) * (b - a)) / (x.max() - x.min())

In [9]:
x = np.random.randn(10)
x

array([-0.3224172 , -0.38405435,  1.13376944, -1.09989127, -0.17242821,
       -0.87785842,  0.04221375,  0.58281521, -1.10061918,  1.14472371])

In [12]:
_x, a, b = min_max_scaling(x)
_x.round(2)

array([0.35, 0.32, 1.  , 0.  , 0.41, 0.1 , 0.51, 0.75, 0.  , 1.  ])

In [19]:
re_scale(_x, a, b)

array([-0.3224172 , -0.38405435,  1.13376944, -1.09989127, -0.17242821,
       -0.87785842,  0.04221375,  0.58281521, -1.10061918,  1.14472371])

## Standarlization

Normalize inputs to have **0 mean** and **1 standard deviation**. Do not change scale of the inputs.

Q: When do we need to use standardlization?

### $x' = \frac{x - \overline{x}}{\sigma}$

### $x' = \frac{x - \overline{x}}{max(x) - min(x)}$
(**Mean Normalization**)

In [20]:
def standarlize(x: np.ndarray):
    return (x - np.mean(x)) / np.std(x)

In [32]:
x = np.random.randn(10)
x

array([ 1.46210794, -2.06014071, -0.3224172 , -0.38405435,  1.13376944,
       -1.09989127, -0.17242821, -0.87785842,  0.04221375,  0.58281521])

In [33]:
np.mean(x)

-0.16958838211535887

In [34]:
np.std(x)

0.9991398390871093

In [35]:
_x = standarlize(x)
_x

array([ 1.63310105, -1.89217991, -0.15296039, -0.21465061,  1.30447989,
       -0.93110378, -0.00284227, -0.70887979,  0.21198447,  0.75305134])

In [36]:
np.mean(_x).round(2)

-0.0

In [37]:
np.std(_x)

1.0