In machine learning we often use regularization to contrain or adjust $\theta$. Regularization is used to keep $\theta$ small, which is similiar to keeping them low in complexity.

## Norms

### L1

an $l^1$ is simply the sum of the absolute values $$ \sum_{i=1}^n \lvert x_i \rvert$$ and is notated for vector **x** as $$ \lVert \mathbf{x} \rVert^1 $$

In [2]:
import numpy as np
from numpy.linalg import norm

In [1]:
a = [1, -2, 3]
a

[1, -2, 3]

In [3]:
transform = lambda x: abs(x)
transform(a[1])

2

In [5]:
#manual regularization
sum(transform(xi) for xi in a)

6

In [7]:
#package reg
norm(a, 1)

6.0

### L2

an $l^2$ is the square root of the sum of the squared values $$\sqrt{ \sum_{i=1}^n x_i^2}$$ and is notated for vector **x** as $$ \lVert \mathbf{x} \rVert^2 $$ If no subscript is given, this is the norm that is usually defaulted to.

In [8]:
a

[1, -2, 3]

In [11]:
transform_2 = lambda x: x**2
transform_2(a[1])

4

In [13]:
#manual 
np.sqrt(sum(transform_2(xi) for xi in a))

3.7416573867739413

In [14]:
#package
norm(a,2)

3.7416573867739413

### Max

an $l^\infty$ is the maximum vector value of the absolute values $$max(\rvert \mathbf{x} \lvert)$$ and is notated for vector **x** as $$ \lVert \mathbf{x} \rVert^\infty $$

In [20]:
max(transform(xi) for xi in a)

3

In [22]:
norm(a, np.inf)

3.0