# Algorithm Tour

In [15]:
import numpy as np

## Support Vector Machine (SVM)

### Concept

Effective in high dimensional spaces.

Still effective in cases where number of dimensions is greater than the number of samples.

Uses a subset of training points in the decision function (called support vectors), so it is also memory efficient.

Versatile: different Kernel functions can be specified for the decision function. Common kernels are provided, but it is also possible to specify custom kernels.

If the number of features is much greater than the number of samples, avoid over-fitting in choosing Kernel functions and regularization term is crucial.

SVMs do not directly provide probability estimates

Support Vector Machine algorithms are not scale invariant, so it is highly recommended to scale your data.

### Hello world!

In [6]:
import pandas as pd

In [1]:
from sklearn import svm

In [2]:
X = [[0, 0], [2, 2]]
y = [0.5, 2.5]

In [11]:
df = pd.concat([pd.DataFrame(X), pd.Series(y)],axis=1)

In [12]:
df.columns = ['x1','x2','y']

In [13]:
df

Unnamed: 0,x1,x2,y
0,0,0,0.5
1,2,2,2.5


In [3]:
regr = svm.SVR()
regr.fit(X, y)

SVR(C=1.0, cache_size=200, coef0=0.0, degree=3, epsilon=0.1, gamma='scale',
    kernel='rbf', max_iter=-1, shrinking=True, tol=0.001, verbose=False)

In [4]:
regr.predict([[1, 1]])

array([1.5])

In [17]:
a = np.ones((2,2))

In [20]:
(a - a.min())/a.max()

array([[0., 0.],
       [0., 0.]])

In [23]:
a = np.cumsum(np.ones(10)).reshape((2,5))

In [24]:
(a - a.min())/a.max()

array([[0. , 0.1, 0.2, 0.3, 0.4],
       [0.5, 0.6, 0.7, 0.8, 0.9]])

In [27]:
a/a.max()

array([[0.1, 0.2, 0.3, 0.4, 0.5],
       [0.6, 0.7, 0.8, 0.9, 1. ]])

### Implementation

Follow general supervised learning steps

- split data
- convert to proper array container
- pick a kernel function (primary degree of freedom): pick linear, then try RBF.
    - RBF effectively lifts your data into a higher dimensional space where it's easier to final
    a hyperplace that separates the classes
    - RBF works by creating a radial basis function for each point
    - weighting them
    - and multipling them together
- feed to implementation

## Clustering (K-means, HDBSCAN)

### Concept

- turn the data into a set of features per sample_id
- pick a distance function
- create a similarity or distance matrix
- apply the algorithm

- 1: randomly define centroids
- 2: assign samples to their nearest centroid
- 3: calculate the mean of the groups as new centroids
- 4: repeat 2 and 3 until centroids do not differ

In basic terms, the algorithm has three steps. The first step chooses the initial centroids, with the most basic method being to choose  samples from the dataset . After initialization, K-means consists of looping between the two other steps. The first step assigns each sample to its nearest centroid. The second step creates new centroids by taking the mean value of all of the samples assigned to each previous centroid. The difference between the old and the new centroids are computed and the algorithm repeats these last two steps until this value is less than a threshold. In other words, it repeats until the centroids do not move significantly.

### Hello world!

### Implementation

## Dimensionality Reduction (PCA)

### Concept

### Hello world!

### Implementation

## Gaussian Process (GP)

### Concept

The advantages of Gaussian processes are:

    - The prediction interpolates the observations (at least for regular kernels).

    - The prediction is probabilistic (Gaussian) so that one can compute empirical confidence intervals and decide based on those if one should refit (online fitting, adaptive fitting) the prediction in some region of interest.

    - Versatile: different kernels can be specified. Common kernels are provided, but it is also possible to specify custom kernels.

The disadvantages of Gaussian processes include:

    - They are not sparse, i.e., they use the whole samples/features information to perform the prediction.

    - They lose efficiency in high dimensional spaces – namely when the number of features exceeds a few dozens.

### Hello world!

### Implementation

## Gradient Boosting Machine (GBM)

### Concept

### Hello world!

### Implementation

## Attention

### Concept

### Hello world!

### Implementation

## Convnet

### Concept

- each layer has a receptive field, over which it applies a convolution operation
- it runs this operation across the entire image or sequence
- the output is a lower dimensional representation of the input, what the representation is depends on what the network is optimizing for
- stacking these nets forms a hierarchy of representations
- with depth comes challenges in training
    - vanishing gradients:
    - the need for residual connections or skip connections

### Hello world!

### Implementation