# Linear multiclass classifier
In this assignment, we implement another machine learning model, the linear classifier. The linear classifier selects weights for each class by which to multiply the value of each feature and then add them together. The class with the higher sum is the prediction of the model.

## Softmax function

To start with, we need a softmax function, which takes as input the estimates for each class and converts them into probabilities from $0$ to $1$:

$$
\sigma(z)_n = \frac{e^{z_n}}{\sum_{i=1}^N e^{z_n}} \ \ \text{for} \ n = 1, \ldots, N
$$

**Nota bene:** The practical aspect of calculating this function is that it involves calculating an exponent from potentially very large numbers - this can lead to very large values in the numerator and denominator outside the float range.

Fortunately, there is a simple solution to this problem - subtract the maximum value among all the grades before calculating the softmax:
```
predictions -= np.max(predictions)
```
(more here - http://cs231n.github.io/linear-classify/#softmax, section `Practical issues: Numeric stability`)

In [1]:
#!source<mlpractice.linear_classifier.softmax>

In [None]:
from mlpractice.tests.linear_classifier.test_softmax import test_all

test_all(softmax)

## Cross entropy loss

In addition, we implement a `cross-entropy loss`, which we will use as an error function. In general terms

$$
H(p, q) = \sum\limits_x p(x)\log q(x)
$$

where:
- $x$ - all classes
- $p(x)$ - true probability of the sample belonging to class x
- $q(x)$ - the probability of belonging to class x predicted by the model.
In our case the sample belongs to only one class whose index is passed to the function. For this class $p(x)$ is $1$, and for the other classes it is $0$. 

This makes it easier to implement the function!

In [None]:
#!source<mlpractice.linear_classifier.cross_entropy_loss>

In [None]:
from mlpractice.tests.linear_classifier.test_cross_entropy_loss import test_all

test_all(cross_entropy_loss)

Once we have implemented the functions themselves, we can implement the gradient.

It turns out that calculating the gradient becomes much easier if we combine these functions into one, which first calculates the probabilities via `softmax` and then uses them to calculate the error function via `cross-entropy loss`.

This `softmax_with_cross_entropy` function will return both the error value and the gradient with respect to input parameters.

In [None]:
#!source<mlpractice.linear_classifier.softmax_with_cross_entropy>

In [None]:
from mlpractice.tests.linear_classifier.test_softmax_with_cross_entropy import test_all

test_all(softmax_with_cross_entropy)

### Finally, let's implement the linear classifier!

`softmax` and `cross-entropy` takes predictions as input given by linear classifier. It does this very simply: for each class there is a set of weights by which to multiply the pixels of the picture and add up. The resulting number is the prediction of the class that goes to the softmax input. Thus, a linear classifier can be represented as a multiplication of a vector with pixels by a matrix W of size `num_features, num_classes`. This approach is easily extended to the case of a batch vector with pixels X of size `batch_size, num_features`:


In [None]:
#!source<mlpractice.linear_classifier.linear_softmax>

In [None]:
from mlpractice.tests.linear_classifier.test_linear_softmax import test_all

test_all(linear_softmax)

## l2_regularization description

In [None]:
#!source<mlpractice.linear_classifier.l2_regularization>

In [None]:
from mlpractice.tests.linear_classifier.test_l2_regularization import test_all

test_all(l2_regularization)

## LinearSoftmaxClassifier description

In [None]:
#!source<mlpractice.linear_classifier.LinearSoftmaxClassifier>

In [None]:
# Some testing code