# Classification
Classification is defined as such:

> Given *n<sub>c</sub>* different classes, a classifier algorithm builds a model that **predicts** for every unlabelled instance *i* the class *C* to which it belongs, with a certain degree of accuracy.

In other words, we build a classification model in order to predict a *label* given an *instance*, where *label* is how we classify a certain record (yes/no are an example of binomial labels, and a record can be classified according to whether it belongs or not to a certain category) and *instance* is a table record, containing information regarding the element we are analyzing. 

An instance is defined by **features**, which are what the classifier uses to decide how to classify the instance.

![class_example](../images/classification_example.png)

Many different kinds of classifier exist. Most of them can be found in the python library **scikit-learn**, which can be imported by simply running

> from sklearn import <what-you-need>

Most scikit-learn classifiers implement two methods:

- **fit(train_data, train_label)** - method for the training phase, it accepts as argument a training dataset on which it applies an algorithm defined in the *fit* method itself.
- **predict(test_data)** - method for the prediction phase, applies the previous algorithm to the test_data in order to predict its labels.

### Majority Class Classifier
The **majority class classifier** is the most basic classifier:

- in the **training phase**, it computes the majority class of the dataset - the label that appears the most in the training dataset
- in the **prediction phase**, it ouputs the majority class of the dataset - outputs an array containing as many values as the test datasets, and all of them have value equal to the label previously found in the training phase.

This is implemented in the *sickit-learn* library via the **DummyClassifier**, with **strategy='most_frequent'** option.

```python
from sklearn.dummy import DummyClassifier
mjclass = DummyClassifier(strategy="most_frequent")
```

### k-Nearest Neighbours [k-NN Classifier]

In k-NN classifier, the *k* elements closest to the instance analysed are used to predict, by majority class, the label of the instance.

- in the **training phase**, all instances are stored in memory
    - it is not appropriate for big data, due to huge requirements in memory space
- in the **prediction phase**, a majority classifier is applied on the k-nearest instances
    - the choice of *k* is very important, and is best done by first inspecting the data
        - large K value reduces overall *noise*
        - generally K is between 3-10
    - a definition of k-nearest is in need, as well as an algorithm to compute said distance (examples are *Euclidean d.*, *Manhattan d.*, *Minkowski d.*, *Hamming d.* (only for categorical variables))
    - distance computation can be computationally intensive
    
#### Implementation
    
The k-NN classifier is implemented in the *scikit-learn* library in **sklearn.neighbors**, which contain **KNeighborsClassifier**:

```python
from sklearn.neighbors import KNeighborsClassifier
knnClass = KNeighborsClassifier(n_neighbors=3, metric='euclidean')
```

### Naïve Bayes

Naïve Bayes classifiers are a family of simple probabilistic classifiers, based on applying Bayes' theorem, but with strong independence assumptions between the features - the features should not influence each other in any way (low correlation among features).

![nbtheorem](https://www.analyticsvidhya.com/wp-content/uploads/2015/09/Bayes_rule-300x172-300x172.png)

- P(c|x) is the **posterior probability of class** (c, target) given predictor (x, attributes) - probability of the instance x to belong to class c given its *x<sub>n</sub>* attributes.
- P(c) is the **prior probability of class**.
- P(x|c) is the **likelihood** which is the **probability of predictor given class** - probability of an instance of class x to have the attributes *x<sub>n</sub>* like in instance x.
- P(x) is the **prior probability of predictor**.

What this classifier does is estimating the probability of observing attribute *x<sub>n</sub>* and the prior probability *P(c)*.

#### Naïve Bayes algorithm: an example
![nb_example](../images/nb_example.png)
With the training dataset in the first image, we need to classify (AKA predict) whether players will play or not based on weather condition. A few easy steps can be followed to do that:

1. convert the data from the default dataset (image1) to a frequency table (image2)
2. build the Likelihood table (image3) by finding the probabilities of each weather (sum rows/columns and divide by total cases) - these are the P(x|c) of the Naive Bayes equation
3. Calculate the posterior probability for each class with the Naive Bayes equation. The class with the highest posterior probability is the outcome of prediction.

So, if we want to predict whether players will paly if the weather is sunny, our equation would become:

> P(Yes|Sunny) = P(Sunny|Yes) \* P(Yes)/P(Sunny) = 3/9 \* 9/14 \* 5/14 = 0.6

#### Multinomial Naïve Bayes

A particular case of Naïve Bayes classifier is the Multinomial Nave Bayes: it is used for document classification, because it computes the probability of a document *d* of being in class *c*. The document is considered as a bag-of-words: the estimation then becomes the probability of observing word *w* and the prior probability *P(c)*:

![mnb_eq.png](../images/mnb_eq.png)

where *n<sub>w<sub>d</sub></sub>* is the number of times the word *w* appears in the document *d*.

#### Pros/cons and applications

Naive Bayes model is easy to build and particularly useful for very large data sets. Along with simplicity, Naive Bayes is known to outperform even highly sophisticated classification methods, if the assumption of independence holds. It perform well in case of categorical input variables compared to numerical variable(s).

Naive Bayes algorithms are very useful for:

- Real time prediction - it's very fast
- Multi class prediction
- Text classification/Spam filtering/Sentiment Analysis
- Recommendation System

#### Implementation

The **sklearn.naive_bayes** module implements Naïve Bayes algorithms in the *scikit-learn* library. There are a few different classifiers based on the Naïve Bayes algorithm, among which:

- GaussianNB() - usually use this
- BernoulliNB() - for multivariate Bernoulli models
- MultinomialNB() - for multinomial models

```python
from sklearn.naive_bayes import GaussianNB
nbClass = GaussianNB()
```

### Perceptron / Neuron