# What is classification?
Classification is a supervised machine learning approach which aims to correctly classify a given input into one of many classes. For example, given the pixel data of an image, the classifier can classify the image as a cat, or a digit.
![img](https://i.imgur.com/2hpxZwW.png)

From a mathematical point of view, a classifier can be seen as one or more decision boundaries. A decision boundary is a line that seperates the inputs of 2 different classes on the graph. For a 2D plot, the decision boundary is a line. For a 3D plot, it is a plane, and so on...
- - - -
The goal of training a classifier is to find these decision boundaries :)

### How is classification different from regression? 
Regression is primarily used for prediction tasks (like predicting house prices) whereas classification is used to find out what class an input belongs to. In other words, regression is used for a continous-valued output, and classification is used for a discrete-valued output.

# Does everyone use the same algorithm to classify?
NO! There's a ton of classification algorithms out there, and one can pick whatever suits their needs the best. Here are some popular ones, in no particular order - 
- Logistic Regression (Don't get confused by the terminology, logistic "regression" is actually a classifier :P)
- Naive Baiyes (Not as "naive" as it sounds, gives pretty good results in most cases. Based on probability.)
- Decision Tree (Consists of several if-else like conditions, which the classifier learns on its own from the given data)
- Random Forests (A forest is a bunch of trees. A Random Forest is a bunch of Decision Trees.)
- Support Vector Machines or SVMs (Can use the "kernel trick" to classify data that is not linearly seperable by mapping it to a higher dimension. ~~Interstellar~~)

# A very simple example

Consider an example where we try to classify a movie as good or bad based on its ratings (R) and its first day collection (C).
The output classes are "Good" = 1 and "Bad" = 0.
Here's the training data:

<table>
    <tr>
        <th>R</th>
        <th>C</th>
        <th>Result</th>
    </tr>
    <tr>
        <td>4</td>
        <td>1000</td>
        <td>1</td>
    </tr>
    <tr>
        <td>2</td>
        <td>250</td>
        <td>0</td>
    </tr>
    <tr>
        <td>3</td>
        <td>700</td>
        <td>1</td>
    </tr>
    <tr>
        <td>5</td>
        <td>600</td>
        <td>1</td>
    </tr>
    <tr>
        <td>2</td>
        <td>450</td>
        <td>0</td>
    </tr>
</table>    

Now let's train a decision tree classifier using scikit-learn and see what the results look like!

In [1]:
# Preparing the input data X and the output data Y. 
# Note that the input data has 2 dimensions, since we have 2 input features.

X = [[4,1000],[2,250],[3,700],[5,600],[2,450]]
Y = [1,0,1,1,0]

from sklearn.tree import DecisionTreeClassifier

dtree = DecisionTreeClassifier()
dtree.fit(X,Y)

DecisionTreeClassifier(class_weight=None, criterion='gini', max_depth=None,
            max_features=None, max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=2,
            min_weight_fraction_leaf=0.0, presort=False, random_state=None,
            splitter='best')

In [3]:
# Let's try classifying a new movie with R=1 and C=50!
X_new = [[1,50]]
prediction = dtree.predict(X_new)
print(prediction)

[0]


In [4]:
# Now let's try a movie that with R=5 and C=2000 (you know it's good :P)
X_new = [[5,2000]]
prediction = dtree.predict(X_new)
print(prediction)

[1]


# We did it! 
Our classifier learned that low ratings combined with low collections mean that a movie is bad. We never explicitly told the classifier about any of these patterns, it figured this out on its own. The results demonstrate how our classifier "learned" what a good movie looks like. The mathematics behind this is quite interesting and is explained really well [here](http://www.doc.ic.ac.uk/~sgc/teaching/pre2012/v231/lecture11.html)