# Classification
## A useful predictor
Classification is a class of very useful Machine Learning Algorithms that has a **discrete** output, which means that the output is limited to a set of values. There are a lot of areas where classification really shines and with the help of these methods, it's possible to create human like decisions in a lot of different applications.

There are a bunch of different classification algorithms such as **Logistic Regression**, **Random Forest**, **K-Nearest Neighbors**. Classification can be either binary (Yes/No) or multi-class, but in general, every classification algorithm can be either. 

## Algorithms

### Logistic Regression
The logistic regression model function $h_\theta(x)$ can be defined with the sigmoid function as:

## $h_\theta(x) = \frac{1}{1 + e^{-\theta^{Tx}}}$

The output will be limited to $0 \le h_\theta(x) \le 1$


The output will be a probability score. 

#### Cost Function
The cost to minimize in Logistic Regression can be defined as:

### $ E(\theta) = \frac{1}{n}\times\displaystyle\sum_{i=1} ^n L(h_\theta(x_i) - y_i)$

### $L(h_\theta(x), y) = \begin{cases}
      -log_2(h_\theta(x)) \;\;if y=1\\
      -log_2(1 - h_\theta(x)) \;\;if y=0\\
    \end{cases} $
    
To explain this in a simple way, when the true value $y=1$ and $h_\theta(x)$ is closer to $1$, the cost will be closer to $0$, i.e lower cost and vice versa. This goes in the opposite direction when the true ouput i $y=0$ and $h_\theta(x)$ is closer to $0$, the cost is closer to $0$ as well, which is dedicated by the piecewise functions if statements. 


#### Multi-class classification
As the cost function suggests Logistic Regression only works for binary classification, that's not true. We can use it for multi-class outputs as well by a simple idea. If we treat the every instance as "one-vs-all", we can predict a probability for every possible class and choose the one that's highest. **There are much more details into this and I will update this section later**.



### K-Nearest Neighbors
Is probably the most simple algorithm in Machine Learning, regardless if you use it as classification or regression, but even if it's pretty dumb, it's very fast. So basically you set the value $K$ and when a new prediction happens, it iterates through all the points closest to the new predictor and picks the majority class of the $K$ closests neighbors.   



### Random Forest / Decision Trees

#### Decision Tree
A decision tree is basically a Q&A algorithm where you can imagine it as a tree or if you are a programmer nested if-else statements. In practice, of course it's not a nested tree of if-elses statements, but in theory it could be implemented in that way, but a decision tree is fortunately way more general. 

A decision tree is created by testing the amount of **impurity** by each split of a region $R$ into two smaller regions $R_1$ & $R_2$. The region to split is chosen by testing the different features and tresholds to find the split with lowest impurity in both regions and then recursively continues until it meets a certain criteria, like there won't be any split that improves the impurity in the next region 2 regions from the previous one. 


#### Random Forest
Is a so called **Ensemble Method** which is a definition of a combination of multiple algorithms working together to produce a more accurate result. A Random Forest are basically $n$ numbers of decision trees built from one dataset, by in a general manner randomly picking different observations for each tree. After that process is done, it checks what the majority of the trees predicts on a new value and picks the **majority vote** as the predicted class.



In [21]:
##Code example with Logistic Regression

import pandas as pd
import matplotlib.pylab as plt
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import MinMaxScaler
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn.decomposition import PCA

df = pd.read_csv("datasets/university-admission-dataset.csv") 

#Extracting the independent and dependent variables
X = df[["math", "english"]].values
y = df["admission"].values

#As there could be a very large difference between the independent features, we need to scale them.
#There is a page available to understand this process in the directory.
scaler = MinMaxScaler()
scaler.fit(X)
X_scaled = scaler.transform(X)

#Split the train data and test data so we can test the accuracy later
X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.2, random_state=42)


#Train the model to find the best optimum parameters for this dataset
model = LogisticRegression()
model.fit(X_train, y_train)

prediction_test = model.predict(X_test)

accuracy_score(prediction_test, y_test)



0.85