## Logistic Regression

It calculates the probability that a given value belongs to a specific class. If the probability is more than 50%, it assigns the value in that particular class else if the probability is less than 50%, the value is assigned to the other class. Therefore, we can say that logistic regression acts as a binary classifier.

##### Working of a Logistic Model
For linear regression, the model is defined by:
$y = \beta_0 + \beta_1 x  $       ...............(i)

and for logistic regression, we calculate probability, i.e. y is the probability of a given variable x belonging to a certain class. Thus, the value of y should lie between 0 and 1.

But, when we use equation (i) to calculate probability, we would get values less than 0 as well as greater than 1.

#### Sigmoid function 

We use the sigmoid function as the underlying function in Logistic regression. Mathematically and graphically, it is shown as:

$$ Sigmoid = \frac{1}{1+e^{-t}} $$

<img src="sigmoid.PNG" width="400">

**Why do we use the Sigmoid Function?**

1.	The sigmoid function’s range is between 0 and 1. Thus it’s useful in calculating the probability for the  Logistic function.
*	 It’s derivative is easy to calculate than other functions which is useful during gradient descent calculation.
*	It is a simple way of introducing non-linearity to the model.

Although there are other functions as well, which can be used, but sigmoid is the most common function used for logistic regression.


The logistic function is given as:

<img src="logistic_function.PNG" width="250">

We can see that the logit function is linear in terms with x.


### Multiple Logistic Function

We can generalise the simple logistic function for multiple features as:
<img src="multi.PNG" width="300">

And the logit function can be written as:

<img src="logit.PNG" width="400">

The coefficients are calculated the same we did for simple logistic function, by passing the above equation in the cost function.

Just like we did in multilinear regression, we will check for correlation between different features for Multi logistic as well.

### Multinomial Logistics Regression( Number of Labels >2)

we have libraries that we can use to perform multinomial logistic regression, **we rarely use logistic regression for classification problems where the number of classes is more than 2.**

### Learning Algorithm

we will use gradient descent instead. Specifically we will use batch gradient descent which calculates the gradient from all data points in the data set.

Luckily, our "cross-entropy" error measure is convex so there is only one minimum. Thus the minimum we arrive at is the global minimum.

## Evaluation of a Classification Model

For a  regression problem, we have different metrics like R Squared score, Mean Squared Error etc. what are the metrics to measure the credibility of a classification model?

#### Metrics

In a regression problem, the accuracy is generally measured in terms of the difference in the actual values and the predicted values.

In a classification problem, the credibility of the model is measured using the confusion matrix generated, i.e., how accurately the true positives and true negatives were predicted.

The different metrics used for this purpose are:
1. Accuracy
- Recall
- Precision
- F1 Score
- Specifity
- AUC( Area Under the Curve)
- ROC(Receiver Operator Characteristic)

### Confusion Matrix

A typical confusion matrix looks like the figure shown.

<img src="confusionMatrix.PNG" width="300">

### 1. Accuracy : Correct predictions out of total predictions.

The mathematical formula is :

  $$ Accuracy=  \frac{ (TP+TN)}{(TP+TN+FP+FN)} $$

### 2. Recall or Sensitivity : True Positives out of Total Predicted Positives
The mathematical formula is:

$$ Recall=  \frac{TP}{(TP+FN)} $$

Sensitivity is a measure of : from the total number of (Actual)positive results how many positives were correctly predicted by the model.

Let’s suppose in the previous model, the model gave 50 correct predictions(TP) but failed to identify 200 cancer patients(FN). Recall in that case will be:

Recall=$ \frac {50}{(50+200)} $= 0.2 (The model was able to recall only 20% of the cancer patients)

### 3. Precision : True Positives out of Actual Positives.

Precision is a measure of amongst all the positive predictions, how many of them were actually positive. Mathematically,

Precision=$ \frac {TP}{(TP+FP)} $

Let’s suppose in the previous example, the model identified 50 people as cancer patients(TP) but also raised a  false alarm for 100 patients(FP). Hence,

Precision=$ \frac {50}{(50+100)} $=0.33 (The model only has a precision of 33%)


### But we have a problem!!

As evident from the previous example, the model had a very high Accuracy but performed poorly in terms of Precision and Recall. So, necessarily _Accuracy_ is not the metric to use for evaluating the model in this case.


### A Trade-off?

With an increase in the Recall, there is a drop in Precision of the model.

So the question is - what to go for? Precision or Recall?

Well, the answer is: it depends on the business requirement.

For example, if you are predicting cancer, you need a 100 % recall. But suppose you are predicting whether a person is innocent or not, you need 100% precision.

Can we maximise both at the same time? No

So, is there  a need for a better metric then?

Yes. And it’s called an _F1 Score_

### 4. F1 Score

F1 score is defined as the harmonic mean of Precision and Recall. 

The mathematical formula is:
        $$ F1 Score=  2 \times \left( \frac { Precision \times Recall}{Precision+Recall} \right)$$


### 5. Specificity or True Negative Rate

True Negatives out of the predictions.

  $$ Specificity = \frac {TN}{(TN+FP)} $$

Similarly, False Positive rate can be defined as:  (1- specificity)
Or,  $ \frac {FP}{(TN+FP)} $
