### Logistic Regression - StatQuest

[YouTube Link](https://www.youtube.com/watch?v=yIYKR4sgzI8&t=1s)

We review LinReg first

When we fit a best fit line to our data using LinReg we can do the following things:

![](./data/img/diag1.png)

#### Overview of LogReg

LogReg is similar to LinReg, but it predicts something as True/False instead of predicting something continuous

LogReg does not fit a line to the data, it fits an "S" shaped curve called the Logistic Function

![](./data/img/diag2.png)

The curve goes from 0 to 1
It basically gives the **prob that the mouse is obese based on its weight**

If we wt a v heavy mouse there is a high prob that the mouse is obese

![](./data/img/diag3.png)


Similarly if we wt an intermediate mouse there is only 50% chance that the mouse may be obese

![](./data/img/diag4.png)

Lastly there is only a small prob that a light mouse may be obese

**Although LogReg tells the prob that a mouse is obese or not, it is usually used for Classification**

If prob that mouse is obese > 50% then we classify it as obese, else not obese

Here this is a very simple model where obesity is predicted by weight

We can have more complex models where obesity is predicted by weight + genotype + age + alstrological sign + ...

**Just like LinReg, LogReg can also work with both cont and discrete data**

We can also test to see if each var is useful for predicting obesity

However ** unlike LinReg we cant easily compare the complicated model to the simple model**

Instead we just see if a var's effect on a predn is significantly diff from 0. If not means that the var is not helping the predn. We use "**Wald's Test**" to figure this out 

#### How do we fit the data

In LinReg we fit the line using **Least Squares**

![](./data/img/diag5.png)

We find the line that min the sum of squares for these residuals

We also use the residuals to compute R^2 to compare simple models to complicated models

LogReg does not have the same concept of a residual, so it cant use Least Squares and cant calculate R^2

It uses **Maximum Likelihood**




#### Maximum Likelihood

[YouTube link](https://www.youtube.com/watch?v=XepXtl9YKwc)

Lets say we weighed a bunch of mice

The goal of MaxL is to **find an optimal way to fit a distribution to the data**

There are lots of diff dist for diff types of data

![](./data/img/diag6.png)

The reason we want to fit a didt to our data is:

- It makes it easier to work with and is also more general, it applies to every experiment of the same type

In this case we think that the wts may be normally distributed. That means we think **it came** from a Normal Dist

Normal Dist characteristics:

![](./data/img/diag7.png)

Normal Dist can come in may shapes as shown:

![](./data/img/diag8.png)

Once we settle on the shape, we have to decide where to place (center) the ND on the data

We pick any ND and check how it fits the data

![](./data/img/diag9.png)

The avg of the ND is the black line
Most of the values should be near this avg

In this case its diff from the avg of the actual measurements
Here most of the values are far from the avg of the ND

![](./data/img/diag10.png)

Now we shift the normal dist st its mean is same as the avg wt:

![](./data/img/diag11.png)

If we shift the ND to the right:

![](./data/img/diag12.png)

Then the prob or "likelihood" of observing these measurements would go down again

Now we plot the likelihood of observing the data vs the loc of the center of the dist

![](./data/img/diag13.png)

We want the **Location that maximizes the likelihood of observing the weights we measured**

The location of the ND near the center maximizes the likelihood of observing the weights we measured.
Thus, it is the **Maximum Likelihood Estimate for the mean**

Here we are talking about the mean of the dist, not mean of the data, but with the ND, both these things are the same

We figured out Max Likelihood estimate for the Mean

---

#### Max Likelihood Est for Std Dev

The std dev is the spread of the ND

We similarly plot Likelihood of observing the data vs the std dev and we pick the ND with std dev that max the likelihood

![](./data/img/diag14.png)

When someone says they have the max likelihhod est for mean or sd or something..

This means that they found the value for the mean or sd or whatever that maximizes the likelihood that you observed the things they observed

![](./data/img/diag15.png)


![](./data/img/diag16.png)

** Using Max Likelihood concepts we fit a distribution to the data **



#### How does LogReg use Max Likelihood

Given an "S" shaped curve we find the **likelihood of the data**

We calculate the likelihood for observing one mouse:

![](./data/img/diag17.png)

Similarly we do that for all mice:

![](./data/img/diag18.png)

We multiply all the likelihoods together

That is the **likelihood of the data, given the "S" shaped curve we chose**

Then we shift the line and calculate a new likelihood for the data

We do this again and again..

The curve with **max likelihood** is selected

![](./data/img/diag19.png)



### Odds and Log(Odds), Clearly Explained

[YouTube link](https://www.youtube.com/watch?v=ARfXDSkQf1Y)

odds of my team winning = no of time my team wins / no of times my team loses

This is diff from probability

#### Deriving odds from prob

odds of my team winning = no of time my team wins / no of times my team loses

prob of my team winning = no of time my team wins / no of times my team wins + loses

prob of my team losing = no of time my team loses / no of times my team wins + loses

odds of my team winning = prob of my team winning / prob of my team losing

= prob of my team winning / (1 - prob of my team winning)

odd of simething happening = p / (1 - p)

#### Log of the odds

The worst my team is the odds of winning get closer and closer to zero

**So, if the odds are against my team winning, then they will be bw 0 and 1**

The better my team, the odds of winning start at 1 and go up and up

**So, if the odds are for my team winning then they will be bw 1 and infinity**

We can see this is a number line

![](./data/img/diag22.png)

Red: odds of my team losing

Blue: odds of my team winning

This dist is Asymmetrical

For eg if odds are against 1 to 6 => 1/6 = 0.17

But if the odds are in favour 6 to 1 then its 6/1 = 6

The magnitude of odds against are way smaller than the odds in favor

**Taking the log(odds) solves this prob by making everything symm**

For eg if odds are against 1 to 6 => 1/6 = 0.17
log(odds) = log(0.17) = -1.79

But if the odds are in favour 6 to 1 then its 6/1 = 6
log(odds) = log(6) = 1.79

This is because

log(a/b) = - log(b/a)

So log(odds against) = -log(odds in favor)

Here we calculated log(odds) from counts

We can also calculate using prob

log(odds) = log(p/(1-p))

**The log of the ratio of prob is called the Logit Function and performs the basis of LogReg**

The log(odds) makes things sy

### Logistic Regression Details Pt1: Coefficients

[YouTube link](https://www.youtube.com/watch?v=vN5cNN2-HWE)


Here we will talk about the **coefficients** that are the result of any LogReg and how they are determined and interpreted

![](./data/img/diag20.png)

![](./data/img/diag21.png)


