<img src="../../predictioNN_Logo_JPG(72).jpg" width=200>

## Book Solutions
### Chapter 14: Classification with Logistic Regression

Last updated: January 14, 2023\4

---

1. Explain why the direct use of a linear combination of predictors is not appropriate for the logistic regression model.

Answer: A logistic regression model will require the probability that an event occurs. The range of a linear combination of predictors could be $(-\infty, \infty)$, which is inconsistent with probabilities.

---

3. If the odds of an event are 1/3, what is the probability of the event?

Answer: 1/4

We can write the definition of the odds and solve for the probability.  
Let o = odds and p = probability of event

Then

$o = \frac{p}{1-p}$

$o(1-p)=p$  
$o-op=p$  
$p+op=o$  
$p(1+o)=o$  
$p=\frac{o}{1+o}$

Plugging in,

$p = \frac{1/3}{1+1/3} = 1/4$

---

5. True or False: The parameters that maximize the likelihood will also maximize the log-likelihood.


Answer: True

This follows because the `log()` function is monotonically increasing.

---

9. In gradient descent, does it always make sense to use a large step size? Explain your answer.


Answer: No. 

If the step size is ''too large'' the update step might move the next parameter estimate away from the extrema.

---

10. In computing the gradient of the log-likelihood, we made a substitution for the event probability. Show that their representations below are equivalent.

Expression I

$
\frac{exp(\beta_0 + \beta_1 x_{i1} + \ldots + \beta_{p} x_{ip})}{1+exp(\beta_0 + \beta_1 x_{i1} + \ldots + \beta_{p} x_{ip})}
$

Expression II

$
\frac{1}{1+exp(-(\beta_0 + \beta_1 x_{i1} + \ldots + \beta_{p} x_{ip}))}
$


Answer:

This can be shown by multiplying the numerator and denominator of Expression II by the factor:

$exp(\beta_0 + \beta_1 x_{i1} + \ldots + \beta_{p} x_{ip})$

If we call the factor $a$, then

this uses the property that $e^a e^{-a} = e^{a+(-a)}=1$

---

13. True or False: When the logistic regression parameter estimates are computed with gradient descent, we should expect them to be very different from estimates derived by a statistical package like `statsmodels`.

Answer: False

We should expect the result from a package to yield similar or identical results to gradient descent.

---

15. Using the `sklearn` module, fit a logistic regression model to this dataset: $\{(3,0),(3.2,0),(4,1)\}$. Show your code and print the intercept and slope coefficient estimates. For a predictor value of 6, what is the predicted target?

Answer: The code below will accomplish the task:

In [1]:
import numpy as np
from sklearn.linear_model import LogisticRegression

In [2]:
data = np.array([[3,0],[3.2,0],[4,1]])
data

array([[3. , 0. ],
       [3.2, 0. ],
       [4. , 1. ]])

In [3]:
reg = LogisticRegression().fit(data[:,0].reshape(-1, 1), 
                               data[:,1])

In [4]:
reg.intercept_

array([-2.5127071])

In [5]:
reg.coef_

array([[0.53268991]])

In [6]:
reg.predict(np.array([6]).reshape(-1,1))

array([1.])

---

16. Explain the difference between the `sklearn` functions `predict()` and `predict_proba()`. Specifically, when would it make sense to use each of them?

Answer:

The `predict_proba()` function will produce a predicted probability.  
This probability can be thresholded at cutoff 0.5 to produce predicted labels.  
This function is useful, for example, if a different threshold might be applied to determine the predicted label.

The `predict()` function will produce a predicted label.  
This can be useful for producing a confusion matrix, for example,
where the probability cutoff is 0.5.


---

18. A logistic regression model was applied to a live dataset. The  resulting precision was deemed too low by product leadership. What can be done to the probability threshold to produce a higher precision?

Answer:

Setting a higher probability threshold may increase the precision, as the hurdle to predict a positive label will be higher.  
This may reduce the number of false positives.

---