# Logistic Regression 

* The first classifier we will discuss in this class is  **Logistic Regression**. 

* In Linear Regression, we fit a line to data. 

* In a simple (two-class) Logistic Regression we will fit a curve to the probability that the data comes from one **class**

* Many AI models are complicated versions of logistic regression models. 


<p align="center">
  <img src="Exam_pass_logistic_curve.svg.png" alt="alt text" width="50%">
</p>


### Logistic Function 

Logistic Regression addresses the problem of estimating a probability model, $𝑃(Y = 1|x)$. 

The logistic regression model uses a function for the probability model, called the logistic function:

$$ P(Y = 1 \mid x) = \frac{1}{1 + e^{-(\beta_0 + \beta_1 x)}}$$





In [None]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report, confusion_matrix, accuracy_score 

In [None]:


def logistic(x, beta0=0, beta1=1):
    p =  1 / (1 + np.exp(-(beta0 + beta1 * x)))
    return p

# Generate a range of x values
x = np.linspace(-10, 10, 400)

# Parameters
beta0 = 0   # Intercept
beta1 = 1   # Slope

# Compute logistic values
y = logistic(x, beta0, beta1)

# Plot
plt.figure(figsize=(8, 5))
plt.plot(x, y, label=fr'Logistic: $\beta_0={beta0}$, $\beta_1={beta1}$')
plt.title("Logistic Function", fontsize=14)
plt.xlabel("x")
plt.ylabel("P(Y=1|x)")
plt.grid(True)
plt.legend()
plt.ylim(-0.1, 1.1)
plt.show()


The probability model will predict $𝑃(Y = 1 | x)$ with an S-shaped curve:

* $\beta_0$ shifts the curve right or left by $c = \frac{-\beta_0}{\beta_1}$ 
* $\beta_1$ controls the steepness the S-shaped curve. Distance from ½ to almost 1 or ½ to
almost 0 to ½ is $\frac{2\beta_1}$
* if $\beta_1$ is positive, then the predicted $P(Y = 1|x)$ goes from zero for small values
of $x$ to one for large values of $X$
if $\beta_1$ is negative, then the predicted $P(Y = 1|x)$ goes from one for small values
of $x$ to zero for large values of $X$


<p align="center">
  <img src="logistic.png" alt="alt text" width="50%">
</p>

* It's useful to rewrite the logistic regression model, in terms of odds.  This is called the **logit** function by statisticians and in economics


$$ \text{logit}\big(P(Y = 1 \mid x)\big) = \ln\left( \frac{P(Y = 1 \mid x)}{1 - P(Y = 1 \mid x)} \right) = \beta_0 + \beta_1 x $$

* The ratio shown is the **odds** ratio between the probability of $Y = 1$ with the probability $Y = 0$, where $Y$ can only be 1 or 0  

* A one unit change in x is associated with an $e^{\beta_1}$ change in the odds that $𝑌 = 1$ .

* What happens with the odds ratio is 1, i.e., $P(Y = 1) = 0.5?$

* Since $P(Y = 0) = 1 - P(Y = 1) = 0.5$, 

$$ \ln\left( \frac{P(Y = 1)}{1 - P(Y = 1)} \right) = ln (1) = 0  = \beta_0 + \beta_1 x $$

$$ x = -\frac{\beta_0}{\beta_1} $$


In [None]:

# Generate a range of x values
x = np.linspace(-10, 10, 400)

# Parameters
beta0 = 2   # Intercept
beta1 = 1   # Slope
c = -beta0 / beta1  # x value where P(Y=1|x) = 0.5  
# Compute logistic values
y = logistic(x, beta0, beta1)

# Plot
plt.figure(figsize=(8, 5))
plt.plot(x, y, label=fr'Logistic: $\beta_0={beta0}$, $\beta_1={beta1}$')
plt.plot([c,c], [0 ,1], 'r-',label = 'Decision Boundary')  # Point where P(Y=1|x) = 0.5
plt.title("Logistic Function", fontsize=14)
plt.xlabel("x")
plt.ylabel("P(Y=1|x)")
plt.grid(True)
plt.legend()
plt.ylim(-0.1, 1.1)
plt.show()


### Probability mass function for logistic regression 

* In logistic regression, the response variable $Y$ is binary, taking values in

$$ Y \in \{0, 1\} $$


* We define

$$     P(Y = 1 \mid x) = p \quad \text{and} \quad P(Y = 0 \mid x) = 1 - p $$


* $p$ is modeled using the logistic function:

$$ p = \frac{1}{1 + e^{- (\beta_0 + \beta_1 x)}} $$


* The probability mass function (PMF) of a Bernoulli random variable can be written compactly as

$$P(Y = y \mid x) = p^{\,y} (1 - p)^{\,1 - y}, \quad \text{for } y \in \{0,1\} $$


* This expression encodes both possible outcomes in a single formula. Specifically:
    * If $y = 1$
    $$ P(Y = 1 \mid x) = p^1 (1 - p)^0 = p $$
    * If $y = 0$
    $$ P(Y = 0 \mid x) = p^0 (1 - p)^1 = 1 - p $$


* The notation $P(Y = y)$ means *the probability that the random variable $Y$ takes the specific observed value $y$*. Since $y$ can only be $0$ or $1$, this single expression

$$ P(Y = y) = p^{\,y}(1-p)^{\,1-y} $$

automatically selects the correct probability term depending on whether the observed outcome was $0$ or $1$.


### Likelihood for Logistic Regression

* Given a dataset $\{(x_i, y_i)\}_{i=1}^N$ with $y_i \in \{0,1\}$ and

$$ p_i = P(Y_i = 1 \mid x_i) = \sigma(\beta_0 + \beta_1 x_i) = \frac{1}{1 + e^{- (\beta_0 + \beta_1 x_i)}} $$

* the Likelihood of the parameter vector $\beta = (\beta_0, \beta_1)$ is

$$    L(\beta \mid x_{1:N}, y_{1:N}) = \prod_{i=1}^N p_i^{\,y_i} (1 - p_i)^{\,1 - y_i} $$

* Taking the logarithm yields the log-likelihood:

$$ \ell(\beta) = \ln L(\beta) = \sum_{i=1}^N \left[ y_i \ln(p_i) + (1 - y_i) \ln(1 - p_i) \right] $$


### Loss Function 

* In machine learning, the term loss function is used to refer to some measure of error that you are trying to minimize.  If you have a probability model, as in Logistic Regression,  the loss function is simply the **negative log-Likelihood** 

$$ -\ell(\beta) = -\ln L(\beta) = -\sum_{i=1}^N \left[ y_i \ln(p_i) + (1 - y_i) \ln(1 - p_i) \right] $$

* As mentioned in the video lecture, in machine learning this is called negative cross-entropy.  
* How do we minimize this? Differentiate, equate to zero and solve for it!
* Or, stick into some numerical procedure (gradient descent usually) to find the minimum  


### An example of real data:  

## Diabetes Prediction Example 
[Pima Indians Diabetes Study](https://www.kaggle.com/datasets/uciml/pima-indians-diabetes-database)

In [None]:

#col_names = ['pregnant', 'glucose', 'bp', 'skin', 'insulin', 'bmi', 'pedigree', 'age', 'label']
# load dataset
pima = pd.read_csv("../data/diabetes.csv")

In [None]:
pima.head()

In [None]:
pima.info()

In [None]:
#I grabbed a list of all the columns 
cols = pima.columns

In [None]:
#Examine how many of each outcome
pima["Outcome"].value_counts()

* I always like to take a first glance at all the data.  This works for continuous valued data, and works if you have fewer than 10 variables.

In [None]:
sns.pairplot(pima, hue="Outcome", height=3);
plt.show()

In [None]:
diabetes = pima['Outcome']
predictors = pima['Glucose']

In [None]:
plt.figure(figsize=(8,5))
plt.plot(predictors, diabetes,'ro', alpha=0.7)
plt.xlabel('Glucose')
plt.ylabel('Diabetes (0 = No, 1 = Yes)')
plt.title('Diabetes Outcome vs Glucose Level')
plt.grid(True)

* First step is always to set aside some data for testing after we train the model. 

In [None]:
predictors_train, predictors_test, diabetes_train, diabetes_test = train_test_split(predictors, diabetes, test_size=0.25, random_state=16)

* Lets examine how the test and training data are distributed.  

In [None]:
print(np.sum(diabetes_train == 1), np.sum(diabetes_train == 0))
print(np.sum(diabetes_test== 1), np.sum(diabetes_test == 0))

* our syntax is sklearn is standard 

In [None]:
# I need to do a reshape here because I have a single predictor.
predictors_train.values.reshape(-1,1)
# this forces it to be a 2D array with one column and many rows.

* first lets fit the model

In [None]:
lr = LogisticRegression()
lr.fit(predictors_train.values.reshape(-1,1), diabetes_train)


* Now lets evaluate the model performance.  the `score` method returns accuracy for Logistic Regression 

In [None]:
accuracy_train = lr.score(predictors_train.values.reshape(-1,1), diabetes_train)
accuracy_test = lr.score(predictors_test.values.reshape(-1,1), diabetes_test)
print(f"Training Accuracy: {accuracy_train:.3f}")
print(f"Test Accuracy: {accuracy_test:.3f}")    

* Is that good?
* Is there anything unusual?   

*Whats the model it generated?

In [None]:
beta = list()
beta.append(lr.intercept_[0])
beta.append(lr.coef_[0][0])
print(f"beta0 (intercept): {beta[0]:.3f}")
print(f"beta1 (slope): {beta[1]:.3f}")

In [None]:
decision_boundary = -beta[0] / beta[1]
print(f"Decision Boundary (Glucose level where P(Y=1|x)=0.5): {decision_boundary:.3f}")

### Confusion Matrix 

* A Confusion Matrix provides better insight into classifier performance than simple accuracy 
* To obtain a confusion matrix we need predictions from the model

In [None]:
diabetes_train_pred = lr.predict(predictors_train.values.reshape(-1,1))
diabetes_test_pred = lr.predict(predictors_test.values.reshape(-1,1))


Compute the confusion matrix for the training data

In [None]:
cnf_matrix_train = confusion_matrix(diabetes_train, diabetes_train_pred)
print(cnf_matrix_train)

In [None]:
#Never say '"Healthy", "Normal", just say "Undiagnosed"
class_names=['Undiagnosed','Diabetes'] # name  of classes
fig, ax = plt.subplots()
tick_marks = np.arange(len(class_names))
# create heatmap
sns.heatmap(pd.DataFrame(cnf_matrix_train), annot=True, cmap="jet" ,fmt='g')
ax.xaxis.set_label_position("top")
plt.tight_layout()
plt.title('Confusion matrix', y=1.1)
plt.ylabel('Actual label')
plt.xlabel('Predicted label')
plt.xticks(tick_marks+0.5, class_names)
plt.yticks(tick_marks+0.5, class_names)

In [None]:
cnf_matrix_test = confusion_matrix(diabetes_test, diabetes_test_pred)
#Never say '"Healthy", "Normal", just say "Undiagnosed"
class_names=['Undiagnosed','Diabetes'] # name  of classes
fig, ax = plt.subplots()
tick_marks = np.arange(len(class_names))
# create heatmap
sns.heatmap(pd.DataFrame(cnf_matrix_test), annot=True, cmap="jet" ,fmt='g')
ax.xaxis.set_label_position("top")
plt.tight_layout()
plt.title('Confusion matrix', y=1.1)
plt.ylabel('Actual label')
plt.xlabel('Predicted label')
plt.xticks(tick_marks+0.5, class_names)
plt.yticks(tick_marks+0.5, class_names)
