# Logistic Regression Details Pt1: Coefficients

In this notebook, we will be discussing the specifics of logistic regression, particularly the coefficients in the context of using a continuous variable to predict a binary outcome. We will also discuss the coefficients in the context of testing if a discrete variable is related to the binary outcome.

To start with, let's import the necessary libraries.

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report, confusion_matrix

## Logistic Regression

Logistic Regression is a statistical model that uses a logistic function to model a binary dependent variable. In regression analysis, logistic regression (or logit regression) is estimating the parameters of a logistic model (a form of binary regression).

Let's create a synthetic dataset and use it to illustrate the concept of logistic regression.

In [None]:
# Create a synthetic dataset
np.random.seed(0)
x = np.random.normal(0, 1, 100)
y = (x > 0).astype(np.float)
x[x > 0] *= 4
x += .3 * np.random.normal(0, 1, 100)

x = x[:, np.newaxis]

# Plot the data
plt.figure(figsize=(8, 6))
plt.scatter(x, y, color='black', zorder=20)
plt.show()

From the plot, we see that the data points are divided into two groups: one with a value close to 0 and one with a value close to 1. We can fit a logistic regression model to this data and predict the probability of a data point being in the group with a value of 1.

## Logistic Regression Coefficients

In logistic regression, the coefficients are the values that multiply the predictor values. The sign of each coefficient indicates the direction of the relationship between a predictor variable and the response variable. A positive sign indicates that as the predictor variable increases, the response variable also increases. A negative sign indicates the opposite. 

Let's fit a logistic regression model to this data and examine the coefficients.

In [None]:
# Fit the logistic regression model
clf = LogisticRegression(solver='lbfgs')
clf.fit(x, y)

# Print the coefficients
print(f'Coefficient: {clf.coef_}')

From the output, we can see that the coefficient is positive, which means that as the predictor variable (x) increases, the response variable (y) also increases, hence, the probability of the outcome being 1 increases.

## Understanding Logistic Regression Coefficients

The coefficients of a logistic regression model can be interpreted in terms of the odds ratio, which is the ratio of the odds of success to the odds of failure. For a coefficient $\beta$, the odds ratio is given by $e^{\beta}$, which is the exponential function of the coefficient.

If $\beta$ is positive, $e^{\beta}$ is greater than 1, which means that as the predictor variable increases, the odds of the outcome being 1 increase. If $\beta$ is negative, $e^{\beta}$ is less than 1, which means that as the predictor variable increases, the odds of the outcome being 1 decrease.

Let's calculate the odds ratio for our logistic regression model.

In [None]:
# Calculate the odds ratio
odds_ratio = np.exp(clf.coef_)
print(f'Odds ratio: {odds_ratio}')

The odds ratio is greater than 1, which confirms that as the predictor variable increases, the odds of the outcome being 1 increase.

## Logistic Regression with a Discrete Variable

Logistic regression can also be used with a discrete predictor variable. The process is similar, but instead of having a single coefficient for the predictor variable, we have a separate coefficient for each level of the variable.

In [None]:
# Create a synthetic dataset with a discrete predictor variable
np.random.seed(0)
x_discrete = np.random.choice([0, 1], size=100, p=[.5, .5])
y_discrete = (x_discrete == 1).astype(np.float)
x_discrete = x_discrete[:, np.newaxis]

# Fit the logistic regression model
clf_discrete = LogisticRegression(solver='lbfgs')
clf_discrete.fit(x_discrete, y_discrete)

# Print the coefficients
print(f'Coefficient: {clf_discrete.coef_}')

In this case, the coefficient is positive, which means that the odds of the outcome being 1 are higher for the level 1 of the predictor variable than for the level 0.

In conclusion, the coefficients in logistic regression are related to the odds ratio of the outcome and can be interpreted in terms of the direction and strength of the relationship between the predictor variables and the response variable. They are the key to understanding the output of a logistic regression model. 

## References

1. [Logistic regression - Wikipedia](https://en.wikipedia.org/wiki/Logistic_regression)
2. [Understanding Logistic Regression Coefficients - UCLA](https://stats.idre.ucla.edu/other/mult-pkg/faq/general/faq-how-do-i-interpret-odds-ratios-in-logistic-regression/)