# Logistic Regression

- Predicts the probability of a binary outcome.
- Uses sigmoid function to map inputs to probabilities (0 to 1)
- Ideal for classification
- Types of Regression:
    - Binomial: 2 classes (0 or 1) (uses sigmoid)
    - Multimodel: more than 2 unordered classes (cat, dog, sheep) (uses softmax)
    - Ordinal: More than 2 ordered classes (low, medium, high)


softmax: e^xi / sum of e^xj 

sigmoid: 1 / 1+e^-x
    - if below 0.5 then class 0 else 1

- goal: find weights w and bias b that maximise the likelihood of observing the data.


## Assumptions
- linear relationship between independent variables and log odds of the dependent variable.
- no extreme outliers
- Requires a large sample size.

## Evaluating Logistic Regression
- Accuracy
- precision: acurracy of positive predictions 
    - precision = TP / TP+FP
- Recall: measures the proportion of correctly predicted positive instances among all actual positive instances.
    - TP / TP / FN
- F1: the harmonic mean of precision and recall.

In [14]:
from sklearn.datasets import load_breast_cancer
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

X, y = load_breast_cancer(return_X_y=True)

# cross-validation split

clf = LogisticRegression(max_iter=10000, random_state=0)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20, random_state=75)
clf.fit(X_train, y_train)

acc = accuracy_score(y_test, clf.predict(X_test)) * 100
print(f"Logistic Regression model accuracy: {acc:.2f}%")

Logistic Regression model accuracy: 98.25%
