# Logistic Regression for Machine Learning
*Curtis Miller*

**Logistic regression** (also referred to as **logit models**) is a form of regression that, given an observation's features, produces probabilities predicting whether an observation belongs to a certain class. While common in machine learning, they are popular statistical models in general, appearing in fields such as economics, medicine, etc.

After fitting a logit model, we make predictions using the probability produced by the model that an observation belongs to a certain class. We may decide to for an observationt to predict the class to which it is most likely to belong; in other words, if the probability an observation to a class is greater than 0.5, we predict it belongs to that class. (In principle we could choose a different threshold than 0.5.)

Logit models are considered linear models, but by changing the features the model uses we may express non-linear relationships.

Logit models are implemented in **scikit-learn** in the `LogisticRegression` class.

Again we will work with the *Titanic* dataset. We will make the transformations needed when fittiing a SVM.

In [None]:
import pandas as pd
from pandas import DataFrame
from sklearn.model_selection import train_test_split, cross_validate
from sklearn.metrics import classification_report

In [None]:
titanic = pd.read_csv("titanic.csv")
titanic.replace({'Sex': {'male': 0, 'female': 1}}, inplace=True)
titanic.drop("Name", axis=1, inplace=True)
titanic = titanic.join(pd.get_dummies(titanic.Pclass, prefix='Pclass')).drop("Pclass", axis=1)
titanic_train, titanic_test = train_test_split(titanic)
titanic_train.head()

## Fitting a Logit Model

Fitting logit models is similar to what we've seen before.

In [None]:
from sklearn.linear_model import LogisticRegression

In [None]:
logit = LogisticRegression()
logit.fit(X=titanic_train.drop("Survived", axis=1),
          y=titanic_train.Survived)
logit.predict([[0, 26, 0, 0, 30, 0, 1, 0]])     # Example prediction

In [None]:
logit.predict_proba([[0, 26, 0, 0, 30, 0, 1, 0]])    # What is the probability of belonging to certain classes?

In [None]:
print(classification_report(titanic_train.Survived, logit.predict(titanic_train.drop("Survived", axis=1))))

Let's see the logit model's performance on test data.

In [None]:
print(classification_report(titanic_test.Survived, logit.predict(titanic_test.drop("Survived", axis=1))))